Sampling Biases in Network Path Measurements and What to Do About It, Srikanth Kandula, Ratul Mahajan, in Proc. ACM IMC'09, November 2009, Chicago, IL.
The paper considers the difference in the results obtained by sampled network path measurements with the actual performance. It shows that source sampling is a source of bias, and that picking only a few sources to gather data, and then extrapolating from this data leads to poor results.
The paper then presents three methods to improve the interpolation from the data. One is to consider only the tail of the path, to remove the source bias. This proves relatively ineffective compared to the other two methods. The next method is to embed the coordinates in a low dimensional space and to compute the missing paths in the coordinate space. This provides the best improvement for latency. Finally, the last method consist in clustering the nodes into an organization which somewhat mimics the typical network architecture of an access network near the source, a core network in the middle, and then another access network at the destination. This proves the best method for capacity.
The paper is interesting, mostly empirical, looking at the methods inspired from removing sampling bias in other context and seeing if they work in this context.
The embedding method is interesting to me, owing to our work on embedding graphs for routing. They use vivaldi, for which code exists, but has some limitations.