Thursday, January 28, 2010

Mobile Call Graphs: Beyond Power-Law and Lognormal Distributions

Mobile Call Graphs: Beyond Power-Law and Lognormal Distributions, Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, Christos Faloutsos, Jure Leskovec, Proc. of ACM KDD'08, August 2008, Las Vegas, NV.

This paper analyzes mobile phone calls data from the Sprint network and tries to create a model of the underlying graphs associated with different metrics: Partners (number of callers and callees associated with each user); calls (total number of calls made or received by a given user); duration (total duration of calls for each user). For each of these, the distribution has a power law tail, but a power law distribution does not fit well the rest of the distribution. The lognormal distribution fits better, except for the tail where it is off. So the paper considers the Double Pareto Log Normal (DPLN) distribution and explains how to find the best fit for that distribution.

This distribution would be useful for creating synthetic models to evaluate different social network parameters.

Understanding the Spreading Patterns of Mobile Phone Viruses

Understanding the Spreading Patterns of Mobile Phone Viruses, Pu Wang, Marta Gonzalez, Cesar Hidalgo, Albert-Laszlo Barabasi, in Science Express, April 2009, DOI: 10.1126/science.1167053

This paper studies data from cell phone usage and mobility and extrapolate a mobility model and a connectivity graph from proximity and from call connections. In the former graph, two points are connected if they are within a given distance of each other; in the latter, they connect if a phone call is made between two points (users).

From the graph, they can derive some basic properties. Also, if an Operating System is assigned to nodes on a graph, then the propagation of viruses targeting that OS can be computed. There is a critical threshold for the market share of a given OS (that is the fraction of cell phones using this OS in the graph) so that the virus becomes epidemic.

The topology of the graph is interesting (similar to that of Mobile Call Graphs by Seshadri et al) and the characterization could be useful in social network studies. The mathematical tools to compute the size of the cluster is also of interest. It's a cute paper. I would be cautious with the results for bluetooth virus propagation, which is extrapolated from the graph data, but is not really validated. I guess the extrapolation cannot be that off due to the multiple data points in between the extrapolated trajectories.

Tuesday, January 12, 2010

Fast, Memory Efficient Flow Rate Estimation Using Runs

Fast, Memory Efficient Flow Rate Estimation Using Runs, Fang Hao, Murali Kodialam, T.V. Lakshman, Shantidev Mohanty, in IEEE/ACM Transactionson Networking, Vol. 15, No. 6, December 2007.

The paper suggests to estimate the rate of flows by basically looking indirectly at the flow rate. The quantity is the number of 2-runs, that is the number of time that consecutive arrivals belong to the same flow. The occurrence of a 2-run will depend on the rate of the flow, and from observing the flows during a long enough time, one can infer the fraction of traffic belong to a flow by counting the two-runs.

This is similar in concept to the Flageolet & Martin paper, which observes some metric which is derived from the underlying process. F&M toss a geometric random variable and keep the max every time a flow works through the system; while this paper considers another derivative function of the rate. I believe I have seen a version of F&M used for the purpose of this paper, which seems natural.

The idea is cute, and leads to some straightforward analysis of the accuracy of the method. I was curious about how the assumptions of the model (independent Poisson processes for each flow, stationarity, etc.) would translate in a real link, but for a link with enough traffic, the method seems to work quite well.

Sampling Biases in Network Path Measurements and What to Do About It

Sampling Biases in Network Path Measurements and What to Do About It, Srikanth Kandula, Ratul Mahajan, in Proc. ACM IMC'09, November 2009, Chicago, IL.

The paper considers the difference in the results obtained by sampled network path measurements with the actual performance. It shows that source sampling is a source of bias, and that picking only a few sources to gather data, and then extrapolating from this data leads to poor results.

The paper then presents three methods to improve the interpolation from the data. One is to consider only the tail of the path, to remove the source bias. This proves relatively ineffective compared to the other two methods. The next method is to embed the coordinates in a low dimensional space and to compute the missing paths in the coordinate space. This provides the best improvement for latency. Finally, the last method consist in clustering the nodes into an organization which somewhat mimics the typical network architecture of an access network near the source, a core network in the middle, and then another access network at the destination. This proves the best method for capacity.

The paper is interesting, mostly empirical, looking at the methods inspired from removing sampling bias in other context and seeing if they work in this context.

The embedding method is interesting to me, owing to our work on embedding graphs for routing. They use vivaldi, for which code exists, but has some limitations.