Cedric Westphal's Networking Notes: 2010

Wednesday, July 21, 2010

A Rant on Reviews

We submitted a paper to a conference (ICNP) which is quite selective (30 papers accepted for 170 submissions). Grade-wise, the reviews of our paper are decent (three 'accept with room' and one 'accept'). The conference being very competitive, our paper did not get accepted, but I admit to being irritated enough by the strange reasoning of some of the reviews to pen this post.

One reviewer states: 'The evaluation considers only two network graph types... In both cases, they only consider relatively small networks, up to 10^5 nodes. Hence, while the results obtained are indicative, they don't really support the claims of 'extensive simulations' and 'close to optimal on a wide variety of topologies.'

This floored me: 10^5 nodes for routing simulation is actually A LOT. If you want to evaluate your routing on PlanetLab, you get 800 nodes. If you use NS, you can't do over a 1,000 nodes. The other papers we looked at get to 1,000 nodes, maybe 10,000 at the most. We had to implement our own emulator from scratch to get to 300,000 nodes. And it's hardware limitations that stopped us at that scale: with 4GB in RAM, the matrix to describe the network connectivity, if not optimized, is already 10^10 bits on a 10^5 nodes network.

Reviewer keeps going: instead of using home-brewn [sic] scale free graphs, it would be better to use graph generators that are known to produce topologically similar graphs to the Internet; consider e.g. BRITE or INET. This refers to the fact we use, quotes the reviewer, Barabasi-Albert scale free networks, which are known to be somewhat different from the real graphs met in the internet.

So this reviewer wants us to use LARGE graph, generated from BRITE, but NOT Barabasi-Albert. Looking at the BRITE doc, however, one can see that BRITE defines LARGE as 10^5 (see Item 4 on the wish list (link to pdf)); and that BRITE uses...(click to enlarge)

... Barabasi-Albert to create Internet-like topologies.

Another reviewer mentions: The main problem with the paper is that there is no comparison with recent research in the area of designing routing protocols with small stretch and small per-node state. Specifically, there are at least two lines of recent research that are worth comparing against:
- Practical implementation of Compact Routing [the reviewer mentions S4]
- Routing based on distributed Delaunay Triangulation (which guarantees that greedy routing always works) [and the reviewer cites two more papers on Delaunay triangulation].

But, we can't compare with compact routing, because, as our paper states: "The idea of trading-off path stretch for routing table size is the core component of the work on compact routing. Thorup et al. show that it is possible to guarantee a path stretch of three with a route table size growing as O(\sqrt{n \log(n)})." This is much higher than our polylogarithmic scheme. S4 selects K beacons which have to maintain path in between them, and each node has an associated cluster, where the cluster size is about sqrt(n) when the number of beacon is sqrt(n). We consider log(n), how can we compare?

We can't compare with routing on Delaunay triangulations (DT), because we consider graphs, not Euclidean spaces. Plus, DT requires planar connectivity, which we don't assume. We start from a graph G=(E,V) and we try to find a representation in a Euclidean space. To get a DT, you need to be in the Euclidean space first!

So the "main problem" with our paper is that we don't consider work that, while excellent, is not relevant to our results (granted, the reviewer gave us 'accept if room' so the main problem is not a deal breaker for him either). Still, this drives home the point of hypercriticality.

Wednesday, June 16, 2010

Reliable Embeddings.Designing and Embedding Reliable Virtual Infrastructures

First time we put something on ArXiv:

Designing and Embedding Reliable Virtual Infrastructures, Wai-Leong Yeow, Cédric Westphal, Ulaş C. Kozat, http://arxiv.org/abs/1005.5367, Submitted May 28th.

This is a technical report version of a paper that was accepted into the ACM SIGCOMM VISA workshop, which will take place in September 2010 in New Dehli, India. I am co-chair of the VISA workshop*, on the topic of Infrastructure Virtualization. This is a very exciting, very current area of research, so hopefully you can attend the workshop!

* I hasten to add that any paper that was co-authored by a VISA chair has been handled out-of-band by the other chairs.

Tuesday, May 18, 2010

Maranello: Practical Partial Packet Recovery for 802.11

Maranello: Practical Partial Packet Recovery for 802.11, Bo Han and Aaron Schulman, Francesco Gringoli, Neil Spring and Bobby Bhattacharjee, Lorenzo Nava, Lusheng Ji, Seungjoon Lee, and Robert Miller, in Proc. NSDI'10, San Jose, CA, April 2010.

This is another paper which has built a system that works. There is no super deep theory behind it, but a lot of effort to implement something in a real environment. This is basically a partial packet recovery mechanism which is compatible with 802.11 and works as follows:
- the sender sends a packet to the receiver. Typically, if the packet is received correctly, an Ack is sent; if not, no Ack is sent, and after a time-out, the sender attempts a retransmission. In maranello, if the paper is not received correctly, the receiver still transmit an Ack, except it is a Nack with a checksum of blocks within the packets. Based on the checksum, the sender can retransmit only the corrupted blocks and not the whole packet.

This speeds up the retransmission and reduces the congestion on the air interface. As a consequence, throughput is roughly 50% higher than 802.11 and delay is significantly reduced.

The system is evaluated alongside traditional 802.11, which shows that both system can coexist peacefully.

This was pointed to me as an excellent paper by a Stanford faculty, and it did not disappoint indeed. I have no idea where the name Maranello comes from. I mean, I know of the city in Italy, but not how it relates to partial packet recovery.

Reverse traceroute

Reverse traceroute, Ethan Katz-Bassett, Harsha V. Madhyastha, Vijay Kumar Adhikari, Colin Scott, Justine Sherry, Peter van Wesep, Thomas Anderson, and Arvind Krishnamurthy, in Proc. NSDI'10, San Jose, CA, April 2010.

I did not attend the conference, it conflicted with other meetings, but have been going through the proceedings. I'm starting with this paper since it got the best paper award, and I might post about a few more.

It's a cute paper and useful. They build a tool to compute the reverse route from the destination back to the source (as opposed to traceroute which identifies the route from the source to the destination). It's a full of good little tricks, it's nicely put together, and it's practical and useful.

The main idea is to use two IP options and source IP spoofing. The IP options are the route recording, which logs up to 9 intermediate relays in the probe packet; and the timestamp option, which is used to confirm whether a specific router is on the path of the probe.

The goal is to find the path D->S, where S is the source and D the destination. Traceroute gives S->D and they want to find the reverse. Using a set of vantage points, they pick one, say V, which is less than 8 hops away from the source. They spoof a packet from V as being from S, with the IP-RR option and send it to D. D returns it to S (due to the spoof) and logs the first hop on the D->S path due to the route recording and the fact that there was room for at least one hop in the header from the choice of V.

If the routers don't support the IP-RR, or if there is no V at less than 8 hops from D, then they use the timestamp option to validate whether a set of guesses is on the path or not.

This is not perfect, but neither is traceroute anyway, and the results offered by the mechanisms are much more useful and accurate than assuming symmetry of the path.

Wednesday, April 21, 2010

SmartRE: An Architecture for Coordinated Network-wide Redundancy Elimination

SmartRE: An Architecture for Coordinated Network-wide Redundancy Elimination, Ashok Anand, Vyas Sekar, Aditya Akella, in Proc. of ACM SIGCOMM'09, August 2009, Barcelona, Spain.

Traffic in network exhibits some redundancy and this paper presents an architecture which attempts to encode packets in order to remove these redundancies. It take into account some specific constraints (encoding and decoding rates) and defines where to cache the packets and how to ensure that an encoded packet based on the information at a router will find a decoder further down the path.

The proposed approach approximates relatively well an omniscient encoder and removes most of the redundancy. It also demonstrates that handling network-wide redundancy removal is preferable to hop-by-hop.

While the approach is promising, it assumes that there is redundancy in the traffic. There are some WAN optimization products and thus a market, but some of the evaluation assumes a 10% redundancy. As such, setting up a complex caching and encoding/decoding infrastructure for such a gain does not seem worth it. I would like to see better characterized the true redundancy gain, before dwelling on how to remove these redundancies.

I attended Sigcomm in BCN, and I don't recall seeing this talk, even though I most likely have. It was in a morning session, and I don't think I missed any morning session. Funny how remembering the talk would have saved me from re-discovering these results.

Thursday, January 28, 2010

Mobile Call Graphs: Beyond Power-Law and Lognormal Distributions

Mobile Call Graphs: Beyond Power-Law and Lognormal Distributions, Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, Christos Faloutsos, Jure Leskovec, Proc. of ACM KDD'08, August 2008, Las Vegas, NV.

This paper analyzes mobile phone calls data from the Sprint network and tries to create a model of the underlying graphs associated with different metrics: Partners (number of callers and callees associated with each user); calls (total number of calls made or received by a given user); duration (total duration of calls for each user). For each of these, the distribution has a power law tail, but a power law distribution does not fit well the rest of the distribution. The lognormal distribution fits better, except for the tail where it is off. So the paper considers the Double Pareto Log Normal (DPLN) distribution and explains how to find the best fit for that distribution.

This distribution would be useful for creating synthetic models to evaluate different social network parameters.

Understanding the Spreading Patterns of Mobile Phone Viruses

Understanding the Spreading Patterns of Mobile Phone Viruses, Pu Wang, Marta Gonzalez, Cesar Hidalgo, Albert-Laszlo Barabasi, in Science Express, April 2009, DOI: 10.1126/science.1167053

This paper studies data from cell phone usage and mobility and extrapolate a mobility model and a connectivity graph from proximity and from call connections. In the former graph, two points are connected if they are within a given distance of each other; in the latter, they connect if a phone call is made between two points (users).

From the graph, they can derive some basic properties. Also, if an Operating System is assigned to nodes on a graph, then the propagation of viruses targeting that OS can be computed. There is a critical threshold for the market share of a given OS (that is the fraction of cell phones using this OS in the graph) so that the virus becomes epidemic.

The topology of the graph is interesting (similar to that of Mobile Call Graphs by Seshadri et al) and the characterization could be useful in social network studies. The mathematical tools to compute the size of the cluster is also of interest. It's a cute paper. I would be cautious with the results for bluetooth virus propagation, which is extrapolated from the graph data, but is not really validated. I guess the extrapolation cannot be that off due to the multiple data points in between the extrapolated trajectories.

Tuesday, January 12, 2010

Fast, Memory Efficient Flow Rate Estimation Using Runs

Fast, Memory Efficient Flow Rate Estimation Using Runs, Fang Hao, Murali Kodialam, T.V. Lakshman, Shantidev Mohanty, in IEEE/ACM Transactionson Networking, Vol. 15, No. 6, December 2007.

The paper suggests to estimate the rate of flows by basically looking indirectly at the flow rate. The quantity is the number of 2-runs, that is the number of time that consecutive arrivals belong to the same flow. The occurrence of a 2-run will depend on the rate of the flow, and from observing the flows during a long enough time, one can infer the fraction of traffic belong to a flow by counting the two-runs.

This is similar in concept to the Flageolet & Martin paper, which observes some metric which is derived from the underlying process. F&M toss a geometric random variable and keep the max every time a flow works through the system; while this paper considers another derivative function of the rate. I believe I have seen a version of F&M used for the purpose of this paper, which seems natural.

The idea is cute, and leads to some straightforward analysis of the accuracy of the method. I was curious about how the assumptions of the model (independent Poisson processes for each flow, stationarity, etc.) would translate in a real link, but for a link with enough traffic, the method seems to work quite well.

Sampling Biases in Network Path Measurements and What to Do About It

Sampling Biases in Network Path Measurements and What to Do About It, Srikanth Kandula, Ratul Mahajan, in Proc. ACM IMC'09, November 2009, Chicago, IL.

The paper considers the difference in the results obtained by sampled network path measurements with the actual performance. It shows that source sampling is a source of bias, and that picking only a few sources to gather data, and then extrapolating from this data leads to poor results.

The paper then presents three methods to improve the interpolation from the data. One is to consider only the tail of the path, to remove the source bias. This proves relatively ineffective compared to the other two methods. The next method is to embed the coordinates in a low dimensional space and to compute the missing paths in the coordinate space. This provides the best improvement for latency. Finally, the last method consist in clustering the nodes into an organization which somewhat mimics the typical network architecture of an access network near the source, a core network in the middle, and then another access network at the destination. This proves the best method for capacity.

The paper is interesting, mostly empirical, looking at the methods inspired from removing sampling bias in other context and seeing if they work in this context.

The embedding method is interesting to me, owing to our work on embedding graphs for routing. They use vivaldi, for which code exists, but has some limitations.

Cedric Westphal's Networking Notes