Thursday, October 22, 2009

IEEE NOMS rebuttal.

I had no clue that IEEE NOMS would have a rebuttal phase. We only submitted to the conference mostly because it was in Japan (if accepted, we'll combine with a trip to the mothership) and the deadline coincided with our post-doc taking a paternity leave, so was not aware of the process of the conference.

I found the rebuttal phase from the reviewer point of view quite useless during the Globecom PC. From the author point of view, the sample is much smaller; it is quite strange that the reviews are provided without the grade, so I'm unable to say if the reviewers liked the paper or not. Obviously they are in between, they don't think it's the best or worst thing ever, but it's hard to decipher if it's leaning one way or the other. I guess it's mostly due to the format of the question, which emphasizes both positive and negative: strength, weakness and detailed (editorial) comments.

I'm not sure how to reply to the reviews, though, not knowing if the job is to not screw it up, or if it's pointless to try to change someone's mind (can a reviewer of an unconvincing paper be turned around in 3,000 characters?)

An Analysis of Packet Sampling in the Frequency Domain

An Analysis of Packet Sampling in the Frequency Domain, Luigi Alfredo Grieco, Chadi Barakat, in Proc. of ACM SIGCOM IMC'09, November 2009, Chicago.

I have been looking with great interest at the technical program for IMC'09 (see here and a few more papers to come). I have to say the stuff from other communities look quite exciting. Grass greener and all.

Anyway, this paper is a very cute (and simple) thing: it considers the Fourier transform of the packet size from a source-destination pair, and assesses a rule for the sampling rate in order to avoid aliasing effects when reconstructing the signal from the Fourier transform.

This is somewhat preliminary and applications to compressing the signal or to use it for anomaly detection are not presented (but hinted at). I would have thought the community of traffic measurement would have done this kind of transforms on the signal a long time ago.

On the Interplay Between Routing and Signal Representation for Compressive Sensing in Wireless Sensor Networks

On the Interplay Between Routing and Signal Representation for Compressive Sensing in Wireless Sensor Networks, Giorgio Quer, Riccardo Masiero, Danilele Munaretto, Michele Rossi, Joerg Widmer, Michele Zorzi.

This paper addresses the same problem as the paper right below. Actually, it's interesting that both papers cite versions each other, which is quite rare.

This paper is a bit more optimistic in the use of compressed sensing in conjunction with routing and data aggregation in sensor networks, but still raises some issues into which random projection would be best for real data (the best results are obtained for synthetic data). It seems there is a trial-and-error phase to see what compressed sensing bring to this problem, with no definite answer thus far.

Disclosure: one of the authors here, Joerg, is my colleague in Munich.

Spatially-Localized Compressed Sensing and Routing in Multi-Hop Sensor Networks

Spatially-Localized Compressed Sensing and Routing in Multi-Hop Sensor Networks, Sungwon Lee, Sundeep Pattem, Maheswaran Sathiamoorthy, Bhaskar Krishnamachari, Antonio Ortega in Proc. of the 3rd International Conference on GeoSensor Networks, 2009.

This paper considers the use of compressed sensing with data aggregation in sensor networks. The paper proposes a clustering scheme to compressed the data along the cluster, and reconstruct the data later (either independently within a cluster, or jointly, assuming some spatial correlation between the clusters). The results are not very promising for the approach, once energy costs (for transmission) are taken into account. But there are not too many references using compressed sensing in concert with routing applications.

Wednesday, October 21, 2009

Space-Code Bloom Filter for Efficient Per-Flow Traffic Measurement

Space-Code Bloom Filter for Efficient Per-Flow Traffic Measurement Abhishek Kumar, Jun (Jim) Xu, Jia Wang, Oliver Spatschek, Li (Erran) Li, in Proc. of the 3rd ACM SIGCOMM conference on Internet Measurement 2003.

The paper uses a bloom filters for counting traffic. It does so by having l Bloom filters (BF), and each time a flow comes in, it chooses one of the l BFs at random. The idea is that, by counting how many different BFs are hit by the request, one can assess an estimate of the number of packets. If the BFs are chosen with equal probability, then after l log(l) packets, all of the l would return a positive reply to a query membership for the packet. Of course, that would restrict the counting to low values, so they don't select the BF uniformly, but at random with a geometric distribution.

If that reminds you of the paper below, it's basically the same idea (the authors were most likely unaware of that paper, since they don't cite it, and don't make use of the theory developed in the Flajolet/Martin paper).

The theory should be a bit more involved than Flajolet/Martin since the BF introduce some noise (false positive) that basically would tend to overestimate a bit the packet count if the F/M analysis was used straight out of the box. So this paper derives another way of inverting the counting from the BFs which, in the simulations, does not skew either over/under but seems to have its error centered around the original value.

Anyway, it seems that this counting using geometric random variables is quite popular a structure.

RouteBricks: Exploiting Parallelism to Scale Software Routers

RouteBricks: Exploiting Parallelism to Scale Software Routers, Mihai Doberscu, Norbert Egi, Katerina Argyrako, Byung-Gon Chun, Kevin Fall, Gianluca Iannaccone, Allan Knies, Maziar Manesh, Sylvia Ratnasamy

This paper got Best Paper award at SOSP, and I got pointed to it by a blog post here. Actually, I believe that Katerina and/or Sylvia discussed the concept of RouteBricks during some lunch during SIGCOMM in Barcelona, so I kinda had heard of it. Congrats for the award!

Paper is deserving of it, for sure. It presents an architecture to create a router using commodity hardware (well, sophisticated commodity hardware, prototype Nehalem servers, you can't build this in your garage yet) to achieve the performance of a Cisco router at a fraction of the cost. SOSP has the slides and the recorded audio of the presentation on the website, which is pretty neat.

Most of the details are in how to avoid the bottlenecks in the architecture to spread the traffic over several servers rather than into a choking point. They use valiant load balancing, and a multi-hop butterfly topology to connect them. They also need some optimization in the processing to get to faster rates (multi-queue NIC, batch processing).

My thoughts reading this was: what a good idea, and what a nicely written paper! Also, what kind of reliability do you get from this platform (I would assume Cisco routers have some up-time requirements, fault tolerance, hot swappable line cards, whatever, to keep the ISP happy); I was thinking about reliability from the cellular network operator point of view (since I was thinking if this router design could translate into, say, a commodity base station. Of course, on 2nd thought, what un-commoditize the BS is the air interface, not the switching part of it). I also would like to know what Cisco margins are, since they can only compare the cost of their router to a list price.

Weighted Bloom Filter

Weighted Bloom Filter, Jehoshua Bruck, Jie Gao, Anxiao Jiang, in Proc. IEEE International Symposium on Information Theory (ISIT'06), July 2006.

This paper makes an interesting observation: a Bloom filter functions as follows. For each element, if the element belongs to a specific set, then the element is hashed into a bit vector of length m. It is hashed by computing k hash functions, each pointing to a bit in the bit vector which is then flipped to 1 (if it is 0).

The observation of the paper is that, if you know a priori the frequency of the requests for each elements, then you should not allocate the same number of bits to each element in the bit vector, but vary the number of hash functions k as a function of the popularity of the request.

The paper goes on to produce a function which gives the number of hash vectors for each element depending on the popularity. The counter-argument to this would be naturally that it is only shifting the problem around: the membership query structure is optimized, but at the cost of creating another structure to assign the popularity of the item.

Actually, paper answers this objection: it shows that a simple bucketing of "hot" items vs "cold" items lead to improvement in the performance of the Weighted Bloom Filter vs the regular Bloom Filter. So there is no need for a complicated structure to define the popularity.

I had a chuckle thinking how to verify if an item is hot or cold: query the membership in the "hot" set using a Bloom filter!

I have met one of the authors (Andrew) when I visited CalTech back in 2005, where he gave just for me a dry run presentation of his MobiHoc paper on the NP complexity of locating a user through angle triangulation.

Probabilistic Counting Algorithms for Data Base Applications

Probabilistic Counting Algorithms for Data Base Applications Philippe Flajolet, Nigel Martin, Journal of Computer and System Sciences, Vol. 31, No. 2, October 1985.

This seems to be a classic paper. It was pointed as a major reference in a review request, so I went to read it. I'm glad I did, it is very cute. It basically describes a counting schemes for knowing the number of distinct items in a set. It is a probabilistic mechanism with a single pass and low storage requirement.

It works as follows: for every item in the set, it computes a hash. The hash maps to a random variable with a geometric distribution, such that it is equal 0 with probability 1/2, 1 wp 1/4, 2 wp 1/8, etc. up to 2^L-1. Consider a bit vector of length 2^L, and flip the i-th bit to 1 if the geometric random variable points to i (keep it to 0 otherwise). Since two items in the data base that are identical will hash to the same value, then the number of times that the geometric random variables is generated is a function of the number of distinct items.

For a large number, the values of 0, 1, ..etc will be one in the bit vector. The first non-1 bit is used to estimate the number of different items. Denote by R the left-most entry in the bit vector which has value 0. Then the paper shows that R is about log(\phi n), with \phi = 0.77351, with a relatively small variance, such that the number n can be estimated from R relatively accurately. The size of the counter is this bitvector L, which has size of O(log(n)).

The paper makes some of the proofs formal, but the analysis is relatively complex. But I like the idea of looking at some properties of a sequence of random variables to estimate the number of elements in the sequence. I believe I have seen somewhere a reference where the longest run of a particular value is used to estimate the number of items (for instance, for head/tail coin toss, the longest run of heads will grow as O(log(n)), if I recall well, which would lead to a log(log(n)) bit size for such scheme).

The interesting question is: what other proxy can be used to compress the count, and can be inverted relatively easily to yield a good estimate of the original data.

Architectures for the Future Networks and the Next Generation Internet: A Survey

Architectures for the Future Networks and the Next Generation Internet: A Survey, Subharthi Paul, Jianli Pan, Raj Jain, Washington University in St Louis Technical Report, 2009-69.

This survey attempts to cover possible architectures for the future network. It is very relevant to me, since I had at one point the intent of putting together a similar technical report. And after reading this one, I realize that I could still do it: while this paper covers a lot of ground, much of the papers I read on the topic don't overlap with this: it is such a wide topic to cover that only 60 pages and 200 references don't seem to do it justice!

The survey contains basically eight sections, one on the Internet 3.0 vision at WUSTL, and one each on security, content delivery, DTNs, management and control, service architectures, routing, and future Internet infrastructure design. I was interested mostly in the management & control, routing and future Internet design (GENI). I could use this as a reference in those fields, as it contains a lot of the must-read papers and it explains in great details the different control frameworks for GENI.

Where's that Phone?: Geolocating IP Addresses on 3G Networks

Where's that Phone?: Geolocating IP Addresses on 3G Networks, Mahesh Balakrishnan, Iqbal Mohomed, Venugopalan Ramasubramanian, in Proc. of SIGCOMM Internet Measurement Conference'09, November 2009.

I was looking at the program for IMC'09 and this paper caught my interest. It is a very basic experimental study which shows that:
  • for a given mobile device, the IP address assigned by the operator (in most cases, AT&T wireless) is not static, but changes a lot over time. This creates issues (or opportunities) for IP-filtered web sites.

  • for a given IP address, the location of the device cannot be predicted: the same IP address could be assigned at different times to device in wildly different locations. This has implications, in particular for the service which hopes to translate the IP address into location-based advertising.

  • the delay information is the best predictor for location: there is correlation in the application delay and the location of the device.

  • Those are quite interesting results to keep in mind, as the number of mobile devices connected to the Internet through 3G is increasing rapidly. So it is a really cute observation.

    Peer Review Does Not Work,... or Does It?

    In my inbox:

    From: ISPR/KGCM 2009 []
    Sent: Thursday, February 12, 2009 3:38 PM
    Subject: Invitation to a Symposium on Peer Reviewing

    Only 8% members of the Scientific Research Society agreed that "peer review works well as it is." (Chubin and Hackett, 1990; p.192).

    "A recent U.S. Supreme Court decision and an analysis of the peer review system substantiate complaints about this fundamental aspect of scientific research." (Horrobin, 2001).

    Horrobin concludes that peer review "is a non-validated charade whose processes generate results little better than does chance." (Horrobin, 2001). This has been statistically proven and reported by an increasing number of journal editors.

    Since a growing number of studies conclude that peer review is flawed and ineffective as it is being implemented, why not apply scientific and engineering research and methods to the peer review process?

    This is the purpose of the International Symposium on Peer Reviewing:... which will be held on July 10-13, 2009, in Orlando, Florida, USA.

    So peer review is ineffective and flawed and sucks. And the punch line?

    All Submitted papers will be reviewed using a double-blind (at least three reviewers), non-blind, and participative peer review.

    Time to submit a randomly generated paper!.