Wednesday, October 21, 2009

Space-Code Bloom Filter for Efficient Per-Flow Traffic Measurement

Space-Code Bloom Filter for Efficient Per-Flow Traffic Measurement Abhishek Kumar, Jun (Jim) Xu, Jia Wang, Oliver Spatschek, Li (Erran) Li, in Proc. of the 3rd ACM SIGCOMM conference on Internet Measurement 2003.

The paper uses a bloom filters for counting traffic. It does so by having l Bloom filters (BF), and each time a flow comes in, it chooses one of the l BFs at random. The idea is that, by counting how many different BFs are hit by the request, one can assess an estimate of the number of packets. If the BFs are chosen with equal probability, then after l log(l) packets, all of the l would return a positive reply to a query membership for the packet. Of course, that would restrict the counting to low values, so they don't select the BF uniformly, but at random with a geometric distribution.

If that reminds you of the paper below, it's basically the same idea (the authors were most likely unaware of that paper, since they don't cite it, and don't make use of the theory developed in the Flajolet/Martin paper).

The theory should be a bit more involved than Flajolet/Martin since the BF introduce some noise (false positive) that basically would tend to overestimate a bit the packet count if the F/M analysis was used straight out of the box. So this paper derives another way of inverting the counting from the BFs which, in the simulations, does not skew either over/under but seems to have its error centered around the original value.

Anyway, it seems that this counting using geometric random variables is quite popular a structure.

No comments: