Wednesday, October 21, 2009

Weighted Bloom Filter

Weighted Bloom Filter, Jehoshua Bruck, Jie Gao, Anxiao Jiang, in Proc. IEEE International Symposium on Information Theory (ISIT'06), July 2006.

This paper makes an interesting observation: a Bloom filter functions as follows. For each element, if the element belongs to a specific set, then the element is hashed into a bit vector of length m. It is hashed by computing k hash functions, each pointing to a bit in the bit vector which is then flipped to 1 (if it is 0).

The observation of the paper is that, if you know a priori the frequency of the requests for each elements, then you should not allocate the same number of bits to each element in the bit vector, but vary the number of hash functions k as a function of the popularity of the request.

The paper goes on to produce a function which gives the number of hash vectors for each element depending on the popularity. The counter-argument to this would be naturally that it is only shifting the problem around: the membership query structure is optimized, but at the cost of creating another structure to assign the popularity of the item.

Actually, paper answers this objection: it shows that a simple bucketing of "hot" items vs "cold" items lead to improvement in the performance of the Weighted Bloom Filter vs the regular Bloom Filter. So there is no need for a complicated structure to define the popularity.

I had a chuckle thinking how to verify if an item is hot or cold: query the membership in the "hot" set using a Bloom filter!

I have met one of the authors (Andrew) when I visited CalTech back in 2005, where he gave just for me a dry run presentation of his MobiHoc paper on the NP complexity of locating a user through angle triangulation.

1 comment:

Hannu said...

Cedric, interesting to note that we are both looking into the applicability of Bloom filters!

Hannu F