Searching through large databases with bloom filter | Oxford Protein Informatics Group

Searching through large databases is often a linear time problem. Here I compare the performance of applying a bloom filter and using the regular std::find command in C++:Codes are from: https://codereview.stackexchange.com/questions/179135/bloom-filter-implementation-in-c

Simple example
Strings in the database:
Hello World!
sweet potato
C++
alpha
beta
gamma
Number of strings in the database: 6
Number of strings in the test set: 10
Strings to look for:
Bloom filter takes (ms):51
Regular search takes takes (ms):39

Then I tested it on Ligity-like hashes in the form of AAA000, where AAA were the three vertex types and 000 were the three edge length indices. I generate a list of random hashes in this configuration and varied the number of prefixes – where the first n characters are the same.

Applying it to the Ligity-like hashes reinforces two ideas: a) having a good hashing function that adequately separate out items are important b) bloom filter requires the prior overhead in encoding the hashes, but this overhead helps save a significant amount of time when a large number of searches were carried out.

Author

Wing Ki (Catherine) Wong

View all posts