BitFunnel Engineering Diary

We're open sourcing BitFunnel, a library for high performance indexing, retrieval, and ranking of documents. Today the code runs at massive scale inside of Bing's data centers, but our dream is to make the code available and relevant to anyone, anywhere who values search. As we release each module, we will document our key design decisions here on this blog.

Searching for Primes

What do prime numbers have to do with BitFunnel? It turns out we use them to test our matching engine. One of the challenges in bringing up a new search engine is figuring out how to test it. If you happen to have another working search engine that has ingested the same corpus, you’re in luck - just compare its output with that of your new search engine. Well that’s the theory, anyway. (read more...)

Sample Data

I’ve been trying to make it really easy to get started with BitFunnel, but we still have a ways to go. From the beginning we put a lot of effort into ensuring our code would build and run on Linux, OSX, and Windows, and we set up CI on Appveyor and Travis to help us quickly spot breaks on any OS. This has kept the build in good shape, but it seems that the system is still hard to configure and run, especially for those who don’t use it on a day-to-day basis. (read more...)

BitFunnel performance estimation

.slide {border: 1px solid;} Hi! I’m going to talk about two things today. First, I’m going to talk about one way to think about performance. That is, one way you can reason about performance. Second, I’m going to talk about search. We’re going to look at search as a case study because, when talking about perfomance, it’s often useful to have something concrete to reason about. (read more...)

A Small Query Language

A challenge in bringing BitFunnel to open source is providing functionality that was previously supplied by portions of Bing upstream of BitFunnel. BitFunnel was designed as a library that takes, as input, a tree of TermMatchNodes which represents a boolean expression combining terms and phrases using logical operators like and, or, and not. The Bing search pipeline does a ton of work on the query itself before presenting a TermMatchNode tree to BitFunnel. (read more...)