Original post

Sure. As you pointed out, when we first conceived of Bleve, one of the things that was new and different that we were bringing to the table was this idea that we had this notion of an indexing scheme which would take all of the index and be able to represent it as keys and values. Now, if we could represent the entire index as just keys and values, what it meant was any key-value store – and at the time, 2014, was like a hotbed of key-value stores; there’s LevelDB, RocksDB, all this excitement going on about key-value stores… So we thought “This is great. Even if we choose wrong now, we could just plug in a faster key-value store later and that will solve all of our problems.” That was the initial idea that we conceived. And to be fair, it did allow a lot of flexibility early on in the project.

A good example was at the time BoltDB was one of the only pure key-value stores. And pure Go was, again, a benefit to us, because we’d already been burned by cgo and some other projects. So the idea that there was this pure Go – you could use the go get command without having to set up a bunch of other C libraries first, and it would work. So the fact that we had support for BoltDB was huge early on.

But as I alluded to, it all revolved around the fact that the index could be distilled down to sets of keys and values. And what learned over time was it didn’t matter which key-value store we used, it was that encoding itself, that representation of all the index as keys and values – that in and of itself was not a particularly good encoding, either for storage size in terms of writing the index, but also in terms of query time, being able to answer queries quickly.

So as I said, we learned, basically – because Couchbase ultimately wrote another key-value store called Moss; I spoke about Moss at GopherCon… Moss is great for everything that it is, but it was still just another faster key-value store that ultimately didn’t solve that problem. So coming out of Moss in the 2017-2018 timeframe – as you said, we started our new indexing scheme called Scorch. The insight was basically – the project had grown up. In the beginning, people loved the flexibility “I can just pick and choose whatever key-value store I want”, but what we’ve found later was users didn’t care what key-value store. They wanted it to work; it should do everything it says on the box, and it should be as fast as you can make it go, and it should be as small as you can make it go. People want us to own the implementation of the bytes on disk; they don’t wanna worry about that, they don’t wanna have to upgrade to a new version of LevelDB in the future to fix some issue… They want us to own those problems.

[00:40:17.02] So the approach basically involved – okay, let’s set this old index scheme aside; we’re gonna have a new index scheme, which is not built on top of a key-value store, it’s gonna just write its own representation of the bytes directly to disk. Yeah, we have to own that piece now, and that was something we were comfortable with doing… And we had to sort of engineer that.

You mentioned that talk I gave at GopherCon UK… I really enjoyed giving that talk, because as you said, I tried to not just sugarcoat it and show you the finished product and say “Look, we went off to rewrite this thing, and here it is. It’s awesome.” In a nutshell, that’s how a lot of tech talks are… And I felt that just wasn’t honest. It was hard getting to where we got, and I thought the more interesting story was sort of going through all those things. Again, if anybody who’s interested, it is a talk worth going back. I hope that holds up over time, and people still enjoy it.

So that did lead us to bringing in Scorch. At the time I gave that talk, Scorch was still pretty new… But Scorch is production-ready today. It’s still not the default with Bleve, for reasons that are, again, disappointing… Bleve has a lot of early Go projects. It got popular before there was good versioning, and even vendoring. It predated even vendoring.

The trouble we have now is there’s a lot of people that have adopted it that are using the old index scheme, so we need to be mindful of them, we need to have an upgrade path that doesn’t break things… So again, Go modules is like a hot topic for Bleve right now, and that’s one of the things that at Bluge Labs I hope to spend a lot of time working on for Bleve. Anyway, that’s where we are today. Again, we all recommend people using Bleve to use the Scorch index scheme, even though it’s not the default yet as of today.