Recently I have been reading a series of very interesting posts by an Aphyr who tests distributed databases and then publishes their performance under network partition conditions, clock skews, dead nodes and other interesting failure conditions.
The tool he uses is called Jepsen which he wrote himself, and the repo itself includes the set of tests that he used on various databases. After reading all the posts about how various supposedly industry-standard (I guess this is the industry standard) databases which claim to be quite reliable fail under precisely the conditions they are not supposed to fail under, I decided to try the tool out myself and document the experience.
My first thought about this tool was that it would be a great application of Docker, and it turns out he had already written a Docker harness for Jepsen. This was great since you could do all sorts of things in the container environment of Docker and still simulate network partitions and all those nasty things without needing too much hardware.
I checked out Jepsen and ran the scripts the way it told me to.
1 2 3 4 5 6 7
And then it goes off and builds a bunch of containers from scratch.
Then at the very end (after maybe 15 mins of downloading and installing) it goes
1 2 3 4
Excellent I can run it now!
So I open another terminal and type it in and instantly I’m transported to where I should be to start working with Jepsen. Hooray!
1 2 3 4 5 6 7 8 9 10 11 12
Let’s try the next command. It looks to me like I should try running the test on etcd with 10 clients?
It then goes off and spews a large amount of output
1 2 3 4 5 6 7 8 9 10 11 12 13
It basically goes off and does a large number of reads and writes. At the very end, it comes back with this:
Wow that was quick. I did not expect to run into an etcd problem so fast. I did not even specify any failures.
Seems like I had some code to read. After fumbling around I discovered the source file.
1 2 3 4 5 6 7 8 9
Looks like the Jepsen nemesis was baked into the test script. Well thats fine for now.
The next step basically to me was to find out what was wrong. There was a large amount of data output so there must be a tool of some sort that can tell me what was wrong. (Or so I imagined)
After a bit of googling I came across another project called Knossos.
I didn’t know very much about Clojure so I just followed through the README anyway. So first I cloned knossos
I dug around to find the .edn file from my Jepsen run and discovered it in /jepsen/etcd/store/latest/history.edn
Then I ran the command in the README
Excellent. Yes. I knew that.
Scrolling down a little in the README I discovered it could generate the linearizability graph too. That was cool so I tried, by following the code in the README
1 2 3 4 5 6 7 8 9 10 11
I am then dropped into the Clojure prompt, where I try clumsily to guess how things are done:
1 2 3 4 5 6 7 8 9
Okay, I had no idea what that meant. Looks like there are still kinks.
In short I have a few thoughts:
- Using docker for this is way cool
- The analysis tool sucks
- This can be so much more general
Using docker is way cool
While Aphyr says that he uses his own LXC stuff to work with, I think its cool that he does provide a Docker interface. However he isn’t taking advantage of a few things docker can do:
- Prebuilt docker containers
- Networking in docker that can be used to simulate faults
- Skewing clocks in the container
- Adding and removing new containers, simulating scaling effects and changing configuration
It is probably because the tool is in its early stages of development, and really he knows how to use it the best, so he can do his stuff the best. But I can’t help but feel he himself would benefit from some improvement in this domain. Perhaps he does not use it right now, or maybe he does in another way. I’ve only sat with this code for the last 4 hours.
The analysis tool sucks
It needs to recover from weird errors nicely and maybe still present some amount of information. Unfortunately it doesn’t, so I will need to spend some time figuring out what I did wrong. Once I get it then I will be able to truly start analysing databases myself.
This can be so much more general
All in all it looks like a large amount of the code done here is still quite research quality and tests, while composable seemed like they had to be baked into the testing code still. On the surface it felt to me that skew clocks, network failures, network delays and failing environments were things that did not have to be part of the test and could be external.
None of this makes any of Jepsen less cool
In my opinion all one should really need to do is specify what to do in order to perform a read, write or CAS, and the testing tool should attempt a combination of the three to figure out how a database puts up with it.
Its also likely to be the case because different databases are different. Some are queues, some are data structures, some are RDBMS and some are key value stores. However they all should have the same idea that storing data stores, and automated analysis that shows if it is CP or AP and how would be incredible. This is something I might work on in a later post when I have time.
The world needs more of these. This run was an eye-opener to real simulations of faults, and also the idea that distributed databases are actually extremely hard to understand and build. It also has shown me how one person who makes an effort to do something advanced and difficult and documents it in a clear manner can so quickly change the landscape of a field.
Thanks to him I am sure I and a large number of people now understand the definitions and jargon behind distributed databases which have been hidden behind walls of text in white papers and difficult to read (read longwinded) articles, and this really opens the door to helping people build better and more resilient systems.
This probably will not be the last post on distributed databases on the blog as I, as are many others, interested in how things can scale safely. We have more or less solved it for webservers, but scaling databases, that is clearly a different beast.