technology
Building an Experiment Framework
I have been working on a experiment framework recently which allows us to change experiment values on the fly. This has been one of the things we needed for a while. We would push a feature to 1% of our users and something goes wrong and we want to turn off the experiment as soon as possible. Instead of setting some variable in a file to False which requires a code deploy we chose to put our experiment switches in Zookeeper.
We chose ZK since its a great centralized service for maintaining configuration and provides distributed synchronization. Since all our python code is gevented we needed a pure python client so that we can monkeypatch the socket. Unfortunately the default python-zk binding is written in pure C. There are a bunch of memory leaks as well. So we looked around for a pure-python client and we found one, Kazoo. We wrote a bunch of wrappers on top to kazoo layer to handle the callback and other required stuff.
from kazoo_utils import DataWatcher
watcher = DataWatcher("/experiments/show_new_homepage")
value, znode = watcher.get_data()
# Based on the value we choose to proceed with experiment or not.
watcher.watch(should_show_new_homepage)
# should_show_new_homepage function will determine if we should show new
# home page or not.
In this scenario we just designed a new home page and wanted to show it as an experiment to 1% of our user base. We write to ZK node a value of 1.
from kazoo.client import KazooClient
zk = KazooClient(hosts='127.0.0.1:2181')
zk.ensure_path("/experiments/show_new_homepage")
zk.set("/experiments/show_new_homepage", "1")
zk.get("/experiments/show_new_homepage")
# 1
Whereever we show logic to show the homepage we would check to see if the current user is selected for the experiment.
# memoize this function so that we dont have to read the value everytime
@memoize
def selected_for_experiment(user_id, experiment):
value, stat = zk.get(experiment)
if user_id % 100 <= int(value):
return True
else:
return False
We simply mod the logged in user’s id and check if its less than the experiment value. Once we realize that experiment has a bug or not performing well we will simply turn off the experiment and push a bug fix at our convenience. One cool thing about our DataWatcher is it launches a greenlet and registers a WATCH on the node. If at all the ZK read fails the greenlet will keep retrying to connect to ZK. All this stuff happens in a greenlet while our python server is serving requests. Cool ain’t it?
I think redis can also do the same stuff but I dont think its the right tool. You can store an experiment value in a key and cache it in python process. You can spawn a greenlet to SUBSCRIBE to a redis channel and update the cached experiment value in the python process. This might work fine if you think building and deploying a ZK cluster is huge work.