technology

Building an Experiment Framework

I have been working on a experiment framework recently which allows us to change experiment values on the fly. This has been one of the things we needed for a while. We would push a feature to 1% of our users and something goes wrong and we want to turn off the experiment as soon as possible. Instead of setting some variable in a file to False which requires a code deploy we chose to put our experiment switches in Zookeeper.

We chose ZK since its a great centralized service for maintaining configuration and provides distributed synchronization. Since all our python code is gevented we needed a pure python client so that we can monkeypatch the socket. Unfortunately the default python-zk binding is written in pure C. There are a bunch of memory leaks as well. So we looked around for a pure-python client and we found one, Kazoo. We wrote a bunch of wrappers on top to kazoo layer to handle the callback and other required stuff.

    from kazoo_utils import DataWatcher
    watcher = DataWatcher("/experiments/show_new_homepage")
    value, znode = watcher.get_data()
    # Based on the value we choose to proceed with experiment or not.
    watcher.watch(should_show_new_homepage)
    # should_show_new_homepage function will determine if we should show new
    # home page or not.

In this scenario we just designed a new home page and wanted to show it as an experiment to 1% of our user base. We write to ZK node a value of 1.

    from kazoo.client import KazooClient
    zk = KazooClient(hosts='127.0.0.1:2181')
    zk.ensure_path("/experiments/show_new_homepage")
    zk.set("/experiments/show_new_homepage", "1")
    zk.get("/experiments/show_new_homepage")
    # 1

Whereever we show logic to show the homepage we would check to see if the current user is selected for the experiment.

    # memoize this function so that we dont have to read the value everytime
    @memoize
    def selected_for_experiment(user_id, experiment):
        value, stat = zk.get(experiment)
        if user_id % 100 <= int(value):
            return True
        else:
            return False

We simply mod the logged in user’s id and check if its less than the experiment value. Once we realize that experiment has a bug or not performing well we will simply turn off the experiment and push a bug fix at our convenience. One cool thing about our DataWatcher is it launches a greenlet and registers a WATCH on the node. If at all the ZK read fails the greenlet will keep retrying to connect to ZK. All this stuff happens in a greenlet while our python server is serving requests. Cool ain’t it?

I think redis can also do the same stuff but I dont think its the right tool. You can store an experiment value in a key and cache it in python process. You can spawn a greenlet to SUBSCRIBE to a redis channel and update the cached experiment value in the python process. This might work fine if you think building and deploying a ZK cluster is huge work.

Previous Next