PyMysql is a pure python mysql client, which is trying to be a drop in replacement for python-mysqldb. There is a install_as_MySQLdb function which when called makes PyMysql to be a drop in replacement for MySQLdb. I am excited about this project because its so easy to install and get started. Installing MySQLdb is never straight forward because its a C binding and looks for mysql_config binary.
Once you start to identify that your site is growing fast you will see a ton of bottlenecks at your sql layer. Its usually your ORM doing some whacky stuff, once you tweak it to do simple queries you are fine. But as your traffic grows and you have about 20-30 app servers running 20-30 app processes you are under pressure again. You are making a ton of connections to your mysql layer. Even you pool your connections you are handling quite a lot of connections. At this point you will need to fork your mysql into a different data layer. That is making it like a thrift API or a Rest Api like service to get data.
I personally think that writing SQL is easy and becomes very easy when your site grows. Because you cannot JOIN tables and expect it to work at scale. Here is a quick example of using PyMysql
def get_mysql_connection(shard_name): # really important to set autocommit to True or else mysql returns query cache conn = pymysql.connect(**settings.DATABASES[shard_name]) conn.autocommit(1) return conn # in your python controller conn = get_mysql_connection(settings.DATABASES["slave001"]) cursor = conn.cursor() sql = "DELETE FROM follow_user WHERE user_id1=%s AND user_id2=%s" data = cursor.execute(sql, (user_id1,user_id2)) cursor.connection.commit() cursor.close() conn.close() return str(data)
Just get a connection open a cursor run your query with PyMysql escaping your sql parameters (to avoid injections) and boom done. Here is a simple SQL insert.
sql = """INSERT INTO users (username, password) VALUES (%s, %s)""" data = cursor.execute(sql, (username,password,) cursor.connection.commit() user_id = cursor.connection.insert_id() cursor.close() return user_id
Now comes the most interesting part. Running your queries asynchronously. Once you import the PyMysql module you can use gevent / eventlet to monkey patch the socket module. You have the power to spawn multiple queries in parallel with greenlets.
def query(sql, database): conn = get_mysql_connection(settings.DATABASES[database]) cursor= conn.cursor() data = cursor.execute(sql) cursor.close() conn.close() return data # in ur controllers jobs = [gevent.spawn(query, (sql,database)) for sql, database in some_array] gevent.joinall(jobs, timeout=2) # will block untill both queries complete what_you_want = [job.value for job in jobs]
Recently Tarek Ziade has blogged how they used gevent+PyMysql+Gunicorn for Mozilla’s sync server. I want to cover about connection pooling in a separate blog post.