After being an active member in django-syncr I have gained some experience with API calls and parsing data from XML/JSON. It is a lot of fun working in such a project. Recently I have written Readernaut and Brightkite syncr apps which pulls user’s data and writes into a django model.

Setting up a django model depends on the data retrieved from the API call. First read the documentation of the API call and know what data is being returned and what is the name of the tag. You can experiment with a curl call or a basic http call and see how the data is returned. I use XML over JSON as there are many libraries to parse XML than JSON. After deciding on what data is to be be written into a model write the model.

The next step is to write a python function which makes an API call, parse the data and insert it into the django model. I will explain this with an example: Readernaut-API. Readernaut has a very basic API with no authentication. So first you need to read the documentation and identify the ENDPOINTS for each API call.

In the view you formulate the url to which you need to make the API call and then parse the returned data. There is a urllib module which makes this very easy.

url = "http://readernaut.com/api/v1/yashh/books/"
data = urllib.urlopen(url).read()

Instead of hardcoding the username you can pass it as argument and make it reusable. Here the urllib makes a request and puts the returned data in a variable. data right now contains XML which needs to be parsed. There are quite many ways to parse XML. The most basic approach would be using python’s inbuilt XML parser. It’s quite easy to use it but the code would be quite verbose. You can use

  • BeautifulSoup
  • Feedparser

Both libraries have some good support for parsing XML on the fly. They are fast and efficient provided the returned is XML is not huge (not few Megs). However I am apparently too lazy to use them. I started using xml2dict which is accepts XML as input and returns a python dictionary.

from xml2dict import XML2Dict
x = XML2Dict()
dict = x.fromstring(data)

dict contains the data parsed from XML. Now it is as simple as write a for loop and iterate over the data and check if it exists in the database already, if not insert the data.

for book in r1["reader_books"]["reader_book"]:
    //logic to extract data from python dict
    //and use get_or_create function over the data.

Another important thing is to check if the object already exists in our database. There are a couple of ways you can do it. First you need to identify an entity in your model to perform a check on. Suppose in the readernaut example you grab a list of books of a user and try to insert them into the database. Here the unique entity would be the book id. Every book in readernaut has a unique book id which makes it easy for to perform a check if the book exists. Similarly brightkite provides a alpha-numeric id for every checkin. However there might be situations where you might not be able to find a unique entity. In that case you can rely on the datetime attribute of each record. This is reliable to some extent as far as there are no two objects with same datetime(kind of rare but possible).

To perform a check if an objects exists in the database you can use django’s built in get_or_create function. To use this method put all the date of a record into a dictionary defaults except for the unique entity. Then this piece of code will check if the record exists in the database. If it does n’t exist it will create one.

obj, created = Model.objects.get_or_create(
unique_entity = record. unique_entity, defaults = default_dict)

Another method would be to put put the get method in a try clause and the creation code in the except clause.

try:
   book = Book.objects.get(book_id = book["reader_book_id"]["value"])
except DoesNotExist:
  new_book = Book.objects.create(values)
  new_book.save()

Here the try clause first gets executed. If there is no error it goes on. If there is an exception “DoesNotExist” then it executes the Except clause there by creating the object.

Another issue we often come across is parsing a datetime UNIX timestamp into a python datetime object. We initially used the default fromtimestamp method but later we adopted a better approach.

import time
from datetime import datetime
pub_time = time.strptime(created_at_from_API
  , "%a %b %d %H:%M:%S +0000 %Y")
pub_time = datetime.fromtimestamp(time.mktime(pub_time))
//pub_time is the datetime object we want

Note: Sometimes the timestamp returned by API may vary according to the daylight savings. With all those changes we might need to alter +0000 everytime. Instead we made use of dateutil library which parses a string and extracts datetime object from it.

import dateutil.parser, dateutil.tz
pub_time = dateutil.parser.parse(created_at_from_API)
if pub_time.tzinfo:
  pub_time = pub_time.astimezone(dateutil.tz.
   tzlocal()).replace(tzinfo=None)

With this dateutil you can also convert the datetime to your local time. All the zone conversion can be handled with ease.

This is something we do in almost all the apps in django-syncr. Recently I wrote a python-github which is a wrapper over the github API. Feel free to grab and use it.