Exporting Django Querysets to CSV

Exporting Django Querysets to CSV

At Azavea, we have numerous client projects that must provide exportable data in CSV format. The reasons for this range from simple data exchange to complex export-modify-import workflows. In order to make this process easier for django projects, we made a simple utility, django-queryset-csv, for exporting django querysets directly to HTTP responses with CSVs attached.

So you have something like this:

installation:

 pip install django-queryset-csv

models.py:

from django.db import models

class Activity(models.Model):
    name = models.CharField(max_length=50, verbose_name="Name of Activity")

class Person(models.Model):
    name = models.CharField(max_length=50, verbose_name="Person's name")
    address = models.CharField(max_length=255)
    info = models.TextField(verbose_name="Info on Person")
    hobby = models.ForeignKey(Activity)

views.py:

import djqscsv
from models import Person

def get_csv(request):
    qs = Person.objects.all()
    return djqscsv.render_to_csv_response(qs)

Pain Points

Why bother? Can’t you write a for-loop to export a CSV in a dozen or so lines of code? It turns out there are a few pain points we run into over and over again with CSV exports:

  • We’re currently using python 2.7 for our django projects, where the provided CSV library has poor support for unicode characters. This has to be addressed somehow, usually by utf-8 encoding python strings before writing them to the CSV.
  • Adding a BOM character to CSVs with utf8 encoding is required for them to open properly in Microsoft Excel.

These are delicate behaviors that we prefer to have handled by a dedicated library with adequate unit test coverage.

Utilities

In addition, we found ourselves repeatedly wanting the same helper utilities, and to have them work together consistently and predictably.

  • The ability to generate a filename automatically based on the underlying data.
    filename = djqscsv.generate_filename(person_qs)
    # filename => 'person_export.csv'
  • The ability to generate timestamps for each export
    filename = djqscsv.generate_filename(person_qs, append_datestamp=True)
    # filename => 'person_export_20140224.csv'
  • The ability to generate column headers from the underlying data, sane defaults, custom overrides, or some flexible combination of the three.
    render_to_csv_response(person_qs)
    # column headers will be 'ID', "Person's name",
    # 'address', 'Info on person', 'hobby_id'
    # these are derived from the django model definition of Person.
    render_to_csv_response(person_qs, field_header_map={'id': 'The Id'})
    # column headers will be 'The Id', "Person's name",
    # 'address', 'Info on person', 'hobby_id'
    render_to_csv_response(person_qs, use_verbose_names=False)
    # column headers will be 'id', 'name', 'address', 'info', 'hobby_id'

    In this case, field_header_map takes precedence if provided, followed by verbose_name if specified and not disabled, followed finally by the underlying model field names.

Advanced Usage

Moving this library into production, we quickly discovered some advanced features that we needed the library to support.

Foreign Keys

The most obvious is foreign key fields. This is supported using the .values() method of a queryset, which is able to walk relationships in the same fashion as other ORM directives, using double-underscores. Note that you can’t make use of verbose names in this case, but you can provide custom overrides in the field_header_map argument:

person_qs = Person.objects.values('id', 'name', 'hobby__name')
render_to_csv_response(person_qs, field_header_map={'hobby__name': 'Name of Activity'})
# column headers will be 'id', 'name', and 'Name of Activity'

Asynchronous Export

Sometimes, you can’t return a CSV response in the same thread. We ran into this problem because sometimes CSV exports, or the queries that produce them, take too long for the request/response cycle, and tying down a web worker for that period is unnacceptable. For this case, django-queryset-csv provides a lower-level function for writing CSV data to a file-like object of any kind:

csv_file = TemporaryFile()
djqscsv.write_csv(person_qs, csv_file)

Final Thoughts

We’re using django-queryset-csv in production, but the project remains in beta. We hope it’ll make CSV exports a little less painful for you too. Please report all bugs via github.