Importing data into a database

The core of gffutils is a sqlite3 database. If you’re familiar with SQL, you can check out the Schema.

If you have a well-formatted GFF or GTF file, simply create a database with the gffutils.create_db() function:

import gffutils
db = gffutils.create_db(filename, database_filename)

This is a one-time operation. This will parse the file, infer the relationships among the features in the file, and store the features and relationships in the file database_filename. Once it is complete, from now on you just have to attach to the existing database_filename like this:

db = gffutils.FeatureDB(database_filename)

You can now use the tools in gffutils to work with the data.

In practice, however, GFF and GTF files do not always exactly match the official specification. These difficult-to-work-with files are the raison d’être for gffutils, which provides many different parameters for customization and for handling special cases.

The goal of these different parameters is to ultimately assign unique IDs to each feature in the file (which is used as the primary key for the features table. The section Database IDs details these settings, and the Examples show all sorts of tricks for getting improperly-formatted files to work with gffutils.