gffutils.inspect.inspect

gffutils.inspect.inspect(data, look_for=['featuretype', 'chrom', 'attribute_keys', 'feature_count'], limit=None, verbose=True)[source]

Inspect a GFF or GTF data source.

This function is useful for figuring out the different featuretypes found in a file (for potential removal before creating a FeatureDB).

Returns a dictionary with a key for each item in look_for and a corresponding value that is a dictionary of how many of each unique item were found.

There will always be a feature_count key, indicating how many features were looked at (if limit is provided, then feature_count will be the same as limit).

For example, if look_for is [‘chrom’, ‘featuretype’], then the result will be a dictionary like:

{
    'chrom': {
        'chr1': 500,
        'chr2': 435,
        'chr3': 200,
        ...
        ...
    }.

    'featuretype': {
        'gene': 150,
        'exon': 324,
        ...
    },

    'feature_count': 5000

}
Parameters:
  • data (str, FeatureDB instance, or iterator of Features) – If data is a string, assume it’s a GFF or GTF filename. If it’s a FeatureDB instance, then its all_features() method will be automatically called. Otherwise, assume it’s an iterable of Feature objects.

  • look_for (list) –

    List of things to keep track of. Options are:

    • any attribute of a Feature object, such as chrom, source, start, stop, strand.

    • ”attribute_keys”, which will look at all the individual attribute keys of each feature

  • limit (int) – Number of features to look at. Default is no limit.

  • verbose (bool) – Report how many features have been processed.

Return type:

dict