Changes in v0.10¶
~to user’s home directory for filenames (issue #105).
When merging, make merging attributes optional (issue #107)
Use a proper context manager for open files, fixes issue #110.
Dramatically improved merging routine – many thanks to Nolan Wood @innovate-invent (#130).
Previously, when merging the second feature’s attributes were not deep-copied, resulting in unintended changes to the underlying dict (#133, thanks Nolan Wood @innovate-invent)
Fixed an issue that when imputing intron features, attributes were being pulled from the first (or last) exon (#139, thanks @stekaz).
Support creating Feature objects using empty values for attributes (#144).
Ensure that tests work post-installation (`#145 <https://github.com/daler/gffutils/pull/145`_, thanks Michael Crusoe @mr-c)
FeatureDB.update, especially with respect to handling autoincrementing feature IDs. Previously, upon updating a db with another, autoincrement integers would restart at 1. Thanks Nolan Wood (@innovate-invent) and @abhishekkumaresan (#149)
Changes in v0.9¶
Long-overdue release with performance improvements and better handling of corner-case GFF and GTF files.
performance tests (thanks Andrew Lando)
performance improvements by building additional indexes (thanks Andrew Lando)
performance improvments by running
analyze featureson created table (thanks Andrew Lando). Existing databases that have not had this run will trigger a warning suggesting that this should be run to speed up queries dramatically.
add test for corner-case GTFs (issue #79)
add fix for corner-case GFFs where
"="is both a separator between fields as well as part of a value inside a field even when not quoted (issue #82)
fix handling of corner-case GFFs that are completely missing a start or end position (issue #85)
improvements to test framework
All percent-encoded characters are decoded upon parsing (regardless of if the GFF3 spec says they should have been encoded in the first place), and then re-encoded when converting the Feature to a string (issue #98). Only characters specified in the GFF3 spec are re-encoded. Note that some GFF files have spaces encoded as
%20, but spaces should not be encoded according to the GFF3 specs. In this case, they will be decoded into spaces upon parsing, but not re-encoded when converting to string. Set
gffutils.constants.ignore_url_escape_characters=Trueto disable any encoding/decoding behavior.
improved testing framework
Changes in v0.8.7.1¶
Fixes bug in
gffutils.pybedtools_integration.tsses where iterating over large
databases and using the
as_bed6=True argument could cause a deadlock.
Changes in v0.8.7¶
gffutils.pybedtools_integration. In particular, the
gffutils.pybedtools_integration.tsses() function provides many options
for creating a GTF, GFF, or BED file of transcription start sites (TSSes) from
Changes in v0.8.6.1¶
Only a warning – and not an ImportError – is raised if BioPython is not installed.
Lots of updates in the testing framework to use docker containers on travis-ci.org.
Changes in v0.8.4¶
To summarize, there are some publicly available GTF files that don’t match the
GTF specification and have transcripts and genes already added. By default,
gffutils assumes a GTF matches spec and that there are no transcript or gene
features. It infers transcript and gene extents from exons alone. So for these
off-spec GTF files,
gffutils would do a lot of extra work inferring the
transcript and gene extents, and then it would try to the inferred features
back into the database. Since they were already there, it triggered
The point is, if you didn’t specifically tell
gffutils to skip this step, all
of this extra merging work would cause database creation to take far longer
than it should have (possibly 10-100x longer).
With v0.8.4, if you create a database out of a GTF file and there are
transcript or gene features in it,
gffutils will emit a warning and
a recommendation to disable inferring transcripts and/or genes to speed things
The new keyword arguments for controlling this in
disable_infer_genes. These are both set
to False by default.
The previous, soon-to-be-deprecated way of doing this was to use
infer_gene_extent=False. The new equivalent is to use
disable_infer_genes=True. If you use the
old method, it will be automatically converted to the new method and a warning
will be emitted.
This new behavior is more flexible since it gives us the ability to infer transcripts if genes exist, or infer genes if transcripts exist (rather than the previous all-or-nothing approach).
Changes in v0.8.3.1¶
Thanks to Sven-Eric Schelhorn (@schellhorn on github), this version fixes a bug where, if multiple gffutils processes try to create databases from GTF files simultaneously, the resulting databases would be incomplete and incorrect.
Changes in v0.8.3¶
inspect.inspect()function for examining the contents of a GFF or GTF file.
Feature.sequence()method to extract the sequence for a feature (uses pyfaidx).
When creating or updating a database, the provided
transformfunction can return a value evaluating to False which will cause that feature to be skipped.
create_db()can use remote gzipped files as input
FeatureDB.delete()method to delete features from a database
Initial support for BioPython SeqFeature objects
limitkwarg can now be used for
FeatureDB.children()to restrict returned features to a genomic range
FeatureDB.interfeatures()can now update attributes
Much more flexible
FeatureDB.region()that allows slice-like operations.
FeatureDB.update()so that entire features (rather than just attributes) can be replaced or updated (thanks Rintze Zelle for ideas and testing)
fix a bug when using a function as an
create_db()function (thanks @moritzbuck on github)