metaseq.results_table.EdgeRResults¶
-
class
metaseq.results_table.
EdgeRResults
(data, db=None, header_check=True, **kwargs)[source]¶ Bases:
metaseq.results_table.DifferentialExpressionResults
Class for working with results from edgeR.
Just like a DifferentialExpressionResults object, but sets the pval_column, lfc_column, and mean_column to the names used in edgeR’s output.
The underlying pandas.DataFrame is always available with the data attribute.
Any attributes not explicitly in this class will be looked for in the underlying pandas.DataFrame.
Parameters: data : string or pandas.DataFrame
If string, assumes it’s a filename and calls pandas.read_table(data, **import_kwargs).
db : string or gffutils.FeatureDB
Optional database that can be used to generate features
import_kwargs : dict
These arguments will be passed to pandas.read_table() if data is a filename.
Methods
TSS
([upstream, downstream])Creates a BED/GFF file of the 5’ end of each feature represented in the table and returns the resulting pybedtools.BedTool object. TTS
([upstream, downstream])Creates a BED/GFF file of the 3’ end of each feature represented in the table and returns the resulting pybedtools.BedTool object. align_with
(other)Align the dataframe’s index with another. attach_db
(db)Attach a gffutils.FeatureDB for access to features. changed
([thresh, idx])Changed features. copy
()disenriched
([thresh, idx])Disenriched features. downregulated
([thresh, idx])Downregulated features. enriched
([thresh, idx])Enriched features. features
([ignore_unknown])Generator of features. five_prime
([upstream, downstream])Creates a BED/GFF file of the 5’ end of each feature represented in the table and returns the resulting pybedtools.BedTool object. genes_in_common
(other)Convenience method for getting the genes found in both dataframes. genes_with_peak
(peaks[, transform_func, ...])Returns a boolean index of genes that have a peak nearby. ma_plot
(thresh[, up_kwargs, dn_kwargs, ...])MA plot radviz
(column_names[, transforms])Radviz plot. reindex_to
(x[, attribute])Returns a copy that only has rows corresponding to feature names in x. scatter
(x, y[, xfunc, yfunc, xscale, ...])Do-it-all method for making annotated scatterplots. strip_unknown_features
()Remove features not found in the gffutils.FeatureDB. three_prime
([upstream, downstream])Creates a BED/GFF file of the 3’ end of each feature represented in the table and returns the resulting pybedtools.BedTool object. unchanged
([thresh, idx])Changed features. update
(dataframe)Updates the current data with a new dataframe. upregulated
([thresh, idx])Upregulated features. Methods
TSS
([upstream, downstream])Creates a BED/GFF file of the 5’ end of each feature represented in the table and returns the resulting pybedtools.BedTool object. TTS
([upstream, downstream])Creates a BED/GFF file of the 3’ end of each feature represented in the table and returns the resulting pybedtools.BedTool object. __init__
(data[, db, header_check])align_with
(other)Align the dataframe’s index with another. attach_db
(db)Attach a gffutils.FeatureDB for access to features. changed
([thresh, idx])Changed features. copy
()disenriched
([thresh, idx])Disenriched features. downregulated
([thresh, idx])Downregulated features. enriched
([thresh, idx])Enriched features. features
([ignore_unknown])Generator of features. five_prime
([upstream, downstream])Creates a BED/GFF file of the 5’ end of each feature represented in the table and returns the resulting pybedtools.BedTool object. genes_in_common
(other)Convenience method for getting the genes found in both dataframes. genes_with_peak
(peaks[, transform_func, ...])Returns a boolean index of genes that have a peak nearby. ma_plot
(thresh[, up_kwargs, dn_kwargs, ...])MA plot radviz
(column_names[, transforms])Radviz plot. reindex_to
(x[, attribute])Returns a copy that only has rows corresponding to feature names in x. scatter
(x, y[, xfunc, yfunc, xscale, ...])Do-it-all method for making annotated scatterplots. strip_unknown_features
()Remove features not found in the gffutils.FeatureDB. three_prime
([upstream, downstream])Creates a BED/GFF file of the 3’ end of each feature represented in the table and returns the resulting pybedtools.BedTool object. unchanged
([thresh, idx])Changed features. update
(dataframe)Updates the current data with a new dataframe. upregulated
([thresh, idx])Upregulated features. -
__init__
(data, db=None, header_check=True, **kwargs)¶
-
TSS
(upstream=1, downstream=0)¶ Creates a BED/GFF file of the 5’ end of each feature represented in the table and returns the resulting pybedtools.BedTool object. Needs an attached database.
Parameters: upstream, downstream : int
Number of basepairs up and downstream to include
-
TTS
(upstream=0, downstream=1)¶ Creates a BED/GFF file of the 3’ end of each feature represented in the table and returns the resulting pybedtools.BedTool object. Needs an attached database.
Parameters: upstream, downstream : int
Number of basepairs up and downstream to include
-
align_with
(other)¶ Align the dataframe’s index with another.
-
attach_db
(db)¶ Attach a gffutils.FeatureDB for access to features.
Useful if you want to attach a db after this instance has already been created.
Parameters: db : gffutils.FeatureDB
-
changed
(thresh=0.05, idx=True)¶ Changed features.
{threshdoc}
-
disenriched
(thresh=0.05, idx=True)¶ - Disenriched features.
Parameters: thresh : float
Only features with <= thresh will be returned
idx : bool
If True, a boolean index will be returned. If False, a new object will be returned that has been subsetted.
-
downregulated
(thresh=0.05, idx=True)¶ - Downregulated features.
Parameters: thresh : float
Only features with <= thresh will be returned
idx : bool
If True, a boolean index will be returned. If False, a new object will be returned that has been subsetted.
-
enriched
(thresh=0.05, idx=True)¶ - Enriched features.
Parameters: thresh : float
Only features with <= thresh will be returned
idx : bool
If True, a boolean index will be returned. If False, a new object will be returned that has been subsetted.
-
features
(ignore_unknown=False)¶ Generator of features.
If a gffutils.FeatureDB is attached, returns a pybedtools.Interval for every feature in the dataframe’s index.
Parameters: ignore_unknown : bool
If True, silently ignores features that are not found in the db.
-
five_prime
(upstream=1, downstream=0)¶ Creates a BED/GFF file of the 5’ end of each feature represented in the table and returns the resulting pybedtools.BedTool object. Needs an attached database.
Parameters: upstream, downstream : int
Number of basepairs up and downstream to include
-
genes_in_common
(other)¶ Convenience method for getting the genes found in both dataframes.
-
genes_with_peak
(peaks, transform_func=None, split=False, intersect_kwargs=None, id_attribute='ID', *args, **kwargs)¶ Returns a boolean index of genes that have a peak nearby.
Parameters: peaks : string or pybedtools.BedTool
If string, then assume it’s a filename to a BED/GFF/GTF file of intervals; otherwise use the pybedtools.BedTool object directly.
transform_func : callable
This function will be applied to each gene object returned by self.features(). Additional args and kwargs are passed to transform_func. For example, if you’re looking for peaks within 1kb upstream of TSSs, then pybedtools.featurefuncs.TSS would be a useful transform_func, and you could supply additional kwargs of upstream=1000 and downstream=0.
This function can return iterables of features, too. For example, you might want to look for peaks falling within the exons of a gene. In this case, transform_func should return an iterable of pybedtools.Interval objects. The only requirement is that the name field of any feature matches the index of the dataframe.
intersect_kwargs : dict
kwargs passed to pybedtools.BedTool.intersect.
id_attribute : str
The attribute in the GTF or GFF file that contains the id of the gene. For meaningful results to be returned, a gene’s ID be also found in the index of the dataframe.
For GFF files, typically you’d use id_attribute=”ID”. For GTF files, you’d typically use id_attribute=”gene_id”.
-
ma_plot
(thresh, up_kwargs=None, dn_kwargs=None, zero_line=None, **kwargs)¶ MA plot
Plots the average read count across treatments (x-axis) vs the log2 fold change (y-axis).
Additional kwargs are passed to self.scatter (useful ones might include genes_to_highlight)
Parameters: thresh : float
Features with values <= thresh will be highlighted in the plot.
up_kwargs, dn_kwargs : None or dict
Kwargs passed to matplotlib’s scatter(), used for styling up/down regulated features (defined by thresh and col)
zero_line : None or dict
Kwargs passed to matplotlib.axhline(0).
-
radviz
(column_names, transforms={}, **kwargs)¶ Radviz plot.
Useful for exploratory visualization, a radviz plot can show multivariate data in 2D. Conceptually, the variables (here, specified in column_names) are distributed evenly around the unit circle. Then each point (here, each row in the dataframe) is attached to each variable by a spring, where the stiffness of the spring is proportional to the value of corresponding variable. The final position of a point represents the equilibrium position with all springs pulling on it.
In practice, each variable is normalized to 0-1 (by subtracting the mean and dividing by the range).
This is a very exploratory plot. The order of column_names will affect the results, so it’s best to try a couple different orderings. For other caveats, see [1].
Additional kwargs are passed to self.scatter, so subsetting, callbacks, and other configuration can be performed using options for that method (e.g., genes_to_highlight is particularly useful).
Parameters: column_names : list
Which columns of the dataframe to consider. The columns provided should only include numeric data, and they should not contain any NaN, inf, or -inf values.
transforms : dict
Dictionary mapping column names to transformations that will be applied just for the radviz plot. For example, np.log1p is a useful function. If a column name is not in this dictionary, it will be used as-is.
ax : matplotlib.Axes
If not None, then plot the radviz on this axes. If None, then a new figure will be created.
kwargs : dict
Additional arguments are passed to self.scatter. Note that not all possible kwargs for self.scatter are necessarily useful for a radviz plot (for example, margninal histograms would not be meaningful).
Notes
This method adds two new variables to self.data: “radviz_x” and “radviz_y”. It then calls the self.scatter method, using these new variables.
The data transformation was adapted from the pandas.tools.plotting.radviz function.
References
- [1] Hoffman,P.E. et al. (1997) DNA visual and analytic data mining. In
- the Proceedings of the IEEE Visualization. Phoenix, AZ, pp. 437-441.
[2] http://www.agocg.ac.uk/reports/visual/casestud/brunsdon/radviz.htm [3] http://pandas.pydata.org/pandas-docs/stable/visualization.html #radviz
-
reindex_to
(x, attribute='Name')¶ Returns a copy that only has rows corresponding to feature names in x.
Parameters: x : str or pybedtools.BedTool
BED, GFF, GTF, or VCF where the “Name” field (that is, the value returned by feature[‘Name’]) or any arbitrary attribute
attribute : str
Attribute containing the name of the feature to use as the index.
-
scatter
(x, y, xfunc=None, yfunc=None, xscale=None, yscale=None, xlab=None, ylab=None, genes_to_highlight=None, label_genes=False, marginal_histograms=False, general_kwargs={'picker': True, 'alpha': 0.2, 'color': 'k'}, general_hist_kwargs=None, offset_kwargs={}, label_kwargs=None, ax=None, one_to_one=None, callback=None, xlab_prefix=None, ylab_prefix=None, sizefunc=None, hist_size=0.3, hist_pad=0.0, nan_offset=0.015, pos_offset=0.99, linelength=0.01, neg_offset=0.005, figure_kwargs=None)¶ Do-it-all method for making annotated scatterplots.
Parameters: x, y : array-like
Variables to plot. Must be names in self.data’s DataFrame. For example, “baseMeanA” and “baseMeanB”
xfunc, yfunc : callable
Functions to apply to xvar and yvar respectively. Default is log2; set to None to have no transformation.
xlab, ylab : string
Labels for x and y axes; default is to use function names for xfunc and yfunc and variable names xvar and yvar, e.g., “log2(baseMeanA)”
ax : None or Axes object
If ax=None, then makes a new fig and returns the Axes object, otherwise, plots onto ax
general_kwargs : dict
Kwargs for matplotlib.scatter; specifies how all points look
genes_to_highlight : list of (index, dict) tuples
Provides lots of control to colors. It is a list of (ind, kwargs) tuples, where each ind specifies genes to plot with kwargs. Each dictionary updates a copy of general_kwargs. If genes_to_highlight has a “name” kwarg, this must be a list that’t the same length as ind. It will be used to label the genes in ind using label_kwargs.
callback : callable
Function to call upon clicking a point. Must accept a single argument which is the gene ID. Default is to print the gene name, but an example of another useful callback would be a mini-browser connected to a genomic_signal object from which the expression data were calculated.
one_to_one : None or dict
If not None, a dictionary of matplotlib.plot kwargs that will be used to plot a 1:1 line.
label_kwargs : dict
Kwargs for labeled genes (e.g., dict=(style=’italic’)). Will only be used if an entry in genes_to_highlight has a name key.
offset_kwargs : dict
Kwargs to be passed to matplotlib.transforms.offset_copy, used for adjusting the positioning of gene labels in relation to the actual point.
xlab_prefix, ylab_prefix : str
Optional label prefix that will be added to the beginning of xlab and/or ylab.
hist_size : float
Size of marginal histograms
hist_pad : float
Spacing between marginal histograms
nan_offset, pos_offset, neg_offset : float
Offset, in units of “fraction of axes” for the NaN, +inf, and -inf “rug plots”
linelength : float
Line length for the rug plots
-
strip_unknown_features
()¶ Remove features not found in the gffutils.FeatureDB. This will typically include ‘ambiguous’, ‘no_feature’, etc, but can also be useful if the database was created from a different one than was used to create the table.
-
three_prime
(upstream=0, downstream=1)¶ Creates a BED/GFF file of the 3’ end of each feature represented in the table and returns the resulting pybedtools.BedTool object. Needs an attached database.
Parameters: upstream, downstream : int
Number of basepairs up and downstream to include
-
unchanged
(thresh=0.05, idx=True)¶ Changed features.
{threshdoc}
-
update
(dataframe)¶ Updates the current data with a new dataframe.
This extra step is required to get around the fancy pandas.DataFrame indexing (like .ix, .iloc, etc).
-
upregulated
(thresh=0.05, idx=True)¶ - Upregulated features.
Parameters: thresh : float
Only features with <= thresh will be returned
idx : bool
If True, a boolean index will be returned. If False, a new object will be returned that has been subsetted.
-