pybedtools.bedtool.BedTool.randomstats¶
- BedTool.randomstats(other, iterations, new=False, genome_fn=None, include_distribution=False, **kwargs)[source]¶
Dictionary of results from many randomly shuffled intersections.
Sends args and kwargs to
BedTool.randomintersection()
and compiles results into a dictionary with useful stats. Requires numpy.If
include_distribution
is True, then the dictionary will include the full distribution; otherwise, the distribution is deleted and cleaned up to save on memory usage.This is one possible way of assigning significance to overlaps between two files. See, for example:
Negre N, Brown CD, Shah PK, Kheradpour P, Morrison CA, et al. 2010 A Comprehensive Map of Insulator Elements for the Drosophila Genome. PLoS Genet 6(1): e1000814. doi:10.1371/journal.pgen.1000814
Example usage:
Make chromsizes a very small genome for this example:
>>> chromsizes = {'chr1':(1,1000)} >>> a = pybedtools.example_bedtool('a.bed').set_chromsizes(chromsizes) >>> b = pybedtools.example_bedtool('b.bed') >>> try: ... results = a.randomstats(b, 100, debug=True) ... except ImportError: ... pass
results is a dictionary that you can inspect.
(Note that the following examples are not run as part of the doctests to avoid forcing users to install NumPy just to pass tests)
The actual overlap:
print(results['actual']) 3
The median of all randomized overlaps:
print(results['median randomized']) 2.0
The percentile of the actual overlap in the distribution of randomized overlaps, which can be used to get an empirical p-value:
print(results['percentile']) 90.0