pybedtools.bedtool.BedTool.randomstats

BedTool.randomstats(other, iterations, new=False, genome_fn=None, include_distribution=False, **kwargs)[source]

Dictionary of results from many randomly shuffled intersections.

Sends args and kwargs to BedTool.randomintersection() and compiles results into a dictionary with useful stats. Requires numpy.

If include_distribution is True, then the dictionary will include the full distribution; otherwise, the distribution is deleted and cleaned up to save on memory usage.

This is one possible way of assigning significance to overlaps between two files. See, for example:

Negre N, Brown CD, Shah PK, Kheradpour P, Morrison CA, et al. 2010 A Comprehensive Map of Insulator Elements for the Drosophila Genome. PLoS Genet 6(1): e1000814. doi:10.1371/journal.pgen.1000814

Example usage:

Make chromsizes a very small genome for this example:

>>> chromsizes = {'chr1':(1,1000)}
>>> a = pybedtools.example_bedtool('a.bed').set_chromsizes(chromsizes)
>>> b = pybedtools.example_bedtool('b.bed')
>>> try:
...     results = a.randomstats(b, 100, debug=True)
... except ImportError:
...     pass

results is a dictionary that you can inspect.

(Note that the following examples are not run as part of the doctests to avoid forcing users to install NumPy just to pass tests)

The actual overlap:

print(results['actual'])
3

The median of all randomized overlaps:

print(results['median randomized'])
2.0

The percentile of the actual overlap in the distribution of randomized overlaps, which can be used to get an empirical p-value:

print(results['percentile'])
90.0