pybedtools.contrib.IntersectionMatrix

class pybedtools.contrib.IntersectionMatrix(beds, genome, iterations, dbfn=None, force=False)[source]

Class to handle many pairwise comparisons of interval files

__init__(beds, genome, iterations, dbfn=None, force=False)[source]

Class to handle and keep track of many pairwise comparisons of interval files.

A lightweight database approach is used to minimize computational time.

The database stores filenames and calculation timestamps; re-calculating a matrix using the same interval files will only re-calculate values for those files whose modification times are newer than the timestamp in the database.

beds is a list of bed files.

genome is the string assembly name, e.g., “hg19” or “dm3”.

dbfn is the filename of the database you’d like to use to track what’s been completed.

Example usage:

First, get a list of bed files to use: #>>> beds = [ #… pybedtools.example_filename(i) for i in [ #… ‘Cp190_Kc_Bushey_2009.bed’, #… ‘CTCF_Kc_Bushey_2009.bed’, #… ‘SuHw_Kc_Bushey_2009.bed’, #… ‘BEAF_Kc_Bushey_2009.bed’ #… ]]

Set some parameters. “dm3” is the genome to use; info will be stored in “ex.db”. force=True means to overwrite what’s in the database #>>> # In practice, you’ll want many more iterations… #>>> im = IntersectionMatrix(beds, ‘dm3’, #… dbfn=’ex.db’, iterations=3, force=True) #>>> # Use 4 CPUs for randomization #>>> matrix = im.create_matrix(verbose=True, processes=4)

Methods

__init__(beds, genome, iterations[, dbfn, force])

Class to handle and keep track of many pairwise comparisons of interval files.

add_row(results)

Inserts data into db. results is a dictionary as returned by BedTool.randomstats with keys like::.

create_matrix([verbose])

Matrix (implemented as a dictionary), where the final values are sqlite3.ROW objects from the database.

done(fa, fb, iterations)

Retrieves row from db and only returns True if there's something in there and the timestamp is newer than the input files.

get_row(fa, fb, iterations)

Return the sqlite3.Row from the database corresponding to files fa and fb; returns None if not found.

print_matrix(matrix, key)

Prints a pairwise matrix of values.

run_and_insert(fa, fb, **kwargs)