pybedtools.contrib.long_range_interaction.cis_trans_interactions¶
- pybedtools.contrib.long_range_interaction.cis_trans_interactions(iterator, n, extra, verbose=True)[source]¶
Converts the output from
tag_bedpe
into a pandas DataFrame containing information about regions that contact each other in cis (same fragment) or trans (different fragments).For example, given a BEDPE file representing 3D interactions in the genome, we want to identify which transcription start sites are connected to distal regions containing a peak.
>>> bedpe = pybedtools.example_bedtool('test_bedpe.bed') >>> print(bedpe) chr1 1 10 chr1 50 90 pair1 5 + - x1 chr1 2 15 chr1 200 210 pair2 1 + + y1
>>> tsses = pybedtools.example_bedtool('test_tsses.bed') >>> print(tsses) chr1 5 6 gene1 chr1 60 61 gene2 chr1 88 89 gene3
>>> peaks = pybedtools.example_bedtool('test_peaks.bed') >>> print(peaks) chr1 3 4 peak1 50 .
Here’s what the tracks look like. Note that pair1 is evidence of a gene1-gene2 interaction and a gene1-gene3 interaction:
TRACKS: 1 2 / 5 6 / 8 9 / 20 0123456789012345678901 / 2345678901234567890123 / 012345678901234 / 0123456789 pair1 |||||||||------------ / --------|||||||||||||| / ||||||||||||||| / pair2 |||||||||||||------- / ---------------------- / --------------- / |||||||||| tsses 1 / 2 / 3 peaks 1
>>> from collections import OrderedDict >>> queries = OrderedDict() >>> queries['tss'] = tsses >>> queries['pk'] = peaks >>> iterator, n, extra = pybedtools.contrib.long_range_interaction.tag_bedpe(bedpe, queries) >>> for (label, group1, group2) in iterator: ... group1 = sorted(group1, key=lambda x: str(x)) ... group2 = sorted(group2, key=lambda x: str(x)) ... for i in group1: ... print(i, end='') ... for i in group2: ... print(i, end='') chr1 1 10 pair1 5 + x1 pk chr1 3 4 peak1 50 . 1 chr1 1 10 pair1 5 + x1 tss chr1 5 6 gene1 1 chr1 50 90 pair1 5 - x1 tss chr1 60 61 gene2 1 chr1 50 90 pair1 5 - x1 tss chr1 88 89 gene3 1 chr1 2 15 pair2 1 + y1 pk chr1 3 4 peak1 50 . 1 chr1 2 15 pair2 1 + y1 tss chr1 5 6 gene1 1 chr1 200 210 pair2 1 + y1 . . -1 -1 . 0
Now we run the same thing, but now aggregate it. Note that each piece of interaction evidence has its own line. The first line shows that pair1 has gene1 and peak1 in the same fragment, and that they are connected to gene2. The second line shows again that gene1 and peak1 are in the same fragmet and that they are also connected to gene3:
>>> import pandas; pandas.set_option('display.max_columns', 10) >>> iterator, n, extra = pybedtools.contrib.long_range_interaction.tag_bedpe(bedpe, {'tss': tsses, 'pk': peaks}) >>> df = pybedtools.contrib.long_range_interaction.cis_trans_interactions(iterator, n, extra) >>> print(df.sort_values(list(df.columns)).reset_index(drop=True)) target_label target_name cis_label cis_name distal_label distal_name label 0 pk peak1 tss gene1 . . pair2 1 pk peak1 tss gene1 tss gene2 pair1 2 pk peak1 tss gene1 tss gene3 pair1 3 tss gene1 pk peak1 . . pair2 4 tss gene1 pk peak1 tss gene2 pair1 5 tss gene1 pk peak1 tss gene3 pair1 6 tss gene2 tss gene3 pk peak1 pair1 7 tss gene2 tss gene3 tss gene1 pair1 8 tss gene3 tss gene2 pk peak1 pair1 9 tss gene3 tss gene2 tss gene1 pair1
If we only care about genes:
>>> print((df[df.target_label == 'tss']).sort_values(list(df.columns)).reset_index(drop=True)) target_label target_name cis_label cis_name distal_label distal_name label 0 tss gene1 pk peak1 . . pair2 1 tss gene1 pk peak1 tss gene2 pair1 2 tss gene1 pk peak1 tss gene3 pair1 3 tss gene2 tss gene3 pk peak1 pair1 4 tss gene2 tss gene3 tss gene1 pair1 5 tss gene3 tss gene2 pk peak1 pair1 6 tss gene3 tss gene2 tss gene1 pair1
Note that in pair2, there is no evidence of interaction between gene1 and gene2.
What interacts distally with gene2’s TSS?
>>> assert set(df.loc[df.target_name == 'gene2', 'distal_name']).difference('.') == set([u'gene1', u'peak1'])