pybedtools.contrib.venn_maker.cleaned_intersect

pybedtools.contrib.venn_maker.cleaned_intersect(items)[source]

Perform interval intersections such that the end products have identical features for overlapping intervals.

The VennDiagram package does set intersection, not interval intersection. So the goal here is to represent intersecting intervals as intersecting sets of strings.

Doing a simple BEDTools intersectBed call doesn’t do the trick (even with the -u argument). As a concrete example, what would the string be for an intersection of the feature “chr1:1-100” in file x and “chr1:50-200” in file y?

The method used here is to substitute the intervals in y that overlap x with the corresponding elements in x. This means that in the resulting sets, the overlapping features are identical. To follow up with the example, both x and y would have an item “chr1:50-200” in their sets, simply indicating that one interval overlapped.

Venn diagrams are not well suited for nested overlaps or multi-overlaps. To illustrate, try drawing the 2-way Venn diagram of the following two files. Specifically, what number goes in the middle – the number of features in x that intersect y (1) or the number of features in y that intersect x (2)?:

x:
    chr1  1  100
    chr1 500 6000

y:
    chr1 50 100
    chr1 80 200
    chr9 777 888

In this case, this function will return the following sets:

x:
    chr1:1-100
    chr1:500-6000

y:
    chr1:1-100
    chr9:777-888

This means that while x does not change in length, y can. For example, if there are 2 features in x that overlap one feature in y, then y will gain those two features in place of its single original feature.

This strategy is extended for multiple intersections – see the source for details.