metaseq.plotutils.clustered_sortind¶
-
metaseq.plotutils.
clustered_sortind
(x, k=10, scorefunc=None)[source]¶ Uses MiniBatch k-means clustering to cluster matrix into groups.
Each cluster of rows is then sorted by scorefunc – by default, the max peak height when all rows in a cluster are averaged, or cluster.mean(axis=0).max().
Returns the index that will sort the rows of x and a list of “breaks”. breaks is essentially a cumulative row count for each cluster boundary. In other words, after plotting the array you can use axhline on each “break” to plot the cluster boundary.
If k is a list or tuple, iteratively try each one and select the best with the lowest mean distance from cluster centers.
Parameters: - x – Matrix whose rows are to be clustered
- k – Number of clusters to create or a list of potential clusters; the optimum will be chosen from the list
- scorefunc – Optional function for sorting rows within clusters. Must accept a single argument of a NumPy array.