metaseq.plotutils.clustered_sortind

metaseq.plotutils.clustered_sortind(x, k=10, scorefunc=None)[source]

Uses MiniBatch k-means clustering to cluster matrix into groups.

Each cluster of rows is then sorted by scorefunc – by default, the max peak height when all rows in a cluster are averaged, or cluster.mean(axis=0).max().

Returns the index that will sort the rows of x and a list of “breaks”. breaks is essentially a cumulative row count for each cluster boundary. In other words, after plotting the array you can use axhline on each “break” to plot the cluster boundary.

If k is a list or tuple, iteratively try each one and select the best with the lowest mean distance from cluster centers.

Parameters:
  • x – Matrix whose rows are to be clustered
  • k – Number of clusters to create or a list of potential clusters; the optimum will be chosen from the list
  • scorefunc – Optional function for sorting rows within clusters. Must accept a single argument of a NumPy array.