cmstats module¶
cmstats.correlateCMaps(ccMapObjOne, ccMapObjTwo) |
To calculate correlation between two Hi-C maps |
cmstats.correlateGCMaps(gcmapOne, gcmapTwo) |
To calculate correlation between common Hi-C maps from two gcmap files |
cmstats.getAvgContactByDistance(ccmaps[, …]) |
To calculate average contact as a function of distance |
-
correlateCMaps(ccMapObjOne, ccMapObjTwo, ignore_triangular=True, diagonal_offset=1, corrType='pearson', blockSize=None, slideStepSize=1, cutoffPercentile=None, workDir=None, outFile=None, logHandler=None)¶ To calculate correlation between two Hi-C maps
This function can be used to calculate either Pearson or Spearman rank-order correlation between two Hi-C maps. It also ignore lower-triangular matrix with diagnonal offset to avoid duplicate and large values.
Parameters: - ccMapObjOne (
gcMapExplorer.lib.ccmap.CCMAP) – FirstgcMapExplorer.lib.ccmap.CCMAPinstance containing Hi-C data - ccMapObjTwo (
gcMapExplorer.lib.ccmap.CCMAP) – SecondgcMapExplorer.lib.ccmap.CCMAPinstance containing Hi-C data - ignore_triangular (bool) – Whether entire matrix is considered or only one half triangular region of matrixis considered.
- diagonal_offset (int) – If
ignore_triangular=True, it is used to determine how much bins are ignored from the diagonal in one half triangular region of matrix.diagonal_offset = 0is the main diagonal,diagonal_offset > 0means ignore this many bins from the diagonal. - corrType (str) – Correlation type. For Pearson and Spearman rank-order correlation, use
pearsonandspearman, respectively. - blockSize (str) – To calculate block-wise correlations by sliding block of given size along diagonals. It should be in resolution.
For example,
1mb,500kb,5mb,2.5mbetc. IfNone, correlation of whole map is calculated. Sliding step of block depends onslideStepSize. - slideStepSize (int) – Step-size in bins by which blocks will be shifted for block-wise correlation. If slideStepSize is large then blocks might not be overlapped.
- cutoffPercentile (float) – Cutoff percentile to discard values during correlation calculation. If a cutoff percentile is given, values less than this percentile value will not be considered during correlation calculation.
- workDir (str) – Name of working directory, where temporary files will be kept.If
workDir = None, file will be generated in OS based temporary directory. - outFile (str) – Name of output file. Only written for block-wise correlation.
Returns: - corr (float or list) – Correlation coefficient
- pvalue/centers (float or list) – If
blockSize=None2-tailed p-value is returned. For block-wise correlation, list of block-center is returned.
See also
- scipy.stats.pearsonr for Pearson correlation.
- scipy.stats.spearmanr for Spearman rank-order correlation.
- ccMapObjOne (
-
correlateGCMaps(gcmapOne, gcmapTwo, outFile=None, blockSize=None, slideStepSize=1, name=None, cutoffPercentile=None, ignore_triangular=True, diagonal_offset=1, corrType='pearson', workDir=None, logHandler=None)¶ To calculate correlation between common Hi-C maps from two gcmap files
This function can be used to calculate either Pearson or Spearman rank-order correlation between common maps present in two gcmap files. It also ignore lower-triangular matrix with diagonal offset to avoid duplicate and large values.
Note
If block-wise correlation calculation will be initiated by
blockSizeoption, aoutFileandnameis also required for further processing. The block-wise correlation will stored in output HDF5 format file.Parameters: - gcmapOne (str) – First gcmap file.
- gcmapTwo (str) – Second gcmap file
- outFile (str) – Name of output file. Only written for block-wise correlation.
- blockSize (str) – To calculate block-wise correlations by sliding block of given size along diagonals. It should be in resolution.
For example,
1mb,500kb,5mb,2.5mbetc. IfNone, correlation of whole map is calculated. Sliding step of block depends onslideStepSize. - slideStepSize (int) – Step-size in bins by which blocks will be shifted for block-wise correlation. If slideStepSize is large then blocks might not be overlapped.
- name (str) – Title of dataset in HDF5 output file. If
blockSizeoption is used,nameis an essential argument. - cutoffPercentile (float) – Cutoff percentile to discard values during correlation calculation. If a cutoff percentile is given, values less than this percentile value will not be considered during correlation calculation.
- ignore_triangular (bool) – Whether entire matrix is considered or only one half triangular region of matrixis considered.
- diagonal_offset (int) – If
ignore_triangular=True, it is used to determine how much bins are ignored from the diagonal in one half triangular region of matrix.diagonal_offset = 0is the main diagonal,diagonal_offset > 0means ignore this many bins from the diagonal. - corrType (str) – Correlation type. For Pearson and Spearman rank-order correlation, use
pearsonandspearman, respectively. - workDir (str) – Name of working directory, where temporary files will be kept.If
workDir = None, file will be generated in OS based temporary directory.
Returns: - mapList (list) – List of chromosomes
- corrs (list) – Correlation coefficient of each chromosome
- pvalue (list) – 2-tailed p-value for correlation coefficient of each chromosome.
-
getAvgContactByDistance(ccmaps, stats='median', removeOutliers=False, outliersThreshold=3.5)¶ To calculate average contact as a function of distance
Parameters: - ccmaps (
gcMapExplorer.lib.ccmap.CCMAPor list[gcMapExplorer.lib.ccmap.CCMAP]) – Input contact maps - stats (str) – Statistics for scaling. Accepted methods are
meanandmedian. - removeOutliers (bool) – If
True, outliers will be removed before calculating input statistics. - outliersThreshold (float) – The modified z-score to use as a threshold. Observations with a modified z-score (based on the median absolute deviation) greater than this value will be classified as outliers.
Returns: - avg_contacts (numpy.array)
- A one-dimensional numpy array containing average contacts, where index is distance between two locations for given resolution/binsize.
- For example, if
ccmap.binsize=100000andavg_contacts[4]=1234.56, then at distance of 400000 b, average contact is1234.56.
- ccmaps (