cmstats module¶
cmstats.correlateCMaps (ccMapObjOne, ccMapObjTwo) |
To calculate correlation between two Hi-C maps |
cmstats.correlateGCMaps (gcmapOne, gcmapTwo) |
To calculate correlation between common Hi-C maps from two gcmap files |
cmstats.getAvgContactByDistance (ccmaps[, …]) |
To calculate average contact as a function of distance |
-
correlateCMaps
(ccMapObjOne, ccMapObjTwo, ignore_triangular=True, diagonal_offset=1, corrType='pearson', blockSize=None, slideStepSize=1, cutoffPercentile=None, workDir=None, outFile=None, logHandler=None)¶ To calculate correlation between two Hi-C maps
This function can be used to calculate either Pearson or Spearman rank-order correlation between two Hi-C maps. It also ignore lower-triangular matrix with diagnonal offset to avoid duplicate and large values.
Parameters: - ccMapObjOne (
gcMapExplorer.lib.ccmap.CCMAP
) – FirstgcMapExplorer.lib.ccmap.CCMAP
instance containing Hi-C data - ccMapObjTwo (
gcMapExplorer.lib.ccmap.CCMAP
) – SecondgcMapExplorer.lib.ccmap.CCMAP
instance containing Hi-C data - ignore_triangular (bool) – Whether entire matrix is considered or only one half triangular region of matrixis considered.
- diagonal_offset (int) – If
ignore_triangular=True
, it is used to determine how much bins are ignored from the diagonal in one half triangular region of matrix.diagonal_offset = 0
is the main diagonal,diagonal_offset > 0
means ignore this many bins from the diagonal. - corrType (str) – Correlation type. For Pearson and Spearman rank-order correlation, use
pearson
andspearman
, respectively. - blockSize (str) – To calculate block-wise correlations by sliding block of given size along diagonals. It should be in resolution.
For example,
1mb
,500kb
,5mb
,2.5mb
etc. IfNone
, correlation of whole map is calculated. Sliding step of block depends onslideStepSize
. - slideStepSize (int) – Step-size in bins by which blocks will be shifted for block-wise correlation. If slideStepSize is large then blocks might not be overlapped.
- cutoffPercentile (float) – Cutoff percentile to discard values during correlation calculation. If a cutoff percentile is given, values less than this percentile value will not be considered during correlation calculation.
- workDir (str) – Name of working directory, where temporary files will be kept.If
workDir = None
, file will be generated in OS based temporary directory. - outFile (str) – Name of output file. Only written for block-wise correlation.
Returns: - corr (float or list) – Correlation coefficient
- pvalue/centers (float or list) – If
blockSize=None
2-tailed p-value is returned. For block-wise correlation, list of block-center is returned.
See also
- scipy.stats.pearsonr for Pearson correlation.
- scipy.stats.spearmanr for Spearman rank-order correlation.
- ccMapObjOne (
-
correlateGCMaps
(gcmapOne, gcmapTwo, outFile=None, blockSize=None, slideStepSize=1, name=None, cutoffPercentile=None, ignore_triangular=True, diagonal_offset=1, corrType='pearson', workDir=None, logHandler=None)¶ To calculate correlation between common Hi-C maps from two gcmap files
This function can be used to calculate either Pearson or Spearman rank-order correlation between common maps present in two gcmap files. It also ignore lower-triangular matrix with diagonal offset to avoid duplicate and large values.
Note
If block-wise correlation calculation will be initiated by
blockSize
option, aoutFile
andname
is also required for further processing. The block-wise correlation will stored in output HDF5 format file.Parameters: - gcmapOne (str) – First gcmap file.
- gcmapTwo (str) – Second gcmap file
- outFile (str) – Name of output file. Only written for block-wise correlation.
- blockSize (str) – To calculate block-wise correlations by sliding block of given size along diagonals. It should be in resolution.
For example,
1mb
,500kb
,5mb
,2.5mb
etc. IfNone
, correlation of whole map is calculated. Sliding step of block depends onslideStepSize
. - slideStepSize (int) – Step-size in bins by which blocks will be shifted for block-wise correlation. If slideStepSize is large then blocks might not be overlapped.
- name (str) – Title of dataset in HDF5 output file. If
blockSize
option is used,name
is an essential argument. - cutoffPercentile (float) – Cutoff percentile to discard values during correlation calculation. If a cutoff percentile is given, values less than this percentile value will not be considered during correlation calculation.
- ignore_triangular (bool) – Whether entire matrix is considered or only one half triangular region of matrixis considered.
- diagonal_offset (int) – If
ignore_triangular=True
, it is used to determine how much bins are ignored from the diagonal in one half triangular region of matrix.diagonal_offset = 0
is the main diagonal,diagonal_offset > 0
means ignore this many bins from the diagonal. - corrType (str) – Correlation type. For Pearson and Spearman rank-order correlation, use
pearson
andspearman
, respectively. - workDir (str) – Name of working directory, where temporary files will be kept.If
workDir = None
, file will be generated in OS based temporary directory.
Returns: - mapList (list) – List of chromosomes
- corrs (list) – Correlation coefficient of each chromosome
- pvalue (list) – 2-tailed p-value for correlation coefficient of each chromosome.
-
getAvgContactByDistance
(ccmaps, stats='median', removeOutliers=False, outliersThreshold=3.5)¶ To calculate average contact as a function of distance
Parameters: - ccmaps (
gcMapExplorer.lib.ccmap.CCMAP
or list[gcMapExplorer.lib.ccmap.CCMAP
]) – Input contact maps - stats (str) – Statistics for scaling. Accepted methods are
mean
andmedian
. - removeOutliers (bool) – If
True
, outliers will be removed before calculating input statistics. - outliersThreshold (float) – The modified z-score to use as a threshold. Observations with a modified z-score (based on the median absolute deviation) greater than this value will be classified as outliers.
Returns: - avg_contacts (numpy.array)
- A one-dimensional numpy array containing average contacts, where index is distance between two locations for given resolution/binsize.
- For example, if
ccmap.binsize=100000
andavg_contacts[4]=1234.56
, then at distance of 400000 b, average contact is1234.56
.
- ccmaps (