cmstats module¶
cmstats.correlateCMaps (ccMapObjOne, ccMapObjTwo) 
To calculate correlation between two HiC maps 
cmstats.correlateGCMaps (gcmapOne, gcmapTwo) 
To calculate correlation between common HiC maps from two gcmap files 
cmstats.getAvgContactByDistance (ccmaps[, ...]) 
To calcualte average contact as a function of distance 

correlateCMaps
(ccMapObjOne, ccMapObjTwo, ignore_triangular=True, diagonal_offset=1, corrType='pearson', blockSize=None, slideStepSize=1, cutoffPercentile=None, workDir=None, outFile=None, logHandler=None)¶ To calculate correlation between two HiC maps
This function can be used to calculate either Pearson or Spearman rankorder correlation between two HiC maps. It also ignore lowertrangular matrix with diagnonal offset to avoid duplicate and large values.
Parameters:  ccMapObjOne (
gcMapExplorer.lib.ccmap.CCMAP
) – FirstgcMapExplorer.lib.ccmap.CCMAP
instance containing HiC data  ccMapObjTwo (
gcMapExplorer.lib.ccmap.CCMAP
) – SecondgcMapExplorer.lib.ccmap.CCMAP
instance containing HiC data  ignore_triangular (bool) – Whether entire matrix is considered or only one half triangular region of matrixis considered.
 diagonal_offset (int) – If
ignore_triangular=True
, it is used to determine how much bins are ignored from the diagonal in one half triangular region of matrix.diagonal_offset = 0
is the main diagonal,diagonal_offset > 0
means ignore this many bins from the diagonal.  corrType (str) – Correlation type. For Pearson and Spearman rankorder correlation, use
pearson
andspearman
, respectively.  blockSize (str) – To calculate blockwise correlations by sliding block of given size along diagonals. It should be in resolution.
For example,
1mb
,500kb
,5mb
,2.5mb
etc. IfNone
, correlation of whole map is calculated. Sliding step of block depends onslideStepSize
.  slideStepSize (int) – Stepsize in bins by which blocks will be shifted for blockwise correlation. If slideStepSize is large then blocks might not be overlapped.
 cutoffPercentile (float) – Cutoff percentile to discard values during correlation calculation. If a cutoff percentile is given, values less than this percentile value will not be considered during correlation calculation.
 workDir (str) – Name of working directory, where temporary files will be kept.If
workDir = None
, file will be generated in OS based temporary directory.  outFile (str) – Name of output file. Only written for blockwise correlation.
Returns:  corr (float or list) – Correlation coefficient
 pvalue/centers (float or list) – If
blockSize=None
2tailed pvalue is returned. For blockwise correlation, list of blockcenter is returned.
See also
 scipy.stats.pearsonr for Pearson correlation.
 scipy.stats.spearmanr for Spearman rankorder correlation.
 ccMapObjOne (

correlateGCMaps
(gcmapOne, gcmapTwo, outFile=None, blockSize=None, slideStepSize=1, name=None, cutoffPercentile=None, ignore_triangular=True, diagonal_offset=1, corrType='pearson', workDir=None, logHandler=None)¶ To calculate correlation between common HiC maps from two gcmap files
This function can be used to calculate either Pearson or Spearman rankorder correlation between common maps present in two gcmap files. It also ignore lowertrangular matrix with diagnonal offset to avoid duplicate and large values.
Note
If blockwise correlation calculation will be initiated by
blockSize
option, aoutFile
andname
is also required for further processing. The blockwise correlation will stored in output HDF5 format file.Parameters:  gcmapOne (str) – First gcmap file.
 gcmapTwo (str) – Second gcmap file
 outFile (str) – Name of output file. Only written for blockwise correlation.
 blockSize (str) – To calculate blockwise correlations by sliding block of given size along diagonals. It should be in resolution.
For example,
1mb
,500kb
,5mb
,2.5mb
etc. IfNone
, correlation of whole map is calculated. Sliding step of block depends onslideStepSize
.  slideStepSize (int) – Stepsize in bins by which blocks will be shifted for blockwise correlation. If slideStepSize is large then blocks might not be overlapped.
 name (str) – Title of dataset in HDF5 output file. If
blockSize
option is used,name
is an essential argument.  cutoffPercentile (float) – Cutoff percentile to discard values during correlation calculation. If a cutoff percentile is given, values less than this percentile value will not be considered during correlation calculation.
 ignore_triangular (bool) – Whether entire matrix is considered or only one half triangular region of matrixis considered.
 diagonal_offset (int) – If
ignore_triangular=True
, it is used to determine how much bins are ignored from the diagonal in one half triangular region of matrix.diagonal_offset = 0
is the main diagonal,diagonal_offset > 0
means ignore this many bins from the diagonal.  corrType (str) – Correlation type. For Pearson and Spearman rankorder correlation, use
pearson
andspearman
, respectively.  workDir (str) – Name of working directory, where temporary files will be kept.If
workDir = None
, file will be generated in OS based temporary directory.
Returns:  mapList (list) – List of chromosomes
 corrs (list) – Correlation coefficient of each chromosome
 pvalue (list) – 2tailed pvalue for correlation coefficient of each chromosome.

getAvgContactByDistance
(ccmaps, stats='median', removeOutliers=False, outliersThershold=3.5)¶ To calcualte average contact as a function of distance
Parameters:  ccmaps (
gcMapExplorer.lib.ccmap.CCMAP
or list[gcMapExplorer.lib.ccmap.CCMAP
]) – Input contact maps  stats (str) – Statistics for scaling. Accepted methods are
mean
andmedian
.  removeOutliers (bool) – If
True
, outliers will be removed before calculating input statistics.  outliersThershold (float) – The modified zscore to use as a threshold. Observations with a modified zscore (based on the median absolute deviation) greater than this value will be classified as outliers.
Returns:  avg_contacts (numpy.array)
 A onedimensional numpy array containing average contacts, where index is distance between two locations for given resolution/binsize.
 For example, if
ccmap.binsize=100000
andavg_contacts[4]=1234.56
, then at distance of 400000 b, average contact is1234.56
.
 ccmaps (