# Comparison between original IC algorithm and gcMapExplorer implementation¶

Here, we compare the results from original IC matrix balancing algorithm with the gcMapExplorer implmentation.

Note: To run the below calculation, iced should be installed.

Note

• Below we use already normalized maps that were calculated during previous tutorial.

Importing required modules:

In [1]:
import gcMapExplorer.lib as gmlib
import numpy as np
import iced

import matplotlib as mpl
import matplotlib.pyplot as plt

# To show inline plots
%matplotlib inline
plt.style.use('ggplot')              # Theme for plotting

Wrapper function to normalize map using original algorithm

Follwing function perform several steps:

• Extract a new matrix from input matrix by removing any rows and columns with missing data
• Normalize it using iced module
• Construct normalized matrix from normalization vector
• Expand normalized matrix by re-inserting rows and column with missing data
In [2]:
def normIC_from_iced(ccmap):
# Discard rows and columns with missing data
bNonZeros = gmlib.ccmapHelpers.get_nonzeros_index(ccmap.matrix)
A = (ccmap.matrix[bNonZeros,:])[:,bNonZeros]

# Calculate normalization using iced module
outA = iced.normalization.ICE_normalization(A[:], eps=1e-4, max_iter=3000)

# Initialize ouput ccmap object
normCCMap = ccmap.copy(fill=0.0)
normCCMap.make_editable()
normCCMap.bNoData = ~bNonZeros

# Store normalized values to output ccmap
dsm_i = 0
ox = normCCMap.matrix.shape[0]
idx_fill = np.nonzero( ~normCCMap.bNoData )
for i in range(ox):
if not normCCMap.bNoData[i]:
normCCMap.matrix[i, idx_fill] = outA[dsm_i]
normCCMap.matrix[idx_fill, i] = outA[dsm_i]
dsm_i += 1

# Get the maximum and minimum value except zero
normCCMap.minvalue = ma.min()
normCCMap.maxvalue = ma.max()

return normCCMap

## Normalization using original algorithm from iced module¶

Here, we read raw maps from a gcmap file, normalize it using above function and save the normalized maps to output gcmap file.

In [3]:
gcMapInputFile = 'cmaps/CooMatrix/rawObserved_100kb.gcmap'
gcMapOutFile = 'norm_comparison/normIC_iced.gcmap'

# Get list of maps in ascending order
gcmap = gmlib.gcmap.GCMAP(gcMapInputFile)
mapList = gcmap.mapNameList.copy()
del gcmap

for mapName in mapList:
print('Calculating for {0}...'.format(mapName))
normKR_ccmap = normIC_from_iced(ccMap)

if normKR_ccmap is not None:

del ccMap
del normKR_ccmap
INFO:addCCMap2GCMap: Generating downsampled maps for [chr21] ...
INFO:addCCMap2GCMap:     ... Finished downsampling for [chr21] ...
Calculating for chr21...
Calculating for chr22...
INFO:addCCMap2GCMap: Generating downsampled maps for [chr22] ...
INFO:addCCMap2GCMap:     ... Finished downsampling for [chr22] ...
Calculating for chr20...
INFO:addCCMap2GCMap: Generating downsampled maps for [chr20] ...
INFO:addCCMap2GCMap:     ... Finished downsampling for [chr20] ...
Calculating for chr15...
INFO:addCCMap2GCMap: Generating downsampled maps for [chr15] ...
INFO:addCCMap2GCMap:     ... Finished downsampling for [chr15] ...
Calculating for chr5...
INFO:addCCMap2GCMap: Generating downsampled maps for [chr5] ...
INFO:addCCMap2GCMap:     ... Finished downsampling for [chr5] ...
Calculating for chr1...
INFO:addCCMap2GCMap: Generating downsampled maps for [chr1] ...
INFO:addCCMap2GCMap:     ... Finished downsampling for [chr1] ...

## Correlation Calculation¶

Below we calculate correlation between above normalized maps (original algorithm) and previously normalized maps (gcMapExplorer implementation).

In [4]:
gcMapOne = 'norm_comparison/normIC_iced.gcmap'   # Normalization using original algorithm
gcMapTwo = 'normalized/normIC_100kb.gcmap'       # Normalization using gcMapExplorer module

mapList, corrs, pvalues = gmlib.cmstats.correlateGCMaps(gcMapOne, gcMapTwo, corrType='pearson')

print('Pearson Correlation gcMapExplorer-IC vs original-IC')
print('====================================================')
print('\nChromosome     Corr      P-values')
print('--------------------------------------------------------')
for i in range(len(mapList)):
print('{0:5} {1:12.5} {2:12.5}'.format( mapList[i], corrs[i], pvalues[i]))
print('--------------------------------------------------------')
INFO:correlateGCMaps: Performing calculation for chr1 ...
INFO:correlateGCMaps:     Finished calculation for chr1 ...

INFO:correlateGCMaps: Performing calculation for chr5 ...
INFO:correlateGCMaps:     Finished calculation for chr5 ...

INFO:correlateGCMaps: Performing calculation for chr15 ...
INFO:correlateGCMaps:     Finished calculation for chr15 ...

INFO:correlateGCMaps: Performing calculation for chr20 ...
INFO:correlateGCMaps:     Finished calculation for chr20 ...

INFO:correlateGCMaps: Performing calculation for chr21 ...
INFO:correlateGCMaps:     Finished calculation for chr21 ...

INFO:correlateGCMaps: Performing calculation for chr22 ...
INFO:correlateGCMaps:     Finished calculation for chr22 ...

Pearson Correlation gcMapExplorer-IC vs original-IC
====================================================

Chromosome     Corr      P-values
--------------------------------------------------------
chr1           1.0          0.0
chr5           1.0          0.0
chr15          1.0          0.0
chr20          1.0          0.0
chr21          1.0          0.0
chr22          1.0          0.0
--------------------------------------------------------

## Results¶

As can be seen above output table, both normalized maps correlate perfectly, therefore, gcMapExplorer implemented normalization yielded almost same result as of the original algorithm.

To plot comparison between values from original and gcMapExplorer method

Below function is used to plot values from a single map.

In [5]:
def plot_comparison_for_chromosome(mapName, fig, idx):
# Read normalized map from original algorithm

# Read normalized map from gcMapExplorer implemented code

# Make ccmap matrix file available for reading

# Mask any bins that have zero value, since both map are calculated from same raw map,
# same bins will have missing data

ax = fig.add_subplot(3,2,idx)                                 # Axes plot
ax.set_yscale("log", basey=10, nonposy='clip')                # Set Y-axis to log scale
ax.set_xscale("log", basey=10, nonposy='clip')                # Set X-axis to log scale
ax.set_title('Comparison {0}'.format(mapName))                # Title plot

# Plot for Pearson correlation
# Plot only those bins that do not have missing data (> zero)

ax.set_xlabel('Original IC Normalized values')
ax.set_ylabel('gcMapExplorer IC Normalized values')

del origICNormCCmap
del gcMapExpNormCCmap

Now, we plot normalized values for some chromosome maps using above written function, which also shows that the obtaind values are similar from both implementations.

In [6]:
fig = plt.figure(figsize=(13,19))                             # Figure size
fig.subplots_adjust(hspace=0.3, wspace=0.25)                  # Space between sub-plots
mpl.rcParams.update({'font.size': 12})                        # Font-size

# Plot for chromosomes
plot_comparison_for_chromosome('chr1', fig, 1)                # For chr1
plot_comparison_for_chromosome('chr5', fig, 2)                # For chr5
plot_comparison_for_chromosome('chr15', fig, 3)               # For chr15
plot_comparison_for_chromosome('chr20', fig, 4)               # For chr20
plot_comparison_for_chromosome('chr21', fig, 5)               # For chr21
plot_comparison_for_chromosome('chr22', fig, 6)               # For chr22

# plt.savefig('compare_IC.png', dpi=300)
plt.show()