Comparison between original IC algorithm and gcMapExplorer implementation

Here, we compare the results from original IC matrix balancing algorithm with the gcMapExplorer implmentation.

Note: To run the below calculation, iced should be installed.

Note

  • Below we use already normalized maps that were calculated during previous tutorial.

Importing required modules:

[1]:
import gcMapExplorer.lib as gmlib
import numpy as np
import iced

import matplotlib as mpl
import matplotlib.pyplot as plt

# To show inline plots
%matplotlib inline
plt.style.use('ggplot')              # Theme for plotting

Wrapper function to normalize map using original algorithm

Follwing function perform several steps:

  • Extract a new matrix from input matrix by removing any rows and columns with missing data
  • Normalize it using iced module
  • Construct normalized matrix from normalization vector
  • Expand normalized matrix by re-inserting rows and column with missing data
[2]:
def normIC_from_iced(ccmap):
    # Discard rows and columns with missing data
    ccmap.make_readable()
    bNonZeros = gmlib.ccmapHelpers.get_nonzeros_index(ccmap.matrix)
    A = (ccmap.matrix[bNonZeros,:])[:,bNonZeros]

    # Calculate normalization using iced module
    outA = iced.normalization.ICE_normalization(A[:], eps=1e-4, max_iter=3000)

    # Initialize ouput ccmap object
    normCCMap = ccmap.copy(fill=0.0)
    normCCMap.make_editable()
    normCCMap.bNoData = ~bNonZeros

    # Store normalized values to output ccmap
    dsm_i = 0
    ox = normCCMap.matrix.shape[0]
    idx_fill = np.nonzero( ~normCCMap.bNoData )
    for i in range(ox):
        if not normCCMap.bNoData[i]:
            normCCMap.matrix[i, idx_fill] = outA[dsm_i]
            normCCMap.matrix[idx_fill, i] = outA[dsm_i]
            dsm_i += 1

    # Get the maximum and minimum value except zero
    ma = np.ma.masked_equal(outA, 0.0, copy=False)
    normCCMap.minvalue = ma.min()
    normCCMap.maxvalue = ma.max()

    normCCMap.make_unreadable()
    ccmap.make_unreadable()

    return normCCMap

Normalization using original algorithm from iced module

Here, we read raw maps from a gcmap file, normalize it using above function and save the normalized maps to output gcmap file.

[3]:
gcMapInputFile = 'cmaps/CooMatrix/rawObserved_100kb.gcmap'
gcMapOutFile = 'norm_comparison/normIC_iced.gcmap'

# Get list of maps in ascending order
gcmap = gmlib.gcmap.GCMAP(gcMapInputFile)
gcmap.loadSmallestMap()
mapList = gcmap.mapNameList.copy()
del gcmap

for mapName in mapList:
    print('Calculating for {0}...'.format(mapName))
    ccMap = gmlib.gcmap.loadGCMapAsCCMap(gcMapInputFile, mapName=mapName)
    normKR_ccmap = normIC_from_iced(ccMap)

    if normKR_ccmap is not None:
        gmlib.gcmap.addCCMap2GCMap(normKR_ccmap, gcMapOutFile,generateCoarse=True, coarsingMethod='sum',)

    del ccMap
    del normKR_ccmap
INFO:addCCMap2GCMap: Opened file [norm_comparison/normIC_iced.gcmap] for reading writing..
INFO:addCCMap2GCMap: Data for chr21 is already present in [norm_comparison/normIC_iced.gcmap], replacing it...
INFO:addCCMap2GCMap: Adding data to [norm_comparison/normIC_iced.gcmap] for [chr21] ...
INFO:addCCMap2GCMap:     ...Finished adding data for [chr21] ...
INFO:addCCMap2GCMap: Generating downsampled maps for [chr21] ...
INFO:addCCMap2GCMap:     ... Finished downsampling for [chr21] ...
INFO:addCCMap2GCMap: Closed file [norm_comparison/normIC_iced.gcmap]...
Calculating for chr21...
Calculating for chr22...
INFO:addCCMap2GCMap: Opened file [norm_comparison/normIC_iced.gcmap] for reading writing..
INFO:addCCMap2GCMap: Data for chr22 is already present in [norm_comparison/normIC_iced.gcmap], replacing it...
INFO:addCCMap2GCMap: Adding data to [norm_comparison/normIC_iced.gcmap] for [chr22] ...
INFO:addCCMap2GCMap:     ...Finished adding data for [chr22] ...
INFO:addCCMap2GCMap: Generating downsampled maps for [chr22] ...
INFO:addCCMap2GCMap:     ... Finished downsampling for [chr22] ...
INFO:addCCMap2GCMap: Closed file [norm_comparison/normIC_iced.gcmap]...
Calculating for chr20...
INFO:addCCMap2GCMap: Opened file [norm_comparison/normIC_iced.gcmap] for reading writing..
INFO:addCCMap2GCMap: Data for chr20 is already present in [norm_comparison/normIC_iced.gcmap], replacing it...
INFO:addCCMap2GCMap: Adding data to [norm_comparison/normIC_iced.gcmap] for [chr20] ...
INFO:addCCMap2GCMap:     ...Finished adding data for [chr20] ...
INFO:addCCMap2GCMap: Generating downsampled maps for [chr20] ...
INFO:addCCMap2GCMap:     ... Finished downsampling for [chr20] ...
INFO:addCCMap2GCMap: Closed file [norm_comparison/normIC_iced.gcmap]...
Calculating for chr15...
INFO:addCCMap2GCMap: Opened file [norm_comparison/normIC_iced.gcmap] for reading writing..
INFO:addCCMap2GCMap: Data for chr15 is already present in [norm_comparison/normIC_iced.gcmap], replacing it...
INFO:addCCMap2GCMap: Adding data to [norm_comparison/normIC_iced.gcmap] for [chr15] ...
INFO:addCCMap2GCMap:     ...Finished adding data for [chr15] ...
INFO:addCCMap2GCMap: Generating downsampled maps for [chr15] ...
INFO:addCCMap2GCMap:     ... Finished downsampling for [chr15] ...
INFO:addCCMap2GCMap: Closed file [norm_comparison/normIC_iced.gcmap]...
Calculating for chr5...
INFO:addCCMap2GCMap: Opened file [norm_comparison/normIC_iced.gcmap] for reading writing..
INFO:addCCMap2GCMap: Data for chr5 is already present in [norm_comparison/normIC_iced.gcmap], replacing it...
INFO:addCCMap2GCMap: Adding data to [norm_comparison/normIC_iced.gcmap] for [chr5] ...
INFO:addCCMap2GCMap:     ...Finished adding data for [chr5] ...
INFO:addCCMap2GCMap: Generating downsampled maps for [chr5] ...
INFO:addCCMap2GCMap:     ... Finished downsampling for [chr5] ...
INFO:addCCMap2GCMap: Closed file [norm_comparison/normIC_iced.gcmap]...
Calculating for chr1...
INFO:addCCMap2GCMap: Opened file [norm_comparison/normIC_iced.gcmap] for reading writing..
INFO:addCCMap2GCMap: Data for chr1 is already present in [norm_comparison/normIC_iced.gcmap], replacing it...
INFO:addCCMap2GCMap: Adding data to [norm_comparison/normIC_iced.gcmap] for [chr1] ...
INFO:addCCMap2GCMap:     ...Finished adding data for [chr1] ...
INFO:addCCMap2GCMap: Generating downsampled maps for [chr1] ...
INFO:addCCMap2GCMap:     ... Finished downsampling for [chr1] ...
INFO:addCCMap2GCMap: Closed file [norm_comparison/normIC_iced.gcmap]...

Correlation Calculation

Below we calculate correlation between above normalized maps (original algorithm) and previously normalized maps (gcMapExplorer implementation).

[4]:
gcMapOne = 'norm_comparison/normIC_iced.gcmap'   # Normalization using original algorithm
gcMapTwo = 'normalized/normIC_100kb.gcmap'       # Normalization using gcMapExplorer module

mapList, corrs, pvalues = gmlib.cmstats.correlateGCMaps(gcMapOne, gcMapTwo, corrType='pearson')

print('Pearson Correlation gcMapExplorer-IC vs original-IC')
print('====================================================')
print('\nChromosome     Corr      P-values')
print('--------------------------------------------------------')
for i in range(len(mapList)):
    print('{0:5} {1:12.5} {2:12.5}'.format( mapList[i], corrs[i], pvalues[i]))
print('--------------------------------------------------------')
INFO:correlateGCMaps: Performing calculation for chr1 ...
INFO:correlateGCMaps:     Finished calculation for chr1 ...

INFO:correlateGCMaps: Performing calculation for chr5 ...
INFO:correlateGCMaps:     Finished calculation for chr5 ...

INFO:correlateGCMaps: Performing calculation for chr15 ...
INFO:correlateGCMaps:     Finished calculation for chr15 ...

INFO:correlateGCMaps: Performing calculation for chr20 ...
INFO:correlateGCMaps:     Finished calculation for chr20 ...

INFO:correlateGCMaps: Performing calculation for chr21 ...
INFO:correlateGCMaps:     Finished calculation for chr21 ...

INFO:correlateGCMaps: Performing calculation for chr22 ...
INFO:correlateGCMaps:     Finished calculation for chr22 ...

Pearson Correlation gcMapExplorer-IC vs original-IC
====================================================

Chromosome     Corr      P-values
--------------------------------------------------------
chr1           1.0          0.0
chr5           1.0          0.0
chr15          1.0          0.0
chr20          1.0          0.0
chr21          1.0          0.0
chr22          1.0          0.0
--------------------------------------------------------

Results

As can be seen above output table, both normalized maps correlate perfectly, therefore, gcMapExplorer implemented normalization yielded almost same result as of the original algorithm.


To plot comparison between values from original and gcMapExplorer method

Below function is used to plot values from a single map.

[5]:
def plot_comparison_for_chromosome(mapName, fig, idx):
    # Read normalized map from original algorithm
    origICNormCCmap = gmlib.gcmap.loadGCMapAsCCMap('norm_comparison/normIC_iced.gcmap', mapName=mapName)

    # Read normalized map from gcMapExplorer implemented code
    gcMapExpNormCCmap = gmlib.gcmap.loadGCMapAsCCMap('normalized/normIC_100kb.gcmap', mapName=mapName)

    # Make ccmap matrix file available for reading
    origICNormCCmap.make_readable()
    gcMapExpNormCCmap.make_readable()

    # Mask any bins that have zero value, since both map are calculated from same raw map,
    # same bins will have missing data
    origICNormCCmapMasked = np.ma.masked_equal(origICNormCCmap.matrix, 0.0, copy=False)
    gcMapExpNormCCmapMasked = np.ma.masked_equal(gcMapExpNormCCmap.matrix, 0.0, copy=False)

    ax = fig.add_subplot(3,2,idx)                                 # Axes plot
    ax.set_yscale("log", basey=10, nonposy='clip')                # Set Y-axis to log scale
    ax.set_xscale("log", basey=10, nonposy='clip')                # Set X-axis to log scale
    ax.set_title('Comparison {0}'.format(mapName))                # Title plot

    # Plot for Pearson correlation
    # Plot only those bins that do not have missing data (> zero)
    ax.scatter(origICNormCCmapMasked.compressed(), gcMapExpNormCCmapMasked.compressed(), marker='o', s=6)

    ax.set_xlabel('Original IC Normalized values')
    ax.set_ylabel('gcMapExplorer IC Normalized values')

    del origICNormCCmap
    del gcMapExpNormCCmap

Now, we plot normalized values for some chromosome maps using above written function, which also shows that the obtaind values are similar from both implementations.

[6]:
fig = plt.figure(figsize=(13,19))                             # Figure size
fig.subplots_adjust(hspace=0.3, wspace=0.25)                  # Space between sub-plots
mpl.rcParams.update({'font.size': 12})                        # Font-size

# Plot for chromosomes
plot_comparison_for_chromosome('chr1', fig, 1)                # For chr1
plot_comparison_for_chromosome('chr5', fig, 2)                # For chr5
plot_comparison_for_chromosome('chr15', fig, 3)               # For chr15
plot_comparison_for_chromosome('chr20', fig, 4)               # For chr20
plot_comparison_for_chromosome('chr21', fig, 5)               # For chr21
plot_comparison_for_chromosome('chr22', fig, 6)               # For chr22

# plt.savefig('compare_IC.png', dpi=300)
plt.show()
../_images/modules_examples_compare_IC_norm_13_0.png