# ccmapHelpers module¶

 ccmapHelpers.MemoryMappedArray Convenient wrapper for numpy memory mapped array file ccmapHelpers.MemoryMappedArray.copy(self) Copy this numpy memory mapped array and generate new ccmapHelpers.MemoryMappedArray.copy_from(...) Copy values from source MemoryMappedArray ccmapHelpers.MemoryMappedArray.copy_to(self, ...) Copy values to destination MemoryMappedArray ccmapHelpers.get_nonzeros_index(matrix[, ...]) To get a numpy array of bool values for all rows/columns which have NO missing data ccmapHelpers.remove_zeros(matrix[, ...]) To remove rows/columns with missing data (zero values)

## gcMapExplorer.ccmapHelpers¶

get_nonzeros_index(matrix, thershold_percentile=None, thershold_data_occup=None)

To get a numpy array of bool values for all rows/columns which have NO missing data

Parameters: matrix (numpy.memmap or gcMapExplorer.lib.ccmap.CCMAP.matrix) – Input matrix percentile_thershold_no_data (int) – It can be used to filter the map, where rows/columns with largest numbers of missing data can be discarded. percentile_thershold_no_data should be between 1 and 100. This options discard the rows and columns which are above this percentile. For example: if this value is 99, those row or columns will be discarded which contains larger than number of zeros (missing data) at 99 percentile. To calculate percentile, all blank rows are removed, then in all rows, number of zeros are counted. Afterwards, number of zeros at percentile_thershold_no_data percentile is obtained. In next step, if a row contain number of zeros larger than this percentile value, the whole row and column is assigned to have missing data. This percentile indicates highest numbers of zeros (missing data) in given rows/columns. thershold_data_occup (float) – It can be used to filter the map, where rows/columns with largest numbers of missing data can be discarded. This ratio is (number of bins with data) / (total number of bins in the given row/column). For example: if thershold_data_occup = 0.8, then all rows containing more than 20% of missing data will be discarded. Note that this parameter is suitable for low resolution data because maps are likely to be much less sparse. bData – 1D-array containing True and False values. * If True: row/column has data above the thershold * If False: row/column has no data under the thershold numpy.array[bool]
remove_zeros(matrix, thershold_percentile=None, thershold_data_occup=None, workDir=None)

To remove rows/columns with missing data (zero values)

Parameters: matrix (numpy.memmap or gcMapExplorer.lib.ccmap.CCMAP.matrix) – Input matrix percentile_thershold_no_data (int) – It can be used to filter the map, where rows/columns with largest numbers of missing data can be discarded. percentile_thershold_no_data should be between 1 and 100. This options discard the rows and columns which are above this percentile. For example: if this value is 99, those row or columns will be discarded which contains larger than number of zeros (missing data) at 99 percentile. To calculate percentile, all blank rows are removed, then in all rows, number of zeros are counted. Afterwards, number of zeros at percentile_thershold_no_data percentile is obtained. In next step, if a row contain number of zeros larger than this percentile value, the whole row and column is assigned to have missing data. This percentile indicates highest numbers of zeros (missing data) in given rows/columns. thershold_data_occup (float) – It can be used to filter the map, where rows/columns with largest numbers of missing data can be discarded. This ratio is (number of bins with data) / (total number of bins in the given row/column). For example: if thershold_data_occup = 0.8, then all rows containing more than 20% of missing data will be discarded. Note that this parameter is suitable for low resolution data because maps are likely to be much less sparse. workDir (str) – Path to the directory where temporary intermediate files are generated. If None, files are generated in the temporary directory according to the OS type. A (MemoryMappedArray) – MemoryMappedArray instance containing new truncated array as memory mapped file bNoData (numpy.array[bool]) – 1D-array containing True and False values. * If True: row/column has no data under the thershold * If False: row/column has data above the thershold

## MemoryMappedArray class¶

class MemoryMappedArray

Convenient wrapper for numpy memory mapped array file

For more details, see here: (See: Numpy memmap).

path2matrix

str – Path to numpy memory mapped array file

arr

numpy.memmap – Pointer to memory mapped numpy array

workDir

str – Path to the directory where temporary intermediate files are generated. If None, files are generated in the temporary directory according to the OS type.

dtype

str – Data type of array

Parameters: shape (tuple) – Shape of array fill (int or float (Optional)) – Fill array with this value dtype (str) – Data type of array
copy(self)

Copy this numpy memory mapped array and generate new

Returns: out – A new MemoryMappedArray instance with copied arrays MemoryMappedArray
copy_from(self, src)

Copy values from source MemoryMappedArray

Parameters: src (MemoryMappedArray) – Source memory mapped arrays for new values None ValueError – if src is not of MemoryMappedArray instance
copy_to(self, dest)

Copy values to destination MemoryMappedArray

Parameters: dest (MemoryMappedArray) – Destination memory mapped arrays None ValueError – if dest is not of MemoryMappedArray instance

# KnightRuizNorm class¶

class KnightRuizNorm

A modified Knight-Ruiz algorithm for matrix balancing

The original ported Knight-Ruiz algorithm is modifed to implement the normalization using both memory/RAM and disk. It allows the normalization of small Hi-C maps to huge maps that could not be accomodated in RAM.

Parameters: A (numpy.ndarray or MemoryMappedArray) – Input matrix. Note Matrix should not contain any row or column with all zero values (missing data for row/column). This type of matrix can be obtained from remove_zeros(). If memory='HDD', A should be MemoryMappedArray memory (str) – Accepted keywords are RAM and HDD: RAM: All intermediate arrays are generated in memory(RAM). This version is faster, however, it requires RAM depending on the input matrix size. HDD: All intermediate arrays are generated as memory mapped array files on hard-disk. workDir (str) – Path to the directory where temporary intermediate files are generated. If None, files are generated in the temporary directory according to the OS type.
run(self, A, fl, OutMatrix, bNoData)

Perform Knight-Ruiz normalization

Parameters: A (numpy.ndarray or MemoryMappedArray.arr) – Input matrix. Note Matrix should not contain any row or column with all zero values (missing data for row/column). This type of matrix can be obtained from remove_zeros(). Warning If A was MemoryMappedArray in KnightRuizNorm. Here A should be MemoryMappedArray.arr instead of MemoryMappedArray. fl (int) – Its value should be zero OutMatrix (gcMapExplorer.lib.ccmap.CCMAP.matrix) – Output matrix of Hi-C map to which normalized matrix is returned. bNoData (numpy.ndarray[bool]) – A numpy.array containing bool to show if rows/columns have missing data. It can be obtained from remove_zeros().