ccmapHelpers module¶

`ccmapHelpers.MemoryMappedArray`	Convenient wrapper for numpy memory mapped array file
`ccmapHelpers.MemoryMappedArray.copy`	Copy this numpy memory mapped array and generate new
`ccmapHelpers.MemoryMappedArray.copy_from`	Copy values from source `MemoryMappedArray`
`ccmapHelpers.MemoryMappedArray.copy_to`	Copy values to destination `MemoryMappedArray`
`ccmapHelpers.get_nonzeros_index`(matrix[, …])	To get a numpy array of bool values for all rows/columns which have NO missing data
`ccmapHelpers.remove_zeros`(matrix[, …])	To remove rows/columns with missing data (zero values)

gcMapExplorer.ccmapHelpers¶

get_nonzeros_index(matrix, threshold_percentile=None, threshold_data_occup=None, filterByDiagonal=False)¶

To get a numpy array of bool values for all rows/columns which have NO missing data

Parameters:	matrix (numpy.memmap or `gcMapExplorer.lib.ccmap.CCMAP.matrix`) – Input matrix percentile_threshold_no_data (int) – It can be used to filter the map, where rows/columns with largest numbers of missing data can be discarded. `percentile_threshold_no_data` should be between 1 and 100. This options discard the rows and columns which are above this percentile. For example: if this value is 99, those row or columns will be discarded which contains larger than number of zeros (missing data) at 99 percentile. To calculate percentile, all blank rows are removed, then in all rows, number of zeros are counted. Afterwards, number of zeros at percentile_threshold_no_data percentile is obtained. In next step, if a row contain number of zeros larger than this percentile value, the whole row and column is assigned to have missing data. This percentile indicates highest numbers of zeros (missing data) in given rows/columns. threshold_data_occup (float) – It can be used to filter the map, where rows/columns with largest numbers of missing data can be discarded. This ratio is (number of bins with data) / (total number of bins in the given row/column). For example: if threshold_data_occup = 0.8, then all rows containing more than 20% of missing data will be discarded. Note that this parameter is suitable for low resolution data because maps are likely to be much less sparse.
Returns:	bData – 1D-array containing `True` and `False` values. * If `True`: row/column has data above the threshold * If `False`: row/column has no data under the threshold
Return type:	numpy.array[bool]

remove_zeros(matrix, threshold_percentile=None, threshold_data_occup=None, workDir=None)¶

To remove rows/columns with missing data (zero values)

Parameters:

matrix (numpy.memmap or gcMapExplorer.lib.ccmap.CCMAP.matrix) – Input matrix
percentile_threshold_no_data (int) –
It can be used to filter the map, where rows/columns with largest numbers of missing data can be discarded. percentile_threshold_no_data should be between 1 and 100. This options discard the rows and columns which are above this percentile. For example: if this value is 99, those row or columns will be discarded which contains larger than number of zeros (missing data) at 99 percentile.

To calculate percentile, all blank rows are removed, then in all rows, number of zeros are counted. Afterwards, number of zeros at percentile_threshold_no_data percentile is obtained. In next step, if a row contain number of zeros larger than this percentile value, the whole row and column is assigned to have missing data. This percentile indicates highest numbers of zeros (missing data) in given rows/columns.
threshold_data_occup (float) –
It can be used to filter the map, where rows/columns with largest numbers of missing data can be discarded. This ratio is (number of bins with data) / (total number of bins in the given row/column). For example: if threshold_data_occup = 0.8, then all rows containing more than 20% of missing data will be discarded.

Note that this parameter is suitable for low resolution data because maps are likely to be much less sparse.
workDir (str) – Path to the directory where temporary intermediate files are generated. If None, files are generated in the temporary directory according to the OS type.

Returns:

A (MemoryMappedArray) – MemoryMappedArray instance containing new truncated array as memory mapped file
bNoData (numpy.array[bool]) – 1D-array containing True and False values. * If True: row/column has no data under the threshold * If False: row/column has data above the threshold

MemoryMappedArray class¶

class MemoryMappedArray¶

Convenient wrapper for numpy memory mapped array file

For more details, see here: (See: Numpy memmap).

path2matrix¶: str – Path to numpy memory mapped array file

arr¶: numpy.memmap – Pointer to memory mapped numpy array

workDir¶: str – Path to the directory where temporary intermediate files are generated. If None, files are generated in the temporary directory according to the OS type.

dtype¶: str – Data type of array

Parameters:	shape (tuple) – Shape of array fill (int or float (Optional)) – Fill array with this value dtype (str) – Data type of array

copy¶

Copy this numpy memory mapped array and generate new

Returns:	out – A new `MemoryMappedArray` instance with copied arrays
Return type:	`MemoryMappedArray`

copy_from¶

Copy values from source MemoryMappedArray

Parameters:	src (`MemoryMappedArray`) – Source memory mapped arrays for new values
Returns:
Return type:	None
Raises:	`ValueError` – if src is not of `MemoryMappedArray` instance

copy_to¶

Copy values to destination MemoryMappedArray

Parameters:	dest (`MemoryMappedArray`) – Destination memory mapped arrays
Returns:
Return type:	None
Raises:	`ValueError` – if dest is not of `MemoryMappedArray` instance

KnightRuizNorm class¶

class KnightRuizNorm¶

A modified Knight-Ruiz algorithm for matrix balancing

The original ported Knight-Ruiz algorithm is modified to implement the normalization using both memory/RAM and disk. It allows the normalization of small Hi-C maps to huge maps that could not be accommodated in RAM.

Parameters:

A (numpy.ndarray or MemoryMappedArray) –
Input matrix.
Note
- Matrix should not contain any row or column with all zero values (missing data for row/column). This type of matrix can be obtained from remove_zeros().
- If memory='HDD', A should be MemoryMappedArray
memory (str) –
Accepted keywords are RAM and HDD:
- RAM: All intermediate arrays are generated in memory(RAM). This version is faster, however, it requires RAM depending on the input matrix size.
- HDD: All intermediate arrays are generated as memory mapped array files on hard-disk.
workDir (str) – Path to the directory where temporary intermediate files are generated. If None, files are generated in the temporary directory according to the OS type.

run¶

Perform Knight-Ruiz normalization

Parameters:

A (numpy.ndarray or MemoryMappedArray.arr) –
Input matrix.
Note
- Matrix should not contain any row or column with all zero values (missing data for row/column). This type of matrix can be obtained from remove_zeros().
Warning

If A was MemoryMappedArray in KnightRuizNorm. Here A should be MemoryMappedArray.arr instead of MemoryMappedArray.
fl (int) – Its value should be zero
OutMatrix (gcMapExplorer.lib.ccmap.CCMAP.matrix) – Output matrix of Hi-C map to which normalized matrix is returned.
bNoData (numpy.ndarray[bool]) – A numpy.array containing bool to show if rows/columns have missing data. It can be obtained from remove_zeros().