ccmapHelpers module¶
ccmapHelpers.MemoryMappedArray |
Convenient wrapper for numpy memory mapped array file |
ccmapHelpers.MemoryMappedArray.copy |
Copy this numpy memory mapped array and generate new |
ccmapHelpers.MemoryMappedArray.copy_from |
Copy values from source MemoryMappedArray |
ccmapHelpers.MemoryMappedArray.copy_to |
Copy values to destination MemoryMappedArray |
ccmapHelpers.get_nonzeros_index (matrix[, …]) |
To get a numpy array of bool values for all rows/columns which have NO missing data |
ccmapHelpers.remove_zeros (matrix[, …]) |
To remove rows/columns with missing data (zero values) |
gcMapExplorer.ccmapHelpers¶
-
get_nonzeros_index
(matrix, threshold_percentile=None, threshold_data_occup=None, filterByDiagonal=False)¶ To get a numpy array of bool values for all rows/columns which have NO missing data
Parameters: - matrix (numpy.memmap or
gcMapExplorer.lib.ccmap.CCMAP.matrix
) – Input matrix - percentile_threshold_no_data (int) –
It can be used to filter the map, where rows/columns with largest numbers of missing data can be discarded.
percentile_threshold_no_data
should be between 1 and 100. This options discard the rows and columns which are above this percentile. For example: if this value is 99, those row or columns will be discarded which contains larger than number of zeros (missing data) at 99 percentile.To calculate percentile, all blank rows are removed, then in all rows, number of zeros are counted. Afterwards, number of zeros at percentile_threshold_no_data percentile is obtained. In next step, if a row contain number of zeros larger than this percentile value, the whole row and column is assigned to have missing data. This percentile indicates highest numbers of zeros (missing data) in given rows/columns.
- threshold_data_occup (float) –
It can be used to filter the map, where rows/columns with largest numbers of missing data can be discarded. This ratio is (number of bins with data) / (total number of bins in the given row/column). For example: if threshold_data_occup = 0.8, then all rows containing more than 20% of missing data will be discarded.
Note that this parameter is suitable for low resolution data because maps are likely to be much less sparse.
Returns: bData – 1D-array containing
True
andFalse
values. * IfTrue
: row/column has data above the threshold * IfFalse
: row/column has no data under the thresholdReturn type: numpy.array[bool]
- matrix (numpy.memmap or
-
remove_zeros
(matrix, threshold_percentile=None, threshold_data_occup=None, workDir=None)¶ To remove rows/columns with missing data (zero values)
Parameters: - matrix (numpy.memmap or
gcMapExplorer.lib.ccmap.CCMAP.matrix
) – Input matrix - percentile_threshold_no_data (int) –
It can be used to filter the map, where rows/columns with largest numbers of missing data can be discarded.
percentile_threshold_no_data
should be between 1 and 100. This options discard the rows and columns which are above this percentile. For example: if this value is 99, those row or columns will be discarded which contains larger than number of zeros (missing data) at 99 percentile.To calculate percentile, all blank rows are removed, then in all rows, number of zeros are counted. Afterwards, number of zeros at percentile_threshold_no_data percentile is obtained. In next step, if a row contain number of zeros larger than this percentile value, the whole row and column is assigned to have missing data. This percentile indicates highest numbers of zeros (missing data) in given rows/columns.
- threshold_data_occup (float) –
It can be used to filter the map, where rows/columns with largest numbers of missing data can be discarded. This ratio is (number of bins with data) / (total number of bins in the given row/column). For example: if threshold_data_occup = 0.8, then all rows containing more than 20% of missing data will be discarded.
Note that this parameter is suitable for low resolution data because maps are likely to be much less sparse.
- workDir (str) – Path to the directory where temporary intermediate files are generated. If
None
, files are generated in the temporary directory according to the OS type.
Returns: - A (
MemoryMappedArray
) –MemoryMappedArray
instance containing new truncated array as memory mapped file - bNoData (numpy.array[bool]) – 1D-array containing
True
andFalse
values. * IfTrue
: row/column has no data under the threshold * IfFalse
: row/column has data above the threshold
- matrix (numpy.memmap or
MemoryMappedArray class¶
-
class
MemoryMappedArray
¶ Convenient wrapper for numpy memory mapped array file
For more details, see here: (See: Numpy memmap).
-
path2matrix
¶ str – Path to numpy memory mapped array file
-
arr
¶ numpy.memmap – Pointer to memory mapped numpy array
-
workDir
¶ str – Path to the directory where temporary intermediate files are generated. If
None
, files are generated in the temporary directory according to the OS type.
-
dtype
¶ str – Data type of array
Parameters: -
copy
¶ Copy this numpy memory mapped array and generate new
Returns: out – A new MemoryMappedArray
instance with copied arraysReturn type: MemoryMappedArray
-
copy_from
¶ Copy values from source
MemoryMappedArray
Parameters: src ( MemoryMappedArray
) – Source memory mapped arrays for new valuesReturns: Return type: None Raises: ValueError
– if src is not ofMemoryMappedArray
instance
-
copy_to
¶ Copy values to destination
MemoryMappedArray
Parameters: dest ( MemoryMappedArray
) – Destination memory mapped arraysReturns: Return type: None Raises: ValueError
– if dest is not ofMemoryMappedArray
instance
-
KnightRuizNorm class¶
-
class
KnightRuizNorm
¶ A modified Knight-Ruiz algorithm for matrix balancing
The original ported Knight-Ruiz algorithm is modified to implement the normalization using both memory/RAM and disk. It allows the normalization of small Hi-C maps to huge maps that could not be accommodated in RAM.
Parameters: - A (numpy.ndarray or
MemoryMappedArray
) –Input matrix.
Note
- Matrix should not contain any row or column with all zero values (missing data for row/column). This type of matrix can be obtained from
remove_zeros()
. - If
memory='HDD'
,A
should beMemoryMappedArray
- Matrix should not contain any row or column with all zero values (missing data for row/column). This type of matrix can be obtained from
- memory (str) –
Accepted keywords are
RAM
andHDD
:RAM
: All intermediate arrays are generated in memory(RAM). This version is faster, however, it requires RAM depending on the input matrix size.HDD
: All intermediate arrays are generated as memory mapped array files on hard-disk.
- workDir (str) – Path to the directory where temporary intermediate files are generated. If
None
, files are generated in the temporary directory according to the OS type.
-
run
¶ Perform Knight-Ruiz normalization
Parameters: - A (numpy.ndarray or
MemoryMappedArray.arr
) –Input matrix.
Note
- Matrix should not contain any row or column with all zero values (missing data for row/column). This type of matrix can be obtained from
remove_zeros()
.
Warning
If A was
MemoryMappedArray
inKnightRuizNorm
. HereA
should beMemoryMappedArray.arr
instead ofMemoryMappedArray
. - Matrix should not contain any row or column with all zero values (missing data for row/column). This type of matrix can be obtained from
- fl (int) – Its value should be zero
- OutMatrix (
gcMapExplorer.lib.ccmap.CCMAP.matrix
) – Output matrix of Hi-C map to which normalized matrix is returned. - bNoData (numpy.ndarray[bool]) – A numpy.array containing bool to show if rows/columns have missing data. It can be obtained from
remove_zeros()
.
- A (numpy.ndarray or
- A (numpy.ndarray or