gcmap module¶
gcmap.GCMAP (hdf5[, mapName, chromAtX, …]) |
To access Genome wide contact map. |
gcmap.GCMAP.checkMapExist ([mapName, …]) |
Check if a map is exist in the file |
gcmap.GCMAP.changeMap ([mapName, chromAtX, …]) |
Change the map for another chromosome |
gcmap.GCMAP.changeResolution (resolution) |
Try to change contact map of a given resolution. |
gcmap.GCMAP.toFinerResolution () |
Try to change contact map to next finer resolution |
gcmap.GCMAP.toCoarserResolution () |
Try to change contact map to next coarser resolution |
gcmap.GCMAP.loadSmallestMap ([resolution]) |
Load smallest sized contact map |
gcmap.GCMAP.genMapNameList ([sortBy]) |
Generate list of contact maps available in gcmap file |
gcmap.GCMAP.performDownSampling ([method]) |
Downsample recursively and store the maps |
gcmap.GCMAP.downsampleMapToResolution (resolution) |
Downsample the current map to a particular resolution |
gcmap.GCMAP.downsampleAllMapToResolution (…) |
Downsample all maps to a particular resolution |
gcmap.loadGCMapAsCCMap (filename[, mapName, …]) |
Load a map from gcmap file as a gcMapExplorer.lib.ccmap.CCMAP . |
gcmap.addCCMap2GCMap (cmap, filename[, …]) |
Add gcMapExplorer.lib.ccmap.CCMAP to a gcmap file |
gcmap.changeGCMapCompression (infile, …[, …]) |
Change compression method in GCMAP file |
GCMAP class¶
-
class
GCMAP
(hdf5, mapName=None, chromAtX=None, chromAtY=None, resolution=None)¶ To access Genome wide contact map.
It is similar to
gcMapExplorer.lib.ccmap.CCMAP
and contains same attributes. Therefore, bothgcMapExplorer.lib.ccmap.CCMAP
andGCMAP
can be used in same way to access attributes. It also contains additional attributes because it uses HDF5 file to read the maps on demand.Structure of gcmap file:
HDF5 │ ├──────── chr1 ──── Attributes : ['xlabel', 'ylabel', 'compression'] │ │ │ ├────── 10kb ( 2D Numpy Array ) ─── Attributes : ['minvalue', 'maxvalue', 'xshape', 'yshape', 'binsize'] │ ├────── 20kb ( 2D Numpy Array ) ─── Attributes : ['minvalue', 'maxvalue', 'xshape', 'yshape', 'binsize'] │ ├────── 40kb ( 2D Numpy Array ) ─── Attributes : ['minvalue', 'maxvalue', 'xshape', 'yshape', 'binsize'] │ ├────── 60kb ( 2D Numpy Array ) ─── Attributes : ['minvalue', 'maxvalue', 'xshape', 'yshape', 'binsize'] │ ├────── 80kb ( 2D Numpy Array ) ─── Attributes : ['minvalue', 'maxvalue', 'xshape', 'yshape', 'binsize'] │ ├────── 160kb ( 2D Numpy Array ) ─── Attributes : ['minvalue', 'maxvalue', 'xshape', 'yshape', 'binsize'] │ ├────── 320kb ( 2D Numpy Array ) ─── Attributes : ['minvalue', 'maxvalue', 'xshape', 'yshape', 'binsize'] │ ├────── 640kb ( 2D Numpy Array ) ─── Attributes : ['minvalue', 'maxvalue', 'xshape', 'yshape', 'binsize'] │ │ │ ├────── 10kb-bNoData ( 1D Numpy Array ) │ ├────── 20kb-bNoData ( 1D Numpy Array ) │ ├────── 40kb-bNoData ( 1D Numpy Array ) │ ├────── 60kb-bNoData ( 1D Numpy Array ) │ ├────── 80kb-bNoData ( 1D Numpy Array ) │ ├────── 160kb-bNoData ( 1D Numpy Array ) │ ├────── 320kb-bNoData ( 1D Numpy Array ) │ └────── 640kb-bNoData ( 1D Numpy Array ) │ ├──────── chr2 ──── Attributes : ['xlabel', 'ylabel', 'compression'] │ │ │ ├────── 10kb ( 2D Numpy Array ) ─── Attributes : ['minvalue', 'maxvalue', 'xshape', 'yshape', 'binsize'] │ ├────── 20kb ( 2D Numpy Array ) ─── Attributes : ['minvalue', 'maxvalue', 'xshape', 'yshape', 'binsize'] │ ├────── 40kb ( 2D Numpy Array ) ─── Attributes : ['minvalue', 'maxvalue', 'xshape', 'yshape', 'binsize'] │ ├────── 60kb ( 2D Numpy Array ) ─── Attributes : ['minvalue', 'maxvalue', 'xshape', 'yshape', 'binsize'] │ ├────── 80kb ( 2D Numpy Array ) ─── Attributes : ['minvalue', 'maxvalue', 'xshape', 'yshape', 'binsize'] │ ├────── 160kb ( 2D Numpy Array ) ─── Attributes : ['minvalue', 'maxvalue', 'xshape', 'yshape', 'binsize'] │ ├────── 320kb ( 2D Numpy Array ) ─── Attributes : ['minvalue', 'maxvalue', 'xshape', 'yshape', 'binsize'] │ ├────── 640kb ( 2D Numpy Array ) ─── Attributes : ['minvalue', 'maxvalue', 'xshape', 'yshape', 'binsize'] │ │ │ ├────── 10kb-bNoData ( 1D Numpy Array ) │ ├────── 20kb-bNoData ( 1D Numpy Array ) │ ├────── 40kb-bNoData ( 1D Numpy Array ) │ ├────── 60kb-bNoData ( 1D Numpy Array ) │ ├────── 80kb-bNoData ( 1D Numpy Array ) │ ├────── 160kb-bNoData ( 1D Numpy Array ) │ ├────── 320kb-bNoData ( 1D Numpy Array ) │ └────── 640kb-bNoData ( 1D Numpy Array ) : : : └───── ...
Note
- Reading the entire map from HDF5 could be time taking for very large map. Therefore, use this class in cases like read small region of map or read only once.
- To perform calculation, use
gcMapExplorer.lib.gcmap.loadGCMapAsCCMap()
as it returnsgcMapExplorer.lib.ccmap.CCMAP
.
- The class is instantiated by two methods:
>>> ccMapObj = gcMapExplorer.lib.ccmap.CCMAP(hdf5, 'chr22') # To read chr22 vs chr22 map >>> ccMapObj.matrix[200:400, 200:400] # Read region between 200 to 400 of chr22 vs chr22 map.
Parameters: - hdf5 (str or h5py.File) – Either gcmap file name or h5py file object, which is an entry point to HDF5 file.
- mapName (str) – Name of map. It could be chromosome name in case of intra-chromosomal map.
e.g.:
chr1
orchr2
. - chromAtX (str) – chromosome at X-axis. In case of intra-chromosomal map, this is not required because both at X and Y axis same chromosome is present.
- chromAtX – chromosome at Y-axis. In case of intra-chromosomal map, this is not required because
both at X and Y axis same chromosome is present. If
chromAtY = None
, both x-axis and y-axis contains same chromosome and map is of ‘intra’ of ‘cis’ type. - resolution (str) – Input resolution to read from file.
-
yticks
¶ list – Minimum and maximum locations along Y-axis. e.g.
yticks=[0, 400000]
-
xticks
¶ list – Minimum and maximum locations along X-axis. e.g.
xticks=[0, 400000]
-
binsize
¶ int – Resolution of data. In case of 10kb resolution, binsize is 10000.
-
title
¶ str – Title of the data
-
xlabel
¶ str – Title for X-axis, which is chromosome name along X-axis
-
ylabel
¶ str – Title for Y-axis, which is chromosome name along Y-axis
-
shape
¶ tuple – Overall shape of matrix
-
minvalue
¶ float – Minimum value in matrix
-
maxvalue
¶ float – Maximum value in matrix
-
matrix
¶ h5py.Dataset – A HDF5 Dataset object pointing to matrix/map. See here for more details:
-
bNoData
¶ numpy.ndarray – A boolean numpy array of matrix shape
-
bLog
¶ bool – If values in matrix are in log
-
dtype
¶ str – Data type of matrix/map
-
mapType
¶ str – Type of map:
intra
orinter
chromosomal map. If chromosome along X- and Y- axis is same, then map is intra-chromosomal, otherwise map is inter-chromosomal.
-
hdf5
¶ h5py.File – HDF5 file object instance
-
fileOpened
¶ bool – Whether a file is opened inside object or a HDF5 file object is provided to object. When a file is opened by object, it is closed before object is destroyed.
-
groupName
¶ str – Name of current contact map or group name in HDF5 file
-
resolution
¶ str – Resolution of current contact map
-
finestResolution
¶ str – Finest available resolution of current contact map
-
binsizes
¶ list – List of binsizes available for current contact map
-
mapNameList
¶ list – List of all available contact maps in gcmap file
-
changeMap
(mapName=None, chromAtX=None, chromAtY=None, resolution=None)¶ Change the map for another chromosome
It can be used to change the map. For example, to access the map of ‘chr20’ instead of ‘chr22’, use this function.
Parameters: - mapName (str) – Name of map. It could be chromosome name in case of intra-chromosomal map.
e.g.:
chr1
orchr2
. - chromAtX (str) – chromosome at X-axis. In case of intra-chromosomal map, this is not required because both at X and Y axis same chromosome is present.
- chromAtX – chromosome at Y-axis. In case of intra-chromosomal map, this is not required because
both at X and Y axis same chromosome is present. If
chromAtY = None
, both x-axis and y-axis contains same chromosome and map is of ‘intra’ of ‘cis’ type. - resolution (str) – Input resolution to read from file.
- For example:
>>> ccMapObj = gcMapExplorer.lib.ccmap.CCMAP(hdf5, 'chr22') # To read chr22 vs chr22 map >>> ccMapObj.matrix[200:400, 200:400] # To access region between 200 to 400 of chr22 vs chr22 map. >>> ccMapObj.changeMap('chr20') # Changed to read chr20 vs chr20 map >>> ccMapObj.matrix[200:400, 200:400] # Now, to access region between 200 to 400 of chr20 vs chr20 map.
- mapName (str) – Name of map. It could be chromosome name in case of intra-chromosomal map.
e.g.:
-
changeResolution
(resolution)¶ Try to change contact map of a given resolution.
Parameters: resolution (str) – Resolution to change. Returns: success – If change was successful True
otherwiseFalse
.Return type: bool
-
checkMapExist
(mapName=None, chromAtX=None, chromAtY=None, resolution=None)¶ Check if a map is exist in the file
It can be used to check if a map is exist in the file.
Parameters: - mapName (str) – Name of map. It could be chromosome name in case of intra-chromosomal map.
e.g.:
chr1
orchr2
. - chromAtX (str) – chromosome at X-axis. In case of intra-chromosomal map, this is not required because both at X and Y axis same chromosome is present.
- chromAtX – chromosome at Y-axis. In case of intra-chromosomal map, this is not required because
both at X and Y axis same chromosome is present. If
chromAtY = None
, both x-axis and y-axis contains same chromosome and map is of ‘intra’ of ‘cis’ type. - resolution (str) – Input resolution to read from file.
Returns: doExist – If map is present then
True
otherwiseFalse
.Return type: - mapName (str) – Name of map. It could be chromosome name in case of intra-chromosomal map.
e.g.:
-
downsampleAllMapToResolution
(resolution, method='sum')¶ Downsample all maps to a particular resolution
By default, maps with only few resolutions are generated. For example, when finest resolution is 5 kb, the downsampled map with 10kb, 20kb, 40kb, 80kb, 160kb etc are generated. If other resolution (e.g. 100kb) is required, then it can be used.
Parameters: Returns: success –
True
orFalse
Return type:
-
downsampleMapToResolution
(resolution, method='sum')¶ Downsample the current map to a particular resolution
By default, maps with only few resolutions are generated. For example, when finest resolution is 5 kb, the downsampled map with 10kb, 20kb, 40kb, 80kb, 160kb etc are generated. If other resolution (e.g. 100kb) is required, then it can be used.
Note
It only downsample for current map. To downsample all maps, look
gcmap.GCMAP.downsampleAllMapToResolution()
.Parameters: Returns: success –
True
orFalse
Return type:
-
genMapNameList
(sortBy='name')¶ Generate list of contact maps available in gcmap file
The maps can be either sorted by name or by size. The listed maps are in
GCMAP.mapNameList
.Parameters: sortBy (str) – Accepted keywords are name
andsize
.
-
get_ticks
(binsize=None)¶ To get xticks and yticks for the matrix
Parameters: binsize (int) – Number of base in each bin or pixel or box of contact map. Returns: - xticks (numpy.array) – 1D array containing positions along X-axis
- yticks (numpy.array) – 1D array containing positions along X-axis
-
loadSmallestMap
(resolution=None)¶ Load smallest sized contact map
Parameters: resolution (str) – Resolution to change. When it is not provided, finest resolution of smallest maps is loaded.
-
performDownSampling
(method='sum')¶ Downsample recursively and store the maps
It Downsample the maps and automatically add it to same input
gcmap
file. Downsampling works recursively, and downsampled maps are generated until map has a size of less than 500.Parameters: method (str) – Method of downsampling. Three accepted methods are sum
: sum all values,mean
: Average of all values andmax
: Maximum of all values.
gcMapExplorer.gcmap¶
-
loadGCMapAsCCMap
(filename, mapName=None, chromAtX=None, chromAtY=None, resolution=None, workDir=None)¶ Load a map from gcmap file as a
gcMapExplorer.lib.ccmap.CCMAP
.Parameters: - filename (str) – Either a gcmap file or h5py.File instance or GCMAP.hdf5 from which contact map data will be read.
- mapName (str) – Name of contact map. e.g.:
chr1
orchr2
. - chromAtX (str) – Name of chromosome at X-axis
- chromAtY (str) – Name of chromosome at Y-axis. If
chromAtY = None
, both x-axis and y-axis contains same chromosome and map is of ‘intra’ of ‘cis’ type. - resolution (str) – Resolution of required map. If contact map of input resolution is not found,
None
will be returned. - workDir (str) – Name of directory where temporary files will be kept. These files will be automatically deleted.
Returns: object
Return type: None or
gcMapExplorer.lib.ccmap.CCMAP
-
addCCMap2GCMap
(cmap, filename, scaleoffset=None, compression='lzf', generateCoarse=True, coarseningMethod='sum', replaceCMap=True, logHandler=None)¶ Add
gcMapExplorer.lib.ccmap.CCMAP
to a gcmap fileParameters: - cmap (
gcMapExplorer.lib.ccmap.CCMAP
) – An instance ofgcMapExplorer.lib.ccmap.CCMAP
, which will be added to gcmap file - filename (str) – Name of
gcmap
file or h5py.File instance or GCMAP.hdf5 to which output data will be written. - scaleoffset (int) – For integer data, this specifies the number of bits to retain in hdf5 file. In case of
0
value, HDF5 automatically compute the number of bits required for lossless compression of the chunk. For floating-point data, indicates the number of digits after the decimal point to retain. This can help to reduce the final file size. In case ofNone
data will be stored without any loss of precision. - compression (str) – Compression method. Presently allowed :
lzf
for LZF compression andgzip
for GZIP compression. - generateCoarse (bool) – Also generates all coarser maps where resolutions will be coarsen by a factor of two, consecutively.
e.g.: In case of 10 kb input resolution, downsampled maps of
20kb
,40kb
,80kb
,160kb
,320kb
etc. will be generated until, map size is less than 500. - coarseningMethod (str) – Method of downsampling. Three accepted methods are
sum
: sum all values,mean
: Average of all values andmax
: Maximum of all values. - replaceCMap (bool) – Replace entire old ccmap data including resolutions and coarsen data.
Returns: success – If addition was successful
True
otherwiseFalse
.Return type: - cmap (
-
changeGCMapCompression
(infile, outfile, compression, logHandler=None)¶ Change compression method in GCMAP file
Change compression method in GCMAP file. Currently LZF and GZIP compression is allowed. LZF is fast and moderately compressing algorithm. However, GZIP is slower with large compressing ratio.
Warning
GCMAP with
gzip
compression can be universally read from any programming language using HDF5 library, howeverLZF
compression can be only decompressed using Python h5py package.Parameters: