gcmap module

gcmap.GCMAP(hdf5[, mapName, chromAtX, …]) To access Genome wide contact map.
gcmap.GCMAP.checkMapExist([mapName, …]) Check if a map is exist in the file
gcmap.GCMAP.changeMap([mapName, chromAtX, …]) Change the map for another chromosome
gcmap.GCMAP.changeResolution(resolution) Try to change contact map of a given resolution.
gcmap.GCMAP.toFinerResolution() Try to change contact map to next finer resolution
gcmap.GCMAP.toCoarserResolution() Try to change contact map to next coarser resolution
gcmap.GCMAP.loadSmallestMap([resolution]) Load smallest sized contact map
gcmap.GCMAP.genMapNameList([sortBy]) Generate list of contact maps available in gcmap file
gcmap.GCMAP.performDownSampling([method]) Downsample recursively and store the maps
gcmap.GCMAP.downsampleMapToResolution(resolution) Downsample the current map to a particular resolution
gcmap.GCMAP.downsampleAllMapToResolution(…) Downsample all maps to a particular resolution
gcmap.loadGCMapAsCCMap(filename[, mapName, …]) Load a map from gcmap file as a gcMapExplorer.lib.ccmap.CCMAP.
gcmap.addCCMap2GCMap(cmap, filename[, …]) Add gcMapExplorer.lib.ccmap.CCMAP to a gcmap file
gcmap.changeGCMapCompression(infile, …[, …]) Change compression method in GCMAP file

GCMAP class

class GCMAP(hdf5, mapName=None, chromAtX=None, chromAtY=None, resolution=None)

To access Genome wide contact map.

It is similar to gcMapExplorer.lib.ccmap.CCMAP and contains same attributes. Therefore, both gcMapExplorer.lib.ccmap.CCMAP and GCMAP can be used in same way to access attributes. It also contains additional attributes because it uses HDF5 file to read the maps on demand.

Structure of gcmap file:

HDF5
│
├──────── chr1 ──── Attributes : ['xlabel', 'ylabel', 'compression']
│           │
│           ├────── 10kb  ( 2D Numpy Array ) ─── Attributes : ['minvalue', 'maxvalue', 'xshape', 'yshape', 'binsize']
│           ├────── 20kb  ( 2D Numpy Array ) ─── Attributes : ['minvalue', 'maxvalue', 'xshape', 'yshape', 'binsize']
│           ├────── 40kb  ( 2D Numpy Array ) ─── Attributes : ['minvalue', 'maxvalue', 'xshape', 'yshape', 'binsize']
│           ├────── 60kb  ( 2D Numpy Array ) ─── Attributes : ['minvalue', 'maxvalue', 'xshape', 'yshape', 'binsize']
│           ├────── 80kb  ( 2D Numpy Array ) ─── Attributes : ['minvalue', 'maxvalue', 'xshape', 'yshape', 'binsize']
│           ├────── 160kb ( 2D Numpy Array ) ─── Attributes : ['minvalue', 'maxvalue', 'xshape', 'yshape', 'binsize']
│           ├────── 320kb ( 2D Numpy Array ) ─── Attributes : ['minvalue', 'maxvalue', 'xshape', 'yshape', 'binsize']
│           ├────── 640kb ( 2D Numpy Array ) ─── Attributes : ['minvalue', 'maxvalue', 'xshape', 'yshape', 'binsize']
│           │
│           ├────── 10kb-bNoData  ( 1D Numpy Array )
│           ├────── 20kb-bNoData  ( 1D Numpy Array )
│           ├────── 40kb-bNoData  ( 1D Numpy Array )
│           ├────── 60kb-bNoData  ( 1D Numpy Array )
│           ├────── 80kb-bNoData  ( 1D Numpy Array )
│           ├────── 160kb-bNoData ( 1D Numpy Array )
│           ├────── 320kb-bNoData ( 1D Numpy Array )
│           └────── 640kb-bNoData ( 1D Numpy Array )
│
├──────── chr2 ──── Attributes : ['xlabel', 'ylabel', 'compression']
│           │
│           ├────── 10kb  ( 2D Numpy Array ) ─── Attributes : ['minvalue', 'maxvalue', 'xshape', 'yshape', 'binsize']
│           ├────── 20kb  ( 2D Numpy Array ) ─── Attributes : ['minvalue', 'maxvalue', 'xshape', 'yshape', 'binsize']
│           ├────── 40kb  ( 2D Numpy Array ) ─── Attributes : ['minvalue', 'maxvalue', 'xshape', 'yshape', 'binsize']
│           ├────── 60kb  ( 2D Numpy Array ) ─── Attributes : ['minvalue', 'maxvalue', 'xshape', 'yshape', 'binsize']
│           ├────── 80kb  ( 2D Numpy Array ) ─── Attributes : ['minvalue', 'maxvalue', 'xshape', 'yshape', 'binsize']
│           ├────── 160kb ( 2D Numpy Array ) ─── Attributes : ['minvalue', 'maxvalue', 'xshape', 'yshape', 'binsize']
│           ├────── 320kb ( 2D Numpy Array ) ─── Attributes : ['minvalue', 'maxvalue', 'xshape', 'yshape', 'binsize']
│           ├────── 640kb ( 2D Numpy Array ) ─── Attributes : ['minvalue', 'maxvalue', 'xshape', 'yshape', 'binsize']
│           │
│           ├────── 10kb-bNoData  ( 1D Numpy Array )
│           ├────── 20kb-bNoData  ( 1D Numpy Array )
│           ├────── 40kb-bNoData  ( 1D Numpy Array )
│           ├────── 60kb-bNoData  ( 1D Numpy Array )
│           ├────── 80kb-bNoData  ( 1D Numpy Array )
│           ├────── 160kb-bNoData ( 1D Numpy Array )
│           ├────── 320kb-bNoData ( 1D Numpy Array )
│           └────── 640kb-bNoData ( 1D Numpy Array )
:
:
:
└───── ...

Note

The class is instantiated by two methods:
>>> ccMapObj = gcMapExplorer.lib.ccmap.CCMAP(hdf5, 'chr22')    # To read chr22 vs chr22 map
>>> ccMapObj.matrix[200:400, 200:400]  # Read region between 200 to 400 of chr22 vs chr22 map.
Parameters:
  • hdf5 (str or h5py.File) – Either gcmap file name or h5py file object, which is an entry point to HDF5 file.
  • mapName (str) – Name of map. It could be chromosome name in case of intra-chromosomal map. e.g.: chr1 or chr2.
  • chromAtX (str) – chromosome at X-axis. In case of intra-chromosomal map, this is not required because both at X and Y axis same chromosome is present.
  • chromAtX – chromosome at Y-axis. In case of intra-chromosomal map, this is not required because both at X and Y axis same chromosome is present. If chromAtY = None, both x-axis and y-axis contains same chromosome and map is of ‘intra’ of ‘cis’ type.
  • resolution (str) – Input resolution to read from file.
yticks

list – Minimum and maximum locations along Y-axis. e.g. yticks=[0, 400000]

xticks

list – Minimum and maximum locations along X-axis. e.g. xticks=[0, 400000]

binsize

int – Resolution of data. In case of 10kb resolution, binsize is 10000.

title

str – Title of the data

xlabel

str – Title for X-axis, which is chromosome name along X-axis

ylabel

str – Title for Y-axis, which is chromosome name along Y-axis

shape

tuple – Overall shape of matrix

minvalue

float – Minimum value in matrix

maxvalue

float – Maximum value in matrix

matrix

h5py.Dataset – A HDF5 Dataset object pointing to matrix/map. See here for more details:

bNoData

numpy.ndarray – A boolean numpy array of matrix shape

bLog

bool – If values in matrix are in log

dtype

str – Data type of matrix/map

mapType

str – Type of map: intra or inter chromosomal map. If chromosome along X- and Y- axis is same, then map is intra-chromosomal, otherwise map is inter-chromosomal.

hdf5

h5py.File – HDF5 file object instance

fileOpened

bool – Whether a file is opened inside object or a HDF5 file object is provided to object. When a file is opened by object, it is closed before object is destroyed.

groupName

str – Name of current contact map or group name in HDF5 file

resolution

str – Resolution of current contact map

finestResolution

str – Finest available resolution of current contact map

binsizes

list – List of binsizes available for current contact map

mapNameList

list – List of all available contact maps in gcmap file

changeMap(mapName=None, chromAtX=None, chromAtY=None, resolution=None)

Change the map for another chromosome

It can be used to change the map. For example, to access the map of ‘chr20’ instead of ‘chr22’, use this function.

Parameters:
  • mapName (str) – Name of map. It could be chromosome name in case of intra-chromosomal map. e.g.: chr1 or chr2.
  • chromAtX (str) – chromosome at X-axis. In case of intra-chromosomal map, this is not required because both at X and Y axis same chromosome is present.
  • chromAtX – chromosome at Y-axis. In case of intra-chromosomal map, this is not required because both at X and Y axis same chromosome is present. If chromAtY = None, both x-axis and y-axis contains same chromosome and map is of ‘intra’ of ‘cis’ type.
  • resolution (str) – Input resolution to read from file.
For example:
>>> ccMapObj = gcMapExplorer.lib.ccmap.CCMAP(hdf5, 'chr22')    # To read chr22 vs chr22 map
>>> ccMapObj.matrix[200:400, 200:400]  # To access region between 200 to 400 of chr22 vs chr22 map.
>>> ccMapObj.changeMap('chr20')        # Changed to read chr20 vs chr20 map
>>> ccMapObj.matrix[200:400, 200:400]  # Now, to access region between 200 to 400 of chr20 vs chr20 map.
changeResolution(resolution)

Try to change contact map of a given resolution.

Parameters:resolution (str) – Resolution to change.
Returns:success – If change was successful True otherwise False.
Return type:bool
checkMapExist(mapName=None, chromAtX=None, chromAtY=None, resolution=None)

Check if a map is exist in the file

It can be used to check if a map is exist in the file.

Parameters:
  • mapName (str) – Name of map. It could be chromosome name in case of intra-chromosomal map. e.g.: chr1 or chr2.
  • chromAtX (str) – chromosome at X-axis. In case of intra-chromosomal map, this is not required because both at X and Y axis same chromosome is present.
  • chromAtX – chromosome at Y-axis. In case of intra-chromosomal map, this is not required because both at X and Y axis same chromosome is present. If chromAtY = None, both x-axis and y-axis contains same chromosome and map is of ‘intra’ of ‘cis’ type.
  • resolution (str) – Input resolution to read from file.
Returns:

doExist – If map is present then True otherwise False.

Return type:

bool

downsampleAllMapToResolution(resolution, method='sum')

Downsample all maps to a particular resolution

By default, maps with only few resolutions are generated. For example, when finest resolution is 5 kb, the downsampled map with 10kb, 20kb, 40kb, 80kb, 160kb etc are generated. If other resolution (e.g. 100kb) is required, then it can be used.

Parameters:
  • resolution (str) – Resolution to downsample.
  • method (str) – Method of downsampling. Three accepted methods are sum: sum all values, mean: Average of all values and max: Maximum of all values.
Returns:

successTrue or False

Return type:

bool

downsampleMapToResolution(resolution, method='sum')

Downsample the current map to a particular resolution

By default, maps with only few resolutions are generated. For example, when finest resolution is 5 kb, the downsampled map with 10kb, 20kb, 40kb, 80kb, 160kb etc are generated. If other resolution (e.g. 100kb) is required, then it can be used.

Note

It only downsample for current map. To downsample all maps, look gcmap.GCMAP.downsampleAllMapToResolution().

Parameters:
  • resolution (str) – Resolution to downsample
  • method (str) – Method of downsampling. Three accepted methods are sum: sum all values, mean: Average of all values and max: Maximum of all values.
Returns:

successTrue or False

Return type:

bool

genMapNameList(sortBy='name')

Generate list of contact maps available in gcmap file

The maps can be either sorted by name or by size. The listed maps are in GCMAP.mapNameList.

Parameters:sortBy (str) – Accepted keywords are name and size.
get_ticks(binsize=None)

To get xticks and yticks for the matrix

Parameters:binsize (int) – Number of base in each bin or pixel or box of contact map.
Returns:
  • xticks (numpy.array) – 1D array containing positions along X-axis
  • yticks (numpy.array) – 1D array containing positions along X-axis
loadSmallestMap(resolution=None)

Load smallest sized contact map

Parameters:resolution (str) – Resolution to change. When it is not provided, finest resolution of smallest maps is loaded.
performDownSampling(method='sum')

Downsample recursively and store the maps

It Downsample the maps and automatically add it to same input gcmap file. Downsampling works recursively, and downsampled maps are generated until map has a size of less than 500.

Parameters:method (str) – Method of downsampling. Three accepted methods are sum: sum all values, mean: Average of all values and max: Maximum of all values.
toCoarserResolution()

Try to change contact map to next coarser resolution

Returns:success – If change was successful True otherwise False.
Return type:bool
toFinerResolution()

Try to change contact map to next finer resolution

Returns:success – If change was successful True otherwise False.
Return type:bool

gcMapExplorer.gcmap

loadGCMapAsCCMap(filename, mapName=None, chromAtX=None, chromAtY=None, resolution=None, workDir=None)

Load a map from gcmap file as a gcMapExplorer.lib.ccmap.CCMAP.

Parameters:
  • filename (str) – Either a gcmap file or h5py.File instance or GCMAP.hdf5 from which contact map data will be read.
  • mapName (str) – Name of contact map. e.g.: chr1 or chr2.
  • chromAtX (str) – Name of chromosome at X-axis
  • chromAtY (str) – Name of chromosome at Y-axis. If chromAtY = None, both x-axis and y-axis contains same chromosome and map is of ‘intra’ of ‘cis’ type.
  • resolution (str) – Resolution of required map. If contact map of input resolution is not found, None will be returned.
  • workDir (str) – Name of directory where temporary files will be kept. These files will be automatically deleted.
Returns:

object

Return type:

None or gcMapExplorer.lib.ccmap.CCMAP

addCCMap2GCMap(cmap, filename, scaleoffset=None, compression='lzf', generateCoarse=True, coarseningMethod='sum', replaceCMap=True, logHandler=None)

Add gcMapExplorer.lib.ccmap.CCMAP to a gcmap file

Parameters:
  • cmap (gcMapExplorer.lib.ccmap.CCMAP) – An instance of gcMapExplorer.lib.ccmap.CCMAP, which will be added to gcmap file
  • filename (str) – Name of gcmap file or h5py.File instance or GCMAP.hdf5 to which output data will be written.
  • scaleoffset (int) – For integer data, this specifies the number of bits to retain in hdf5 file. In case of 0 value, HDF5 automatically compute the number of bits required for lossless compression of the chunk. For floating-point data, indicates the number of digits after the decimal point to retain. This can help to reduce the final file size. In case of None data will be stored without any loss of precision.
  • compression (str) – Compression method. Presently allowed : lzf for LZF compression and gzip for GZIP compression.
  • generateCoarse (bool) – Also generates all coarser maps where resolutions will be coarsen by a factor of two, consecutively. e.g.: In case of 10 kb input resolution, downsampled maps of 20kb, 40kb, 80kb, 160kb, 320kb etc. will be generated until, map size is less than 500.
  • coarseningMethod (str) – Method of downsampling. Three accepted methods are sum: sum all values, mean: Average of all values and max: Maximum of all values.
  • replaceCMap (bool) – Replace entire old ccmap data including resolutions and coarsen data.
Returns:

success – If addition was successful True otherwise False.

Return type:

bool

changeGCMapCompression(infile, outfile, compression, logHandler=None)

Change compression method in GCMAP file

Change compression method in GCMAP file. Currently LZF and GZIP compression is allowed. LZF is fast and moderately compressing algorithm. However, GZIP is slower with large compressing ratio.

Warning

GCMAP with gzip compression can be universally read from any programming language using HDF5 library, however LZF compression can be only decompressed using Python h5py package.

Parameters:
  • infile (str) – Input GCMAP file
  • outfile (str) – Output GCMAP file
  • compression (str) – Method of compression: lzf` or gzip