class HDF5Handler

HDF5Handler(filename[, title]) Handler for genomic data HDF5 file.
HDF5Handler.setTitle(title) Set title of the dataset
HDF5Handler.getChromList() To get list of all chromosomes present in hdf5 file
HDF5Handler.getResolutionList(chrom) To get all resolutions for given chromosome from hdf5 file
HDF5Handler.getDataNameList(chrom, resolution) List of all available arrays by respecitve coarse method name for given chromosome and resolution
HDF5Handler.hasChromosome(chrom) To get list of all chromosomes present in hdf5 file
HDF5Handler.hasResolution(chrom, resolution) To get list of all chromosomes present in hdf5 file
HDF5Handler.hasDataName(chrom, resolution, ...) To get list of all chromosomes present in hdf5 file
HDF5Handler.buildDataTree() Build data dictionary from the input hdf5 file
class HDF5Handler(filename, title=None)

Handler for genomic data HDF5 file.

This class acts like a handler and can be used to read, write, and modify genomic data file in HDF5 format. This is a binary file and is compressed using zlib method to reduce the storage memory.

Structure of HDF5 file: /<Chromosome>/<Resolution>/<1D Numpy Array>

HDF5 ──────────────────────────> title
  ├──────── chr1
  │           ├───── 1kb
  │           │        ├──────── amean  ( Arithmatic mean) (type: 1D Numpy Array)
  │           │        ├──────── median ( Median value   ) (type: 1D Numpy Array)
  │           │        ├──────── hmean  ( Harmonic mean  ) (type: 1D Numpy Array)
  │           │        ├──────── gmean  ( Geometric mean ) (type: 1D Numpy Array)
  │           │        ├──────── min    ( Minimum value  ) (type: 1D Numpy Array)
  │           │        └──────── max    ( Maximum value  ) (type: 1D Numpy Array)
  │           │
  │           ├────  5kb
  │           │        ├──────── amean  ( Arithmatic mean) (type: 1D Numpy Array)
  │           │        ├──────── median ( Median value   ) (type: 1D Numpy Array)
  │           │        ├──────── hmean  ( Harmonic mean  ) (type: 1D Numpy Array)
  │           │        ├──────── gmean  ( Geometric mean ) (type: 1D Numpy Array)
  │           │        ├──────── min    ( Minimum value  ) (type: 1D Numpy Array)
  │           │        └──────── max    ( Maximum value  ) (type: 1D Numpy Array)
  │           │
  │           └────  ...
  │
  ├──────── chr2
  │           ├───── 1kb
  │           │        ├──────── amean  ( Arithmatic mean) (type: 1D Numpy Array)
  │           │        ├──────── median ( Median value   ) (type: 1D Numpy Array)
  │           │        ├──────── hmean  ( Harmonic mean  ) (type: 1D Numpy Array)
  │           │        ├──────── gmean  ( Geometric mean ) (type: 1D Numpy Array)
  │           │        ├──────── min    ( Minimum value  ) (type: 1D Numpy Array)
  │           │        └──────── max    ( Maximum value  ) (type: 1D Numpy Array)
  │           └────  ..
  :
  :
  :
  └───── ...
filename

str – HDF5 file name

title

str – Title of the data

hdf5

h5py.File – input/output stream to HDF5 file

data

dict – This dictionary is generated by HDF5Handler.buildDataTree(). This dictionary gives access to all data arrays.

Parameters:
  • filename (str) – HDF5 file name. e.g.: abcxyz.h5
  • title (str) – title or name of the data

Examples

from hiCMapAnalyze import genomicsDataHandler as gdh
import numpy as np

# Load available file
hdf5Hand = gdh.HDF5Handler('test.h5')

# Build the data structure, this is an essential step to retrieve the data easiliy as shown below
hdf5Hand.buildDataTree()

# Print shape and maximum value of chr1->1kb->mean array
print(hdf5Hand.data['chr1']['1kb']['mean'].shape, np.amax(hdf5Hand.data['chr1']['1kb']['mean']))
addDataByArray(Chrom, resolution, data_name, value_array, compression='lzf')
Add array to the hdf5 file for given chromosome, resolution and data name.
It can be used either to add new data array or to replace existing data.
Parameters:
  • Chrom (str) – Chromosome Name
  • resolution (str) – Reslution of data
  • data_name (str) – Name of data.
  • value_array (numpy.ndarray) – An array containing values.
buildDataTree()

Build data dictionary from the input hdf5 file

To retrieve the data from hdf5 file, this function should be used to built the dictionary HDF5Handler.data. This dictionary gives access directly to data of any chromosome with specific resolution.

close()

close hdf5 file

getChromList()

To get list of all chromosomes present in hdf5 file

Returns:chroms – List of all chromosomes present in hdf5 file
Return type:list
getDataNameList(chrom, resolution)

List of all available arrays by respecitve coarse method name for given chromosome and resolution

Parameters:
  • chrom (str) – chromosome name
  • resolution (str) – resolution
Returns:

nameList – List of arrays by name of dataset

Return type:

list[str]

Raises:

KeyError – If chromosome not found in hdf5 file. If input resolution keyword is not found for input chromosome.

getResolutionList(chrom)

To get all resolutions for given chromosome from hdf5 file

Parameters:chrom (str) – chromosome name
Returns:resolutionList – A list of all available resolutions for the given chromosome
Return type:list[str]
Raises:KeyError – If chromosome not found in hdf5 file
hasChromosome(chrom)

To get list of all chromosomes present in hdf5 file

Parameters:chrom (str) – Chromosome name to be look up in file.
Returns:gotChromsome – If queried chromosome present in file True otherwise False.
Return type:bool
hasDataName(chrom, resolution, dataName)

To get list of all chromosomes present in hdf5 file

Parameters:
  • chrom (str) – Chromosome name to be look up in file.
  • resolution (str) – Data Resolution for queried Chromosome
  • dataName (str) – Name of data to be queried in given Chromosome.
Returns:

gotDataName – If queried data in given chromsome at given resolution is present in file True otherwise False.

Return type:

bool

hasResolution(chrom, resolution)

To get list of all chromosomes present in hdf5 file

Parameters:
  • chrom (str) – Chromosome name to be look up in file.
  • resolution (str) – Data Resolution for queried Chromosome
Returns:

gotResolution – If queried resolution of given chromsome present in file True otherwise False.

Return type:

bool

open()

open hdf5 file

setTitle(title)

Set title of the dataset

It can be used to set or replace the title of the dataset. If file is not yet opened, title will be stored to file when file will be opened.

Parameters:title (str) – The title of dataset.