class HDF5Handler

HDF5Handler(filename[, title]) Handler for genomic data HDF5 file.
HDF5Handler.setTitle(title) Set title of the dataset
HDF5Handler.getChromList() To get list of all chromosomes present in hdf5 file
HDF5Handler.getResolutionList(chrom[, dataName]) To get all resolutions for given chromosome from hdf5 file
HDF5Handler.getDataNameList(chrom, resolution) List of all available arrays by respective coarse method name for given chromosome and resolution
HDF5Handler.hasChromosome(chrom) To get list of all chromosomes present in hdf5 file
HDF5Handler.hasResolution(chrom, resolution) To check if given resolution for given chromosome is present
HDF5Handler.hasDataName(chrom, resolution, …) To check if given data for given resolution for given chromosome is present
HDF5Handler.buildDataTree() Build data dictionary from the input hdf5 file
HDF5Handler.addDataByArray(Chrom, …[, …]) Add array to the hdf5 file for given chromosome, resolution and data name.
class HDF5Handler(filename, title=None)

Handler for genomic data HDF5 file.

This class acts like a handler and can be used to read, write, and modify genomic data file in HDF5 format. This is a binary file and is compressed using zlib method to reduce the storage memory.

Structure of HDF5 file: /<Chromosome>/<Resolution>/<1D Numpy Array>

HDF5 ──────────────────────────> title
  ├──────── chr1
  │           ├───── 1kb
  │           │        ├──────── amean  ( Arithmetic mean) (type: 1D Numpy Array)
  │           │        ├──────── median ( Median value   ) (type: 1D Numpy Array)
  │           │        ├──────── hmean  ( Harmonic mean  ) (type: 1D Numpy Array)
  │           │        ├──────── gmean  ( Geometric mean ) (type: 1D Numpy Array)
  │           │        ├──────── min    ( Minimum value  ) (type: 1D Numpy Array)
  │           │        └──────── max    ( Maximum value  ) (type: 1D Numpy Array)
  │           │
  │           ├────  5kb
  │           │        ├──────── amean  ( Arithmetic mean) (type: 1D Numpy Array)
  │           │        ├──────── median ( Median value   ) (type: 1D Numpy Array)
  │           │        ├──────── hmean  ( Harmonic mean  ) (type: 1D Numpy Array)
  │           │        ├──────── gmean  ( Geometric mean ) (type: 1D Numpy Array)
  │           │        ├──────── min    ( Minimum value  ) (type: 1D Numpy Array)
  │           │        └──────── max    ( Maximum value  ) (type: 1D Numpy Array)
  │           │
  │           └────  ...
  │
  ├──────── chr2
  │           ├───── 1kb
  │           │        ├──────── amean  ( Arithmetic mean) (type: 1D Numpy Array)
  │           │        ├──────── median ( Median value   ) (type: 1D Numpy Array)
  │           │        ├──────── hmean  ( Harmonic mean  ) (type: 1D Numpy Array)
  │           │        ├──────── gmean  ( Geometric mean ) (type: 1D Numpy Array)
  │           │        ├──────── min    ( Minimum value  ) (type: 1D Numpy Array)
  │           │        └──────── max    ( Maximum value  ) (type: 1D Numpy Array)
  │           └────  ..
  :
  :
  :
  └───── ...
filename

str – HDF5 file name

title

str – Title of the data

hdf5

h5py.File – input/output stream to HDF5 file

data

dict – This dictionary is generated by HDF5Handler.buildDataTree(). This dictionary gives access to all data arrays.

Parameters:
  • filename (str) – HDF5 file name. e.g.: abcxyz.h5
  • title (str) – title or name of the data

Examples

from gcMapExplorer import lib as gmlib
import numpy as np

# Load available file
hdf5Hand = gmlib.genomicsDataHandler.HDF5Handler('abcxyz.h5')

# Open the file
hdf5Hand.open()

# Build the data structure
hdf5Hand.buildDataTree()

# Print shape and maximum value of chr1->1kb->mean array
print(hdf5Hand.data['chr1']['1kb']['amean'].shape, np.amax(hdf5Hand.data['chr1']['1kb']['mean']))
addDataByArray(Chrom, resolution, data_name, value_array, compression='lzf')
Add array to the hdf5 file for given chromosome, resolution and data name.
It can be used either to add new data array or to replace existing data.
Parameters:
  • Chrom (str) – Chromosome Name
  • resolution (str) – Resolution of data
  • data_name (str) – Name of data.
  • value_array (numpy.ndarray) – An array containing values.
buildDataTree()

Build data dictionary from the input hdf5 file

To retrieve the data from hdf5 file, this function should be used to built the dictionary HDF5Handler.data. This dictionary gives access directly to data of any chromosome with specific resolution.

close()

close hdf5 file

getChromList()

To get list of all chromosomes present in hdf5 file

Returns:chroms – List of all chromosomes present in hdf5 file
Return type:list
getDataNameList(chrom, resolution)

List of all available arrays by respective coarse method name for given chromosome and resolution

Parameters:
  • chrom (str) – chromosome name
  • resolution (str) – resolution
Returns:

nameList – List of arrays by name of dataset

Return type:

list[str]

Raises:

KeyError – If chromosome not found in hdf5 file. If input resolution keyword is not found for input chromosome.

getResolutionList(chrom, dataName=None)

To get all resolutions for given chromosome from hdf5 file

Parameters:
  • chrom (str) – chromosome name
  • dataName (str) – Options to get list of all resolution list for given data name
Returns:

resolutionList – A list of all available resolutions for the given chromosome

Return type:

list[str]

Raises:

KeyError – If chromosome not found in hdf5 file

hasChromosome(chrom)

To get list of all chromosomes present in hdf5 file

Parameters:chrom (str) – Chromosome name to be look up in file.
Returns:gotChromosome – If queried chromosome present in file True otherwise False.
Return type:bool
hasDataName(chrom, resolution, dataName)

To check if given data for given resolution for given chromosome is present

Parameters:
  • chrom (str) – Chromosome name to be look up in file.
  • resolution (str) – Data Resolution for queried Chromosome
  • dataName (str) – Name of data to be queried in given Chromosome.
Returns:

gotDataName – If queried data in given chromosome at given resolution is present in file True otherwise False.

Return type:

bool

hasResolution(chrom, resolution, dataName=None)

To check if given resolution for given chromosome is present

Parameters:
  • chrom (str) – Chromosome name to be look up in file.
  • resolution (str) – Data Resolution for queried Chromosome
  • dataName (str) – Options to check if resolution for given data name is present
Returns:

gotResolution – If queried resolution of given chromosome present in file True otherwise False.

Return type:

bool

open()

open hdf5 file

setTitle(title)

Set title of the dataset

It can be used to set or replace the title of the dataset. If file is not yet opened, title will be stored to file when file will be opened.

Parameters:title (str) – The title of dataset.