class HDF5Handler
¶
HDF5Handler (filename[, title]) |
Handler for genomic data HDF5 file. |
HDF5Handler.setTitle (title) |
Set title of the dataset |
HDF5Handler.getChromList () |
To get list of all chromosomes present in hdf5 file |
HDF5Handler.getResolutionList (chrom[, dataName]) |
To get all resolutions for given chromosome from hdf5 file |
HDF5Handler.getDataNameList (chrom, resolution) |
List of all available arrays by respective coarse method name for given chromosome and resolution |
HDF5Handler.hasChromosome (chrom) |
To get list of all chromosomes present in hdf5 file |
HDF5Handler.hasResolution (chrom, resolution) |
To check if given resolution for given chromosome is present |
HDF5Handler.hasDataName (chrom, resolution, …) |
To check if given data for given resolution for given chromosome is present |
HDF5Handler.buildDataTree () |
Build data dictionary from the input hdf5 file |
HDF5Handler.addDataByArray (Chrom, …[, …]) |
Add array to the hdf5 file for given chromosome, resolution and data name. |
-
class
HDF5Handler
(filename, title=None)¶ Handler for genomic data HDF5 file.
This class acts like a handler and can be used to read, write, and modify genomic data file in HDF5 format. This is a binary file and is compressed using
zlib
method to reduce the storage memory.Structure of HDF5 file:
/<Chromosome>/<Resolution>/<1D Numpy Array>
HDF5 ──────────────────────────> title ├──────── chr1 │ ├───── 1kb │ │ ├──────── amean ( Arithmetic mean) (type: 1D Numpy Array) │ │ ├──────── median ( Median value ) (type: 1D Numpy Array) │ │ ├──────── hmean ( Harmonic mean ) (type: 1D Numpy Array) │ │ ├──────── gmean ( Geometric mean ) (type: 1D Numpy Array) │ │ ├──────── min ( Minimum value ) (type: 1D Numpy Array) │ │ └──────── max ( Maximum value ) (type: 1D Numpy Array) │ │ │ ├──── 5kb │ │ ├──────── amean ( Arithmetic mean) (type: 1D Numpy Array) │ │ ├──────── median ( Median value ) (type: 1D Numpy Array) │ │ ├──────── hmean ( Harmonic mean ) (type: 1D Numpy Array) │ │ ├──────── gmean ( Geometric mean ) (type: 1D Numpy Array) │ │ ├──────── min ( Minimum value ) (type: 1D Numpy Array) │ │ └──────── max ( Maximum value ) (type: 1D Numpy Array) │ │ │ └──── ... │ ├──────── chr2 │ ├───── 1kb │ │ ├──────── amean ( Arithmetic mean) (type: 1D Numpy Array) │ │ ├──────── median ( Median value ) (type: 1D Numpy Array) │ │ ├──────── hmean ( Harmonic mean ) (type: 1D Numpy Array) │ │ ├──────── gmean ( Geometric mean ) (type: 1D Numpy Array) │ │ ├──────── min ( Minimum value ) (type: 1D Numpy Array) │ │ └──────── max ( Maximum value ) (type: 1D Numpy Array) │ └──── .. : : : └───── ...
-
filename
¶ str – HDF5 file name
-
title
¶ str – Title of the data
-
hdf5
¶ h5py.File – input/output stream to HDF5 file
-
data
¶ dict – This dictionary is generated by
HDF5Handler.buildDataTree()
. This dictionary gives access to all data arrays.
Parameters: Examples
from gcMapExplorer import lib as gmlib import numpy as np # Load available file hdf5Hand = gmlib.genomicsDataHandler.HDF5Handler('abcxyz.h5') # Open the file hdf5Hand.open() # Build the data structure hdf5Hand.buildDataTree() # Print shape and maximum value of chr1->1kb->mean array print(hdf5Hand.data['chr1']['1kb']['amean'].shape, np.amax(hdf5Hand.data['chr1']['1kb']['mean']))
-
addDataByArray
(Chrom, resolution, data_name, value_array, compression='lzf')¶ - Add array to the hdf5 file for given chromosome, resolution and data name.
- It can be used either to add new data array or to replace existing data.
Parameters:
-
buildDataTree
()¶ Build data dictionary from the input hdf5 file
To retrieve the data from hdf5 file, this function should be used to built the dictionary
HDF5Handler.data
. This dictionary gives access directly to data of any chromosome with specific resolution.
-
close
()¶ close hdf5 file
-
getChromList
()¶ To get list of all chromosomes present in hdf5 file
Returns: chroms – List of all chromosomes present in hdf5 file Return type: list
-
getDataNameList
(chrom, resolution)¶ List of all available arrays by respective coarse method name for given chromosome and resolution
Parameters: Returns: nameList – List of arrays by name of dataset
Return type: Raises: KeyError
– If chromosome not found in hdf5 file. If input resolution keyword is not found for input chromosome.
-
getResolutionList
(chrom, dataName=None)¶ To get all resolutions for given chromosome from hdf5 file
Parameters: Returns: resolutionList – A list of all available resolutions for the given chromosome
Return type: Raises: KeyError
– If chromosome not found in hdf5 file
-
hasChromosome
(chrom)¶ To get list of all chromosomes present in hdf5 file
Parameters: chrom (str) – Chromosome name to be look up in file. Returns: gotChromosome – If queried chromosome present in file True
otherwiseFalse
.Return type: bool
-
hasDataName
(chrom, resolution, dataName)¶ To check if given data for given resolution for given chromosome is present
Parameters: Returns: gotDataName – If queried data in given chromosome at given resolution is present in file
True
otherwiseFalse
.Return type:
-
hasResolution
(chrom, resolution, dataName=None)¶ To check if given resolution for given chromosome is present
Parameters: Returns: gotResolution – If queried resolution of given chromosome present in file
True
otherwiseFalse
.Return type:
-
open
()¶ open hdf5 file
-