class HDF5Handler¶
HDF5Handler(filename[, title]) |
Handler for genomic data HDF5 file. |
HDF5Handler.setTitle(title) |
Set title of the dataset |
HDF5Handler.getChromList() |
To get list of all chromosomes present in hdf5 file |
HDF5Handler.getResolutionList(chrom) |
To get all resolutions for given chromosome from hdf5 file |
HDF5Handler.getDataNameList(chrom, resolution) |
List of all available arrays by respecitve coarse method name for given chromosome and resolution |
HDF5Handler.hasChromosome(chrom) |
To get list of all chromosomes present in hdf5 file |
HDF5Handler.hasResolution(chrom, resolution) |
To get list of all chromosomes present in hdf5 file |
HDF5Handler.hasDataName(chrom, resolution, ...) |
To get list of all chromosomes present in hdf5 file |
HDF5Handler.buildDataTree() |
Build data dictionary from the input hdf5 file |
-
class
HDF5Handler(filename, title=None)¶ Handler for genomic data HDF5 file.
This class acts like a handler and can be used to read, write, and modify genomic data file in HDF5 format. This is a binary file and is compressed using
zlibmethod to reduce the storage memory.Structure of HDF5 file:
/<Chromosome>/<Resolution>/<1D Numpy Array>HDF5 ──────────────────────────> title ├──────── chr1 │ ├───── 1kb │ │ ├──────── amean ( Arithmatic mean) (type: 1D Numpy Array) │ │ ├──────── median ( Median value ) (type: 1D Numpy Array) │ │ ├──────── hmean ( Harmonic mean ) (type: 1D Numpy Array) │ │ ├──────── gmean ( Geometric mean ) (type: 1D Numpy Array) │ │ ├──────── min ( Minimum value ) (type: 1D Numpy Array) │ │ └──────── max ( Maximum value ) (type: 1D Numpy Array) │ │ │ ├──── 5kb │ │ ├──────── amean ( Arithmatic mean) (type: 1D Numpy Array) │ │ ├──────── median ( Median value ) (type: 1D Numpy Array) │ │ ├──────── hmean ( Harmonic mean ) (type: 1D Numpy Array) │ │ ├──────── gmean ( Geometric mean ) (type: 1D Numpy Array) │ │ ├──────── min ( Minimum value ) (type: 1D Numpy Array) │ │ └──────── max ( Maximum value ) (type: 1D Numpy Array) │ │ │ └──── ... │ ├──────── chr2 │ ├───── 1kb │ │ ├──────── amean ( Arithmatic mean) (type: 1D Numpy Array) │ │ ├──────── median ( Median value ) (type: 1D Numpy Array) │ │ ├──────── hmean ( Harmonic mean ) (type: 1D Numpy Array) │ │ ├──────── gmean ( Geometric mean ) (type: 1D Numpy Array) │ │ ├──────── min ( Minimum value ) (type: 1D Numpy Array) │ │ └──────── max ( Maximum value ) (type: 1D Numpy Array) │ └──── .. : : : └───── ...
-
filename¶ str – HDF5 file name
-
title¶ str – Title of the data
-
hdf5¶ h5py.File – input/output stream to HDF5 file
-
data¶ dict – This dictionary is generated by
HDF5Handler.buildDataTree(). This dictionary gives access to all data arrays.
Parameters: Examples
from hiCMapAnalyze import genomicsDataHandler as gdh import numpy as np # Load available file hdf5Hand = gdh.HDF5Handler('test.h5') # Build the data structure, this is an essential step to retrieve the data easiliy as shown below hdf5Hand.buildDataTree() # Print shape and maximum value of chr1->1kb->mean array print(hdf5Hand.data['chr1']['1kb']['mean'].shape, np.amax(hdf5Hand.data['chr1']['1kb']['mean']))
-
addDataByArray(Chrom, resolution, data_name, value_array, compression='lzf')¶ - Add array to the hdf5 file for given chromosome, resolution and data name.
- It can be used either to add new data array or to replace existing data.
Parameters:
-
buildDataTree()¶ Build data dictionary from the input hdf5 file
To retrieve the data from hdf5 file, this function should be used to built the dictionary
HDF5Handler.data. This dictionary gives access directly to data of any chromosome with specific resolution.
-
close()¶ close hdf5 file
-
getChromList()¶ To get list of all chromosomes present in hdf5 file
Returns: chroms – List of all chromosomes present in hdf5 file Return type: list
-
getDataNameList(chrom, resolution)¶ List of all available arrays by respecitve coarse method name for given chromosome and resolution
Parameters: Returns: nameList – List of arrays by name of dataset
Return type: Raises: KeyError– If chromosome not found in hdf5 file. If input resolution keyword is not found for input chromosome.
-
getResolutionList(chrom)¶ To get all resolutions for given chromosome from hdf5 file
Parameters: chrom (str) – chromosome name Returns: resolutionList – A list of all available resolutions for the given chromosome Return type: list[str] Raises: KeyError– If chromosome not found in hdf5 file
-
hasChromosome(chrom)¶ To get list of all chromosomes present in hdf5 file
Parameters: chrom (str) – Chromosome name to be look up in file. Returns: gotChromsome – If queried chromosome present in file TrueotherwiseFalse.Return type: bool
-
hasDataName(chrom, resolution, dataName)¶ To get list of all chromosomes present in hdf5 file
Parameters: Returns: gotDataName – If queried data in given chromsome at given resolution is present in file
TrueotherwiseFalse.Return type:
-
hasResolution(chrom, resolution)¶ To get list of all chromosomes present in hdf5 file
Parameters: Returns: gotResolution – If queried resolution of given chromsome present in file
TrueotherwiseFalse.Return type:
-
open()¶ open hdf5 file
-