class HDF5Handler¶
HDF5Handler(filename[, title]) |
Handler for genomic data HDF5 file. |
HDF5Handler.setTitle(title) |
Set title of the dataset |
HDF5Handler.getChromList() |
To get list of all chromosomes present in hdf5 file |
HDF5Handler.getResolutionList(chrom[, dataName]) |
To get all resolutions for given chromosome from hdf5 file |
HDF5Handler.getDataNameList(chrom, resolution) |
List of all available arrays by respective coarse method name for given chromosome and resolution |
HDF5Handler.hasChromosome(chrom) |
To get list of all chromosomes present in hdf5 file |
HDF5Handler.hasResolution(chrom, resolution) |
To check if given resolution for given chromosome is present |
HDF5Handler.hasDataName(chrom, resolution, …) |
To check if given data for given resolution for given chromosome is present |
HDF5Handler.buildDataTree() |
Build data dictionary from the input hdf5 file |
HDF5Handler.addDataByArray(Chrom, …[, …]) |
Add array to the hdf5 file for given chromosome, resolution and data name. |
-
class
HDF5Handler(filename, title=None)¶ Handler for genomic data HDF5 file.
This class acts like a handler and can be used to read, write, and modify genomic data file in HDF5 format. This is a binary file and is compressed using
zlibmethod to reduce the storage memory.Structure of HDF5 file:
/<Chromosome>/<Resolution>/<1D Numpy Array>HDF5 ──────────────────────────> title ├──────── chr1 │ ├───── 1kb │ │ ├──────── amean ( Arithmetic mean) (type: 1D Numpy Array) │ │ ├──────── median ( Median value ) (type: 1D Numpy Array) │ │ ├──────── hmean ( Harmonic mean ) (type: 1D Numpy Array) │ │ ├──────── gmean ( Geometric mean ) (type: 1D Numpy Array) │ │ ├──────── min ( Minimum value ) (type: 1D Numpy Array) │ │ └──────── max ( Maximum value ) (type: 1D Numpy Array) │ │ │ ├──── 5kb │ │ ├──────── amean ( Arithmetic mean) (type: 1D Numpy Array) │ │ ├──────── median ( Median value ) (type: 1D Numpy Array) │ │ ├──────── hmean ( Harmonic mean ) (type: 1D Numpy Array) │ │ ├──────── gmean ( Geometric mean ) (type: 1D Numpy Array) │ │ ├──────── min ( Minimum value ) (type: 1D Numpy Array) │ │ └──────── max ( Maximum value ) (type: 1D Numpy Array) │ │ │ └──── ... │ ├──────── chr2 │ ├───── 1kb │ │ ├──────── amean ( Arithmetic mean) (type: 1D Numpy Array) │ │ ├──────── median ( Median value ) (type: 1D Numpy Array) │ │ ├──────── hmean ( Harmonic mean ) (type: 1D Numpy Array) │ │ ├──────── gmean ( Geometric mean ) (type: 1D Numpy Array) │ │ ├──────── min ( Minimum value ) (type: 1D Numpy Array) │ │ └──────── max ( Maximum value ) (type: 1D Numpy Array) │ └──── .. : : : └───── ...
-
filename¶ str – HDF5 file name
-
title¶ str – Title of the data
-
hdf5¶ h5py.File – input/output stream to HDF5 file
-
data¶ dict – This dictionary is generated by
HDF5Handler.buildDataTree(). This dictionary gives access to all data arrays.
Parameters: Examples
from gcMapExplorer import lib as gmlib import numpy as np # Load available file hdf5Hand = gmlib.genomicsDataHandler.HDF5Handler('abcxyz.h5') # Open the file hdf5Hand.open() # Build the data structure hdf5Hand.buildDataTree() # Print shape and maximum value of chr1->1kb->mean array print(hdf5Hand.data['chr1']['1kb']['amean'].shape, np.amax(hdf5Hand.data['chr1']['1kb']['mean']))
-
addDataByArray(Chrom, resolution, data_name, value_array, compression='lzf')¶ - Add array to the hdf5 file for given chromosome, resolution and data name.
- It can be used either to add new data array or to replace existing data.
Parameters:
-
buildDataTree()¶ Build data dictionary from the input hdf5 file
To retrieve the data from hdf5 file, this function should be used to built the dictionary
HDF5Handler.data. This dictionary gives access directly to data of any chromosome with specific resolution.
-
close()¶ close hdf5 file
-
getChromList()¶ To get list of all chromosomes present in hdf5 file
Returns: chroms – List of all chromosomes present in hdf5 file Return type: list
-
getDataNameList(chrom, resolution)¶ List of all available arrays by respective coarse method name for given chromosome and resolution
Parameters: Returns: nameList – List of arrays by name of dataset
Return type: Raises: KeyError– If chromosome not found in hdf5 file. If input resolution keyword is not found for input chromosome.
-
getResolutionList(chrom, dataName=None)¶ To get all resolutions for given chromosome from hdf5 file
Parameters: Returns: resolutionList – A list of all available resolutions for the given chromosome
Return type: Raises: KeyError– If chromosome not found in hdf5 file
-
hasChromosome(chrom)¶ To get list of all chromosomes present in hdf5 file
Parameters: chrom (str) – Chromosome name to be look up in file. Returns: gotChromosome – If queried chromosome present in file TrueotherwiseFalse.Return type: bool
-
hasDataName(chrom, resolution, dataName)¶ To check if given data for given resolution for given chromosome is present
Parameters: Returns: gotDataName – If queried data in given chromosome at given resolution is present in file
TrueotherwiseFalse.Return type:
-
hasResolution(chrom, resolution, dataName=None)¶ To check if given resolution for given chromosome is present
Parameters: Returns: gotResolution – If queried resolution of given chromosome present in file
TrueotherwiseFalse.Return type:
-
open()¶ open hdf5 file
-