class BigWigHandler

BigWigHandler(filenames[, …]) To handle bigWig files and to convert it to h5 file
BigWigHandler.getBigWigInfo() Retrieve chromosome names and their sizes
BigWigHandler.bigWigtoWig([outfilenames]) To generate Wig file
BigWigHandler.saveAsH5(filename[, …]) Save data to h5 file.
class BigWigHandler(filenames, pathTobigWigToWig=None, pathTobigWigInfo=None, chromName=None, methodToCombine='mean', workDir=None, maxEntryWrite=10000000)

To handle bigWig files and to convert it to h5 file

This class can be used to convert bigWig file to h5 file. It can also be used to combine several bigWig files that are originated from replicated experiments.

Warning

Presently bigWigToWig and bigWigInfo is not available for Windows OS. Therefore, this class will fail in this OS.

bigWigFileNames

str or list[str] – List of bigWig file names including path

pathTobigWigToWig

str – Path to bigWigToWig program. It can be downloaded from http://hgdownload.cse.ucsc.edu/admin/exe/ for MacOSX and Linux. If path to program is already present in configuration file, it will be taken from the configuration.

If it is not present in configuration file, the input path should be provided. It will be stored in configuration file for later use.

pathTobigWigInfo

str – Path to bigWigInfo program. It can be downloaded from http://hgdownload.cse.ucsc.edu/admin/exe/ for MacOSX and Linux. If path to program is already present in configuration file, it will be taken from the configuration.

If it is not present in configuration file, the input path should be provided. It will be stored in configuration file for later use.

WigFileNames

str – List of Wig file names, either automatically generated or given by user

chromName

str – Name of input target chromosome. If this is provided, only this chromosome data is extracted and stored in h5 file.

wigHandle

WigHandler – WigHandler instance to parse Wig file and save data as hdf5 file

chromSizeInfo

dict – A dictionary containing chromosome size information

methodToCombine

str – method to combine bigWig/Wig files, Presently, accepted keywords are: mean, min and max

maxEntryWrite

int – Number of lines read from Wig file at an instant, after this, data is dumped in temporary numpy array file

Parameters:
  • filenames (str or list[str]) – A bigWig file or list of bigWig files including path
  • pathTobigWigToWig (str) –

    Path to bigWigToWig program. It can be downloaded from http://hgdownload.cse.ucsc.edu/admin/exe/ for MacOSX and Linux. If path to program is already present in configuration file, it will be taken from the configuration.

    If it is not present in configuration file, the input path should be provided. It will be stored in configuration file for later use.

  • pathTobigWigInfo (str) –

    Path to bigWigInfo program. It can be downloaded from http://hgdownload.cse.ucsc.edu/admin/exe/ for MacOSX and Linux. If path to program is already present in configuration file, it will be taken from the configuration.

    If it is not present in configuration file, the input path should be provided. It will be stored in configuration file for later use.

  • chromName (str) – Name of input target chromosome. If this is provided, only this chromosome data is extracted and stored in h5 file.
  • methodToCombine (str) – method to combine bigWig/Wig files, Presently, accepted keywords are: mean, min and max
  • maxEntryWrite (int) – Number of lines read from Wig file at an instant, after this, data is dumped in temporary numpy array file. To reduce memory (RAM) occupancy, reduce this number because large numbers need large RAM.
_bigWigtoWig(bigWigFileName, outfilename)

Base method to generate Wig file from a bigWig file

Use BigWigHandler.bigWigtoWig() to automatically convert all bigWig files to Wig files.

Warning

Private method. Use it at your own risk. It is used internally in BigWigHandler.bigWigtoWig()

Parameters:
  • bigWigFileName (str) – Input bigWig file names.
  • outfilename (str) – Name of output Wig file.
_checkBigWigInfoProgram(pathTobigWigInfo)

Check if bigWigInfo program is available or accessible.

If program is not available in configuration file, the given path will be stored in the file after checking its accessibility.

The path is stored in gcMapExplorer.lib.genomicsDataHandler.BigWigHandler.pathTobigWigInfo

Parameters:pathTobigWigInfo (str) – Path to bigWigInfo program
_checkBigWigToWigProgram(pathTobigWigToWig)

Check if bigWigToWig program is available or accessible.

If program is not available in configuration file, the given path will be stored in the file after checking its accessibility.

The path is stored in gcMapExplorer.lib.genomicsDataHandler.BigWigHandler.pathTobigWigToWig

Parameters:pathTobigWigToWig (str) – Path to bigWigToWig program
_getBigWigInfo(filename)

Base method to Retrieve chromosome names and their sizes

  • Chromosome size information is stored for a given bigWig file. If size of chromosome is already present in dictionary, largest size is stored in dictionary.
  • Use BigWigHandler.getBigWigInfo() to automatically retrieve chromosome size information from all bigWig files.

Warning

Private method. Use it at your own risk. It is used internally in BigWigHandler.getBigWigInfo()

Parameters:filename (str) – Input bigWig file
bigWigtoWig(outfilenames=None)

To generate Wig file

It uses bigWigToWig program to convert bigWig to Wig file. It uses BigWigHandler.chromSizeInfo to extract the listed chromosome data.

If outfilenames are provided, wig files are generated with these names. Otherwise, Wig file names are generated randomly and listed in BigWigHandler.WigFileNames. If these files are generated with random names, these will be deleted after execution.

Parameters:outfilenames (str or list of strip) – List of Wig file names. If None, names are automatically generated, files are temporarily created and after execution, all files are deleted.
getBigWigInfo()

Retrieve chromosome names and their sizes

BigWigInfo program is executed on all listed bigWig files and chromosomes name with respective size is stored in BigWigHandler.chromSizeInfo variable. From the several listed bigWig files, only largest size of chromosomes are considered.

If BigWigHandler.chromName is provided, only target chromosome information is kept in BigWigHandler.chromSizeInfo dictionary.

saveAsH5(filename, tmpNumpyArrayFiles=None, title=None, resolutions=None, coarsening_methods=None, compression='lzf', keep_original=False)

Save data to h5 file.

Parameters:
  • filename (str) – Output hdf5 file name with h5 extension.
  • tmpNumpyArrayFiles (TempNumpyArrayFiles (optional)) – Usually not required. This TempNumpyArrayFiles instance stores the temporary numpy array files information. To convert large number of bigWig files, its use increases the conversion speed significantly because new temporary array files takes time to generate and frequent generation of these files can be avoided.
  • title (str (optional)) – Title of the data
  • resolutions (list of str) –

    Additional input resolutions other than these default resolutions: 1kb’, ‘2kb’, ‘4kb’, ‘5kb’, ‘8kb’, ‘10kb’, ‘20kb’, ‘40kb’, ‘80kb’, ‘100kb’, ‘160kb’,‘200kb’, ‘320kb’, ‘500kb’, ‘640kb’, and ‘1mb’.

    For Example: use resolutions=['25kb', '50kb', '75kb'] to add additional 25kb, 50kb and 75kb resolution data.

  • coarsening_methods (list of str) –

    Methods to coarse or downsample the data for converting from 1-base to coarser resolutions. Presently, five methods are implemented.

    • 'min' -> Minimum value
    • 'max' -> Maximum value
    • 'amean' -> Arithmetic mean or average
    • 'hmean' -> Harmonic mean
    • 'gmean' -> Geometric mean
    • 'median' -> Median

    In case of None, all five methods will be considered. User may use only subset of these methods. For example: coarse_method=['max', 'amean'] can be used for downsampling by only these two methods.

  • compression (str) – data compression method in HDF5 file : lzf or gzip method.
  • keep_original (bool) – Whether original data present in bigwig file should be incorporated in HDF5 file. This will significantly increase size of HDF5 file.

Examples

from gcMapExplorer.lib import genomicsDataHandler as gdh

# start BigWigHandler to combine and convert two bigWig files
bigwig = gdh.BigWigHandler(['first.bigWig', 'second.bigWig'], './bigWigToWig', './bigWigInfo')

# Save hdf5 file with two additional resolutions
# and only two downsampling method.
bigwig.saveAsH5('converted.h5', resolutions=['25kb', '50kb'], coarsening_methods=['max', 'amean'])