class WigHandler

WigHandler(filenames[, chromSizeInfo, …]) To convert Wig files to hdf5 file
WigHandler.parseWig() To parse Wig files
WigHandler.setChromosome(chromName) Set the target chromosome for reading and extracting from wig file
WigHandler.saveAsH5(hdf5Out[, title, …]) To convert Wig files to hdf5 file
WigHandler.getRawWigDataAsDictionary([dicOut]) To get a entire dictionary of data from Wig file
class WigHandler(filenames, chromSizeInfo=None, chromName=None, indexFile=None, tmpNumpyArrayFiles=None, methodToCombine='mean', workDir=None, maxEntryWrite=10000000)

To convert Wig files to hdf5 file

It parses wig files and save all data to a hdf5 file for given resolutions.

WigFileNames

list[str] – List of input Wig files.

Note

In case if WigHandler.chromName is provided, only one wig file is accepted.

chromName

str – Name of target chromosome name need to be extracted from wig file.

chromSizeInfo

dict – A dictionary containing chromosome size information.

_chromPointerInFile

dict – A dictionary containing position index of each chromosome in wig file.

indexFile

str – A file in json format containing indices (position in wig file) and sizes of chromosomes. If this file is not present and given as input, a new file will be generated. If this file is present, indices and sizes will be taken from this file. If index and size of input chromosome is not present in json file, these will be determined from wig file and stored in same json file. This file could be very helpful in case when same wig file has to be read many times because step to determine index and size of chromosome is skipped.

methodToCombine

str – method to combine bigWig/Wig files, Presently, accepted keywords are: mean, min and max

tmpNumpyArrayFiles

TempNumpyArrayFiles – This TempNumpyArrayFiles instance stores the temporary numpy array files information.

isWigParsed

bool – Whether Wig files are already parsed.

maxEntryWrite

int – Number of lines read from Wig file at an instant, after this, data is dumped in temporary numpy array file

Parameters:
  • filenames (str or list(str)) –

    List of input Wig files.

    Note

    In case if WigHandler.chromName is provided, only one wig file is accepted.

  • chromName (str) – Name of target chromosome name need to be extracted from wig file.
  • chromSizeInfo (dict) – A dictionary containing chromosome size information. Generated by BigWigHandler.getBigWigInfo().
  • indexFile (str) – A file in json format containing indices (position in wig file) and sizes of chromosomes. If this file is not present and given as input, a new file will be generated. If this file is present, indices and sizes will be taken from this file. If index and size of input chromosome is not present in json file, these will be determined from wig file and stored in same json file. This file could be very helpful in case when same wig file has to be read many times because step to determine index and size of chromosome is skipped.
  • tmpNumpyArrayFiles (TempNumpyArrayFiles) – This TempNumpyArrayFiles instance stores the temporary numpy array files information.
  • methodToCombine (str) – method to combine bigWig/Wig files, Presently, accepted keywords are: mean, min and max
  • maxEntryWrite (int) – Number of lines read from Wig file at an instant, after this, data is dumped in temporary numpy array file. To reduce memory (RAM) occupancy, reduce this number because large numbers need large RAM.
_FillDataInNumpyArrayFile(ChromTitle, location_list, value_list)

Fill the extracted data from Wig file to temporary numpy array file

Warning

Private method. Use it at your own risk. It is used internally in WigHandler._parseWig().

Parameters:
  • ChromTitle (str) – Name of chromosome
  • location_list (list of int) – List of locations for given chromosome
  • value_list (list of float) – List of values for respecitve chromosome location
_PerformDataCoarsening(Chrom, resolution, coarsening_method)

Base method to perform Data coarsening.

This method read temporary Numpy array files and perform data coarsening using the given input method.

Warning

Private method. Use it at your own risk. It is used internally in WigHandler._StoreInHdf5File().

Parameters:
  • Chrom (str) – Chromosome name
  • resolution (str) – resolution in word.
  • coarsening_method (str) – Name of method to use for data coarsening. Accepted keywords: min, max, median, amean, gmean and hmean.
_StoreInHdf5File(hdf5Out, title, resolutions=None, coarsening_methods=None, compression='lzf', keep_original=False)

Base method to store coarsed data in hdf5 file.

At first data is coarsened and subsequently stored in h5 file.

Warning

Private method. Use it at your own risk. It is used internally in WigHandler.saveAsH5().

Parameters:
  • hdf5Out (str or HDF5Handler) – Name of output hdf5 file or instance of HDF5Handler
  • title (str) – Title of data
  • resolutions (list of str) –

    Additional input resolutions other than these default resolutions: 1kb’, ‘2kb’, ‘4kb’, ‘5kb’, ‘8kb’, ‘10kb’, ‘20kb’, ‘40kb’, ‘80kb’, ‘100kb’, ‘160kb’,‘200kb’, ‘320kb’, ‘500kb’, ‘640kb’, and ‘1mb’.

    For Example: use resolutions=['25kb', '50kb', '75kb'] to add additional 25kb, 50kb and 75kb resolution data.

  • coarsening_methods (list of str) – Methods to coarse or downsample the data for converting from 1-base to coarser resolutions. Presently, five methods are implemented. * 'min' -> Minimum value * 'max' -> Maximum value * 'amean' -> Arithmetic mean or average * 'hmean' -> Harmonic mean * 'gmean' -> Geometric mean * 'median' -> Median In case of None, all five methods will be considered. User may use only subset of these methods. For example: coarse_method=['max', 'amean'] can be used for downsampling by only these two methods.
  • compression (str) – data compression method in HDF5 file : lzf or gzip method.
  • keep_original (bool) – Whether original data present in wig file should be incorporated in HDF5 file. This will significantly increase size of HDF5 file.
_getChromSizeInfo(wigFileName, inputChrom=None)

Get chromosome size and index wig file

This method parses a Wig file, extracts chromosome size and index it for each chromosome.

It sets WigHandler._chromPointerInFile and WigHandler.chromSizeInfo.

Warning

Private method. Use it at your own risk. It is used internally durng initialization and in WigHandler.setChromosome().

Parameters:
  • wigFileName (str) – Name of Wig File
  • inputChrom (str) – Name of target chromosome
_getChromTitleBedgraph_parseWig(line)

To parse chromosome title from the format line of bedGraph format Wig file.

Warning

Private method. Use it at your own risk. It is used internally in WigHandler._parseWig().

Parameters:line (str) – The line containing chromosome information from Wig file
_getChromTitle_parseWig(line)

To parse chromosome title from the format line of fixedStep and variableStep format Wig file.

Warning

Private method. Use it at your own risk. It is used internally in WigHandler._parseWig().

Parameters:line (str) – The line containing chromosome information from Wig file
_getSpan_parseWig(line)

To parse span value the format line of fixedStep format Wig file.

Warning

Private method. Use it at your own risk. It is used internally in WigHandler._parseWig().

Parameters:line (str) – The line containing chromosome information from Wig file
_getStartStepFixedStep_parseWig(line)

To parse start and step values the format line of fixedStep format Wig file.

Warning

Private method. Use it at your own risk. It is used internally in WigHandler._parseWig().

Parameters:line (str) – The line containing chromosome information from Wig file
_loadChromSizeAndIndex()

Load chromosome sizes and indices from a json file

_parseWig(wigFileName)

Base method to parse a Wig file.

This method parses a Wig file and extracted data are copied in temporary numpy array files.

Warning

Private method. Use it at your own risk. It is used internally in WigHandler.parseWig().

Parameters:wigFileName (str) – Name of Wig File
_saveChromSizeAndIndex()

Save chromosomes sizes and indices dictionary to a json file

getRawWigDataAsDictionary(dicOut=None)

To get a entire dictionary of data from Wig file

It generates a dictionary of numpy arrays for each chromosome. These arrays are stored in temporary numpy array files of TempNumpyArrayFiles.

Parameters:dicOut (dict) – The output dictionary to which data will be added or replaced.
Returns:dicOut – The output dictionary.
Return type:dict
parseWig()

To parse Wig files

This method parses all Wig files listed in WigHandler.WigFileNames. The extracted data is further stored in temporary numpy array files of respective chromosome. These numpy array files can be used either for data coarsening or for further analysis.

saveAsH5(hdf5Out, title=None, resolutions=None, coarsening_methods=None, compression='lzf', keep_original=False)

To convert Wig files to hdf5 file

Parameters:
  • hdf5Out (HDF5Handler or str) – Output hdf5 file name or HDF5Handler instance
  • title (str) – Title of the data
  • resolutions (list of str) –

    Additional input resolutions other than these default resolutions: 1kb’, ‘2kb’, ‘4kb’, ‘5kb’, ‘8kb’, ‘10kb’, ‘20kb’, ‘40kb’, ‘80kb’, ‘100kb’, ‘160kb’,‘200kb’, ‘320kb’, ‘500kb’, ‘640kb’, and ‘1mb’.

    For Example: use resolutions=['25kb', '50kb', '75kb'] to add additional 25kb, 50kb and 75kb resolution data.

  • coarsening_methods (list of str) –

    Methods to coarse or downsample the data for converting from 1-base to coarser resolutions. Presently, five methods are implemented.

    • 'min' -> Minimum value
    • 'max' -> Maximum value
    • 'amean' -> Arithmetic mean or average
    • 'hmean' -> Harmonic mean
    • 'gmean' -> Geometric mean
    • 'median' -> Median

    In case of None, all five methods will be considered. User may use only subset of these methods. For example: coarse_method=['max', 'amean'] can be used for downsampling by only these two methods.

  • compression (str) – data compression method in HDF5 file : lzf or gzip method.
  • keep_original (bool) – Whether original data present in bigwig file should be incorporated in HDF5 file. This will significantly increase size of HDF5 file.
setChromosome(chromName)

Set the target chromosome for reading and extracting from wig file

To read and convert data of another chromsome from a wig file, it can be set here. After this, directly use WigHandler.saveAsH5() to save data in H5 file.

Parameters:chromName (str) – Name of new target chromosome