class WigHandler
¶
WigHandler (filenames[, chromSizeInfo, …]) |
To convert Wig files to hdf5 file |
WigHandler.parseWig () |
To parse Wig files |
WigHandler.setChromosome (chromName) |
Set the target chromosome for reading and extracting from wig file |
WigHandler.saveAsH5 (hdf5Out[, title, …]) |
To convert Wig files to hdf5 file |
WigHandler.getRawWigDataAsDictionary ([dicOut]) |
To get a entire dictionary of data from Wig file |
-
class
WigHandler
(filenames, chromSizeInfo=None, chromName=None, indexFile=None, tmpNumpyArrayFiles=None, methodToCombine='mean', workDir=None, maxEntryWrite=10000000)¶ To convert Wig files to hdf5 file
It parses wig files and save all data to a hdf5 file for given resolutions.
-
WigFileNames
¶ list[str] – List of input Wig files.
Note
In case if
WigHandler.chromName
is provided, only one wig file is accepted.
-
chromName
¶ str – Name of target chromosome name need to be extracted from wig file.
-
chromSizeInfo
¶ dict – A dictionary containing chromosome size information.
-
_chromPointerInFile
¶ dict – A dictionary containing position index of each chromosome in wig file.
-
indexFile
¶ str – A file in json format containing indices (position in wig file) and sizes of chromosomes. If this file is not present and given as input, a new file will be generated. If this file is present, indices and sizes will be taken from this file. If index and size of input chromosome is not present in json file, these will be determined from wig file and stored in same json file. This file could be very helpful in case when same wig file has to be read many times because step to determine index and size of chromosome is skipped.
-
methodToCombine
¶ str – method to combine bigWig/Wig files, Presently, accepted keywords are:
mean
,min
andmax
-
tmpNumpyArrayFiles
¶ TempNumpyArrayFiles
– ThisTempNumpyArrayFiles
instance stores the temporary numpy array files information.
-
isWigParsed
¶ bool – Whether Wig files are already parsed.
-
maxEntryWrite
¶ int – Number of lines read from Wig file at an instant, after this, data is dumped in temporary numpy array file
Parameters: - filenames (str or list(str)) –
List of input Wig files.
Note
In case if
WigHandler.chromName
is provided, only one wig file is accepted. - chromName (str) – Name of target chromosome name need to be extracted from wig file.
- chromSizeInfo (dict) – A dictionary containing chromosome size information. Generated by
BigWigHandler.getBigWigInfo()
. - indexFile (str) – A file in json format containing indices (position in wig file) and sizes of chromosomes. If this file is not present and given as input, a new file will be generated. If this file is present, indices and sizes will be taken from this file. If index and size of input chromosome is not present in json file, these will be determined from wig file and stored in same json file. This file could be very helpful in case when same wig file has to be read many times because step to determine index and size of chromosome is skipped.
- tmpNumpyArrayFiles (
TempNumpyArrayFiles
) – ThisTempNumpyArrayFiles
instance stores the temporary numpy array files information. - methodToCombine (str) – method to combine bigWig/Wig files, Presently, accepted keywords are:
mean
,min
andmax
- maxEntryWrite (int) – Number of lines read from Wig file at an instant, after this, data is dumped in temporary numpy array file. To reduce memory (RAM) occupancy, reduce this number because large numbers need large RAM.
-
_FillDataInNumpyArrayFile
(ChromTitle, location_list, value_list)¶ Fill the extracted data from Wig file to temporary numpy array file
Warning
Private method. Use it at your own risk. It is used internally in
WigHandler._parseWig()
.Parameters: - ChromTitle (str) – Name of chromosome
- location_list (list of int) – List of locations for given chromosome
- value_list (list of float) – List of values for respecitve chromosome location
-
_PerformDataCoarsening
(Chrom, resolution, coarsening_method)¶ Base method to perform Data coarsening.
This method read temporary Numpy array files and perform data coarsening using the given input method.
Warning
Private method. Use it at your own risk. It is used internally in
WigHandler._StoreInHdf5File()
.Parameters:
-
_StoreInHdf5File
(hdf5Out, title, resolutions=None, coarsening_methods=None, compression='lzf', keep_original=False)¶ Base method to store coarsed data in hdf5 file.
At first data is coarsened and subsequently stored in h5 file.
Warning
Private method. Use it at your own risk. It is used internally in
WigHandler.saveAsH5()
.Parameters: - hdf5Out (str or
HDF5Handler
) – Name of output hdf5 file or instance ofHDF5Handler
- title (str) – Title of data
- resolutions (list of str) –
Additional input resolutions other than these default resolutions: 1kb’, ‘2kb’, ‘4kb’, ‘5kb’, ‘8kb’, ‘10kb’, ‘20kb’, ‘40kb’, ‘80kb’, ‘100kb’, ‘160kb’,‘200kb’, ‘320kb’, ‘500kb’, ‘640kb’, and ‘1mb’.
For Example: use
resolutions=['25kb', '50kb', '75kb']
to add additional 25kb, 50kb and 75kb resolution data. - coarsening_methods (list of str) – Methods to coarse or downsample the data for converting from 1-base
to coarser resolutions. Presently, five methods are implemented.
*
'min'
-> Minimum value *'max'
-> Maximum value *'amean'
-> Arithmetic mean or average *'hmean'
-> Harmonic mean *'gmean'
-> Geometric mean *'median'
-> Median In case ofNone
, all five methods will be considered. User may use only subset of these methods. For example:coarse_method=['max', 'amean']
can be used for downsampling by only these two methods. - compression (str) – data compression method in HDF5 file :
lzf
orgzip
method. - keep_original (bool) – Whether original data present in wig file should be incorporated in HDF5 file. This will significantly increase size of HDF5 file.
- hdf5Out (str or
-
_getChromSizeInfo
(wigFileName, inputChrom=None)¶ Get chromosome size and index wig file
This method parses a Wig file, extracts chromosome size and index it for each chromosome.
It sets
WigHandler._chromPointerInFile
andWigHandler.chromSizeInfo
.Warning
Private method. Use it at your own risk. It is used internally durng initialization and in
WigHandler.setChromosome()
.Parameters:
-
_getChromTitleBedgraph_parseWig
(line)¶ To parse chromosome title from the format line of bedGraph format Wig file.
Warning
Private method. Use it at your own risk. It is used internally in
WigHandler._parseWig()
.Parameters: line (str) – The line containing chromosome information from Wig file
-
_getChromTitle_parseWig
(line)¶ To parse chromosome title from the format line of fixedStep and variableStep format Wig file.
Warning
Private method. Use it at your own risk. It is used internally in
WigHandler._parseWig()
.Parameters: line (str) – The line containing chromosome information from Wig file
-
_getSpan_parseWig
(line)¶ To parse span value the format line of fixedStep format Wig file.
Warning
Private method. Use it at your own risk. It is used internally in
WigHandler._parseWig()
.Parameters: line (str) – The line containing chromosome information from Wig file
-
_getStartStepFixedStep_parseWig
(line)¶ To parse start and step values the format line of fixedStep format Wig file.
Warning
Private method. Use it at your own risk. It is used internally in
WigHandler._parseWig()
.Parameters: line (str) – The line containing chromosome information from Wig file
-
_loadChromSizeAndIndex
()¶ Load chromosome sizes and indices from a json file
-
_parseWig
(wigFileName)¶ Base method to parse a Wig file.
This method parses a Wig file and extracted data are copied in temporary numpy array files.
Warning
Private method. Use it at your own risk. It is used internally in
WigHandler.parseWig()
.Parameters: wigFileName (str) – Name of Wig File
-
_saveChromSizeAndIndex
()¶ Save chromosomes sizes and indices dictionary to a json file
-
getRawWigDataAsDictionary
(dicOut=None)¶ To get a entire dictionary of data from Wig file
It generates a dictionary of numpy arrays for each chromosome. These arrays are stored in temporary numpy array files of
TempNumpyArrayFiles
.Parameters: dicOut (dict) – The output dictionary to which data will be added or replaced. Returns: dicOut – The output dictionary. Return type: dict
-
parseWig
()¶ To parse Wig files
This method parses all Wig files listed in
WigHandler.WigFileNames
. The extracted data is further stored in temporary numpy array files of respective chromosome. These numpy array files can be used either for data coarsening or for further analysis.- To save as h5: Use
WigHandler.saveAsH5()
. - To perform analysis: Use
WigHandler.getRawWigDataAsDictionary()
to get a dictionary of numpy arrays.
- To save as h5: Use
-
saveAsH5
(hdf5Out, title=None, resolutions=None, coarsening_methods=None, compression='lzf', keep_original=False)¶ To convert Wig files to hdf5 file
Parameters: - hdf5Out (
HDF5Handler
or str) – Output hdf5 file name orHDF5Handler
instance - title (str) – Title of the data
- resolutions (list of str) –
Additional input resolutions other than these default resolutions: 1kb’, ‘2kb’, ‘4kb’, ‘5kb’, ‘8kb’, ‘10kb’, ‘20kb’, ‘40kb’, ‘80kb’, ‘100kb’, ‘160kb’,‘200kb’, ‘320kb’, ‘500kb’, ‘640kb’, and ‘1mb’.
For Example: use
resolutions=['25kb', '50kb', '75kb']
to add additional 25kb, 50kb and 75kb resolution data. - coarsening_methods (list of str) –
Methods to coarse or downsample the data for converting from 1-base to coarser resolutions. Presently, five methods are implemented.
'min'
-> Minimum value'max'
-> Maximum value'amean'
-> Arithmetic mean or average'hmean'
-> Harmonic mean'gmean'
-> Geometric mean'median'
-> Median
In case of
None
, all five methods will be considered. User may use only subset of these methods. For example:coarse_method=['max', 'amean']
can be used for downsampling by only these two methods. - compression (str) – data compression method in HDF5 file :
lzf
orgzip
method. - keep_original (bool) – Whether original data present in bigwig file should be incorporated in HDF5 file. This will significantly increase size of HDF5 file.
- hdf5Out (
-
setChromosome
(chromName)¶ Set the target chromosome for reading and extracting from wig file
To read and convert data of another chromsome from a wig file, it can be set here. After this, directly use
WigHandler.saveAsH5()
to save data in H5 file.Parameters: chromName (str) – Name of new target chromosome
-