`class BEDHandler`¶

`BEDHandler`(filenames[, column, chromName, …])	To convert BED files to hdf5/h5 file
`BEDHandler.parseBed`()	To parse bed files
`BEDHandler.setChromosome`(chromName)	Set the target chromosome for reading and extracting from bed file
`BEDHandler.saveAsH5`(hdf5Out[, title, …])	To convert bed files to hdf5 file

class BEDHandler(filenames, column=7, chromName=None, indexFile=None, tmpNumpyArrayFiles=None, methodToCombine='mean', workDir=None, maxEntryWrite=10000000)¶

To convert BED files to hdf5/h5 file

It parses bed files and save all data to a hdf5/h5 file for given resolutions.

BedFileNames¶: list[str] – List of input bed files.

Note

In case if BEDHandler.chromName is provided, only one wig file is accepted.

column¶

int – The column number, which is considered as data column. Column number could vary and depends on BED format. For example:

ENCODE broadPeak format (BED 6+3): 7th column
ENCODE gappedPeak format (BED 12+3): 13th column
ENCODE narrowPeak format (BED 6+4): 7th column
ENCODE RNA elements format (BED 6+3): 7th column

chromName¶: str – Name of target chromosome name need to be extracted from bed file.

chromSizeInfo¶: dict – A dictionary containing chromosome size information.

_chromPointerInFile¶: dict – A dictionary containing position index of each chromosome in bed file.

indexFile¶: str – A file in json format containing indices (position in bed file) and sizes of chromosomes. If this file is not present and given as input, a new file will be generated. If this file is present, indices and sizes will be taken from this file. If index and size of input chromosome is not present in json file, these will be determined from bed file and stored in same json file. This file could be very helpful in case when same wig file has to be read many times because step to determine index and size of chromosome is skipped.

methodToCombine¶: str – method to combine bed files, Presently, accepted keywords are: mean, min and max

tmpNumpyArrayFiles¶: TempNumpyArrayFiles – This TempNumpyArrayFiles instance stores the temporary numpy array files information.

isBedParsed¶: bool – Whether bed files are already parsed.

maxEntryWrite¶: int – Number of lines read from bed file at an instant, after this, data is dumped in temporary numpy array file

Parameters:

filenames (str or list(str)) –
List of input bed files.

Note

In case if BEDHandler.chromName is provided, only one bed file is accepted.
column (int) –
The column number, which is considered as data column. Column number could vary and depends on BED format. For example:
- ENCODE broadPeak format (BED 6+3): 7th column
- ENCODE gappedPeak format (BED 12+3): 13th column
- ENCODE narrowPeak format (BED 6+4): 7th column
- ENCODE RNA elements format (BED 6+3): 7th column
chromName (str) – Name of target chromosome name need to be extracted from bed file.
indexFile (str) – A file in json format containing indices (position in bed file) and sizes of chromosomes. If this file is not present and given as input, a new file will be generated. If this file is present, indices and sizes will be taken from this file. If index and size of input chromosome is not present in json file, these will be determined from bed file and stored in same json file. This file could be very helpful in case when same bed file has to be read many times because step to determine index and size of chromosome is skipped.
tmpNumpyArrayFiles (TempNumpyArrayFiles) – This TempNumpyArrayFiles instance stores the temporary numpy array files information.
methodToCombine (str) – method to combine bed files, Presently, accepted keywords are: mean, min and max
maxEntryWrite (int) – Number of lines read from bed file at an instant, after this, data is dumped in temporary numpy array file. To reduce memory (RAM) occupancy, reduce this number because large numbers need large RAM.

_FillDataInNumpyArrayFile(ChromTitle, location_list, value_list)¶

Fill the extracted data from bed file to temporary numpy array file

Warning

Private method. Use it at your own risk. It is used internally in BEDHandler._parseBed().

Parameters:	ChromTitle (str) – Name of chromosome location_list (list of int) – List of locations for given chromosome value_list (list of float) – List of values for respective chromosome location

_PerformDataCoarsening(Chrom, resolution, coarse_method)¶

Base method to perform Data coarsening.

This method read temporary Numpy array files and perform data coarsening using the given input method.

Warning

Private method. Use it at your own risk. It is used internally in BEDHandler._StoreInHdf5File().

Parameters:	Chrom (str) – Chromosome name resolution (str) – resolution in word. coarse_method (str) – Name of method to use for data coarsening. Accepted keywords: min, max, median, amean, gmean and hmean.

_StoreInHdf5File(hdf5Out, title, resolutions=None, coarsening_methods=None, compression='lzf', keep_original=False)¶

Base method to store coarsened data in hdf5/h5 file.

At first data is coarsened and subsequently stored in h5 file.

Warning

Private method. Use it at your own risk. It is used internally in BEDHandler.saveAsH5().

Parameters:

hdf5Out (str or HDF5Handler) – Name of output hdf5 file or instance of HDF5Handler
title (str) – Title of data
resolutions (list of str) –
Additional input resolutions other than these default resolutions: 1kb’, ‘2kb’, ‘4kb’, ‘5kb’, ‘8kb’, ‘10kb’, ‘20kb’, ‘40kb’, ‘80kb’, ‘100kb’, ‘160kb’,‘200kb’, ‘320kb’, ‘500kb’, ‘640kb’, and ‘1mb’.

For Example: use resolutions=['25kb', '50kb', '75kb'] to add additional 25kb, 50kb and 75kb resolution data.
coarsening_methods (list of str) –
Methods to coarse or downsample the data for converting from 1-base to coarser resolutions. Presently, five methods are implemented.
- 'min' -> Minimum value
- 'max' -> Maximum value
- 'amean' -> Arithmetic mean or average
- 'hmean' -> Harmonic mean
- 'gmean' -> Geometric mean
- 'median' -> Median
In case of None, all five methods will be considered. User may use only subset of these methods. For example: coarse_method=['max', 'amean'] can be used for downsampling by only these two methods.
compression (str) – data compression method in HDF5 file : lzf or gzip method.
keep_original (bool) – Whether original data present in wig file should be incorporated in HDF5 file. This will significantly increase size of HDF5 file.

_getChromSizeInfo(bedFileName, inputChrom=None)¶

Get chromosome size and index bed file

This method parses a bed file, extracts chromosome size and index it for each chromosome.

It sets BEDHandler._chromPointerInFile and BEDHandler.chromSizeInfo.

Warning

Private method. Use it at your own risk. It is used internally during initialization and in BEDHandler.setChromosome().

Parameters:	bedFileName (str) – Name of Wig File inputChrom (str) – Name of target chromosome

_loadChromSizeAndIndex()¶: Load chromosome sizes and indices from a json file

_parseBed(bedFileName)¶

Base method to parse a bed file.

This method parses a bed file and extracted data are copied in temporary numpy array files.

Warning

Private method. Use it at your own risk. It is used internally in BEDHandler.parseBed().

Parameters:	bedFileName (str) – Name of bed File

_saveChromSizeAndIndex()¶: Save chromosomes sizes and indices dictionary to a json file

getRawWigDataAsDictionary(dicOut=None)¶

To get a entire dictionary of data from bed file

It generates a dictionary of numpy arrays for each chromosome. These arrays are stored in temporary numpy array files of TempNumpyArrayFiles.

Parameters:	dicOut (dict) – The output dictionary to which data will be added or replaced.
Returns:	dicOut – The output dictionary.
Return type:	dict

parseBed()¶

To parse bed files

This method parses all bed files listed in BEDHandler.bedFileNames. The extracted data is further stored in temporary numpy array files of respective chromosome. These numpy array files can be used either for data coarsening or for further analysis.

To save as h5: Use BEDHandler.saveAsH5().
To perform analysis: Use BEDHandler.getRawWigDataAsDictionary() to get a dictionary of numpy arrays.

saveAsH5(hdf5Out, title=None, resolutions=None, coarsening_methods=None, compression='lzf', keep_original=False)¶

To convert bed files to hdf5 file

It parses bed files, coarsened the data and store in an input hdf5/h5 file.

Parameters:

hdf5Out (HDF5Handler or str) – Output hdf5 file name or HDF5Handler instance
title (str) – Title of the data
resolutions (list of str) –
Additional input resolutions other than these default resolutions: 1kb’, ‘2kb’, ‘4kb’, ‘5kb’, ‘8kb’, ‘10kb’, ‘20kb’, ‘40kb’, ‘80kb’, ‘100kb’, ‘160kb’,‘200kb’, ‘320kb’, ‘500kb’, ‘640kb’, and ‘1mb’.

For Example: use resolutions=['25kb', '50kb', '75kb'] to add additional 25kb, 50kb and 75kb resolution data.
coarsening_methods (list of str) –
Methods to coarse or downsample the data for converting from 1-base to coarser resolutions. Presently, five methods are implemented.
- 'min' -> Minimum value
- 'max' -> Maximum value
- 'amean' -> Arithmetic mean or average
- 'hmean' -> Harmonic mean
- 'gmean' -> Geometric mean
- 'median' -> Median
In case of None, all five methods will be considered. User may use only subset of these methods. For example: coarse_method=['max', 'amean'] can be used for downsampling by only these two methods.
compression (str) – data compression method in HDF5 file : lzf or gzip method.
keep_original (bool) – Whether original data present in bigwig file should be incorporated in HDF5 file. This will significantly increase size of HDF5 file.

setChromosome(chromName)¶

Set the target chromosome for reading and extracting from bed file

To read and convert data of another chromsome from a bed file, it can be set here. After this, directly use BEDHandler.saveAsH5() to save data in H5 file.

Parameters:	chromName (str) – Name of new target chromosome

class BEDHandler¶

`class BEDHandler`¶