class  BEDHandler

BEDHandler(filenames[, column, chromName, …]) To convert BED files to hdf5/h5 file
BEDHandler.parseBed() To parse bed files
BEDHandler.setChromosome(chromName) Set the target chromosome for reading and extracting from bed file
BEDHandler.saveAsH5(hdf5Out[, title, …]) To convert bed files to hdf5 file
class BEDHandler(filenames, column=7, chromName=None, indexFile=None, tmpNumpyArrayFiles=None, methodToCombine='mean', workDir=None, maxEntryWrite=10000000)

To convert BED files to hdf5/h5 file

It parses bed files and save all data to a hdf5/h5 file for given resolutions.

BedFileNames

list[str] – List of input bed files.

Note

In case if BEDHandler.chromName is provided, only one wig file is accepted.

column

int – The column number, which is considered as data column. Column number could vary and depends on BED format. For example:

  • ENCODE broadPeak format (BED 6+3): 7th column
  • ENCODE gappedPeak format (BED 12+3): 13th column
  • ENCODE narrowPeak format (BED 6+4): 7th column
  • ENCODE RNA elements format (BED 6+3): 7th column
chromName

str – Name of target chromosome name need to be extracted from bed file.

chromSizeInfo

dict – A dictionary containing chromosome size information.

_chromPointerInFile

dict – A dictionary containing position index of each chromosome in bed file.

indexFile

str – A file in json format containing indices (position in bed file) and sizes of chromosomes. If this file is not present and given as input, a new file will be generated. If this file is present, indices and sizes will be taken from this file. If index and size of input chromosome is not present in json file, these will be determined from bed file and stored in same json file. This file could be very helpful in case when same wig file has to be read many times because step to determine index and size of chromosome is skipped.

methodToCombine

str – method to combine bed files, Presently, accepted keywords are: mean, min and max

tmpNumpyArrayFiles

TempNumpyArrayFiles – This TempNumpyArrayFiles instance stores the temporary numpy array files information.

isBedParsed

bool – Whether bed files are already parsed.

maxEntryWrite

int – Number of lines read from bed file at an instant, after this, data is dumped in temporary numpy array file

Parameters:
  • filenames (str or list(str)) –

    List of input bed files.

    Note

    In case if BEDHandler.chromName is provided, only one bed file is accepted.

  • column (int) –

    The column number, which is considered as data column. Column number could vary and depends on BED format. For example:

    • ENCODE broadPeak format (BED 6+3): 7th column
    • ENCODE gappedPeak format (BED 12+3): 13th column
    • ENCODE narrowPeak format (BED 6+4): 7th column
    • ENCODE RNA elements format (BED 6+3): 7th column
  • chromName (str) – Name of target chromosome name need to be extracted from bed file.
  • indexFile (str) – A file in json format containing indices (position in bed file) and sizes of chromosomes. If this file is not present and given as input, a new file will be generated. If this file is present, indices and sizes will be taken from this file. If index and size of input chromosome is not present in json file, these will be determined from bed file and stored in same json file. This file could be very helpful in case when same bed file has to be read many times because step to determine index and size of chromosome is skipped.
  • tmpNumpyArrayFiles (TempNumpyArrayFiles) – This TempNumpyArrayFiles instance stores the temporary numpy array files information.
  • methodToCombine (str) – method to combine bed files, Presently, accepted keywords are: mean, min and max
  • maxEntryWrite (int) – Number of lines read from bed file at an instant, after this, data is dumped in temporary numpy array file. To reduce memory (RAM) occupancy, reduce this number because large numbers need large RAM.
_FillDataInNumpyArrayFile(ChromTitle, location_list, value_list)

Fill the extracted data from bed file to temporary numpy array file

Warning

Private method. Use it at your own risk. It is used internally in BEDHandler._parseBed().

Parameters:
  • ChromTitle (str) – Name of chromosome
  • location_list (list of int) – List of locations for given chromosome
  • value_list (list of float) – List of values for respective chromosome location
_PerformDataCoarsening(Chrom, resolution, coarse_method)

Base method to perform Data coarsening.

This method read temporary Numpy array files and perform data coarsening using the given input method.

Warning

Private method. Use it at your own risk. It is used internally in BEDHandler._StoreInHdf5File().

Parameters:
  • Chrom (str) – Chromosome name
  • resolution (str) – resolution in word.
  • coarse_method (str) – Name of method to use for data coarsening. Accepted keywords: min, max, median, amean, gmean and hmean.
_StoreInHdf5File(hdf5Out, title, resolutions=None, coarsening_methods=None, compression='lzf', keep_original=False)

Base method to store coarsened data in hdf5/h5 file.

At first data is coarsened and subsequently stored in h5 file.

Warning

Private method. Use it at your own risk. It is used internally in BEDHandler.saveAsH5().

Parameters:
  • hdf5Out (str or HDF5Handler) – Name of output hdf5 file or instance of HDF5Handler
  • title (str) – Title of data
  • resolutions (list of str) –

    Additional input resolutions other than these default resolutions: 1kb’, ‘2kb’, ‘4kb’, ‘5kb’, ‘8kb’, ‘10kb’, ‘20kb’, ‘40kb’, ‘80kb’, ‘100kb’, ‘160kb’,‘200kb’, ‘320kb’, ‘500kb’, ‘640kb’, and ‘1mb’.

    For Example: use resolutions=['25kb', '50kb', '75kb'] to add additional 25kb, 50kb and 75kb resolution data.

  • coarsening_methods (list of str) –

    Methods to coarse or downsample the data for converting from 1-base to coarser resolutions. Presently, five methods are implemented.

    • 'min' -> Minimum value
    • 'max' -> Maximum value
    • 'amean' -> Arithmetic mean or average
    • 'hmean' -> Harmonic mean
    • 'gmean' -> Geometric mean
    • 'median' -> Median

    In case of None, all five methods will be considered. User may use only subset of these methods. For example: coarse_method=['max', 'amean'] can be used for downsampling by only these two methods.

  • compression (str) – data compression method in HDF5 file : lzf or gzip method.
  • keep_original (bool) – Whether original data present in wig file should be incorporated in HDF5 file. This will significantly increase size of HDF5 file.
_getChromSizeInfo(bedFileName, inputChrom=None)

Get chromosome size and index bed file

This method parses a bed file, extracts chromosome size and index it for each chromosome.

It sets BEDHandler._chromPointerInFile and BEDHandler.chromSizeInfo.

Warning

Private method. Use it at your own risk. It is used internally during initialization and in BEDHandler.setChromosome().

Parameters:
  • bedFileName (str) – Name of Wig File
  • inputChrom (str) – Name of target chromosome
_loadChromSizeAndIndex()

Load chromosome sizes and indices from a json file

_parseBed(bedFileName)

Base method to parse a bed file.

This method parses a bed file and extracted data are copied in temporary numpy array files.

Warning

Private method. Use it at your own risk. It is used internally in BEDHandler.parseBed().

Parameters:bedFileName (str) – Name of bed File
_saveChromSizeAndIndex()

Save chromosomes sizes and indices dictionary to a json file

getRawWigDataAsDictionary(dicOut=None)

To get a entire dictionary of data from bed file

It generates a dictionary of numpy arrays for each chromosome. These arrays are stored in temporary numpy array files of TempNumpyArrayFiles.

Parameters:dicOut (dict) – The output dictionary to which data will be added or replaced.
Returns:dicOut – The output dictionary.
Return type:dict
parseBed()

To parse bed files

This method parses all bed files listed in BEDHandler.bedFileNames. The extracted data is further stored in temporary numpy array files of respective chromosome. These numpy array files can be used either for data coarsening or for further analysis.

saveAsH5(hdf5Out, title=None, resolutions=None, coarsening_methods=None, compression='lzf', keep_original=False)

To convert bed files to hdf5 file

It parses bed files, coarsened the data and store in an input hdf5/h5 file.

Parameters:
  • hdf5Out (HDF5Handler or str) – Output hdf5 file name or HDF5Handler instance
  • title (str) – Title of the data
  • resolutions (list of str) –

    Additional input resolutions other than these default resolutions: 1kb’, ‘2kb’, ‘4kb’, ‘5kb’, ‘8kb’, ‘10kb’, ‘20kb’, ‘40kb’, ‘80kb’, ‘100kb’, ‘160kb’,‘200kb’, ‘320kb’, ‘500kb’, ‘640kb’, and ‘1mb’.

    For Example: use resolutions=['25kb', '50kb', '75kb'] to add additional 25kb, 50kb and 75kb resolution data.

  • coarsening_methods (list of str) –

    Methods to coarse or downsample the data for converting from 1-base to coarser resolutions. Presently, five methods are implemented.

    • 'min' -> Minimum value
    • 'max' -> Maximum value
    • 'amean' -> Arithmetic mean or average
    • 'hmean' -> Harmonic mean
    • 'gmean' -> Geometric mean
    • 'median' -> Median

    In case of None, all five methods will be considered. User may use only subset of these methods. For example: coarse_method=['max', 'amean'] can be used for downsampling by only these two methods.

  • compression (str) – data compression method in HDF5 file : lzf or gzip method.
  • keep_original (bool) – Whether original data present in bigwig file should be incorporated in HDF5 file. This will significantly increase size of HDF5 file.
setChromosome(chromName)

Set the target chromosome for reading and extracting from bed file

To read and convert data of another chromsome from a bed file, it can be set here. After this, directly use BEDHandler.saveAsH5() to save data in H5 file.

Parameters:chromName (str) – Name of new target chromosome