importer module

importer.CooMatrixHandler([inputFiles, ...]) To import ccmap from files similar to sparse matrix in Coordinate (COO) format
importer.CooMatrixHandler.save_ccmaps([...]) To Save all Hi-C maps
importer.CooMatrixHandler.save_gcmap(outputFile) To Save all Hi-C maps as a gcmap file
importer.CooMatrixHandler.setLabels(xlabels, ...) To set xlabels and ylabels for contact maps
importer.CooMatrixHandler.setOutputFileList(...) To set list of output files
importer.PairCooMatrixHandler(inputFile[, ...]) To import ccmap from files similar to paired sparse matrix Coordinate (COO) format
importer.PairCooMatrixHandler.setGCMapOptions([...]) Set options for output gcmap file
importer.PairCooMatrixHandler.runConversion() Perform conversion and save to ccmap and/or gcmap file.
importer.HomerInputHandler([inputFiles, ...]) To import ccmap from Hi-C maps generated by HOMER
importer.HomerInputHandler.save_ccmaps(outdir) Import and save ccmap file
importer.HomerInputHandler.save_gcmap(outputFile) To Save all Hi-C maps as a gcmap file
importer.BinsNContactFilesHandler(binFile, ...) To import Hi-C map from bin and contact file in list format
importer.BinsNContactFilesHandler.save_ccmaps(outdir) Import and save ccmap file
importer.BinsNContactFilesHandler.save_gcmap(...) To Save all Hi-C maps as a gcmap file
importer.gen_map_from_locations_value(i, j, ...) To generate CCMAP object from three lists – i, j, value

CooMatrixHandler class

class CooMatrixHandler(inputFiles=None, inputCompressedFile=None, mapType='intra', resolution=None, coordinate='real', workDir=None, logHandler=None)

To import ccmap from files similar to sparse matrix in Coordinate (COO) format

Two types of coordinates are accepted:
  • with coordinate='real' pair of absolute binned locations on chromosome
  • with coordinate='index' row and column index of matrix. index should always start from zero for absolute beginning of chromosome. e.g. for 10kb, 0-10000 should have index of zero, 10000-20000 have index of one. If this is file format, resolution should be provided explicitly.

Warning

Input file should contain matrix for only one chromosome.

Following file format can be read as a text file, where first and second column is location on chromosome and third column is the value:

20000000        20000000        2692.0
20000000        20100000        885.0
20100000        20100000        6493.0
20000000        20200000        15.0
20100000        20200000        52.0
20200000        20200000        2.0
20000000        20300000        18.0
20100000        20300000        40.0
.
.
.
.
.
.
To Instantiate this class, three scenarios are possible:
  • Text in Archive: Both a list of input files and a compressed file is given. It means look for input files in compressed file.
  • Text: Only a list of input files is given. It means, read data directly from input file.
  • Archive: Only a compressed file is given. It means, read all files present in compressed files.
Parameters:
  • inputFiles (str or list) – name of a input file or list of input files
  • inputCompressedFile (str) – name of input tar archive
  • mapType (str) –

    Type of HiC map

    • intra: for intra-chromosomal map
    • inter: for inter-chromosomal map
  • resolution (str) – resolution of HiC map. Example: ‘100kb’, ‘10kb’ or ‘25kb’ etc. If None, resolution will be determined from the input file.
  • workDir (str) – Directory where temporary files will be stored.
inputFileList

list – List of input files. It could be None when not provided.

inputType

str – Type of input. Three types: Text, Text in Archive and Archive are determined from user input.

inputCompressedFile

str – Name of input compressed file. It could be None when not provided.

outputFileList

str – List of Output files

compressType

str – Format of compressed file. It could be either .tar or .zip or None

compressHandle

ZipFile or TarFile – An object to handle compressed file. It could be ZipFile or TarFile instance depending on compressed format.

mapType

str – Types of HiC map * intra: for intra-chromosomal map * inter: for inter-chromosomal map

coordinate

str – Coordinate type in input text file. It could be either real for real locations or index for rows and column indices.

resolution

str – resolution of HiC map. Example: ‘100kb’, ‘10kb’ or ‘25kb’ etc. If None, resolution will be determined from the input file.

If coordinate='index', resolution is essential for further processing.

workDir

str – Directory where temporary files will be stored. If not provided, directory name will be taken from configuration file.

save_ccmaps(outputFiles=None, xlabels=None, ylabels=None, compress=True)

To Save all Hi-C maps

This function reads input files one by one and save it as a .ccmap file.

Parameters:
  • outputFiles (str or list) – Name of a output file or list of output files. For each input file, a respective output file will be generated, therefore, number of input and output files should match.
  • xlabels (str or list) – Name of the data along X-axis or list of names of the data for respective input files.
  • ylabels (str or list) – Name of the data along y-axis or list of names of the data for respective input files. if it is None, ylabels will be same as xlabels.
  • compress (bool) – If True, numpy array (matrix) file will be compressed to reduce storage memory.
save_gcmap(outputFile, xlabels=None, ylabels=None, coarseningMethod='sum', compression='lzf')

To Save all Hi-C maps as a gcmap file

This function reads input files one by one and save it as a .gcmap file.

Parameters:
  • outputFile (str) – Name of a output gcmap file.
  • xlabels (str or list) – Name of the data along X-axis or list of names of the data for respective input files.
  • ylabels (str or list) – Name of the data along y-axis or list of names of the data for respective input files. if it is None, ylabels will be same as xlabels.
  • coarseningMethod (str) – Method of downsampling. Three accepted methods are sum: sum all values, mean: Average of all values and max: Maximum of all values.
  • compression (str) – Compression method. Presently allowed : lzf for LZF compression and gzip for GZIP compression.
setLabels(xlabels, ylabels)

To set xlabels and ylabels for contact maps

xlabel and ylabel act as a title of data along X-axis and Y-axis respectively.

Parameters:
  • xlabels (str or list) – Name of the data along X-axis or list of names of the data for respective input files.
  • ylabels (str or list) – Name of the data along y-axis or list of names of the data for respective input files.
setOutputFileList(outputFiles)

To set list of output files

Parameters:outputFiles (str or list) – Name of a output file or list of output files. For each input file, a respective output file will be generated, therefore, number of input and output files should match.

PairCooMatrixHandler class

class PairCooMatrixHandler(inputFile, ccmapOutDir=None, ccmapSuffix=None, gcmapOut=None, workDir=None, logHandler=None)

To import ccmap from files similar to paired sparse matrix Coordinate (COO) format

This format is very similar to COO format with additional information of chromosome. Therefore, maps for all chromosome could be contained in a single file.

This type of format appeared with following publication:

Following file format can be read as a text file, where first and second column is location on chromosome and third column is the value:

chr4     60000   75000   chr4    60000   75000   0.1163470887070292
chr4     60000   75000   chr4    105000  120000  0.01292745430078102
chr4     60000   75000   chr4    435000  450000  0.01292745430078102
chr4     75000   90000   chr4    75000   90000   0.05170981720312409
chr4     75000   90000   chr4    345000  360000  0.01292745430078102
chr4     90000   105000  chr4    90000   105000  0.01292745430078102
.
.
.
.
.
.
Parameters:
  • inputFile (str) – name of a input file
  • ccmapOutDir (str) – name of directory where all ccmap file will be stored.
  • ccmapSuffix (str) – Suffix for ccmap file name.
  • gcmapOut (str) – Name of output gcmap file.
  • workDir (str) – Directory where temporary files will be stored.
inputFile

str – name of a input file

ccmapOutDir

str – name of directory where all ccmap file will be stored.

ccmapSuffix

str – Suffix for ccmap file name.

gcmapOut

str – Name of output gcmap file.

gcmapOutOptions

dict – Dictionary for gcmap output options.

workDir

str – Directory where temporary files will be stored.

Examples

>>> pair_map_handle = PairCooMatrixHandler('GSM1863750_tethered_rep1_contacts.txt', gcmapOut='GSM1863750_tethered_rep1_contacts.gcmap')
>>> pair_map_handle.setGCMapOptions()
>>> pair_map_handle.runConversion()
runConversion()

Perform conversion and save to ccmap and/or gcmap file.

Read the input file, process the data, and convert it to ccmap or gcmap file. For output gcmap, PairCooMatrixHandler.setGCMapOptions() should be called to set the necessary options.

setGCMapOptions(compression='lzf', generateCoarse=True, coarseningMethod='sum', replaceCMap=True)

Set options for output gcmap file

Parameters:
  • compression (str) – Compression method. Presently allowed : lzf for LZF compression and gzip for GZIP compression.
  • generateCoarse (bool) – Also generates all coarser maps where resolutions will be coarsen by a factor of two, consecutively. e.g.: In case of 10 kb input resolution, downsampled maps of 20kb, 40kb, 80kb, 160kb, 320kb etc. will be generated until, map size is less than 500.
  • coarseningMethod (str) – Method of downsampling. Three accepted methods are sum: sum all values, mean: Average of all values and max: Maximum of all values.
  • replaceCMap (bool) – Replace entire old ccmap data including resolutions and coarsened data.

HomerInputHandler class

class HomerInputHandler(inputFiles=None, inputCompressedFile=None, workDir=None, logHandler=None)

To import ccmap from Hi-C maps generated by HOMER

HOMER package generates the Hi-C interaction matrices in text file. This Hi-C interaction file can be imported using this class. HOMER format interaction matrix file may contain data for all the chromosomes while this class separately read and save matrix of each chromosome.

To Instantiate this class, three scenarios are possible:
  • Text in Archive: Both a list of input files and a compressed file is given. It means look for input files in compressed file.
  • Text: Only a list of input files is given. It means, read data directly from input file.
  • Archive: Only a compressed file is given. It means, read all files present in compressed files.
Parameters:
  • inputFiles (str or list) – Name of a input file or list of input files. If None, all files from compressed files are used as input files.
  • inputCompressedFile (str) – Name of input compressed file. Accepted formats: tar.gz, tar.bz2 and zip.
  • workDir (str) – Directory where temporary files will be stored. If it is not provided, this value is taken from configuration file.
inputFileList

list – List of input files. It could be None when not provided.

inputType

str – Type of input. Three types: Text, Text in Archive and Archive are determined from user input.

inputCompressedFile

str – Name of input compressed file. It could be None when not provided.

compressType

str – Format of compressed file. It could be either .tar or .zip or None

compressHandle

ZipFile or TarFile – An object to handle compressed file. It could be ZipFile or TarFile instance depending on compressed format.

workDir

str – Directory where temporary files will be stored.

fIns

list[output file stream] – Input file stream for each input files

chromList

list[str] – List of chromosome found in input files

resolution

str – Resolution of map

fTmpOutNames

list[str] – List of temporary output files where data for each chromosomes are extracted separately

fTmpOut

list[output file stream] – List of output file streams for respective temporary output files

save_ccmaps(outdir, suffix=None)

Import and save ccmap file

Read input files, save data temporarily in text file for each chromosome and import these data to native ccmap format using CooMatrixHandler class.

Note

  • Output file names will be automatically generated as <chromosome>_<resolution>.ccmap format. e.g. chr12_10kb.ccmap.
  • A suffix can be added to all files as <chromosome>_<resolution>_<suffix>.ccmap format. e.g. if suffix='_RawObserved', file name is chr12_10kb_RawObserved.ccmap.
Parameters:
  • outdir (str) – Path to directory where all ccmaps have to be saved
  • suffix (str) – Any suffix to file name
save_gcmap(outputFile, coarseningMethod='sum', compression='lzf')

To Save all Hi-C maps as a gcmap file

This function reads input files one by one and save it as a .gcmap file.

Parameters:
  • outputFile (str) – Name of a output gcmap file.
  • coarseningMethod (str) – Method of downsampling. Three accepted methods are sum: sum all values, mean: Average of all values and max: Maximum of all values.
  • compression (str) – Compression method. Presently allowed : lzf for LZF compression and gzip for GZIP compression.

BinsNContactFilesHandler class

class BinsNContactFilesHandler(binFile, contactFile, workDir=None, logHandler=None)

To import Hi-C map from bin and contact file in list format

These types of files are appeared in following GEO data:
Parameters:
  • binFile (str) – Bin file
  • contactFile (str) – Contact file in list format
  • workDir (str) – Directory where temporary files will be stored. If it is not provided, this value is taken from configuration file.
binFile

str – Bin file

contactFile

str – Contact file in list format

ChromSize

dict – Dictionary of Chromosome size

ChromBinsInfo

dict – Dictionary of min (start) and max (end) bin number for each Chromosome

npyBinFileList

dict – Dictionary containing tuple (memmap stream, Temporary numpy array file name)

binsize

int – Bin size of Hi-C map

ccmaps

dict – Dictionary of CCMAP instances for each Chromosome

save_ccmaps(outdir, suffix=None)

Import and save ccmap file

Read input files, save data temporarily for each chromosome and import these data to native ccmap format.

..note::
  • Output file names will be automatically generated as <chromosome>_<resolution>.ccmap format. e.g. chr12_10kb.ccmap.
  • A suffix can be added to all files as <chromosome>_<resolution><suffix>.ccmap format. e.g. if suffix='_RawObserved', file name is chr12_10kb_RawObserved.ccmap.
Parameters:
  • outdir (str) – Path to directory where all ccmaps have to be saved
  • suffix (str) – Any suffix to file name
save_gcmap(outputFile, coarseningMethod='sum', compression='lzf')

To Save all Hi-C maps as a gcmap file

This function reads input files one by one and save it as a .gcmap file.

Parameters:
  • outputFile (str) – Name of a output gcmap file.
  • coarseningMethod (str) – Method of downsampling. Three accepted methods are sum: sum all values, mean: Average of all values and max: Maximum of all values.
  • compression (str) – Compression method. Presently allowed : lzf for LZF compression and gzip for GZIP compression.

Other functions of importer module

gen_map_from_locations_value(i, j, value, resolution=None, mapType='intra', workDir=None, logHandler=None)

To generate CCMAP object from three lists – i, j, value

Parameters:
  • i (list[int]) – List of first location from each pair
  • j (list[int]) – List of second location from each pair
  • resolution (str) – Resolution of Hi-C map
  • mapType (str) – Hi-C map type: intra or inter chromosomal map
  • value (list[float]) – List of values for respective location
  • workDir (str) – Directory where temporary files will be stored. If it is not provided, this value is taken from configuration file.
Returns:

ccMapObj – A CCMAP object

Return type:

gcMapExplorer.lib.ccmap.CCMAP