importer module¶
importer.CooMatrixHandler ([inputFiles, …]) |
To import ccmap from files similar to sparse matrix in Coordinate (COO) format |
importer.CooMatrixHandler.save_ccmaps ([…]) |
To Save all Hi-C maps |
importer.CooMatrixHandler.save_gcmap (outputFile) |
To Save all Hi-C maps as a gcmap file |
importer.CooMatrixHandler.setLabels (xlabels, …) |
To set xlabels and ylabels for contact maps |
importer.CooMatrixHandler.setOutputFileList (…) |
To set list of output files |
importer.PairCooMatrixHandler (inputFile[, …]) |
To import ccmap from files similar to paired sparse matrix Coordinate (COO) format |
importer.PairCooMatrixHandler.setGCMapOptions ([…]) |
Set options for output gcmap file |
importer.PairCooMatrixHandler.runConversion () |
Perform conversion and save to ccmap and/or gcmap file. |
importer.HomerInputHandler ([inputFiles, …]) |
To import ccmap from Hi-C maps generated by HOMER |
importer.HomerInputHandler.save_ccmaps (outdir) |
Import and save ccmap file |
importer.HomerInputHandler.save_gcmap (outputFile) |
To Save all Hi-C maps as a gcmap file |
importer.BinsNContactFilesHandler (binFile, …) |
To import Hi-C map from bin and contact file in list format |
importer.BinsNContactFilesHandler.save_ccmaps (outdir) |
Import and save ccmap file |
importer.BinsNContactFilesHandler.save_gcmap (…) |
To Save all Hi-C maps as a gcmap file |
importer.gen_map_from_locations_value (i, j, …) |
To generate CCMAP object from three lists – i, j, value |
CooMatrixHandler class¶
-
class
CooMatrixHandler
(inputFiles=None, inputCompressedFile=None, mapType='intra', resolution=None, coordinate='real', workDir=None, logHandler=None)¶ To import ccmap from files similar to sparse matrix in Coordinate (COO) format
- Two types of coordinates are accepted:
- with
coordinate='real'
pair of absolute binned locations on chromosome - with
coordinate='index'
row and column index of matrix. index should always start from zero for absolute beginning of chromosome. e.g. for 10kb, 0-10000 should have index of zero, 10000-20000 have index of one. If this is file format, resolution should be provided explicitly.
- with
Warning
Input file should contain matrix for only one chromosome.
Following file format can be read as a text file, where first and second column is location on chromosome and third column is the value:
20000000 20000000 2692.0 20000000 20100000 885.0 20100000 20100000 6493.0 20000000 20200000 15.0 20100000 20200000 52.0 20200000 20200000 2.0 20000000 20300000 18.0 20100000 20300000 40.0 . . . . . .
- To Instantiate this class, three scenarios are possible:
Text in Archive
: Both a list of input files and a compressed file is given. It means look for input files in compressed file.Text
: Only a list of input files is given. It means, read data directly from input file.Archive
: Only a compressed file is given. It means, read all files present in compressed files.
Parameters: - inputFiles (str or list) – name of a input file or list of input files
- inputCompressedFile (str) – name of input tar archive
- mapType (str) –
Type of HiC map
intra
: for intra-chromosomal mapinter
: for inter-chromosomal map
- resolution (str) – resolution of HiC map. Example: ‘100kb’, ‘10kb’ or ‘25kb’ etc. If
None
, resolution will be determined from the input file. - workDir (str) – Directory where temporary files will be stored.
-
inputFileList
¶ list – List of input files. It could be
None
when not provided.
-
inputType
¶ str – Type of input. Three types:
Text
,Text in Archive
andArchive
are determined from user input.
-
inputCompressedFile
¶ str – Name of input compressed file. It could be
None
when not provided.
-
outputFileList
¶ str – List of Output files
-
compressType
¶ str – Format of compressed file. It could be either
.tar
or.zip
orNone
-
compressHandle
¶ ZipFile or TarFile – An object to handle compressed file. It could be ZipFile or TarFile instance depending on compressed format.
-
mapType
¶ str – Types of HiC map *
intra
: for intra-chromosomal map *inter
: for inter-chromosomal map
-
coordinate
¶ str – Coordinate type in input text file. It could be either
real
for real locations orindex
for rows and column indices.
-
resolution
¶ str – resolution of HiC map. Example: ‘100kb’, ‘10kb’ or ‘25kb’ etc. If
None
, resolution will be determined from the input file.If
coordinate='index'
, resolution is essential for further processing.
-
workDir
¶ str – Directory where temporary files will be stored. If not provided, directory name will be taken from configuration file.
-
save_ccmaps
(outputFiles=None, xlabels=None, ylabels=None, compress=True)¶ To Save all Hi-C maps
This function reads input files one by one and save it as a
.ccmap
file.Parameters: - outputFiles (str or list) – Name of a output file or list of output files. For each input file, a respective output file will be generated, therefore, number of input and output files should match.
- xlabels (str or list) – Name of the data along X-axis or list of names of the data for respective input files.
- ylabels (str or list) – Name of the data along y-axis or list of names of the data for
respective input files. if it is
None
, ylabels will be same as xlabels. - compress (bool) – If
True
, numpy array (matrix) file will be compressed to reduce storage memory.
-
save_gcmap
(outputFile, xlabels=None, ylabels=None, coarseningMethod='sum', compression='lzf')¶ To Save all Hi-C maps as a gcmap file
This function reads input files one by one and save it as a
.gcmap
file.Parameters: - outputFile (str) – Name of a output gcmap file.
- xlabels (str or list) – Name of the data along X-axis or list of names of the data for respective input files.
- ylabels (str or list) – Name of the data along y-axis or list of names of the data for
respective input files. if it is
None
, ylabels will be same as xlabels. - coarseningMethod (str) – Method of downsampling. Three accepted methods are
sum
: sum all values,mean
: Average of all values andmax
: Maximum of all values. - compression (str) – Compression method. Presently allowed :
lzf
for LZF compression andgzip
for GZIP compression.
-
setLabels
(xlabels, ylabels)¶ To set xlabels and ylabels for contact maps
xlabel and ylabel act as a title of data along X-axis and Y-axis respectively.
Parameters:
PairCooMatrixHandler class¶
-
class
PairCooMatrixHandler
(inputFile, ccmapOutDir=None, ccmapSuffix=None, gcmapOut=None, workDir=None, logHandler=None)¶ To import ccmap from files similar to paired sparse matrix Coordinate (COO) format
This format is very similar to COO format with additional information of chromosome. Therefore, maps for all chromosome could be contained in a single file.
- This type of format appeared with following publication:
Following file format can be read as a text file, where first and second column is location on chromosome and third column is the value:
chr4 60000 75000 chr4 60000 75000 0.1163470887070292 chr4 60000 75000 chr4 105000 120000 0.01292745430078102 chr4 60000 75000 chr4 435000 450000 0.01292745430078102 chr4 75000 90000 chr4 75000 90000 0.05170981720312409 chr4 75000 90000 chr4 345000 360000 0.01292745430078102 chr4 90000 105000 chr4 90000 105000 0.01292745430078102 . . . . . .
Parameters: -
inputFile
¶ str – name of a input file
-
ccmapOutDir
¶ str – name of directory where all ccmap file will be stored.
-
ccmapSuffix
¶ str – Suffix for ccmap file name.
-
gcmapOut
¶ str – Name of output gcmap file.
-
gcmapOutOptions
¶ dict – Dictionary for gcmap output options.
-
workDir
¶ str – Directory where temporary files will be stored.
Examples
>>> pair_map_handle = PairCooMatrixHandler('GSM1863750_tethered_rep1_contacts.txt', gcmapOut='GSM1863750_tethered_rep1_contacts.gcmap') >>> pair_map_handle.setGCMapOptions() >>> pair_map_handle.runConversion()
-
runConversion
()¶ Perform conversion and save to ccmap and/or gcmap file.
Read the input file, process the data, and convert it to ccmap or gcmap file. For output gcmap,
PairCooMatrixHandler.setGCMapOptions()
should be called to set the necessary options.
-
setGCMapOptions
(compression='lzf', generateCoarse=True, coarseningMethod='sum', replaceCMap=True)¶ Set options for output gcmap file
Parameters: - compression (str) – Compression method. Presently allowed :
lzf
for LZF compression andgzip
for GZIP compression. - generateCoarse (bool) – Also generates all coarser maps where resolutions will be coarsen by a factor of two, consecutively.
e.g.: In case of 10 kb input resolution, downsampled maps of
20kb
,40kb
,80kb
,160kb
,320kb
etc. will be generated until, map size is less than 500. - coarseningMethod (str) – Method of downsampling. Three accepted methods are
sum
: sum all values,mean
: Average of all values andmax
: Maximum of all values. - replaceCMap (bool) – Replace entire old ccmap data including resolutions and coarsened data.
- compression (str) – Compression method. Presently allowed :
HomerInputHandler class¶
-
class
HomerInputHandler
(inputFiles=None, inputCompressedFile=None, workDir=None, logHandler=None)¶ To import ccmap from Hi-C maps generated by HOMER
HOMER package generates the Hi-C interaction matrices in text file. This Hi-C interaction file can be imported using this class. HOMER format interaction matrix file may contain data for all the chromosomes while this class separately read and save matrix of each chromosome.
- To Instantiate this class, three scenarios are possible:
Text in Archive
: Both a list of input files and a compressed file is given. It means look for input files in compressed file.Text
: Only a list of input files is given. It means, read data directly from input file.Archive
: Only a compressed file is given. It means, read all files present in compressed files.
Parameters: - inputFiles (str or list) – Name of a input file or list of input files. If
None
, all files from compressed files are used as input files. - inputCompressedFile (str) – Name of input compressed file. Accepted formats:
tar.gz
,tar.bz2
andzip
. - workDir (str) – Directory where temporary files will be stored. If it is not provided, this value is taken from configuration file.
-
inputFileList
¶ list – List of input files. It could be
None
when not provided.
-
inputType
¶ str – Type of input. Three types:
Text
,Text in Archive
andArchive
are determined from user input.
-
inputCompressedFile
¶ str – Name of input compressed file. It could be
None
when not provided.
-
compressType
¶ str – Format of compressed file. It could be either
.tar
or.zip
orNone
-
compressHandle
¶ ZipFile or TarFile – An object to handle compressed file. It could be ZipFile or TarFile instance depending on compressed format.
-
workDir
¶ str – Directory where temporary files will be stored.
-
fIns
¶ list[output file stream] – Input file stream for each input files
-
chromList
¶ list[str] – List of chromosome found in input files
-
resolution
¶ str – Resolution of map
-
fTmpOutNames
¶ list[str] – List of temporary output files where data for each chromosomes are extracted separately
-
fTmpOut
¶ list[output file stream] – List of output file streams for respective temporary output files
-
save_ccmaps
(outdir, suffix=None)¶ Import and save ccmap file
Read input files, save data temporarily in text file for each chromosome and import these data to native ccmap format using
CooMatrixHandler
class.Note
- Output file names will be automatically generated as
<chromosome>_<resolution>.ccmap
format. e.g.chr12_10kb.ccmap
. - A suffix can be added to all files as
<chromosome>_<resolution>_<suffix>.ccmap
format. e.g. ifsuffix='_RawObserved'
, file name ischr12_10kb_RawObserved.ccmap
.
Parameters: - Output file names will be automatically generated as
-
save_gcmap
(outputFile, coarseningMethod='sum', compression='lzf')¶ To Save all Hi-C maps as a gcmap file
This function reads input files one by one and save it as a
.gcmap
file.Parameters: - outputFile (str) – Name of a output gcmap file.
- coarseningMethod (str) – Method of downsampling. Three accepted methods are
sum
: sum all values,mean
: Average of all values andmax
: Maximum of all values. - compression (str) – Compression method. Presently allowed :
lzf
for LZF compression andgzip
for GZIP compression.
BinsNContactFilesHandler class¶
-
class
BinsNContactFilesHandler
(binFile, contactFile, workDir=None, logHandler=None)¶ To import Hi-C map from bin and contact file in list format
- These types of files are appeared in following GEO data:
Parameters: -
binFile
¶ str – Bin file
-
contactFile
¶ str – Contact file in list format
-
ChromSize
¶ dict – Dictionary of Chromosome size
-
ChromBinsInfo
¶ dict – Dictionary of min (start) and max (end) bin number for each Chromosome
-
npyBinFileList
¶ dict – Dictionary containing tuple (memmap stream, Temporary numpy array file name)
-
binsize
¶ int – Bin size of Hi-C map
-
ccmaps
¶ dict – Dictionary of CCMAP instances for each Chromosome
-
save_ccmaps
(outdir, suffix=None)¶ Import and save ccmap file
Read input files, save data temporarily for each chromosome and import these data to native ccmap format.
- ..note::
- Output file names will be automatically generated as
<chromosome>_<resolution>.ccmap
format. e.g.chr12_10kb.ccmap
. - A suffix can be added to all files as
<chromosome>_<resolution><suffix>.ccmap
format. e.g. ifsuffix='_RawObserved'
, file name ischr12_10kb_RawObserved.ccmap
.
- Output file names will be automatically generated as
Parameters:
-
save_gcmap
(outputFile, coarseningMethod='sum', compression='lzf')¶ To Save all Hi-C maps as a gcmap file
This function reads input files one by one and save it as a
.gcmap
file.Parameters: - outputFile (str) – Name of a output gcmap file.
- coarseningMethod (str) – Method of downsampling. Three accepted methods are
sum
: sum all values,mean
: Average of all values andmax
: Maximum of all values. - compression (str) – Compression method. Presently allowed :
lzf
for LZF compression andgzip
for GZIP compression.
Other functions of importer module¶
-
gen_map_from_locations_value
(i, j, value, resolution=None, mapType='intra', workDir=None, logHandler=None)¶ To generate CCMAP object from three lists – i, j, value
Parameters: - i (list[int]) – List of first location from each pair
- j (list[int]) – List of second location from each pair
- resolution (str) – Resolution of Hi-C map
- mapType (str) – Hi-C map type:
intra
orinter
chromosomal map - value (list[float]) – List of values for respective location
- workDir (str) – Directory where temporary files will be stored. If it is not provided, this value is taken from configuration file.
Returns: ccMapObj – A CCMAP object
Return type: