Sra.py - Methods for dealing with short read archive files¶
Utility functions for dealing with SRA formatted files from the Short Read Archive.
Requirements: * fastq-dump >= 2.1.7
Code¶
-
Sra.
peek
(sra, outdir=None)¶ return the full file names for all files which will be extracted
- Parameters
outdir (path) – perform extraction in outdir. If outdir is None, the extraction will take place in a temporary directory, which will be deleted afterwards.
- Returns
files (list) – A list of fastq formatted files that are contained in the archive.
format (string) – The quality score format in the fastq formatted files.
-
Sra.
extract
(sra, outdir, tool='fastq-dump')¶ return statement for extracting the SRA file in outdir. possible tools are fastq-dump and abi-dump. Use abi-dump for colorspace
-
Sra.
prefetch
(sra)¶ Use prefetch from the SRA toolkit to download the local cache
-
Sra.
clean_cache
(sra)¶ Remove the specified SRA file from the cache.
-
Sra.
fetch_ENA
(dl_path, outdir, protocol='ascp')¶ Fetch fastq from ENA given accession
-
Sra.
fetch_ENA_files
(accession)¶ Get the names of the files matching the ENA accession
-
Sra.
fetch_TCGA_fastq
(acc, filename, token=None, outdir='.')¶ Get Fastq file from TCGA repository. Because of the nature of the TCGA repository it assumes certain things:
That data is paired-end fastq
That the files end in _1.fastq or _2.fastq
-
Sra.
fetch_TCGA_BAM
(acc, token, outdir='.', filter_bed=None)¶ Get BAM file from TCGA repository based on UUID. Will return statement and path/filename of downloaded file. A bed file may be provided to filter to remove contigs not present in the reference genome
-
Sra.
process_remote_BAM
(infile, token=None, outdir='.', filter_bed=None)¶ generate statement from .remote file