Glossary¶
File formats¶
- yaml
Language to serialize objects. Used in the CGAT testing framework. (YAML).
- bam
Format to store genomic alignments in a compressed format. (BAM).
- bed
File containing genomic intervals. (BED).
- vcf
- gtf
General transfer format. Format to store genes and transcripts.
- gff
- bigwig
Compressed format for displaying numerical values across genomic ranges (BIGWIG).
- fasta
Sequence format.
- wiggle
Format for displaying numerical values across genomic ranges (Wiggle).
- psl
Genomic alignment format. The format is described in detail (PSL.
- sam
Format to store genomic alignments (SAM).
- gdl
gdl
- tsv
Tab separated values. In these tables, records are separated by new-line characters and fields by tab characters. Lines with comments are started by the
#
character and are ignored. The first uncommented line should contain the column headers. For example:# This is a comment gene_id length gene1 1000 gene2 2000 # Another comment
- svg
pass
- edge list
pass
- fastq
Sequence format containing quality scores, more background is here
- sra
sra
- axt
axt
- agp
- rdf
Other terms¶
- test directory
Directory that contains the
test.yaml
, input and reference files for testing scripts.- experiment
experiment
- replicate
replicate
- graph
graph
- track
track
- graph
graph
- submit host
pass
- execution host
pass
- edge list
pass
- task
pass
- sphinxreport
sphinxreport
- query
pass
- target
pass
- code directory
pass
- go
pass
- goslim
pass
- fastq
pass
- tss
Transcription start site
- production pipeline
A pipeline that performs common tasks on a certain type of data. The idea of a production pipeline is to provide common preprocessing of data and a first look. A project pipeline might then take data from one or more production pipeline to glean biological insight.
- project pipeline
A pipeline that is project specific. Usually code is developed first inside a project pipeline. When it becomes generally useful, it may be refactored into a production pipeline.
- stdin
Unix standard input. Most CGAT tools read data from stdin.
- stdout
Unix standard output. Most CGAT tools output data to stdout.
- stderr
Unix standard error. This is where errors go.
- loglevel
Verbosity of logging information. The logging level can be determined by the
--verbose
option. A level of0
means no logging output, while1
is information messages only, while2
outputs also debugging information.