bed2fasta.py - get sequences from bed file

Tags

Genomics Intervals Sequences Conversion BED FASTA

Purpose

This script outputs nucleotide sequences for intervals within a bed formatted file using a corresponding genome file.

Usage

A required input to bed2fasta.py is a cgat indexed genome. To obtain an idexed human reference genome we would type

Example::

cat hg19.fasta | index_fasta.py hg19 > hg19.log

This file would then serve as the –genome-file when we wish to extract sequences from a bed formatted file.

For example we could now type:

cat in.bed | python bed2fasta.py --genome-file hg19 > out.fasta

Where we take a set of genomic intervals (e.g. from a human ChIP-seq experiment) and output their respective nucleotide sequences.

Type:

python bed2fasta.py --help

for command line help.

Command line options

usage: bed2fasta [-h] [-g GENOME_FILE] [-m {dust,dustmasker,softmask,none}]
                 [--output-mode {intervals,leftright,segments}]
                 [--min-sequence-length MIN_LENGTH]
                 [--max-sequence-length MAX_LENGTH]
                 [--extend-at {none,3,5,both,3only,5only}]
                 [--extend-by EXTEND_BY] [--use-strand] [--timeit TIMEIT_FILE]
                 [--timeit-name TIMEIT_NAME] [--timeit-header]
                 [--random-seed RANDOM_SEED] [-v LOGLEVEL]
                 [--log-config-filename LOG_CONFIG_FILENAME]
                 [--tracing {function}] [-? ?] [-I STDIN] [-L STDLOG]
                 [-E STDERR] [-S STDOUT]
bed2fasta: error: argument -?: expected one argument