Provided by: gmap_2021-12-17+ds-3_amd64
NAME
gmap_build - Tool for genome database creation for GMAP or GSNAP
SYNOPSIS
gmap_build [options...] -d <genome> [-c <transcriptome> -T <transcript_fasta>] <genome_fasta_files>
DESCRIPTION
gmap_build: Builds a gmap database for a genome to be used by GMAP or GSNAP. Part of GMAP package, version 2021-12-17. You are free to name <genome> and <transcriptome> as you wish. You will use the same names when performing alignments subsequently using GMAP or GSNAP. Note: If adding a transcriptome to an existing genome, then there is no need to specify the genome_fasta_files. This way you can add transcriptome information to an existing genome database.
OPTIONS
-D, --dir=STRING Destination directory for installation (defaults to gmapdb directory specified at configure time) -d, --genomedb=STRING Genome name (required) -n, --names=STRING Substitute names for contigs, provided in a file. The file can have two formats: 1. A file with one column per line, with each line corresponding to a FASTA file, in the order given to gmap_build. The chromosome name for each FASTA file will be replaced with the desired chromosome name in the file. Every chromosome in the FASTA must have a corresponding line in the file. This is useful if you want to rename chromosomes with a systematic numbering pattern. 2. A file with two columns per line, separated by white space. In each line, the original FASTA chromosome name should be in column 1 and the desired chromosome name will be in column 2. The meaning of file format 2 depends on whether --limit-to-names is specified. If so, the genome build will be limited to those chromosomes in this file. Otherwise, all chromosomes in the FASTA file will be included, but only those chromosomes in this file will be re-named, which provides an easy way to change just a few chromosome names. This file can be combined with the --sort=names option, in which the order of chromosomes is that given in the file. In this case, every chromosome must be listed in the file, and for chromosome names that should not be changed, column 2 can be blank (or the same as column 1). The option of a blank column 2 is allowed only when specifying --sort=names, because otherwise, the program cannot distinguish between a 1-column and 2-column names file. -L, --limit-to-names Determines whether to limit the genome build to the lines listed in the --names file. You can limit a genome build to certain chromosomes with this option, plus a --names file that either renames chromosomes, or lists the same names in both columns for the desired chromosomes. -k, --kmer=INT k-mer value for genomic index (allowed: 15 or less, default is 15) -q INT sampling interval for genomoe (allowed: 1-3, default 3) -s, --sort=STRING Sort chromosomes using given method: none - use chromosomes as found in FASTA file(s) (default) alpha - sort chromosomes alphabetically (chr10 before chr 1) numeric-alpha - chr1, chr1U, chr2, chrM, chrU, chrX, chrY chrom - chr1, chr2, chrM, chrX, chrY, chr1U, chrU names - sort chromosomes based on file provided to --names flag -g, --gunzip Files are gzipped, so need to gunzip each file first -E, --fasta-pipe=STRING Interpret argument as a command, instead of a list of FASTA files -Q, --fastq Files are in FASTQ format -R, --revcomp Reverse complement all contigs -w INT Wait (sleep) this many seconds after each step (default 2) -o, --circular=STRING Circular chromosomes (either a list of chromosomes separated by a comma, or a filename containing circular chromosomes, one per line). If you use the --names feature, then you should use the substitute name of the chromosome, not the original name, for this option. (NOTE: This behavior is different from previous versions, and starts with version 2020-10-20.) -2, --altscaffold=STRING File with alt scaffold info, listing alternate scaffolds, one per line, tab-delimited, with the following fields: (1) alt_scaf_acc, (2) parent_name, (3) orientation, (4) alt_scaf_start, (5) alt_scaf_stop, (6) parent_start, (7) parent_end. -e, --nmessages=INT Maximum number of messages (warnings, contig reports) to report (default 50) Options for older genome formats: -M, --mdflag=STRING Use MD file from NCBI for mapping contigs to chromosomal coordinates -C, --contigs-are-mapped Find a chromosomal region in each FASTA header line. Useful for contigs that have been mapped to chromosomal coordinates. Ignored if the --mdflag is provided. Options for transcriptome-guided alignment: -c, --transcriptomedb=STRING Transcriptome name -T, --transcripts=FILE FASTA file containing transcripts (required if specifying --transcriptomedb) -t, --nthreads=INT Number of threads for GMAP alignment of transcripts to genome (default 8) Other tools of GMAP suite are located in /usr/lib/gmap