Ubuntu Manpage: gmap_build - Tool for genome database creation for GMAP or GSNAP

Provided by: gmap_2021-12-17+ds-3_amd64

NAME

       gmap_build - Tool for genome database creation for GMAP or GSNAP

SYNOPSIS

       gmap_build   [options...]   -d   <genome>   [-c   <transcriptome>  -T  <transcript_fasta>]
       <genome_fasta_files>

DESCRIPTION

       gmap_build: Builds a gmap database for a genome to be used by GMAP or GSNAP.  Part of GMAP
       package, version 2021-12-17.

       You  are  free  to  name  <genome> and <transcriptome> as you wish.  You will use the same
       names when performing alignments subsequently using GMAP or GSNAP.

       Note: If adding a transcriptome to an existing genome, then there is no  need  to  specify
       the  genome_fasta_files.   This  way  you can add transcriptome information to an existing
       genome database.

OPTIONS

       -D, --dir=STRING
              Destination directory for installation (defaults to gmapdb directory  specified  at
              configure time)

       -d, --genomedb=STRING
              Genome name (required)

       -n, --names=STRING
              Substitute names for contigs, provided in a file.

              The file can have two formats:

       1.     A  file  with one column per line, with each line corresponding to a FASTA file, in
              the order given to gmap_build.  The chromosome name for each  FASTA  file  will  be
              replaced  with  the  desired  chromosome name in the file.  Every chromosome in the
              FASTA must have a corresponding line in the file.  This is useful if  you  want  to
              rename chromosomes with a systematic numbering pattern.

       2.     A  file  with  two  columns  per line, separated by white space.  In each line, the
              original FASTA chromosome name should be in column 1  and  the  desired  chromosome
              name will be in column 2.

              The  meaning of file format 2 depends on whether --limit-to-names is specified.  If
              so, the genome build will be limited to those chromosomes in this file.  Otherwise,
              all  chromosomes  in the FASTA file will be included, but only those chromosomes in
              this file will be re-named, which provides  an  easy  way  to  change  just  a  few
              chromosome names.

              This  file  can  be  combined  with  the --sort=names option, in which the order of
              chromosomes is that given in the file.  In this  case,  every  chromosome  must  be
              listed  in  the file, and for chromosome names that should not be changed, column 2
              can be blank (or the same as column 1).  The option of a blank column 2 is  allowed
              only   when   specifying   --sort=names,  because  otherwise,  the  program  cannot
              distinguish between a 1-column and 2-column names file.

       -L, --limit-to-names
              Determines whether to limit the genome build to the lines  listed  in  the  --names
              file.  You can limit a genome build to certain chromosomes with this option, plus a
              --names file that either renames chromosomes, or  lists  the  same  names  in  both
              columns for the desired chromosomes.

       -k, --kmer=INT
              k-mer value for genomic index (allowed: 15 or less, default is 15)

       -q INT sampling interval for genomoe (allowed: 1-3, default 3)

       -s, --sort=STRING
              Sort  chromosomes  using  given  method:  none  - use chromosomes as found in FASTA
              file(s) (default) alpha - sort chromosomes  alphabetically  (chr10  before  chr  1)
              numeric-alpha - chr1, chr1U, chr2, chrM, chrU, chrX, chrY chrom - chr1, chr2, chrM,
              chrX, chrY, chr1U, chrU names - sort chromosomes based on file provided to  --names
              flag

       -g, --gunzip
              Files are gzipped, so need to gunzip each file first

       -E, --fasta-pipe=STRING
              Interpret argument as a command, instead of a list of FASTA files

       -Q, --fastq
              Files are in FASTQ format

       -R, --revcomp
              Reverse complement all contigs

       -w INT Wait (sleep) this many seconds after each step (default 2)

       -o, --circular=STRING
              Circular  chromosomes  (either  a  list  of  chromosomes separated by a comma, or a
              filename containing circular chromosomes, one per line).  If you  use  the  --names
              feature,  then  you  should  use  the  substitute  name  of the chromosome, not the
              original name, for this option.  (NOTE: This behavior is  different  from  previous
              versions, and starts with version 2020-10-20.)

       -2, --altscaffold=STRING
              File   with   alt  scaffold  info,  listing  alternate  scaffolds,  one  per  line,
              tab-delimited, with the following fields: (1) alt_scaf_acc,  (2)  parent_name,  (3)
              orientation,   (4)   alt_scaf_start,   (5)  alt_scaf_stop,  (6)  parent_start,  (7)
              parent_end.

       -e, --nmessages=INT
              Maximum number of messages (warnings, contig reports) to report (default 50)

   Options for older genome formats:
       -M, --mdflag=STRING
              Use MD file from NCBI for mapping contigs to chromosomal coordinates

       -C, --contigs-are-mapped
              Find a chromosomal region in each FASTA header line.  Useful for contigs that  have
              been mapped to chromosomal coordinates.  Ignored if the --mdflag is provided.

   Options for transcriptome-guided alignment:
       -c, --transcriptomedb=STRING
              Transcriptome name

       -T, --transcripts=FILE
              FASTA file containing transcripts (required if specifying --transcriptomedb)

       -t, --nthreads=INT
              Number of threads for GMAP alignment of transcripts to genome (default 8)

       Other tools of GMAP suite are located in /usr/lib/gmap