Provided by: gmap_2011-11-30-1_amd64 bug


       gmap_setup - create a genome database for GMAP or GSNAP


       gmap_setup -dgenomename [-Ddestdir] [-oMakefile] FASTA


       -d     genome name

       -D     destination  directory  for installation (defaults to gmapdb directory specified at
              configure time)

       -o     name of output Makefile (default is "Makefile.<genome>")

       -M     use coordinates from an .md file (e.g., file from NCBI)

       -C     try to parse chromosomal coordinates from each FASTA header

       -E     interpret argument as a command, instead of a list of FASTA files

       -O     order chromosomes in numeric/alphabetic order (0 = no, 1 = yes (default))

   Advanced options
       -W     write some output directly to file, instead of  using  RAM  (use  only  if  RAM  is

       -q     GMAP indexing interval (default: 3 nt)

       -Q     PMAP indexing interval (default: 6 aa)


       If  you  want  to treat each FASTA entry as a separate chromosome (either because it is in
       fact  an  entire  chromosome  or  because  you  have  contigs  without   any   chromosomal
       information), you can simply call gmap_setup like this:

              gmap_setup -d <genome> <fasta_file>...

       The  accession of each FASTA header (the word following each ">") will be the name of each
       chromosome. GMAP can handle an unlimited number of "chromosomes",  with  arbitrarily  long
       names.  In  this  way,  GMAP  could  be used as a general search program for near-identity
       matches against a FASTA file.

       -M and -C
              If your sequences represent contigs  that  have  mapping  information  to  specific
              chromosomal  regions,  then  you  can  have  gmap_setup  try to read each header to
              determine its chromosomal region (the -C flag) or read an .md  file  that  contains
              information  about  chromosomal  regions  (the  -M  flag).  The .md files are often
              provided in NCBI releases, but since the  formats  change  often,  gmap_setup  will
              prompt you to make sure it parses it correctly.

       -E     If  you  need  to  pre-process the FASTA files before using these programs, perhaps
              because they are compressed or because you need to insert  chromosomal  information
              in  the  header  lines,  you can specify a command instead of multiple fasta_files,
              like these examples:

               gmap_setup -d <genome> -E 'gunzip -c genomefiles.gz'
               gmap_setup -d <genome> -E 'cat *.fa | ./'

       -W     The gmap_setup process works best if you have a computer with enough  RAM  to  hold
              the entire genome (e.g., 3 gigabytes for a human- or mouse-sized genome). Since the
              resulting genome files work across all machine  architectures,  you  can  find  any
              machine  with  sufficient RAM to build the genome files and then transfer the files
              to another machine. (GMAP itself runs fine on machines with limited  RAM.)  If  you
              cannot find any machine with sufficient RAM for gmap_setup, you can run the program
              with the -W flag to write the files directly, but this can be very slow.

       -q and -Q
              If you specify a smaller interval (for example, 3 for the GMAP interval),  you  can
              create  a  higher-resolution  database,  which  can  be  useful  for  mapping small
              oligomers (smaller than 18 nt). However, the corresponding genome index files  will
              be  larger  (twice  as big if you specify -q 3). These index files may exceed the 2
              gigabyte file offset limit on some computers, and will therefore fail  to  work  on
              those computers.


       Thomas D. Wu and Colin K. Watanabe


       Report bugs to Thomas Wu <>.


       Copyright 2005 Genentech, Inc. All rights reserved.


       gmap(1), gsnap(1)