Provided by: ncbi-tools-bin_6.1.20120620-7_amd64 bug

NAME

       fa2htgs - formatter for high throughput genome sequencing project submissions

SYNOPSIS

       fa2htgs  [-]  [-6 str]  [-7 str]  [-A filename]  [-C str] [-D] [-L filename] [-M str] [-N]
       [-O filename] [-P str] [-Q filename] [-S str] [-T filename] [-X] [-a str] [-b N]  [-c str]
       [-d str]  [-e filename]  [-f]  -g str [-h str] [-i filename] [-k str] [-l N] [-m] [-n str]
       [-o filename] [-p N] [-q] [-r str] -s str [-t filename] [-u] [-v] [-w] [-x str]

DESCRIPTION

       fa2htgs is a program used to generate Seq-submits (an ASN.1 sequence submission file)  for
       high throughput genome sequencing projects.

       fa2htgs will read a FASTA file (or an Ace Contig file with Phrap sequence quality values),
       a Sequin submission template file, (to  get  contact  and  citation  information  for  the
       submission),  and  a series of command line arguments (see below).  This program will then
       combines these information to make a  submission  suitable  for  GenBank.  Once  you  have
       generated your submission file, you need to follow the submission protocol (see the README
       present on your FTP account or mailed out to your Center).

       fa2htgs is intended for the automation by  scripts  for  bulk  submission  of  unannotated
       genome  sequence.  It  can  easily  be extended from its current simple form to allow more
       complicated processing.  A  submission  prepared  with  fa2htgs  can  also  be  read  into
       Psequin(1), and then annotated more extensively.

       Questions  and  concerns about this processing protocol, or how to use this tool should be
       forwarded to <htgs@ncbi.nlm.nih.gov>.

OPTIONS

       A summary of options is included below.

       -      Print usage message

       -6 str SP6 clone (e.g., Contig1,left)

       -7 str T7 clone (e.g., Contig2,right)

       -A filename
              Filename for accession list input (mutually exclusive with -T and -i).   The  input
              file contains a tab-delimited table with three to five columns, which are accession
              number, start position, stop position, and  (optionally)  length  and  strand.   If
              start  >  stop,  the  minus  strand  on the referenced accession is used.  A gap is
              indicated by the word "gap" instead of an accession,  0  for  the  start  and  stop
              positions, and a number for the length.

       -C str Clone library name (will appear as /clone-lib="str" on the source feature)

       -D     HTGS_DRAFT sequence

       -L filename
              Read  phrap  contig  order from filename.  This is a tab-delimited file that can be
              used to drive the  order  of  contigs  (normally  specified  by  -P),  as  well  as
              indicating  the  SP6 and T7 ends.  It can also be used when contigs are known to be
              in opposite orientation.  For example:

                  Contig2     +       1       SP6     left
                  Contig3     +       1
                  Contig1     -               T7      right

              The first column is the contig name, the second is the orientation,  the  third  is
              the  fragment_group,  the  fourth  indicates  the SP6 or T7 end, and the fifth says
              which side of SP6 or T7 end had vector removed.

       -M str Map name (will appear as /map="str" on the source feature)

       -N     Annotate assembly_fragments

       -O filename
              Read comment from filename (100-character-per-line maximum; ~ is a linebreak and `~
              is a literal ~.  You can check the format with PSequin(1).)

       -P str Contigs  to  use,  separated by commas.  If -P is not indicated with the -T option,
              then the fragments will go in in the order that they are in the ace file (which  is
              appropriate  for a phase 1 record, but not for a phase 2 or 3).  If you need to set
              the order of the segments of the ace file, you need to set it  with  the  -P  flag,
              like this: -P "Contig1,Contig4,Contig3,Contig2,Contig5"

       -Q filename
              Read quality scores from filename

       -S str Strain name

       -T filename
              Filename for phrap input (mutually exclusive with -A and -i)

       -X     The  coordinates in the input file are on the resulting segmented sequence.  (Bases
              1 through n of each accession are used.)  Otherwise, the  coordinates  are  on  the
              individual accessions, which need not start at base 1 of the record.

       -a str GenBank accession; use if and only if updating a sequence.

       -b N   Gap length (default = 100; anything from 0 to 1000000000 is legal)

       -c str Clone name (will appear as /clone in the source feature; can be the same as -s)

       -d str Title for sequence (will appear in GenBank DEFINITION line)

       -e filename
              Log errors to filename

       -f     htgs_fulltop keyword

       -g str Genome Center tag (probably the same as your login name on the NCBI FTP server)

       -h str Chromosome (will appear as /chromosome in the source feature)

       -i filename
              Filename for fasta input (default is stdin; mutually exclusive with -A and -T)

       -k str Add the supplied string as a keyword.

       -l N   Length  of  sequence  in bp (default = 0). The length is checked against the actual
              number of bases we get. For phase 1 and 2 sequence it is also used to estimate  gap
              lengths.  For  phase  1 and 2 records, it is important to use a number GREATER than
              the amount of provided nucleotide, otherwise this will generate false `gaps'.  Here
              is  assumed that the putative full length of the BAC or cosmid will be used.  There
              should be at least 20 to 30 `n' in between the segments (you can check for these in
              Sequin), as this will ensure proper behavior when this sequence is used with BLAST.
              Otherwise `artifactual' unrelated segment neighbors may be brought  into  proximity
              of each other.

       -m     Take comment from template

       -n str Organism name (default = Homo sapiens)

       -o filename
              Filename for asn.1 output (default = stdout)

       -p N   HTGS phase:
              1      A  collection  of  unordered contigs with gaps of unknown length.  A Phase 1
                     record must at the very least have two segments with one gap.  (default)
              2      A series of ordered contigs, possibly with known gap lengths.  This could be
                     a single sequence without gaps, if the sequence has ambiguities to resolve.
              3      A   single   contiguous  sequence.   This  sequence  is  finished,  but  not
                     necessarily annotated.

       -q     htgs_cancelled keyword

       -r str Remark for update (brief comment describing the nature of the update, such as  "new
              sequence", "new citation", or "updated features")

       -s str Sequence  name.   The  sequence  must  have a name that is unique within the genome
              center. We use the combination of the genome center  name  (-g  argument)  and  the
              sequence  name  (-s)  to track this sequence and to talk to you about it.  The name
              can have any form you like but must be unique within your center.

       -t filename
              Filename for Seq-submit template (default = template.sub)

       -u     Take biosource from template

       -v     htgs_activefin keyword

       -w     Whole Genome Shotgun flag

       -x str Secondary accession numbers, separated by commas, s.t. U10000,L11000.

              In some cases a large segment will supersede another or group  of  other  accession
              numbers  (records).   These records which are no longer wanted in GenBank should be
              made secondary. Using the -x argument you can list the Accession Numbers  you  want
              to  make  secondary.   This will instruct us to remove the accession number(s) from
              GenBank, and will no longer be part of the GenBank release. They  will  nonetheless
              be available from Entrez.

              GREAT  CARE  should be taken when using this argument!!!  Improper use of accession
              numbers here will result in the inappropriate withdrawal of  GenBank  records  from
              GenBank,  EMBL  and DDBJ.  We provide this parameter as a convenience to submitting
              centers, but this may need to be removed if it is not used carefully.

AUTHOR

       The National Center for Biotechnology Information.

SEE ALSO

       Psequin(1), /usr/share/doc/ncbi-tools-bin/README.fa2htgs.gz