Provided by: biosquid_1.9g+cvs20050121-10_amd64 bug

NAME

       sreformat - convert sequence file to different format

SYNOPSIS

       sreformat [options] format seqfile

DESCRIPTION

       sreformat reads the sequence file seqfile in any supported format, reformats it into a new
       format specified by format, then prints the reformatted text.

       Supported input formats include (but are not limited  to)  the  unaligned  formats  FASTA,
       Genbank,  EMBL,  SWISS-PROT, PIR, and GCG, and the aligned formats Stockholm, Clustal, GCG
       MSF, and Phylip.

       Available  unaligned  output  file  format  codes  include  fasta  (FASTA  format);   embl
       (EMBL/SWISSPROT  format);  genbank  (Genbank  format);  gcg  (GCG single sequence format);
       gcgdata (GCG flatfile  database  format);  pir  (PIR/CODATA  flatfile  format);  raw  (raw
       sequence, no other information).

       The  available aligned output file format codes include stockholm (PFAM/Stockholm format);
       msf (GCG MSF format); a2m (an aligned FASTA format); PHYLIP (Felsenstein's PHYLIP format);
       and  clustal  (Clustal  V/W/X  format);  and  selex  (the  old  SELEX/HMMER/Pfam annotated
       alignment format);

       All thee codes are interpreted case-insensitively (e.g. MSF, Msf, or msf all work).

       Unaligned format files cannot be reformatted to aligned formats.  However, aligned formats
       can be reformatted to unaligned formats -- gap characters are simply stripped out.

       This  program  was  originally named reformat, but that name clashes with a GCG program of
       the same name.

OPTIONS

       -d     DNA; convert U's to T's, to make sure a nucleic acid sequence is shown as  DNA  not
              RNA. See -r.

       -h     Print  brief  help;  includes  version number and summary of all options, including
              expert options.

       -l     Lowercase; convert all sequence residues to lower case.  See -u.

       -n     For DNA/RNA sequences, converts any character that's not unambiguous RNA/DNA  (e.g.
              ACGTU/acgtu)  to  an  N. Used to convert IUPAC ambiguity codes to N's, for software
              that can't handle all IUPAC codes (some public RNA folding codes, for example).  If
              the  file is an alignment, gap characters are also left unchanged. If sequences are
              not nucleic acid sequences, this option will corrupt  the  data  in  a  predictable
              fashion.

       -r     RNA;  convert  T's to U's, to make sure a nucleic acid sequence is shown as RNA not
              DNA. See -d.

       -u     Uppercase; convert all sequence residues to upper case.  See -l.

       -x     For DNA sequences, convert non-IUPAC characters (such as X's) to N's.  This is  for
              compatibility  with  benighted  people  who  insist on using X instead of the IUPAC
              ambiguity character N. (X is for ambiguity in an amino acid residue).

              Warning: like the -n option, the code doesn't check that you are actually giving it
              DNA.  It  simply  literally  just  converts  non-IUPAC  DNA symbols to N. So if you
              accidentally give it protein sequence, it will happily  convert  most  every  amino
              acid residue to an N.

EXPERT OPTIONS

       --gapsym <c>
              Convert  all  gap  characters to <c>.  Used to prepare alignment files for programs
              with strict requirements for gap symbols. Only makes sense if the input seqfile  is
              an alignment.

       --informat <s>
              Specify  that  the sequence file is in format <s>, rather than allowing the program
              to autodetect the file format. Common examples include  Genbank,  EMBL,  GCG,  PIR,
              Stockholm,  Clustal,  MSF,  or PHYLIP; see the printed documentation for a complete
              list of accepted format names.

       --mingap
              If seqfile is an alignment, remove any columns that contain  100%  gap  characters,
              minimizing  the overall length of the alignment.  (Often useful if you've extracted
              a subset of aligned sequences from a larger alignment.)

       --nogap
              Remove any aligned columns that contain any gap symbols at all. Useful as a prelude
              to  phylogenetic  analyses,  where you only want to analyze columns containing 100%
              residues, so you want to strip out any columns with gaps in them.  Only makes sense
              if the file is an alignment file.

       --pfam For  SELEX  alignment  output  format  only,  put the entire alignment in one block
              (don't wrap into multiple blocks).  This is close to the format used internally  by
              Pfam in Stockholm and Cambridge.

       --sam  Try to convert gap characters to UC Santa Cruz SAM style, where a .  means a gap in
              an insert column, and a - means a deletion in a consensus/match column.  This  only
              works  for  converting  aligned  file  formats,  and  only if the alignment already
              adheres to the SAM  convention  of  upper  case  for  residues  in  consensus/match
              columns, and lower case for residues in insert columns. This is true, for instance,
              of all alignments produced by old versions of HMMER.  (HMMER2  produces  alignments
              that  adhere  to  SAM's conventions even in gap character choice.)  This option was
              added to allow Pfam alignments to be reformatted into something more  suitable  for
              profile HMM construction using the UCSC SAM software.

       --samfrac <x>
              Try  to convert the alignment gap characters and residue cases to UC Santa Cruz SAM
              style, where a .  means a gap in an insert column and a - means  a  deletion  in  a
              consensus/match  column,  and  upper  case means match/consensus residues and lower
              case means inserted resiudes. This will  only  work  for  converting  aligned  file
              formats,  but  unlike the --sam option, it will work regardless of whether the file
              adheres to the upper/lower case residue convention. Instead, any column  containing
              more  than a fraction <x> of gap characters is interpreted as an insert column, and
              all other columns are interpreted as match columns.  This option was added to allow
              Pfam  alignments  to  be  reformatted  into something more suitable for profile HMM
              construction using the UCSC SAM software.

       --wussify
              Convert RNA secondary structure annotation strings (both consensus and  individual)
              from old "KHS" format, ><, to the new WUSS notation, <>. If the notation is already
              in WUSS format, this option will screw it  up,  without  warning.  Only  SELEX  and
              Stockholm format files have secondary structure markup at present.

       --dewuss
              Convert  RNA secondary structure annotation strings from the new WUSS notation, <>,
              back to the old KHS format, ><. If the annotation is already in  KHS,  this  option
              will  corrupt  it,  without  warning.   Only  SELEX and Stockholm format files have
              secondary structure markup.

SEE ALSO

       afetch(1), alistat(1), compalign(1), compstruct(1), revcomp(1),  seqsplit(1),  seqstat(1),
       sfetch(1), shuffle(1), sindex(1), stranslate(1), weight(1).

AUTHOR

       Biosquid  and  its  documentation  are  Copyright (C) 1992-2003 HHMI/Washington University
       School of Medicine Freely distributed under the  GNU  General  Public  License  (GPL)  See
       COPYING in the source code distribution for more details, or contact me.

       Sean Eddy
       HHMI/Department of Genetics
       Washington University School of Medicine
       4444 Forest Park Blvd., Box 8510
       St Louis, MO 63108 USA
       Phone: 1-314-362-7666
       FAX  : 1-314-362-2157
       Email: eddy@genetics.wustl.edu