Provided by: spaln_2.4.0+dfsg-2build1_amd64 bug

NAME

       sortgrcd - Postprocess of the output of spaln with -O12 option, Version 2

SYNOPSIS

       sortgrcd [options] xxx1.grd(.gz) [xxx2.grd(.gz) ...]

DESCRIPTION

       sortgrcd  is  used  to recover the output of spaln with -O12 option, to apply some filtering, and also to
       rearrange the outputs of multiple Spaln runs.

OPTIONS

       -C#    Minimum cover rate = % nucleotides in predicted exons / length of query (x 3 if query is  protein)
              (0-100)

       -F#    Filter level #=0: no; #=1: mild; #=2: medium; #=3: stringent (0)

       -H#    Minimum alignment score

       -M#    Maximum total number of mismatches near boundaries

       -N#    Maximum number of non-canonical boundaries

       -O#    Output format. 0:Gff3, 4:Native, 5:Intron 15: unique intron

       -P#    Minimum overall % sequence identity (0-100)

       -S[a|b|c|r]
              sort  order  of chromosomes/contigs a:alphabetical, b:abundance, c:input order r:reverse for minus
              strand

       -U#    Maximum total number of unpaired bases in gaps

       -V#    Maximum internal memory size used for core sort. Suffix k (or K) or m (or M) may  be  attached  to
              specify kilo or mega bytes.

       -m#    Maximum number of mismatches within 10bp from the nearest exon-intron boundary

       -n#    Allow non-canonical (other than GT..AG, GC..AG, AT..AC) intron ends (0: no)

       -u#    Maximum number of unpaired (gap) sites within 10bp from the nearest exon-intron boundary

COMMENTS

       The  output  format of spaln -O12 has been changed since version 2; in addition to *.grd and *.erd files,
       *.qrd file will be  generated. This change has removed the limitations on the lengths of the  identifiers
       of both target (genomic) and query sequences. The database files that was specified by -d option of spaln
       must not be changed before running sortgrcd.

       By default, no filter listed above is applied

       When the output of Spaln is separated in several  files,  the  combined  results  are  subjected  to  the
       sorting.  Although  *.grd(.gz) files are assigned as the argument, there must be corresponding *.erd(.gz)
       and *.qrd(.gz) files in the same directory.

       In the default output format, the gene structure corresponding to each transcript is delimited by a  line
       starting  with  `@',  whereas  each  gene locus is delimited by a line starting with `!'. Two transcripts
       belong to the same locus if their corresponding genomic regions overlap by at least one nucleotide on the
       same strand.

       The -O0, -O3, -O4, -O5, -O6, and -O7 options work in the same manner as those of spaln.

       In     particular,    with    -O0    option,    the    outputs    follow    the    Gff3    gene    format
       (http://www.sequenceontology.org/gff3.shtml) where a gene locus is defined as described above.

       With -O4 (default) and -O5 options, the  outputs  follow  the  exon-oriented  and  intron-oriented  spaln
       formats, respectively.

       With -O15 option, introns are uniqued, i.e., introns inferred from different transcripts with the same 5'
       and 3' boundaries are output only once.

REFERENCES

       (1) "A Space-Efficient and  Accurate  Method  for  Mapping  and  Aligning  cDNA  Sequences  onto  Genomic
       Sequence", O. Gotoh, Nucleic Acid Res., 36 (8), 2630-2638 (2008).
       (2)  "Direct Mapping and Alignment of Protein Sequences onto Genomic Sequence", O. Gotoh, Bioinformatics,
       24 (21) 2438-2444 (2008).

AUTHOR

       Osamu Gotoh <o.gotoh@aist.go.jp>

                                                   2018-09-06                                        sortgrcd(1)