Provided by: spaln_2.4.13f+dfsg-1_amd64 bug

NAME

       sortgrcd - Postprocess of the output of spaln with -O12 option, Version 2

SYNOPSIS

       sortgrcd [options] xxx1.grd(.gz) [xxx2.grd(.gz) ...]

DESCRIPTION

       sortgrcd is used to recover the output of spaln with -O12 option, to apply some filtering,
       and also to rearrange the outputs of multiple Spaln runs.

OPTIONS

       -C#    Minimum cover rate = % nucleotides in predicted exons / length of  query  (x  3  if
              query is protein) (0-100)

       -F#    Filter level #=0: no; #=1: mild; #=2: medium; #=3: stringent (0)

       -H#    Minimum alignment score

       -M#    Maximum total number of mismatches near boundaries

       -N#    Maximum number of non-canonical boundaries

       -O#    Output format. 0:Gff3, 4:Native, 5:Intron 15: unique intron

       -P#    Minimum overall % sequence identity (0-100)

       -S[a|b|c|r]
              sort  order  of  chromosomes/contigs  a:alphabetical,  b:abundance,  c:input  order
              r:reverse for minus strand

       -U#    Maximum total number of unpaired bases in gaps

       -V#    Maximum internal memory size used for core sort. Suffix k (or K) or m (or M) may be
              attached to specify kilo or mega bytes.

       -m#    Maximum number of mismatches within 10bp from the nearest exon-intron boundary

       -n#    Allow non-canonical (other than GT..AG, GC..AG, AT..AC) intron ends (0: no)

       -u#    Maximum  number  of  unpaired  (gap) sites within 10bp from the nearest exon-intron
              boundary

COMMENTS

       The output format of spaln -O12 has been changed since version 2; in addition to *.grd and
       *.erd files, *.qrd file will be  generated. This change has removed the limitations on the
       lengths of the identifiers of both target (genomic)  and  query  sequences.  The  database
       files  that  was  specified  by  -d  option  of  spaln  must not be changed before running
       sortgrcd.

       By default, no filter listed above is applied

       When the output of Spaln is separated in several files, the combined results are subjected
       to  the  sorting.  Although  *.grd(.gz)  files are assigned as the argument, there must be
       corresponding *.erd(.gz) and *.qrd(.gz) files in the same directory.

       In the default output format, the gene  structure  corresponding  to  each  transcript  is
       delimited  by  a  line  starting  with `@', whereas each gene locus is delimited by a line
       starting with `!'. Two transcripts belong to the same locus if their corresponding genomic
       regions overlap by at least one nucleotide on the same strand.

       The -O0, -O3, -O4, -O5, -O6, and -O7 options work in the same manner as those of spaln.

       In   particular,   with   -O0   option,   the   outputs   follow   the  Gff3  gene  format
       (http://www.sequenceontology.org/gff3.shtml) where a gene locus is  defined  as  described
       above.

       With  -O4  (default)  and  -O5  options,  the outputs follow the exon-oriented and intron-
       oriented spaln formats, respectively.

       With -O15 option, introns are uniqued, i.e., introns inferred from  different  transcripts
       with the same 5' and 3' boundaries are output only once.

REFERENCES

       (1)  "A  Space-Efficient  and Accurate Method for Mapping and Aligning cDNA Sequences onto
       Genomic Sequence", O. Gotoh, Nucleic Acid Res., 36 (8), 2630-2638 (2008).
       (2) "Direct Mapping and Alignment of Protein Sequences onto Genomic Sequence",  O.  Gotoh,
       Bioinformatics, 24 (21) 2438-2444 (2008).

AUTHOR

       Osamu Gotoh <o.gotoh@aist.go.jp>

                                            2018-09-06                                sortgrcd(1)