Ubuntu Manpage: ngsfilter - description of ngsfilter

Provided by: obitools_1.2.12+dfsg-2_amd64

NAME

       ngsfilter - description of ngsfilter

       To distinguish between sequences from different PCR products pooled in the same sequencing
       library, pairs of small DNA sequences  (call  tags,  see  the  oligoTag  command  and  its
       associated  paper for more informations on the design of such tags) can be concatenated to
       the PCR primers.

       ngsfilter takes as input sequence record files and a file  describing  the  DNA  tags  and
       primers  sequences  used  for  each  PCR  sample. ngsfilter allows to demultiplex sequence
       records file by identifying these DNA tags and the primers.

       ngsfilter requires a sample description file containing the description of the primers and
       tags  associated to each sample (specified by option -t). The sample description file is a
       text file where each line describes one sample. Columns are  separated  by  space  or  tab
       characters.  Lines beginning with the ‘#’ character will be considered as commentary lines
       and will simply be ignored by ngsfilter.

       Here is an example of a sample description file:

          #exp   sample     tags                   forward_primer       reverse_primer              extra_information
          gh     01_11a     cacgcagtc:cacgcatcg    GGGCAATCCTGAGCCAA    CCATTGAGTCTCTGCACCTATC    F @ community=Festuca; bucket=1; extraction=1;
          gh     01_12a     cacgcatcg:cacgcagtc    GGGCAATCCTGAGCCAA    CCATTGAGTCTCTGCACCTATC    F @ community=Festuca; bucket=1; extraction=2;
          gh     01_21a     cacgcgcat:cacgctact    GGGCAATCCTGAGCCAA    CCATTGAGTCTCTGCACCTATC    F @ community=Festuca; bucket=2; extraction=1;
          gh     01_22a     cacgctact:cacgcgcat    GGGCAATCCTGAGCCAA    CCATTGAGTCTCTGCACCTATC    F @ community=Festuca; bucket=2; extraction=2;
          gh     02_11a     cacgctgag:cacgtacga    GGGCAATCCTGAGCCAA    CCATTGAGTCTCTGCACCTATC    F @ community=Festuca; bucket=1; extraction=1;
          gh     02_12a     cacgtacga:cacgctgag    GGGCAATCCTGAGCCAA    CCATTGAGTCTCTGCACCTATC    F @ community=Festuca; bucket=1; extraction=2;

       The results consist of sequence records,  printed  on  the  standard  output,  with  their
       sequence  trimmed  of the primers and tags and annotated with the corresponding experiment
       and sample (and possibly some extra  informations).  Sequences  for  which  the  tags  and
       primers  have  not  been well identified, and which are thus unassigned to any sample, are
       stored in a file if option -u is  specified  and  tagged  as  erroneous  sequences  (error
       attribute) by ngsfilter.

NGSFILTER SPECIFIC OPTIONS

       -t, --tag-list
              Used  to  specify  the file containing the samples description (with tags, primers,
              sample names,…)

       -u, --unidentified
              Filename used to store the sequences unassigned to any sample

       -e, --error
              Used to specify the number of errors allowed for matching primers [default = 2]

OPTIONS TO SPECIFY INPUT FORMAT

   Restrict the analysis to a sub-part of the input file
       --skip <N>
              The N first sequence records of the file are discarded from the  analysis  and  not
              reported to the output file

       --only <N>
              Only  the N next sequence records of the file are analyzed. The following sequences
              in the file are neither analyzed, neither reported to the output file.  This option
              can be used conjointly with the –skip option.

   Sequence annotated format
       --genbank
              Input file is in genbank format.

       --embl Input file is in embl format.

   fasta related format
       --fasta
              Input file is in fasta format (including OBITools fasta extensions).

   fastq related format
       --sanger
              Input  file  is  in  Sanger  fastq  format  (standard  fastq  used  by  HiSeq/MiSeq
              sequencers).

       --solexa
              Input file is in fastq format produced by Solexa (Ga IIx) sequencers.

   ecoPCR related format
       --ecopcr
              Input file is in ecoPCR format.

       --ecopcrdb
              Input is an ecoPCR database.

   Specifying the sequence type
       --nuc  Input file contains nucleic sequences.

       --prot Input file contains protein sequences.

OPTIONS TO SPECIFY OUTPUT FORMAT

   Standard output format
       --fasta-output
              Output sequences in OBITools fasta format

       --fastq-output
              Output sequences in Sanger fastq format

   Generating an ecoPCR database
       --ecopcrdb-output=<PREFIX_FILENAME>
              Creates an ecoPCR database from sequence records results

   Miscellaneous option
       --uppercase
              Print sequences in upper case (default is lower case)

COMMON OPTIONS

       -h, --help
              Shows this help message and exits.

       --DEBUG
              Sets logging in debug mode.

NGSFILTER ADDED SEQUENCE ATTRIBUTES

            · avg_quality

            · complemented

            · cut

            · direction

            · error

            · experiment

            · forward_match

            · forward_primer

            · forward_score

            · forward_tag

            · head_quality

            · mid_quality

            · partial

            · reverse_match

            · reverse_primer

            · reverse_score

            · reverse_tag

            · sample

            · seq_length

            · seq_length_ori

            · status

            · tail_quality

AUTHOR

       The OBITools Development Team - LECA

COPYRIGHT

       2019 - 2015, OBITool Development Team

 1.02 12                                   Jan 28, 2019                              NGSFILTER(1)