Ubuntu Manpage: sniffles - structural variation caller using third-generation sequencing

NAME

       sniffles - structural variation caller using third-generation sequencing

DESCRIPTION

       usage:  sniffles  --input SORTED_INPUT.bam [--vcf OUTPUT.vcf] [--snf MERGEABLE_OUTPUT.snf]
       [--threads 4] [--non-germline]

       Sniffles2: A fast structural variant (SV) caller for long-read sequencing data

              Version 2.0.2 Contact: moritz.g.smolka@gmail.com

              Usage example A - Call SVs for a single sample:

              sniffles --input sorted_indexed_alignments.bam --vcf output.vcf

              ... OR, with CRAM input and bgzipped+tabix indexed VCF output:

              sniffles --input sample.cram --vcf output.vcf.gz

              ... OR, producing only a  SNF  file  with  SV  candidates  for  later  multi-sample
              calling:

              sniffles --input sample1.bam --snf sample1.snf

              ...  OR,  simultaneously  producing  a  single-sample  VCF  and  SNF file for later
              multi-sample calling:

              sniffles --input sample1.bam --vcf sample1.vcf.gz --snf sample1.snf

              ... OR, with additional options to specify tandem repeat annotations (for  improved
              call  accuracy),  reference (for DEL sequences) and non-germline mode for detecting
              rare SVs:

              sniffles    --input    sample1.bam    --vcf     sample1.vcf.gz     --tandem-repeats
              tandem_repeats.bed --reference genome.fa --non-germline

              Usage example B - Multi-sample calling:

              Step 1. Create .snf for each sample: sniffles --input sample1.bam --snf sample1.snf
              Step 2. Combined calling: sniffles --input sample1.snf sample2.snf ...  sampleN.snf
              --vcf multisample.vcf

              ... OR, using a .tsv file containing a list of .snf files, and custom sample ids in
              an optional second column (one sample per line): Step 2. Combined calling: sniffles
              --input snf_files_list.tsv --vcf multisample.vcf

              Usage example C - Determine genotypes for a set of known SVs (force calling):

              sniffles    --input    sample.bam    --genotype-vcf    input_known_svs.vcf    --vcf
              output_genotypes.vcf

              Use --help for full parameter/usage information

   optional arguments:
       -h, --help
              show this help message and exit

       --version
              show program's version number and exit

   Common parameters:
       -i IN [IN ...], --input IN [IN ...]
              For single-sample calling: A coordinate-sorted  and  indexed  .bam/.cram  (BAM/CRAM
              format)  file  containing  aligned reads. - OR - For multi-sample calling: Multiple
              .snf files (generated before by  running  Sniffles2  for  individual  samples  with
              --snf) (default: None)

       -v OUT.vcf, --vcf OUT.vcf
              VCF  output  filename to write the called and refined SVs to. If the given filename
              ends with .gz, the VCF file will be automatically bgzipped and a .tbi  index  built
              for it. (default: None)

       --snf OUT.snf
              Sniffles2  file  (.snf)  output filename to store candidates for later multi-sample
              calling (default: None)

       --reference reference.fasta
              (Optional) Reference sequence the reads were aligned against. To enable  output  of
              deletion SV sequences, this parameter must be set. (default: None)

       --tandem-repeats IN.bed
              (Optional)  Input  .bed file containing tandem repeat annotations for the reference
              genome. (default: None)

       --non-germline
              Call non-germline SVs (rare, somatic or mosaic SVs) (default: False)

       --phase
              Determine phase for SV calls (requires the input alignments to be phased) (default:
              False)

       -t N, --threads N
              Number of parallel threads to use (speed-up for multi-core CPUs) (default: 4)

   SV Filtering parameters:
       --minsupport auto
              Minimum  number of supporting reads for a SV to be reported (default: automatically
              choose based on coverage) (default: auto)

       --minsupport-auto-mult 0.1/0.025
              Coverage based minimum support multiplier for germline/non-germline modes (only for
              auto minsupport) (default: None)

       --minsvlen N
              Minimum SV length (in bp) (default: 35)

       --minsvlen-screen-ratio N
              Minimum length for SV candidates (as fraction of --minsvlen) (default: 0.95)

       --mapq N
              Alignments with mapping quality lower than this value will be ignored (default: 25)

       --no-qc
              Output all SV candidates, disregarding quality control steps. (default: False)

       --qc-stdev True
              Apply  filtering based on SV start position and length standard deviation (default:
              True)

       --qc-stdev-abs-max N
              Maximum standard deviation for SV length and size (in bp) (default: 500)

       --qc-strand False
              Apply filtering based on strand support of SV calls (default: False)

       --qc-coverage N
              Minimum surrounding region coverage of SV calls (default: 1)

       --long-ins-length 2500
              Insertion SVs longer than this value are considered as hard to detect based on  the
              aligner and read length and subjected to more sensitive filtering. (default: 2500)

       --long-del-length 50000
              Deletion  SVs  longer  than this value are subjected to central coverage drop-based
              filtering (Not applicable for --non-germline) (default: 50000)

       --long-del-coverage 0.66
              Long deletions with central coverage (in relation to upstream/downstream  coverage)
              higher  than  this  value  will  be  filtered  (Not  applicable for --non-germline)
              (default: 0.66)

       --long-dup-length 50000
              Duplication  SVs  longer  than  this  value  are  subjected  to  central   coverage
              increase-based filtering (Not applicable for --non-germline) (default: 50000)

       --long-dup-coverage 1.33
              Long  duplications  with  central  coverage  (in  relation  to  upstream/downstream
              coverage)  lower  than  this  value  will   be   filtered   (Not   applicable   for
              --non-germline) (default: 1.33)

       --max-splits-kb N
              Additional  number  of  splits  per kilobase read sequence allowed before reads are
              ignored (default: 0.1)

       --max-splits-base N
              Base  number  of  splits  allowed  before  reads  are  ignored  (in   addition   to
              --max-splits-kb) (default: 3)

       --min-alignment-length N
              Reads  with  alignments  shorter than this length (in bp) will be ignored (default:
              1000)

       --phase-conflict-threshold F
              Maximum fraction of conflicting reads permitted for  SV  phase  information  to  be
              labelled as PASS (only for --phase) (default: 0.1)

       --detect-large-ins True
              Infer  insertions  that are longer than most reads and therefore are spanned by few
              alignments only. (default: True)

   SV Clustering parameters:
       --cluster-binsize N
              Initial screening bin size in bp (default: 100)

       --cluster-r R
              Multiplier for SV start position standard deviation criterion  in  cluster  merging
              (default: 2.5)

       --cluster-repeat-h H
              Multiplier for mean SV length criterion for tandem repeat cluster merging (default:
              1.5)

       --cluster-repeat-h-max N
              Max. merging distance based on  SV  length  criterion  for  tandem  repeat  cluster
              merging (default: 1000)

       --cluster-merge-pos N
              Max.  merging distance for insertions and deletions on the same read and cluster in
              non-repeat regions (default: 150)

       --cluster-merge-len F
              Max. size difference for merging SVs as fraction of SV length (default: 0.33)

       --cluster-merge-bnd N
              Max. merging distance for breakend SV candidates. (default: 1500)

   SV Genotyping parameters:
       --genotype-ploidy N
              Sample ploidy (currently fixed at value 2) (default: 2)

       --genotype-error N
              Estimated false positve rate for leads (relating to total coverage) (default: 0.05)

       --sample-id SAMPLE_ID
              Custom ID for this sample, used for later multi-sample  calling  (stored  in  .snf)
              (default: None)

       --genotype-vcf IN.vcf
              Determine  the genotypes for all SVs in the given input .vcf file (forced calling).
              Re-genotyped .vcf will  be  written  to  the  output  file  specified  with  --vcf.
              (default: None)

   Multi-Sample Calling / Combine parameters:
       --combine-high-confidence F
              Minimum  fraction of samples in which a SV needs to have individually passed QC for
              it to be reported in combined output (a value of zero will report all SVs that pass
              QC in at least one of the input samples) (default: 0.0)

       --combine-low-confidence F
              Minimum fraction of samples in which a SV needs to be present (failed QC) for it to
              be reported in combined output (default: 0.2)

       --combine-low-confidence-abs N
              Minimum absolute number of samples in which a SV needs to be  present  (failed  QC)
              for it to be reported in combined output (default: 3)

       --combine-null-min-coverage N
              Minimum coverage for a sample genotype to be reported as 0/0 (sample genotypes with
              coverage below this threshold at the SV location will be output as  ./.)  (default:
              5)

       --combine-match N
              Maximum  deviation  of  multiple  SV's  start/end  position for them to be combined
              across samples. Given by max_dev=M*sqrt(min(SV_length_a,SV_length_b)), where  M  is
              this parameter. (default: 500)

       --combine-consensus
              Output the consensus genotype of all samples (default: False)

       --combine-separate-intra
              Disable combination of SVs within the same sample (default: False)

       --combine-output-filtered
              Include  low-confidence  /  putative  non-germline  SVs  in multi-calling (default:
              False)

   SV Postprocessing, QC and output parameters:
       --output-rnames
              Output names of all supporting reads for each SV in the RNAMEs info field (default:
              False)

       --no-consensus
              Disable   consensus  sequence  generation  for  insertion  SV  calls  (may  improve
              performance) (default: False)

       --no-sort
              Do not sort output VCF by genomic coordinates (may  slightly  improve  performance)
              (default: False)

       --no-progress
              Disable progress display (default: False)

       --quiet
              Disable all logging, except errors (default: False)

       --max-del-seq-len N
              Maximum  deletion sequence length to be output. Deletion SVs longer than this value
              will be written to the output as symbolic SVs. (default: 50000)

       --symbolic
              Output all  SVs  as  symbolic,  including  insertions  and  deletions,  instead  of
              reporting nucleotide sequences.  (default: False)

              Usage example A - Call SVs for a single sample:

              sniffles --input sorted_indexed_alignments.bam --vcf output.vcf

              ... OR, with CRAM input and bgzipped+tabix indexed VCF output:

              sniffles --input sample.cram --vcf output.vcf.gz

              ...  OR,  producing  only  a  SNF  file  with  SV candidates for later multi-sample
              calling:

              sniffles --input sample1.bam --snf sample1.snf

              ... OR, simultaneously producing  a  single-sample  VCF  and  SNF  file  for  later
              multi-sample calling:

              sniffles --input sample1.bam --vcf sample1.vcf.gz --snf sample1.snf

              ...  OR, with additional options to specify tandem repeat annotations (for improved
              call accuracy), reference (for DEL sequences) and non-germline mode  for  detecting
              rare SVs:

              sniffles     --input     sample1.bam    --vcf    sample1.vcf.gz    --tandem-repeats
              tandem_repeats.bed --reference genome.fa --non-germline

              Usage example B - Multi-sample calling:

              Step 1. Create .snf for each sample: sniffles --input sample1.bam --snf sample1.snf
              Step  2. Combined calling: sniffles --input sample1.snf sample2.snf ... sampleN.snf
              --vcf multisample.vcf

              ... OR, using a .tsv file containing a list of .snf files, and custom sample ids in
              an optional second column (one sample per line): Step 2. Combined calling: sniffles
              --input snf_files_list.tsv --vcf multisample.vcf

              Usage example C - Determine genotypes for a set of known SVs (force calling):

              sniffles    --input    sample.bam    --genotype-vcf    input_known_svs.vcf    --vcf
              output_genotypes.vcf

AUTHOR

        This manpage was written by Andreas Tille for the Debian distribution and
        can be used for any other usage of the program.