Ubuntu Manpage: pychopper - package documentation

Provided by: python3-pychopper_2.5.0-1_all

NAME

       pychopper - package documentation

COMMAND LINE TOOLS

   Command line tools
   cdna_classifier
       Tool to identify, orient and rescue full-length cDNA reads.

          usage: cdna_classifier [-h] [-b primers] [-g phmm_file] [-c config_file]
                                 [-k kit] [-q cutoff] [-Q min_qual] [-z min_len]
                                 [-r report_pdf] [-u unclass_output]
                                 [-l len_fail_output] [-w rescue_output]
                                 [-S stats_output] [-K qc_fail_output] [-Y autotune_nr]
                                 [-L autotune_samples] [-A scores_output] [-m method]
                                 [-x rescue] [-p] [-t threads] [-B batch_size]
                                 [-D read stats]
                                 input_fastx output_fastx

   Positional Arguments
       Input file.

       Output file.

   Named Arguments
       b      Primers fasta.

       g      File with custom profile HMMs (None).

       c      File to specify primer configurations for each direction (None).

       k      Use primer sequences from this kit (PCS109).

              Default: "PCS109"

       q      Cutoff parameter (autotuned).

       Q      Minimum mean base quality (7.0).

              Default: 7.0

       z      Minimum segment length (50).

              Default: 50

       r      Report PDF (cdna_classifier_report.pdf).

              Default: "cdna_classifier_report.pdf"

       u      Write unclassified reads to this file.

       l      Write fragments failing the length filter in this file.

       w      Write rescued reads to this file.

       S      Write statistics to this file.

              Default: "cdna_classifier_report.tsv"

       K      Write reads failing mean quality filter to this file.

       Y      Approximate number of reads used for tuning the cutoff parameter (10000).

              Default: 10000

       L      Number of samples taken when tuning cutoff parameter (30).

              Default: 30

       A      Write alignment scores to this BED file.

       m      Detection method: phmm or edlib (phmm).

              Default: "phmm"

       x      Protocol-specific read rescue: DCS109 (None).

       p      Keep primers, but trim the rest.

              Default: False

       t      Number of threads to use (8).

              Default: 8

       B      Maximum number of reads processed in each batch (1000000).

              Default: 1000000

       D      Tab separated file with per-read stats (None).

FULL API REFERENCE

   pychopper
   pychopper package
   Subpackages
   pychopper.phmm_data package
   Module contents
   pychopper.primer_data package
   Module contents
   pychopper.tests package
   Submodules
   pychopper.tests.test_detector module
       class pychopper.tests.test_detector.TestDetector(methodName='runTest')
              Bases: unittest.case.TestCase

              Create  an  instance  of  the  class  that  will use the named test method when executed. Raises a
              ValueError if the instance does not have a method with the specified name.

              testPairAlign()

              testScoreCutoff()

   pychopper.tests.test_regression_simple module
       class pychopper.tests.test_regression_simple.TestIntegration(methodName='runTest')
              Bases: unittest.case.TestCase

              Create an instance of the class that will use the  named  test  method  when  executed.  Raises  a
              ValueError if the instance does not have a method with the specified name.

              testIntegration()
                     Integration test.

   Module contents
   Submodules
   pychopper.alignment_hits module
       pychopper.alignment_hits.process_hits(hits, max_score)
              Process alignment hits by removing overlaps

   pychopper.chopper module
       pychopper.chopper.analyse_hits(hits, config)
              Segment  reads  based  on alignment hits using dynamic programming.  The algorithm is based on the
              rule that each primer alignment hit can be used only once.  Hence if a segment  is  included,  the
              next one has to be excluded.

       pychopper.chopper.chopper_edlib(reads, primers, config, max_ed, cutoff, pool, min_batch)
              Segment using the edlib/parasail backend

       pychopper.chopper.chopper_phmm(reads, phmm_file, config, cutoff, threads, pool, min_batch)
              Segment using the profile HMM backend

       pychopper.chopper.segments_to_reads(read, segments, keep_primers)
              Convert segments to output reads with annotation

   pychopper.common_structures module
       class pychopper.common_structures.Hit(Ref, RefStart, RefEnd, Query, QueryStart, QueryEnd, Score)
              Bases: tuple

              Create new instance of Hit(Ref, RefStart, RefEnd, Query, QueryStart, QueryEnd, Score)

              Query  Alias for field number 3

              QueryEnd
                     Alias for field number 5

              QueryStart
                     Alias for field number 4

              Ref    Alias for field number 0

              RefEnd Alias for field number 2

              RefStart
                     Alias for field number 1

              Score  Alias for field number 6

       class pychopper.common_structures.Segment(Left, Start, End, Right, Strand, Len)
              Bases: tuple

              Create new instance of Segment(Left, Start, End, Right, Strand, Len)

              End    Alias for field number 2

              Left   Alias for field number 0

              Len    Alias for field number 5

              Right  Alias for field number 3

              Start  Alias for field number 1

              Strand Alias for field number 4

       class pychopper.common_structures.Seq(Id, Name, Seq, Qual)
              Bases: tuple

              Create new instance of Seq(Id, Name, Seq, Qual)

              Id     Alias for field number 0

              Name   Alias for field number 1

              Qual   Alias for field number 3

              Seq    Alias for field number 2

   pychopper.edlib_backend module
       pychopper.edlib_backend.find_locations(reads, all_primers, max_ed, pool, min_batch)
              Find alignment hits of all primers in all reads using the edlib/parasail backend

   pychopper.hmmer_backend module
       pychopper.hmmer_backend.find_locations(reads, phmm_file, E, pool, min_batch)
              Find alignment hits of all primers in all reads using the pHMM/nhmmscan backend

   pychopper.parasail_backend module
       pychopper.parasail_backend.first_cigar(cigar)
              Extract details of the first operation in a cigar string.

       pychopper.parasail_backend.pair_align(reference, query, query_name, subs_mat, params)
              Perform pairwise local alignment using parsail-python

       pychopper.parasail_backend.process_alignment(aln, query, query_name, aln_params)
              Process an alignment, extracting score, start and end.

       pychopper.parasail_backend.refine_locations(read, all_primers, locations, aln_params={'gap_extend': 1,
       'gap_open': 1, 'match': 1, 'mismatch': -2}, subs_mat=<parasail.bindings_v2.Matrix object>)
              Refine alignment edges based on local alignment

   pychopper.report module
       class pychopper.report.Report(pdf)
              Bases: object

              Class  for  plotting  utilities  on  the  top of matplotlib. Plots are saved in the specified file
              through the PDF backend.

              Parameters

                     • self -- object.

                     • pdf -- Output pdf.

              Returns
                     The report object.

              Return type
                     Report

              close()
                     Close PDF backend. Do not forget to call this at the end of your script or your output will
                     be damaged!

                     Parameters
                            self -- object

                     Returns
                            None

                     Return type
                            object

              plot_arrays(data_map, title='', xlab='', ylab='', marker='.', legend_loc='best', legend=True,
              vlines=None, vlcolor='green', vlwitdh=0.5)
                     Plot multiple pairs of data arrays.

                     Parameters

                            • self -- object.

                            • data_map -- A dictionary with labels as keys and tupples of data arrays  (x,y)  as
                              values.

                            • title -- Figure title.

                            • xlab -- X axis label.

                            • ylab -- Y axis label.

                            • marker -- Marker passed to the plot function.

                            • legend_loc -- Location of legend.

                            • legend -- Plot legend if True

                            • vlines -- Dictionary with labels and positions of vertical lines to draw.

                            • vlcolor -- Color of vertical lines drawn.

                            • vlwidth -- Width of vertical lines drawn.

                     Returns
                            None

                     Return type
                            object

              plot_bars_simple(data_map, title='', xlab='', ylab='', alpha=0.6, xticks_rotation=0,
              auto_limit=False)
                     Plot simple bar chart from input dictionary.

                     Parameters

                            • self -- object.

                            • data_map -- A dictionary with labels as keys and data as values.

                            • title -- Figure title.

                            • xlab -- X axis label.

                            • ylab -- Y axis label.

                            • alpha -- Alpha value.

                            • xticks_rotation -- Rotation value for x tick labels.

                            • auto_limit -- Set y axis limits automatically.

                     Returns
                            None

                     Return type
                            object

              plot_histograms(data_map, title='', xlab='', ylab='', bins=50, alpha=0.7, legend_loc='best',
              legend=True, vlines=None)
                     Plot histograms of multiple data arrays.

                     Parameters

                            • self -- object.

                            • data_map -- A dictionary with labels as keys and data arrays as values.

                            • title -- Figure title.

                            • xlab -- X axis label.

                            • ylab -- Y axis label.

                            • bins -- Number of bins.

                            • alpha -- Transparency value for histograms.

                            • legend_loc -- Location of legend.

                            • legend -- Plot legend if True.

                            • vlines -- Dictionary with labels and positions of vertical lines to draw.

                     Returns
                            None

                     Return type
                            object

              save_close()
                     Utility method to save and close figure.

   pychopper.seq_utils module
       pychopper.seq_utils.base_complement(k)
              Return complement of base.

              Performs  the  subsitutions: A<=>T, C<=>G, X=>X for both upper and lower case. The return value is
              identical to the argument for all other values.

              Parameters
                     k -- A base.

              Returns
                     Complement of base.

              Return type
                     str

       pychopper.seq_utils.errs_tab(n)
              Generate list of error rates for qualities less than equal than n.

       pychopper.seq_utils.get_primers(primers)
              Load primers from fasta file

       pychopper.seq_utils.get_runid(desc)
              Parse out runid from sequence description.

       pychopper.seq_utils.mean_qual(quals, qround=False, tab=[1.0, 0.7943282347242815, 0.6309573444801932,
       0.5011872336272722, 0.3981071705534972, 0.31622776601683794, 0.251188643150958, 0.19952623149688797,
       0.15848931924611134, 0.12589254117941673, 0.1, 0.07943282347242814, 0.06309573444801933,
       0.05011872336272722, 0.039810717055349734, 0.03162277660168379, 0.025118864315095794, 0.0199526231496888,
       0.015848931924611134, 0.012589254117941675, 0.01, 0.007943282347242814, 0.00630957344480193,
       0.005011872336272725, 0.003981071705534973, 0.0031622776601683794, 0.0025118864315095794,
       0.001995262314968879, 0.001584893192461114, 0.0012589254117941675, 0.001, 0.0007943282347242813,
       0.000630957344480193, 0.0005011872336272725, 0.00039810717055349735, 0.00031622776601683794,
       0.00025118864315095795, 0.00019952623149688788, 0.00015848931924611142, 0.00012589254117941674, 0.0001,
       7.943282347242822e-05, 6.309573444801929e-05, 5.011872336272725e-05, 3.9810717055349695e-05,
       3.1622776601683795e-05, 2.5118864315095822e-05, 1.9952623149688786e-05, 1.584893192461114e-05,
       1.2589254117941661e-05, 1e-05, 7.943282347242822e-06, 6.30957344480193e-06, 5.011872336272725e-06,
       3.981071705534969e-06, 3.162277660168379e-06, 2.5118864315095823e-06, 1.9952623149688787e-06,
       1.584893192461114e-06, 1.2589254117941661e-06, 1e-06, 7.943282347242822e-07, 6.30957344480193e-07,
       5.011872336272725e-07, 3.981071705534969e-07, 3.162277660168379e-07, 2.5118864315095823e-07,
       1.9952623149688787e-07, 1.584893192461114e-07, 1.2589254117941662e-07, 1e-07, 7.943282347242822e-08,
       6.30957344480193e-08, 5.011872336272725e-08, 3.981071705534969e-08, 3.162277660168379e-08,
       2.511886431509582e-08, 1.9952623149688786e-08, 1.5848931924611143e-08, 1.2589254117941661e-08, 1e-08,
       7.943282347242822e-09, 6.309573444801943e-09, 5.011872336272715e-09, 3.981071705534969e-09,
       3.1622776601683795e-09, 2.511886431509582e-09, 1.9952623149688828e-09, 1.584893192461111e-09,
       1.2589254117941663e-09, 1e-09, 7.943282347242822e-10, 6.309573444801942e-10, 5.011872336272714e-10,
       3.9810717055349694e-10, 3.1622776601683795e-10, 2.511886431509582e-10, 1.9952623149688828e-10,
       1.584893192461111e-10, 1.2589254117941662e-10, 1e-10, 7.943282347242822e-11, 6.309573444801942e-11,
       5.011872336272715e-11, 3.9810717055349695e-11, 3.1622776601683794e-11, 2.5118864315095823e-11,
       1.9952623149688828e-11, 1.5848931924611107e-11, 1.2589254117941662e-11, 1e-11, 7.943282347242821e-12,
       6.309573444801943e-12, 5.011872336272715e-12, 3.9810717055349695e-12, 3.1622776601683794e-12,
       2.5118864315095823e-12, 1.9952623149688827e-12, 1.584893192461111e-12, 1.258925411794166e-12, 1e-12,
       7.943282347242822e-13, 6.309573444801942e-13, 5.011872336272715e-13, 3.981071705534969e-13,
       3.162277660168379e-13, 2.511886431509582e-13, 1.9952623149688827e-13, 1.584893192461111e-13])
              Calculate average basecall quality of a read.  Receive the ascii quality  scores  of  a  read  and
              return  the  average  quality for that read First convert Phred scores to probabilities, calculate
              average error probability convert average back to Phred scale

       pychopper.seq_utils.random(size=None)
              Return random floats in the half-open  interval  [0.0,  1.0).  Alias  for  random_sample  to  ease
              forward-porting to the new random API.

       pychopper.seq_utils.readfq(fp, sample=None, min_qual=None, rfq_sup={})
              Below  function taken from https://github.com/lh3/readfq/blob/master/readfq.py Much faster parsing
              of large files compared to Biopyhton.

       pychopper.seq_utils.record_size(read, in_format='fastq')
              Calculate record size.

       pychopper.seq_utils.revcomp_seq(seq)
              Reverse complement sequence record

       pychopper.seq_utils.reverse_complement(seq)
              Return reverse complement of a string (base) sequence.

              Parameters
                     seq -- Input sequence.

              Returns
                     Reverse complement of input sequence.

              Return type
                     str

       pychopper.seq_utils.writefq(r, fh)
              Write read to fastq file

   pychopper.utils module
       pychopper.utils.batch(iterable, size)

       pychopper.utils.check_command(cmd)

       pychopper.utils.check_min_hmmer_version(major, minor)

       pychopper.utils.count_fastq_records(fname, size=128000000)

       pychopper.utils.hit2bed(hit, read)

       pychopper.utils.parse_config_string(s)

   Module contents
       • genindex

       • modindex

       • search

AUTHOR

       ONT Applications Group

COPYRIGHT

       2020, Oxford Nanopore Technologies Ltd.

2.5.0                                             Oct 26, 2020                                      PYCHOPPER(1)