Provided by: rsem_1.2.31+dfsg-1_amd64 bug

NAME

       rsem-generate-ngvector

PURPOSE

       Create Ng vector for EBSeq based only on transcript sequences.

SYNOPSIS

       rsem-generate-ngvector [options] input_fasta_file output_name

ARGUMENTS

       input_fasta_file
           The fasta file containing all reference transcripts. The transcripts must be in the
           same order as those in expression value files. Thus, 'reference_name.transcripts.fa'
           generated by 'rsem-prepare-reference' should be used.

       output_name
           The name of all output files. The Ng vector will be stored as 'output_name.ngvec'.

OPTIONS

       -k <int>
           k mer length. See description section. (Default: 25)

       -h/--help
           Show help information.

DESCRIPTION

       This program generates the Ng vector required by EBSeq for isoform level differential
       expression analysis based on reference sequences only. EBSeq can take variance due to read
       mapping ambiguity into consideration by grouping isoforms with parent gene's number of
       isoforms. However, for de novo assembled transcriptome, it is hard to obtain an accurate
       gene-isoform relationship. Instead, this program groups isoforms by using measures on read
       mappaing ambiguity directly. First, it calculates the 'unmappability' of each transcript.
       The 'unmappability' of a transcript is the ratio between the number of k mers with at
       least one perfect match to other transcripts and the total number of k mers of this
       transcript, where k is a parameter. Then, Ng vector is generated by applying Kmeans
       algorithm to the 'unmappability' values with number of clusters set as 3.
       'rsem-generate-ngvector' will make sure the mean 'unmappability' scores for clusters are
       in ascending order. All transcripts whose lengths are less than k are assigned to cluster
       3.

       If your reference is a de novo assembled transcript set, you should run
       'rsem-generate-ngvector' first. Then load the resulting 'output_name.ngvec' into R. For
       example, you can use

        NgVec <- scan(file="output_name.ngvec", what=0, sep="\n")

       . After that, replace 'IsoNgTrun' with 'NgVec' in the second line of section 3.2.5 (Page
       10) of EBSeq's vignette:

        IsoEBres=EBTest(Data=IsoMat, NgVector=NgVec, ...)

       This program only needs to run once per RSEM reference.

OUTPUT

       output_name.ump
           'unmappability' scores for each transcript. This file contains two columns. The first
           column is transcript name and the second column is 'unmappability' score.

       output_name.ngvec
           Ng vector generated by this program.

EXAMPLES

       Suppose the reference sequences file is '/ref/mouse_125/mouse_125.transcripts.fa' and we
       set the output_name as 'mouse_125':

        rsem-generate-ngvector /ref/mouse_125/mouse_125.transcripts.fa mouse_125