Provided by: percona-toolkit_2.2.16-1_all bug

NAME

       pt-query-digest - Analyze MySQL queries from logs, processlist, and tcpdump.

SYNOPSIS

       Usage: pt-query-digest [OPTIONS] [FILES] [DSN]

       pt-query-digest analyzes MySQL queries from slow, general, and binary log files.  It can also analyze
       queries from "SHOW PROCESSLIST" and MySQL protocol data from tcpdump.  By default, queries are grouped by
       fingerprint and reported in descending order of query time (i.e. the slowest queries first).  If no
       "FILES" are given, the tool reads "STDIN".  The optional "DSN" is used for certain options like "--since"
       and "--until".

       Report the slowest queries from "slow.log":

          pt-query-digest slow.log

       Report the slowest queries from the processlist on host1:

          pt-query-digest --processlist h=host1

       Capture MySQL protocol data with tcppdump, then report the slowest queries:

          tcpdump -s 65535 -x -nn -q -tttt -i any -c 1000 port 3306 > mysql.tcp.txt

          pt-query-digest --type tcpdump mysql.tcp.txt

       Save query data from "slow.log" to host2 for later review and trend analysis:

          pt-query-digest --review h=host2 --no-report slow.log

RISKS

       Percona Toolkit is mature, proven in the real world, and well tested, but all database tools can pose a
       risk to the system and the database server.  Before using this tool, please:

       •   Read the tool's documentation

       •   Review the tool's known "BUGS"

       •   Test the tool on a non-production server

       •   Backup your production server and verify the backups

DESCRIPTION

       pt-query-digest  is  a  sophisticated  but  easy to use tool for analyzing MySQL queries.  It can analyze
       queries from MySQL slow, general, and binary logs. (Binary logs must first  be  converted  to  text,  see
       "--type").   It  can  also  use "SHOW PROCESSLIST" and MySQL protocol data from tcpdump.  By default, the
       tool reports which queries are the slowest, and therefore the most important to optimize.   More  complex
       and  custom-tailored  reports  can  be  created  by  using  options  like  "--group-by",  "--filter", and
       "--embedded-attributes".

       Query analysis is a best-practice that should be done frequently.  To make this  easier,  pt-query-digest
       has  two features: query review ("--review") and query history ("--history").  When the "--review" option
       is used, all unique queries are saved to a database.  When the tool is ran again with "--review", queries
       marked as reviewed in the database are not printed in the report.  This highlights new queries that  need
       to  be  reviewed.   When  the "--history" option is used, query metrics (query time, lock time, etc.) for
       each unique query are saved to database.  Each time the tool is ran with "--history", the more historical
       data is saved which can be used to trend and analyze query performance over time.

ATTRIBUTES

       pt-query-digest works on events, which are a collection of key-value  pairs  called  attributes.   You'll
       recognize  most of the attributes right away: "Query_time", "Lock_time", and so on.  You can just look at
       a slow log and see them.  However, there are some that don't exist in the slow log,  and  slow  logs  may
       actually  include  different  kinds  of  attributes  (for example, you may have a server with the Percona
       patches).

       See "ATTRIBUTES REFERENCE" near the end of this documentation for a list of common and "--type"  specific
       attributes.    A   familiarity   with   these  attributes  is  necessary  for  working  with  "--filter",
       "--ignore-attributes", and other attribute-related options.

       With creative use of "--filter", you can create new attributes derived  from  existing  attributes.   For
       example,  to  create  an  attribute  called  "Row_ratio"  for  examining  the  ratio  of  "Rows_sent"  to
       "Rows_examined", specify a filter like:

         --filter '($event->{Row_ratio} = $event->{Rows_sent} / ($event->{Rows_examined})) && 1'

       The "&& 1" trick is needed to create a valid one-line syntax that is always true, even if the  assignment
       happens to evaluate false.  The new attribute will automatically appears in the output:

         # Row ratio        1.00    0.00      1    0.50      1    0.71    0.50

       Attributes created this way can be specified for "--order-by" or any option that requires an attribute.

OUTPUT

       The  default  "--output"  is  a query analysis report.  The "--[no]report" option controls whether or not
       this report is printed.  Sometimes you may want to parse all the queries but  suppress  the  report,  for
       example when using "--review" or "--history".

       There  is  one  paragraph for each class of query analyzed.  A "class" of queries all have the same value
       for the "--group-by" attribute which is "fingerprint" by default.  (See "ATTRIBUTES".)  A fingerprint  is
       an  abstracted  version of the query text with literals removed, whitespace collapsed, and so forth.  The
       report is formatted so it's easy to paste into emails without wrapping, and  all  non-query  lines  begin
       with  a  comment, so you can save it to a .sql file and open it in your favorite syntax-highlighting text
       editor.  There is a response-time profile at the beginning.

       The output described here is controlled by "--report-format".  That option allows you to specify what  to
       print and in what order.  The default output in the default order is described here.

       The  report,  by  default,  begins with a paragraph about the entire analysis run The information is very
       similar to what you'll see for each class of queries in the log, but it  doesn't  have  some  information
       that  would  be  too  expensive to keep globally for the analysis.  It also has some statistics about the
       code's execution itself, such as the CPU and memory usage, the local date and time of the run, and a list
       of input file read/parsed.

       Following this is the response-time profile over the events.  This is a highly  summarized  view  of  the
       unique events in the detailed query report that follows.  It contains the following columns:

        Column        Meaning
        ============  ==========================================================
        Rank          The query's rank within the entire set of queries analyzed
        Query ID      The query's fingerprint
        Response time The total response time, and percentage of overall total
        Calls         The number of times this query was executed
        R/Call        The mean response time per execution
        V/M           The Variance-to-mean ratio of response time
        Item          The distilled query

       A  final  line  whose  rank  is  shown as MISC contains aggregate statistics on the queries that were not
       included in the report, due to options such as "--limit" and "--outliers".  For details on the  variance-
       to-mean ratio, please see http://en.wikipedia.org/wiki/Index_of_dispersion.

       Next,  the  detailed  query  report  is  printed.   Each query appears in a paragraph.  Here is a sample,
       slightly reformatted so 'perldoc' will not wrap lines in a terminal.   The  following  will  all  be  one
       paragraph, but we'll break it up for commentary.

        # Query 2: 0.01 QPS, 0.02x conc, ID 0xFDEA8D2993C9CAF3 at byte 160665

       This  line  identifies  the  sequential  number of the query in the sort order specified by "--order-by".
       Then there's the queries per second, and the approximate concurrency for  this  query  (calculated  as  a
       function  of  the  timespan and total Query_time).  Next there's a query ID.  This ID is a hex version of
       the query's checksum in the database, if you're using "--review".  You can select  the  reviewed  query's
       details from the database with a query like "SELECT .... WHERE checksum=0xFDEA8D2993C9CAF3".

       If  you  are  investigating the report and want to print out every sample of a particular query, then the
       following "--filter" may be helpful:

          pt-query-digest slow.log           \
             --no-report                     \
             --output slowlog                \
             --filter '$event->{fingerprint} \
                  && make_checksum($event->{fingerprint}) eq "FDEA8D2993C9CAF3"'

       Notice that you must remove the "0x" prefix from the checksum.

       Finally, in case you want to find a sample of the query in the log file, there's the  byte  offset  where
       you  can  look.   (This  is  not  always accurate, due to some anomalies in the slow log format, but it's
       usually right.)  The position refers to the worst sample, which we'll see more about below.

       Next is the table of metrics about this class of queries.

        #           pct   total    min    max     avg     95%  stddev  median
        # Count       0       2
        # Exec time  13   1105s   552s   554s    553s    554s      2s    553s
        # Lock time   0   216us   99us  117us   108us   117us    12us   108us
        # Rows sent  20   6.26M  3.13M  3.13M   3.13M   3.13M   12.73   3.13M
        # Rows exam   0   6.26M  3.13M  3.13M   3.13M   3.13M   12.73   3.13M

       The first line is column headers for the table.  The percentage is the percent of the total for the whole
       analysis run, and the total is the actual value of the specified metric.  For example, in  this  case  we
       can  see  that  the query executed 2 times, which is 13% of the total number of queries in the file.  The
       min, max and avg columns are self-explanatory.  The 95% column shows the  95th  percentile;  95%  of  the
       values  are  less  than or equal to this value.  The standard deviation shows you how tightly grouped the
       values are.  The standard deviation and median are both calculated from the 95th  percentile,  discarding
       the extremely large values.

       The  stddev,  median  and  95th  percentile statistics are approximate.  Exact statistics require keeping
       every value seen, sorting, and doing some calculations on them.  This uses a lot  of  memory.   To  avoid
       this, we keep 1000 buckets, each of them 5% bigger than the one before, ranging from .000001 up to a very
       big  number.  When we see a value we increment the bucket into which it falls.  Thus we have fixed memory
       per class of queries.  The drawback is the imprecision, which typically falls in the 5 percent range.

       Next we have statistics on the users, databases and time range for the query.

        # Users       1   user1
        # Databases   2     db1(1), db2(1)
        # Time range 2008-11-26 04:55:18 to 2008-11-27 00:15:15

       The users and databases are shown as a count of distinct values, followed by the values.  If there's only
       one, it's shown alone; if there are many, we show each of the most frequent ones, followed by the  number
       of times it appears.

        # Query_time distribution
        #   1us
        #  10us
        # 100us
        #   1ms
        #  10ms  #####
        # 100ms  ####################
        #    1s  ##########
        #  10s+

       The  execution  times  show  a  logarithmic  chart  of  time clustering.  Each query goes into one of the
       "buckets" and is counted up.  The buckets are powers of ten.  The first  bucket  is  all  values  in  the
       "single  microsecond  range"  -- that is, less than 10us.  The second is "tens of microseconds," which is
       from 10us up to (but not including) 100us; and so on.  The charted attribute can be changed by specifying
       "--report-histogram" but is limited to time-based attributes.

        # Tables
        #    SHOW TABLE STATUS LIKE 'table1'\G
        #    SHOW CREATE TABLE `table1`\G
        # EXPLAIN
        SELECT * FROM table1\G

       This section is a convenience: if you're trying to optimize the queries you see  in  the  slow  log,  you
       probably  want  to  examine  the table structure and size.  These are copy-and-paste-ready commands to do
       that.

       Finally, we see a sample of the queries in this class of query.  This is not a random sample.  It is  the
       query that performed the worst, according to the sort order given by "--order-by".  You will normally see
       a commented "# EXPLAIN" line just before it, so you can copy-paste the query to examine its EXPLAIN plan.
       But  for  non-SELECT  queries  that isn't possible to do, so the tool tries to transform the query into a
       roughly equivalent SELECT query, and adds that below.

       If you want to find this sample event in the log, use the offset mentioned above, and something like  the
       following:

         tail -c +<offset> /path/to/file | head

       See also "--report-format".

QUERY REVIEW

       A  query  "--review"  is  the  process  of storing all the query fingerprints analyzed.  This has several
       benefits:

       •   You can add metadata to classes of queries, such as marking  them  for  follow-up,  adding  notes  to
           queries, or marking them with an issue ID for your issue tracking system.

       •   You  can  refer  to  the  stored values on subsequent runs so you'll know whether you've seen a query
           before.  This can help you cut down on duplicated work.

       •   You can store historical data such as the row count, query times, and generally anything you can  see
           in the report.

       To  use this feature, you run pt-query-digest with the "--review" option.  It will store the fingerprints
       and other information into the table you specify.  Next time you run it with the same option, it will  do
       the following:

       •   It  won't  show you queries you've already reviewed.  A query is considered to be already reviewed if
           you've set a value for the "reviewed_by"  column.   (If  you  want  to  see  queries  you've  already
           reviewed, use the "--report-all" option.)

       •   Queries  that  you've  reviewed,  and don't appear in the output, will cause gaps in the query number
           sequence in the first line of each paragraph.  And the value  you've  specified  for  "--limit"  will
           still  be  honored.   So if you've reviewed all queries in the top 10 and you ask for the top 10, you
           won't see anything in the output.

       •   If you want to see the queries you've already reviewed, you can specify "--report-all".  Then  you'll
           see the normal analysis output, but you'll also see the information from the review table, just below
           the execution time graph.  For example,

             # Review information
             #      comments: really bad IN() subquery, fix soon!
             #    first_seen: 2008-12-01 11:48:57
             #   jira_ticket: 1933
             #     last_seen: 2008-12-18 11:49:07
             #      priority: high
             #   reviewed_by: xaprb
             #   reviewed_on: 2008-12-18 15:03:11

           This  metadata is useful because, as you analyze your queries, you get your comments integrated right
           into the report.

FINGERPRINTS

       A query fingerprint is the abstracted form of a query, which makes it possible to group  similar  queries
       together.   Abstracting  a  query removes literal values, normalizes whitespace, and so on.  For example,
       consider these two queries:

         SELECT name, password FROM user WHERE id='12823';
         select name,   password from user
            where id=5;

       Both of those queries will fingerprint to

         select name, password from user where id=?

       Once the query's fingerprint is known, we can then talk about a query as though it represents all similar
       queries.

       What "pt-query-digest" does is analogous to a GROUP BY  statement  in  SQL.   (But  note  that  "multiple
       columns" doesn't define a multi-column grouping; it defines multiple reports!) If your command-line looks
       like this,

         pt-query-digest               \
             --group-by fingerprint    \
             --order-by Query_time:sum \
             --limit 10                \
             slow.log

       The corresponding pseudo-SQL looks like this:

         SELECT WORST(query BY Query_time), SUM(Query_time), ...
         FROM /path/to/slow.log
         GROUP BY FINGERPRINT(query)
         ORDER BY SUM(Query_time) DESC
         LIMIT 10

       You can also use the value "distill", which is a kind of super-fingerprint.  See "--group-by" for more.

       Query fingerprinting accommodates many special cases, which have proven necessary in the real world.  For
       example,  an  "IN" list with 5 literals is really equivalent to one with 4 literals, so lists of literals
       are collapsed to a single one.  If you find something that is not fingerprinted properly, please submit a
       bug report with a reproducible test case.

       Here is a list of transformations during fingerprinting, which might not be exhaustive:

       •   Group all SELECT queries from mysqldump together, even if they are  against  different  tables.   The
           same applies to all queries from pt-table-checksum.

       •   Shorten multi-value INSERT statements to a single VALUES() list.

       •   Strip comments.

       •   Abstract the databases in USE statements, so all USE statements are grouped together.

       •   Replace all literals, such as quoted strings.  For efficiency, the code that replaces literal numbers
           is  somewhat  non-selective,  and  might  replace  some  things  as numbers when they really are not.
           Hexadecimal literals are also  replaced.   NULL  is  treated  as  a  literal.   Numbers  embedded  in
           identifiers  are  also  replaced,  so tables named similarly will be fingerprinted to the same values
           (e.g. users_2009 and users_2010 will fingerprint identically).

       •   Collapse all whitespace into a single space.

       •   Lowercase the entire query.

       •   Replace all literals inside of IN() and VALUES() lists  with  a  single  placeholder,  regardless  of
           cardinality.

       •   Collapse multiple identical UNION queries into a single one.

OPTIONS

       This  tool  accepts additional command-line arguments.  Refer to the "SYNOPSIS" and usage information for
       details.

       --ask-pass
           Prompt for a password when connecting to MySQL.

       --attribute-aliases
           type: array; default: db|Schema

           List of attribute|alias,etc.

           Certain attributes have multiple names, like db and Schema.  If an event does not  have  the  primary
           attribute,  pt-query-digest  looks  for  an  alias  attribute.   If it finds an alias, it creates the
           primary attribute with the alias attribute's value and removes the alias attribute.

           If the event has the primary attribute, all alias attributes are deleted.

           This helps simplify event attributes so that, for example, there will not be report lines for both db
           and Schema.

       --attribute-value-limit
           type: int; default: 4294967296

           A sanity limit for attribute values.

           This option deals with bugs in slow logging functionality that causes large  values  for  attributes.
           If  the  attribute's  value  is bigger than this, the last-seen value for that class of query is used
           instead.

       --charset
           short form: -A; type: string

           Default character set.  If the value is utf8, sets Perl's binmode  on  STDOUT  to  utf8,  passes  the
           mysql_enable_utf8 option to DBD::mysql, and runs SET NAMES UTF8 after connecting to MySQL.  Any other
           value sets binmode on STDOUT without the utf8 layer, and runs SET NAMES after connecting to MySQL.

       --config
           type: Array

           Read  this  comma-separated  list of config files; if specified, this must be the first option on the
           command line.

       --[no]continue-on-error
           default: yes

           Continue parsing even if there is an error.  The tool will not continue forever: it  stops  once  any
           process causes 100 errors, in which case there is probably a bug in the tool or the input is invalid.

       --[no]create-history-table
           default: yes

           Create the "--history" table if it does not exist.

           This  option causes the table specified by "--history" to be created with the default structure shown
           in the documentation for "--history".

       --[no]create-review-table
           default: yes

           Create the "--review" table if it does not exist.

           This option causes the table specified by "--review" to be created with the default  structure  shown
           in the documentation for "--review".

       --daemonize
           Fork to the background and detach from the shell.  POSIX operating systems only.

       --database
           short form: -D; type: string

           Connect to this database.

       --defaults-file
           short form: -F; type: string

           Only read mysql options from the given file.  You must give an absolute pathname.

       --embedded-attributes
           type: array

           Two Perl regex patterns to capture pseudo-attributes embedded in queries.

           Embedded attributes might be special attribute-value pairs that you've hidden in comments.  The first
           regex  should  match  the  entire  set  of attributes (in case there are multiple).  The second regex
           should match and capture attribute-value pairs from the first regex.

           For example, suppose your query looks like the following:

             SELECT * from users -- file: /login.php, line: 493;

           You might run pt-query-digest with the following option:

             pt-query-digest --embedded-attributes ' -- .*','(\w+): ([^\,]+)'

           The first regular expression captures the whole comment:

             " -- file: /login.php, line: 493;"

           The second one splits it into attribute-value pairs and adds them to the event:

              ATTRIBUTE  VALUE
              =========  ==========
              file       /login.php
              line       493

           NOTE: All commas in the regex patterns must be escaped with \ otherwise the pattern will break.

       --expected-range
           type: array; default: 5,10

           Explain items when there are more or fewer than expected.

           Defines the number of items expected to be seen in the report given by "--[no]report", as  controlled
           by  "--limit"  and  "--outliers".   If  there   are  more or fewer items in the report, each one will
           explain why it was included.

       --explain
           type: DSN

           Run EXPLAIN for the sample query with this DSN and print results.

           This works only when "--group-by" includes fingerprint.  It causes pt-query-digest to run EXPLAIN and
           include the output into the report.  For safety, queries that appear to have a subquery that  EXPLAIN
           will execute won't be EXPLAINed.  Those are typically "derived table" queries of the form

             select ... from ( select .... ) der;

           The  EXPLAIN  results are printed as a full vertical format in the event report, which appears at the
           end of each event report in vertical style ("\G") just like MySQL prints it.

       --filter
           type: string

           Discard events for which this Perl code doesn't return true.

           This option is a string of Perl code or a file  containing  Perl  code  that  gets  compiled  into  a
           subroutine  with  one  argument:  $event.  This is a hashref.  If the given value is a readable file,
           then pt-query-digest reads the entire file and uses its contents as the code.  The  file  should  not
           contain a shebang (#!/usr/bin/perl) line.

           If  the code returns true, the chain of callbacks continues; otherwise it ends.  The code is the last
           statement in the subroutine other than "return $event".  The subroutine template is:

             sub { $event = shift; filter && return $event; }

           Filters given on the command line are wrapped  inside  parentheses  like  like  "(  filter  )".   For
           complex,  multi-line  filters,  you  must put the code inside a file so it will not be wrapped inside
           parentheses.  Either way, the filter must produce syntactically valid code given the  template.   For
           example, an if-else branch given on the command line would not be valid:

             --filter 'if () { } else { }'  # WRONG

           Since it's given on the command line, the if-else branch would be wrapped inside parentheses which is
           not syntactically valid.  So to accomplish something more complex like this would require putting the
           code in a file, for example filter.txt:

             my $event_ok; if (...) { $event_ok=1; } else { $event_ok=0; } $event_ok

           Then specify "--filter filter.txt" to read the code from filter.txt.

           If  the  filter  code won't compile, pt-query-digest will die with an error.  If the filter code does
           compile, an error may still occur at runtime if the code tries to do something  wrong  (like  pattern
           match an undefined value).  pt-query-digest does not provide any safeguards so code carefully!

           An example filter that discards everything but SELECT statements:

             --filter '$event->{arg} =~ m/^select/i'

           This is compiled into a subroutine like the following:

             sub { $event = shift; ( $event->{arg} =~ m/^select/i ) && return $event; }

           It is permissible for the code to have side effects (to alter $event).

           See "ATTRIBUTES REFERENCE" for a list of common and "--type" specific attributes.

           Here are more examples of filter code:

           Host/IP matches domain.com
               --filter '($event->{host} || $event->{ip} || "") =~ m/domain.com/'

               Sometimes MySQL logs the host where the IP is expected.  Therefore, we check both.

           User matches john
               --filter '($event->{user} || "") =~ m/john/'

           More than 1 warning
               --filter '($event->{Warning_count} || 0) > 1'

           Query does full table scan or full join
               --filter '(($event->{Full_scan} || "") eq "Yes") || (($event->{Full_join} || "") eq "Yes")'

           Query was not served from query cache
               --filter '($event->{QC_Hit} || "") eq "No"'

           Query is 1 MB or larger
               --filter '$event->{bytes} >= 1_048_576'

           Since  "--filter"  allows  you  to  alter  $event, you can use it to do other things, like create new
           attributes.  See "ATTRIBUTES" for an example.

       --group-by
           type: Array; default: fingerprint

           Which attribute of the events to group by.

           In general, you can group queries into classes based on any attribute of the query, such as "user" or
           "db", which will by default show you which users and which databases get the most "Query_time".   The
           default  attribute, "fingerprint", groups similar, abstracted queries into classes; see below and see
           also "FINGERPRINTS".

           A report is printed  for  each  "--group-by"  value  (unless  "--no-report"  is  given).   Therefore,
           "--group-by  user,db" means "report on queries with the same user and report on queries with the same
           db"; it does not mean "report on queries with the same user and db."  See also "OUTPUT".

           Every value must have a corresponding value in the same position in  "--order-by".   However,  adding
           values to "--group-by" will automatically add values to "--order-by", for your convenience.

           There  are  several  magical  values  that cause some extra data mining to happen before the grouping
           takes place:

           fingerprint
               This causes events to be fingerprinted to abstract queries into a canonical form, which  is  then
               used to group events together into a class.  See "FINGERPRINTS" for more about fingerprinting.

           tables
               This  causes  events  to  be inspected for what appear to be tables, and then aggregated by that.
               Note that a query that contains two or more tables will be counted as many  times  as  there  are
               tables; so a join against two tables will count the Query_time against both tables.

           distill
               This  is  a  sort of super-fingerprint that collapses queries down into a suggestion of what they
               do, such as "INSERT SELECT table1 table2".

       --help
           Show help and exit.

       --history
           type: DSN

           Save metrics for each query class in the given table.  pt-query-digest  saves  query  metrics  (query
           time, lock time, etc.) to this table so you can see how query classes change over time.

           The  default table is "percona_schema.query_history".  Specify database (D) and table (t) DSN options
           to  override  the   default.    The   database   and   table   are   automatically   created   unless
           "--no-create-history-table" is specified (see "--[no]create-history-table").

           pt-query-digest  inspects  the  columns  in  the  table.   The table must have at least the following
           columns:

             CREATE TABLE query_review_history (
               checksum     BIGINT UNSIGNED NOT NULL,
               sample       TEXT NOT NULL
             );

           Any columns not mentioned above are inspected to see if they follow a certain naming convention.  The
           column is special if the name ends with an underscore followed by any of these values:

             pct|avg|cnt|sum|min|max|pct_95|stddev|median|rank

           If the column ends with one of those values, then the prefix is interpreted as the event attribute to
           store in that column, and the suffix is interpreted as the metric  to  be  stored.   For  example,  a
           column named "Query_time_min" will be used to store the minimum "Query_time" for the class of events.

           The  table  should also have a primary key, but that is up to you, depending on how you want to store
           the historical data.  We suggest adding ts_min and ts_max columns and making them part of the primary
           key along with the checksum.  But you could also just add a ts_min column and make it a DATE type, so
           you'd get one row per class of queries per day.

           The following table definition is used for "--[no]create-history-table":

            CREATE TABLE IF NOT EXISTS query_history (
              checksum             BIGINT UNSIGNED NOT NULL,
              sample               TEXT NOT NULL,
              ts_min               DATETIME,
              ts_max               DATETIME,
              ts_cnt               FLOAT,
              Query_time_sum       FLOAT,
              Query_time_min       FLOAT,
              Query_time_max       FLOAT,
              Query_time_pct_95    FLOAT,
              Query_time_stddev    FLOAT,
              Query_time_median    FLOAT,
              Lock_time_sum        FLOAT,
              Lock_time_min        FLOAT,
              Lock_time_max        FLOAT,
              Lock_time_pct_95     FLOAT,
              Lock_time_stddev     FLOAT,
              Lock_time_median     FLOAT,
              Rows_sent_sum        FLOAT,
              Rows_sent_min        FLOAT,
              Rows_sent_max        FLOAT,
              Rows_sent_pct_95     FLOAT,
              Rows_sent_stddev     FLOAT,
              Rows_sent_median     FLOAT,
              Rows_examined_sum    FLOAT,
              Rows_examined_min    FLOAT,
              Rows_examined_max    FLOAT,
              Rows_examined_pct_95 FLOAT,
              Rows_examined_stddev FLOAT,
              Rows_examined_median FLOAT,
              -- Percona extended slowlog attributes
              -- http://www.percona.com/docs/wiki/patches:slow_extended
              Rows_affected_sum             FLOAT,
              Rows_affected_min             FLOAT,
              Rows_affected_max             FLOAT,
              Rows_affected_pct_95          FLOAT,
              Rows_affected_stddev          FLOAT,
              Rows_affected_median          FLOAT,
              Rows_read_sum                 FLOAT,
              Rows_read_min                 FLOAT,
              Rows_read_max                 FLOAT,
              Rows_read_pct_95              FLOAT,
              Rows_read_stddev              FLOAT,
              Rows_read_median              FLOAT,
              Merge_passes_sum              FLOAT,
              Merge_passes_min              FLOAT,
              Merge_passes_max              FLOAT,
              Merge_passes_pct_95           FLOAT,
              Merge_passes_stddev           FLOAT,
              Merge_passes_median           FLOAT,
              InnoDB_IO_r_ops_min           FLOAT,
              InnoDB_IO_r_ops_max           FLOAT,
              InnoDB_IO_r_ops_pct_95        FLOAT,
              InnoDB_IO_r_ops_stddev        FLOAT,
              InnoDB_IO_r_ops_median        FLOAT,
              InnoDB_IO_r_bytes_min         FLOAT,
              InnoDB_IO_r_bytes_max         FLOAT,
              InnoDB_IO_r_bytes_pct_95      FLOAT,
              InnoDB_IO_r_bytes_stddev      FLOAT,
              InnoDB_IO_r_bytes_median      FLOAT,
              InnoDB_IO_r_wait_min          FLOAT,
              InnoDB_IO_r_wait_max          FLOAT,
              InnoDB_IO_r_wait_pct_95       FLOAT,
              InnoDB_IO_r_wait_stddev       FLOAT,
              InnoDB_IO_r_wait_median       FLOAT,
              InnoDB_rec_lock_wait_min      FLOAT,
              InnoDB_rec_lock_wait_max      FLOAT,
              InnoDB_rec_lock_wait_pct_95   FLOAT,
              InnoDB_rec_lock_wait_stddev   FLOAT,
              InnoDB_rec_lock_wait_median   FLOAT,
              InnoDB_queue_wait_min         FLOAT,
              InnoDB_queue_wait_max         FLOAT,
              InnoDB_queue_wait_pct_95      FLOAT,
              InnoDB_queue_wait_stddev      FLOAT,
              InnoDB_queue_wait_median      FLOAT,
              InnoDB_pages_distinct_min     FLOAT,
              InnoDB_pages_distinct_max     FLOAT,
              InnoDB_pages_distinct_pct_95  FLOAT,
              InnoDB_pages_distinct_stddev  FLOAT,
              InnoDB_pages_distinct_median  FLOAT,
              -- Boolean (Yes/No) attributes.  Only the cnt and sum are needed
              -- for these.  cnt is how many times is attribute was recorded,
              -- and sum is how many of those times the value was Yes.  So
              -- sum/cnt * 100 equals the percentage of recorded times that
              -- the value was Yes.
              QC_Hit_cnt          FLOAT,
              QC_Hit_sum          FLOAT,
              Full_scan_cnt       FLOAT,
              Full_scan_sum       FLOAT,
              Full_join_cnt       FLOAT,
              Full_join_sum       FLOAT,
              Tmp_table_cnt       FLOAT,
              Tmp_table_sum       FLOAT,
              Tmp_table_on_disk_cnt FLOAT,
              Tmp_table_on_disk_sum FLOAT,
              Filesort_cnt          FLOAT,
              Filesort_sum          FLOAT,
              Filesort_on_disk_cnt  FLOAT,
              Filesort_on_disk_sum  FLOAT,
              PRIMARY KEY(checksum, ts_min, ts_max)
            );

           Note that we store the count (cnt) for the ts attribute only; it will be redundant to store this  for
           other attributes.

       --host
           short form: -h; type: string

           Connect to host.

       --ignore-attributes
           type:  array; default: arg, cmd, insert_id, ip, port, Thread_id, timestamp, exptime, flags, key, res,
           val, server_id, offset, end_log_pos, Xid

           Do not aggregate these attributes.  Some attributes are not query metrics but metadata which  doesn't
           need to be (or can't be) aggregated.

       --inherit-attributes
           type: array; default: db,ts

           If missing, inherit these attributes from the last event that had them.

           This  option sets which attributes are inherited or carried forward to events which do not have them.
           For example, if one event has the db attribute equal to "foo", but the next event doesn't have the db
           attribute, then it inherits "foo" for its db attribute.

       --interval
           type: float; default: .1

           How frequently to poll the processlist, in seconds.

       --iterations
           type: int; default: 1

           How many times to iterate through the collect-and-report cycle.  If 0,  iterate  to  infinity.   Each
           iteration  runs  for "--run-time" amount of time.  An iteration is usually determined by an amount of
           time and a report is printed when that amount of time elapses.  With "--run-time-mode" "interval", an
           interval is instead determined by the interval time you specify with "--run-time".  See  "--run-time"
           and "--run-time-mode" for more information.

       --limit
           type: Array; default: 95%:20

           Limit output to the given percentage or count.

           If  the  argument  is an integer, report only the top N worst queries.  If the argument is an integer
           followed by the "%" sign, report that percentage of the worst queries.  If the percentage is followed
           by a colon and another integer, report the top percentage or the number specified  by  that  integer,
           whichever comes first.

           The  value  is actually a comma-separated array of values, one for each item in "--group-by".  If you
           don't specify a value for any of those items, the default is the top 95%.

           See also "--outliers".

       --log
           type: string

           Print all output to this file when daemonized.

       --order-by
           type: Array; default: Query_time:sum

           Sort events by this attribute and aggregate function.

           This is a comma-separated list of order-by expressions, one for  each  "--group-by"  attribute.   The
           default  "Query_time:sum"  is  used for "--group-by" attributes without explicitly given "--order-by"
           attributes (that is, if you specify more  "--group-by"  attributes  than  corresponding  "--order-by"
           attributes).   The  syntax  is  "attribute:aggregate".  See "ATTRIBUTES" for valid attributes.  Valid
           aggregates are:

              Aggregate Meaning
              ========= ============================
              sum       Sum/total attribute value
              min       Minimum attribute value
              max       Maximum attribute value
              cnt       Frequency/count of the query

           For example, the default "Query_time:sum" means that queries in the query  analysis  report  will  be
           ordered  (sorted)  by  their  total  query execution time ("Exec time").  "Query_time:max" orders the
           queries by their maximum query execution time, so the query with the single largest "Query_time" will
           be list first.  "cnt" refers more to the frequency of the query as a whole,  how  often  it  appears;
           "Count"  is  its  corresponding line in the query analysis report.  So any attribute and "cnt" should
           yield the same report wherein queries are sorted by the number of times they appear.

           When parsing general logs ("--type" "genlog"), the  default  "--order-by"  becomes  "Query_time:cnt".
           General  logs  do  not  report  query times so only the "cnt" aggregate makes sense because all query
           times are zero.

           If you specify an attribute that doesn't exist in the events, then pt-query-digest falls back to  the
           default  "Query_time:sum"  and  prints  a notice at the beginning of the report for each query class.
           You can create attributes with "--filter" and order by them; see "ATTRIBUTES" for an example.

       --outliers
           type: array; default: Query_time:1:10

           Report outliers by attribute:percentile:count.

           The syntax of this option is a comma-separated list of colon-delimited strings.  The first  field  is
           the  attribute  by  which  an  outlier  is  defined.   The second is a number that is compared to the
           attribute's 95th percentile.  The  third  is  optional,  and  is  compared  to  the  attribute's  cnt
           aggregate.   Queries  that  pass this specification are added to the report, regardless of any limits
           you specified in "--limit".

           For example, to report queries whose 95th percentile Query_time is at least 60 seconds and which  are
           seen at least 5 times, use the following argument:

             --outliers Query_time:60:5

           You can specify an --outliers option for each value in "--group-by".

       --output
           type: string; default: report

           How to format and print the query analysis results.  Accepted values are:

              VALUE      FORMAT
              =======    ==============================
              report     Standard query analysis report
              slowlog    MySQL slow log
              json       JSON, on array per query class
              json-anon  JSON without example queries

           The  entire "report" output can be disabled by specifying "--no-report" (see "--[no]report"), and its
           sections can be disabled or rearranged by specifying "--report-format".

           "json" output was introduced in 2.2.1 and is still in development, so the data structure  may  change
           in future versions.

       --password
           short form: -p; type: string

           Password  to use when connecting.  If password contains commas they must be escaped with a backslash:
           "exam\,ple"

       --pid
           type: string

           Create the given PID file.  The tool won't start if the PID  file  already  exists  and  the  PID  it
           contains  is different than the current PID.  However, if the PID file exists and the PID it contains
           is no longer running, the tool will overwrite the PID file with the current PID.   The  PID  file  is
           removed automatically when the tool exits.

       --port
           short form: -P; type: int

           Port number to use for connection.

       --processlist
           type: DSN

           Poll this DSN's processlist for queries, with "--interval" sleep between.

           If the connection fails, pt-query-digest tries to reopen it once per second.

       --progress
           type: array; default: time,30

           Print  progress  reports  to  STDERR.  The value is a comma-separated list with two parts.  The first
           part can be percentage, time, or iterations; the second part specifies how often an update should  be
           printed, in percentage, seconds, or number of iterations.

       --read-timeout
           type: time; default: 0

           Wait this long for an event from the input; 0 to wait forever.

           This  option  sets  the maximum time to wait for an event from the input.  It applies to all types of
           input except "--processlist".  If an event is not received after the specified time, the script stops
           reading the input and prints its reports.  If "--iterations"  is  0  or  greater  than  1,  the  next
           iteration will begin, else the script will exit.

           This option requires the Perl POSIX module.

       --[no]report
           default: yes

           Print query analysis reports for each "--group-by" attribute.  This is the standard slow log analysis
           functionality.  See "OUTPUT" for the description of what this does and what the results look like.

           If you don't need a report (for example, when using "--review" or "--history"), it is best to specify
           "--no-report" because this allows the tool to skip some expensive operations.

       --report-all
           Report  all  queries,  even  ones that have been reviewed.  This only affects the "report" "--output"
           when using "--review".  Otherwise, all queries are always printed.

       --report-format
           type: Array; default: rusage,date,hostname,files,header,profile,query_report,prepared

           Print these sections of the query analysis report.

             SECTION      PRINTS
             ============ ======================================================
             rusage       CPU times and memory usage reported by ps
             date         Current local date and time
             hostname     Hostname of machine on which pt-query-digest was run
             files        Input files read/parse
             header       Summary of the entire analysis run
             profile      Compact table of queries for an overview of the report
             query_report Detailed information about each unique query
             prepared     Prepared statements

           The sections are printed in the order specified.  The rusage, date, files  and  header  sections  are
           grouped together if specified together; other sections are separated by blank lines.

           See "OUTPUT" for more information on the various parts of the query report.

       --report-histogram
           type: string; default: Query_time

           Chart the distribution of this attribute's values.

           The distribution chart is limited to time-based attributes, so charting "Rows_examined", for example,
           will produce a useless chart.  Charts look like:

             # Query_time distribution
             #   1us
             #  10us
             # 100us
             #   1ms
             #  10ms  ###########################
             # 100ms  ########################################################
             #    1s  ########
             #  10s+

           See "OUTPUT" for more information.

       --resume
           type: string

           If  specified, the tool writes the last file offset, if there is one, to the given filename. When ran
           again with the same value for this option, the tool reads the last file offset from the  file,  seeks
           to that position in the log, and resumes parsing events from that point onward.

       --review
           type: DSN

           Save query classes for later review, and don't report already reviewed classes.

           The  default  table is "percona_schema.query_review".  Specify database (D) and table (t) DSN options
           to  override  the   default.    The   database   and   table   are   automatically   created   unless
           "--no-create-review-table" is specified (see "--[no]create-review-table").

           If  the  table  was  created manually, it must have at least the following columns.  You can add more
           columns for your own special purposes, but they won't be used by pt-query-digest.

             CREATE TABLE IF NOT EXISTS query_review (
                checksum     BIGINT UNSIGNED NOT NULL PRIMARY KEY,
                fingerprint  TEXT NOT NULL,
                sample       TEXT NOT NULL,
                first_seen   DATETIME,
                last_seen    DATETIME,
                reviewed_by  VARCHAR(20),
                reviewed_on  DATETIME,
                comments     TEXT
             )

           The columns are:

             COLUMN       MEANING
             ===========  ====================================================
             checksum     A 64-bit checksum of the query fingerprint
             fingerprint  The abstracted version of the query; its primary key
             sample       The query text of a sample of the class of queries
             first_seen   The smallest timestamp of this class of queries
             last_seen    The largest timestamp of this class of queries
             reviewed_by  Initially NULL; if set, query is skipped thereafter
             reviewed_on  Initially NULL; not assigned any special meaning
             comments     Initially NULL; not assigned any special meaning

           Note that the "fingerprint" column is the true primary key for a class of queries.  The "checksum" is
           just a cryptographic hash of this value, which provides a shorter value that is very likely  to  also
           be unique.

           After  parsing  and  aggregating  events, your table should contain a row for each fingerprint.  This
           option depends on "--group-by fingerprint" (which is the default).  It will not work otherwise.

       --run-time
           type: time

           How long to run for each "--iterations".  The default is to run forever (you can interrupt with CTRL-
           C).  Because "--iterations" defaults to 1, if you only specify "--run-time", pt-query-digest runs for
           that amount of time and then exits.  The two options are specified together to do  collect-and-report
           cycles.   For  example,  specifying "--iterations" 4 "--run-time" "15m" with a continuous input (like
           STDIN or "--processlist") will cause pt-query-digest to run for 1 hour (15 minutes  x  4),  reporting
           four times, once at each 15 minute interval.

       --run-time-mode
           type: string; default: clock

           Set what the value of "--run-time" operates on.  Following are the possible values for this option:

           clock
               "--run-time"  specifies  an  amount  of real clock time during which the tool should run for each
               "--iterations".

           event
               "--run-time" specifies an amount of log time.  Log time is determined by timestamps in  the  log.
               The first timestamp seen is remembered, and each timestamp after that is compared to the first to
               determine  how  much log time has passed.  For example, if the first timestamp seen is "12:00:00"
               and the next is "12:01:30", that is 1 minute and 30 seconds of log  time.   The  tool  will  read
               events until the log time is greater than or equal to the specified "--run-time" value.

               Since  timestamps  in  logs  are  not always printed, or not always printed frequently, this mode
               varies in accuracy.

           interval
               "--run-time" specifies interval boundaries of log time into which events are divided and  reports
               are  generated.   This  mode  is different from the others because it doesn't specify how long to
               run.  The value of "--run-time" must be an interval that divides evenly into  minutes,  hours  or
               days.   For example, "5m" divides evenly into hours (60/5=12, so 12 5 minutes intervals per hour)
               but "7m" does not (60/7=8.6).

               Specifying "--run-time-mode interval --run-time 30m --iterations  0"  is  similar  to  specifying
               "--run-time-mode  clock --run-time 30m --iterations 0".  In the latter case, pt-query-digest will
               run forever, producing reports every 30 minutes, but this only works effectively with  continuous
               inputs like STDIN and the processlist.  For fixed inputs, like  log  files,  the  former  example
               produces multiple reports by dividing the log into 30 minutes intervals based on timestamps.

               Intervals are calculated from the zeroth second/minute/hour in which a timestamp occurs, not from
               whatever time it specifies.  For example, with 30 minute intervals and a timestamp of "12:10:30",
               the  interval  is  not "12:10:30" to "12:40:30", it is "12:00:00" to "12:29:59".  Or, with 1 hour
               intervals, it is "12:00:00" to "12:59:59".  When a new timestamp exceeds the interval,  a  report
               is printed, and the next interval is recalculated based on the new timestamp.

               Since  "--iterations"  is  1  by default, you probably want to specify a new value else pt-query-
               digest will only get and report on the  first  interval  from  the  log  since  1  interval  =  1
               iteration.  If you want to get and report every interval in a log, specify "--iterations" 0.

       --sample
           type: int

           Filter  out  all  but  the  first N occurrences of each query.  The queries are filtered on the first
           value in "--group-by", so by default, this will filter by query fingerprint.  For example,  "--sample
           2"  will  permit  two  sample  queries  for  each  fingerprint.  Useful in conjunction with "--output
           slowlog" to print the queries.  You probably want to set  "--no-report"  to  avoid  the  overhead  of
           aggregating  and  reporting  if  you're  just using this to print out samples of queries.  A complete
           example:

             pt-query-digest --sample 2 --no-report --output slowlog slow.log

       --set-vars
           type: Array

           Set the MySQL variables in this comma-separated list of "variable=value" pairs.

           By default, the tool sets:

              wait_timeout=10000

           Variables specified on the command line override these defaults.  For example, specifying "--set-vars
           wait_timeout=500" overrides the defaultvalue of 10000.

           The tool prints a warning and continues if a variable cannot be set.

       --show-all
           type: Hash

           Show all values for these attributes.

           By default pt-query-digest only shows as many of an attribute's value that  fit  on  a  single  line.
           This  option  allows  you  to  specify  attributes  for which all values will be shown (line width is
           ignored).  This only works for attributes with string values like  user,  host,  db,  etc.   Multiple
           attributes can be specified, comma-separated.

       --since
           type: string

           Parse only queries newer than this value (parse queries since this date).

           This  option  allows  you  to  ignore queries older than a certain value and parse only those queries
           which are more recent than the value.  The value can be several types:

             * Simple time value N with optional suffix: N[shmd], where
               s=seconds, h=hours, m=minutes, d=days (default s if no suffix
               given); this is like saying "since N[shmd] ago"
             * Full date with optional hours:minutes:seconds:
               YYYY-MM-DD [HH:MM::SS]
             * Short, MySQL-style date:
               YYMMDD [HH:MM:SS]
             * Any time expression evaluated by MySQL:
               CURRENT_DATE - INTERVAL 7 DAY

           If you give a MySQL time expression,  and  you  have  not  also  specified  a  DSN  for  "--explain",
           "--processlist",  or  "--review",  then  you must specify a DSN on the command line so that pt-query-
           digest can connect to MySQL to evaluate the expression.

           The MySQL time expression is wrapped inside a query like "SELECT UNIX_TIMESTAMP(<expression>)", so be
           sure that the expression is valid inside this  query.   For  example,  do  not  use  UNIX_TIMESTAMP()
           because UNIX_TIMESTAMP(UNIX_TIMESTAMP()) returns 0.

           Events  are assumed to be in chronological: older events at the beginning of the log and newer events
           at the end of the log.  "--since" is strict: it ignores all queries until one is found  that  is  new
           enough.   Therefore,  if the query events are not consistently timestamped, some may be ignored which
           are actually new enough.

           See also "--until".

       --socket
           short form: -S; type: string

           Socket file to use for connection.

       --timeline
           Show a timeline of events.

           This option makes pt-query-digest print another kind of report: a timeline of the events.  Each query
           is still grouped and aggregate into classes according to "--group-by", but then they are  printed  in
           chronological order.  The timeline report prints out the timestamp, interval, count and value of each
           classes.

           If  all  you  want  is  the timeline report, then specify "--no-report" to suppress the default query
           analysis report.  Otherwise, the timeline report will be printed at the end before the  response-time
           profile (see "--report-format" and "OUTPUT").

           For example, this:

             pt-query-digest /path/to/log --group-by distill --timeline

           will print something like:

             # ########################################################
             # distill report
             # ########################################################
             # 2009-07-25 11:19:27 1+00:00:01   2 SELECT foo
             # 2009-07-27 11:19:30      00:01   2 SELECT bar
             # 2009-07-27 11:30:00 1+06:30:00   2 SELECT foo

       --type
           type: Array; default: slowlog

           The type of input to parse.  The permitted types are

           binlog
               Parse a binary log file that has first been converted to text using mysqlbinlog.

               For example:

                  mysqlbinlog mysql-bin.000441 > mysql-bin.000441.txt

                  pt-query-digest --type binlog mysql-bin.000441.txt

           genlog
               Parse  a  MySQL general log file.  General logs lack a lot of "ATTRIBUTES", notably "Query_time".
               The default "--order-by" for general logs changes to "Query_time:cnt".

           slowlog
               Parse a log file in any variation of MySQL slow log format.

           tcpdump
               Inspect network packets and decode the MySQL client protocol, extracting  queries  and  responses
               from it.

               pt-query-digest does not actually watch the network (i.e. it does NOT "sniff packets").  Instead,
               it's  just  parsing  the  output of tcpdump.  You are responsible for generating this output; pt-
               query-digest does not do it for you.  Then you send this to pt-query-digest as you would any  log
               file: as files on the command line or to STDIN.

               The  parser  expects the input to be formatted with the following options: "-x -n -q -tttt".  For
               example, if you want to capture output from your local machine, you can  do  something  like  the
               following (the port must come last on FreeBSD):

                 tcpdump -s 65535 -x -nn -q -tttt -i any -c 1000 port 3306 \
                   > mysql.tcp.txt
                 pt-query-digest --type tcpdump mysql.tcp.txt

               The  other  tcpdump parameters, such as -s, -c, and -i, are up to you.  Just make sure the output
               looks like this (there is a line break in the first line to avoid man-page problems):

                 2009-04-12 09:50:16.804849 IP 127.0.0.1.42167
                        > 127.0.0.1.3306: tcp 37
                     0x0000:  4508 0059 6eb2 4000 4006 cde2 7f00 0001
                     0x0010:  ....

               Remember tcpdump has a handy -c option to stop after it captures some number of packets!   That's
               very  useful for testing your tcpdump command.  Note that tcpdump can't capture traffic on a Unix
               socket.  Read <http://bugs.mysql.com/bug.php?id=31577> if you're confused about this.

               Devananda Van Der Veen explained on the MySQL Performance Blog how  to  capture  traffic  without
               dropping  packets on busy servers.  Dropped packets cause pt-query-digest to miss the response to
               a request, then see the response to a later request and assign the wrong execution  time  to  the
               query.  You can change the filter to something like the following to help capture a subset of the
               queries.  (See <http://www.mysqlperformanceblog.com/?p=6092> for details.)

                 tcpdump -i any -s 65535 -x -n -q -tttt \
                    'port 3306 and tcp[1] & 7 == 2 and tcp[3] & 7 == 2'

               All  MySQL  servers  running  on  port  3306  are  automatically  detected in the tcpdump output.
               Therefore, if the tcpdump out contains packets from multiple servers on port 3306  (for  example,
               10.0.0.1:3306,  10.0.0.2:3306, etc.), all packets/queries from all these servers will be analyzed
               together as if they were one server.

               If you're analyzing  traffic  for  a  MySQL  server  that  is  not  running  on  port  3306,  see
               "--watch-server".

               Also  note  that pt-query-digest may fail to report the database for queries when parsing tcpdump
               output.  The database is discovered only in the initial connect events for a new client  or  when
               <USE  db>  is  executed.   If  the tcpdump output contains neither of these, then pt-query-digest
               cannot discover the database.

               Server-side prepared statements are supported.  SSL-encrypted traffic  cannot  be  inspected  and
               decoded.

           rawlog
               Raw logs are not MySQL logs but simple text files with one SQL statement per line, like:

                 SELECT c FROM t WHERE id=1
                 /* Hello, world! */ SELECT * FROM t2 LIMIT 1
                 INSERT INTO t (a, b) VALUES ('foo', 'bar')
                 INSERT INTO t SELECT * FROM monkeys

               Since  raw logs do not have any metrics, many options and features of pt-query-digest do not work
               with them.

               One use case for raw logs is ranking queries by count when the only information  available  is  a
               list of queries, from polling "SHOW PROCESSLIST" for example.

       --until
           type: string

           Parse only queries older than this value (parse queries until this date).

           This  option  allows  you  to  ignore queries newer than a certain value and parse only those queries
           which are older than the value.  The value can be one of the same types listed for "--since".

           Unlike "--since", "--until" is not strict: all queries are parsed until one has a timestamp  that  is
           equal to or greater than "--until".  Then all subsequent queries are ignored.

       --user
           short form: -u; type: string

           User for login if not current user.

       --variations
           type: Array

           Report the number of variations in these attributes' values.

           Variations  show  how many distinct values an attribute had within a class.  The usual value for this
           option is "arg" which shows how many distinct queries were in the  class.   This  can  be  useful  to
           determine a query's cacheability.

           Distinct  values  are  determined  by CRC32 checksums of the attributes' values.  These checksums are
           reported in the query report for attributes specified by this option, like:

             # arg crc      109 (1/25%), 144 (1/25%)... 2 more

           In that class there were 4 distinct queries.  The checksums of the first two  variations  are  shown,
           and each one occurred once (or, 25% of the time).

           The  counts  of  distinct variations is approximate because only 1,000 variations are saved.  The mod
           (%) 1000 of the full CRC32 checksum is saved, so some distinct checksums are treated as equal.

       --version
           Show version and exit.

       --[no]version-check
           default: yes

           Check for the latest version of Percona Toolkit, MySQL, and other programs.

           This is a standard "check for updates automatically" feature, with two additional  features.   First,
           the  tool  checks  the  version of other programs on the local system in addition to its own version.
           For example, it checks the version of every MySQL server it connects to, Perl, and  the  Perl  module
           DBD::mysql.   Second, it checks for and warns about versions with known problems.  For example, MySQL
           5.5.25 had a critical bug and was re-released as 5.5.25a.

           Any updates or known problems are printed to STDOUT before the tool's normal  output.   This  feature
           should never interfere with the normal operation of the tool.

           For more information, visit <https://www.percona.com/version-check>.

       --watch-server
           type: string

           This  option  tells  pt-query-digest which server IP address and port (like "10.0.0.1:3306") to watch
           when parsing tcpdump (for "--type" tcpdump); all other servers are ignored.  If you don't specify it,
           pt-query-digest watches all servers by looking for any IP address using port  3306  or  "mysql".   If
           you're  watching  a  server  with  a  non-standard  port, this won't work, so you must specify the IP
           address and port to watch.

           If you want to watch a mix of servers, some running on standard port 3306 and some  running  on  non-
           standard  ports,  you  need  to create separate tcpdump outputs for the non-standard port servers and
           then specify this option for each.  At present pt-query-digest cannot  auto-detect  servers  on  port
           3306 and also be told to watch a server on a non-standard port.

DSN OPTIONS

       These  DSN  options are used to create a DSN.  Each option is given like "option=value".  The options are
       case-sensitive, so P and p are not the same option.  There cannot be whitespace before or after  the  "="
       and  if  the  value  contains  whitespace  it  must be quoted.  DSN options are comma-separated.  See the
       percona-toolkit manpage for full details.

       •   A

           dsn: charset; copy: yes

           Default character set.

       •   D

           dsn: database; copy: yes

           Default database to use when connecting to MySQL.

       •   F

           dsn: mysql_read_default_file; copy: yes

           Only read default options from the given file.

       •   h

           dsn: host; copy: yes

           Connect to host.

       •   p

           dsn: password; copy: yes

           Password to use when connecting.  If password contains commas they must be escaped with a  backslash:
           "exam\,ple"

       •   P

           dsn: port; copy: yes

           Port number to use for connection.

       •   S

           dsn: mysql_socket; copy: yes

           Socket file to use for connection.

       •   t

           The "--review" or "--history" table.

       •   u

           dsn: user; copy: yes

           User for login if not current user.

ENVIRONMENT

       The  environment  variable "PTDEBUG" enables verbose debugging output to STDERR.  To enable debugging and
       capture all output to a file, run the tool like:

          PTDEBUG=1 pt-query-digest ... > FILE 2>&1

       Be careful: debugging output is voluminous and can generate several megabytes of output.

SYSTEM REQUIREMENTS

       You need Perl, DBI, DBD::mysql, and some core packages that ought to be installed in any  reasonably  new
       version of Perl.

BUGS

       For a list of known bugs, see <http://www.percona.com/bugs/pt-query-digest>.

       Please report bugs at <https://bugs.launchpad.net/percona-toolkit>.  Include the following information in
       your bug report:

       •   Complete command-line used to run the tool

       •   Tool "--version"

       •   MySQL version of all servers involved

       •   Output from the tool including STDERR

       •   Input files (log/dump/config files, etc.)

       If possible, include debugging output by running the tool with "PTDEBUG"; see "ENVIRONMENT".

DOWNLOADING

       Visit  <http://www.percona.com/software/percona-toolkit/>  to  download  the  latest  release  of Percona
       Toolkit.  Or, get the latest release from the command line:

          wget percona.com/get/percona-toolkit.tar.gz

          wget percona.com/get/percona-toolkit.rpm

          wget percona.com/get/percona-toolkit.deb

       You can also get individual tools from the latest release:

          wget percona.com/get/TOOL

       Replace "TOOL" with the name of any tool.

ATTRIBUTES REFERENCE

       Events may have the following attributes.  If writing a "--filter", be sure to check that an attribute is
       defined in each event before using it,  else  the  filter  code  may  crash  the  tool  with  a  "use  of
       uninitialized value" error.

       You can dump event attributes for any input like:

         $ pt-query-digest                  \
             slow.log                       \
             --filter 'print Dumper $event' \
             --no-report                    \
             --sample 1

       That will produce a lot of output with "attribute => value" pairs like:

          $VAR1 = {
            Query_time => '0.033384',
            Rows_examined => '0',
            Rows_sent => '0',
            Thread_id => '10',
            Tmp_table => 'No',
            Tmp_table_on_disk => 'No',
            arg => 'SELECT col FROM tbl WHERE id=5',
            bytes => 103,
            cmd => 'Query',
            db => 'db1',
            fingerprint => 'select col from tbl where id=?',
            host => '',
            pos_in_log => 1334,
            ts => '071218 11:48:27',
            user => '[SQL_SLAVE]'
          };

   COMMON
       These attribute are common to all input "--type" and "--processlist", except where noted.

       arg The query text, or the command for admin commands like "Ping".

       bytes
           The byte length of the "arg".

       cmd "Query" or "Admin".

       db  The  current  database.   The  value  comes from USE database statements.  By default, "Schema" is an
           alias which is automatically changed to "db"; see "--attribute-aliases".

       fingerprint
           An abstracted form of the query.  See "FINGERPRINTS".

       host
           Client host which executed the query.

       pos_in_log
           The byte offset of the event in the log or tcpdump, except for "--processlist".

       Query_time
           The total time the query took, including lock time.

       ts  The timestamp of when the query ended.

   SLOW, GENERAL, AND BINARY LOGS
       Events have all available attributes from the log file.  Therefore, you only need to look at the log file
       to see which events are available, but remember: not all events have the same attributes.

       Percona      Server      adds      many      attributes      to       the       slow       log;       see
       http://www.percona.com/docs/wiki/patches:slow_extended for more information.

   TCPDUMP
       These attributes are available when parsing "--type" tcpdump.

       Error_no
           The MySQL error number if the query caused an error.

       ip  The client's IP address.  Certain log files may also contain this attribute.

       No_good_index_used
           Yes or No if no good index existed for the query (flag set by server).

       No_index_used
           Yes or No if the query did not use any index (flag set by server).

       port
           The client's port number.

       Warning_count
           The number of warnings, as otherwise shown by "SHOW WARNINGS".

   PROCESSLIST
       If  using  "--processlist",  an "id" attribute is available for the process ID, in addition to the common
       attributes.

AUTHORS

       Baron Schwartz, Daniel Nichter, and Brian Fraser

ABOUT PERCONA TOOLKIT

       This tool is part of Percona Toolkit, a collection of advanced command-line tools for MySQL developed  by
       Percona.   Percona  Toolkit  was  forked  from  two  projects  in June, 2011: Maatkit and Aspersa.  Those
       projects were created by Baron Schwartz and  primarily  developed  by  him  and  Daniel  Nichter.   Visit
       <http://www.percona.com/software/> to learn about other free, open-source software from Percona.

COPYRIGHT, LICENSE, AND WARRANTY

       This program is copyright 2008-2015 Percona LLC and/or its affiliates.

       THIS  PROGRAM  IS  PROVIDED  "AS  IS"  AND  WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT
       LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

       This program is free software; you can redistribute it and/or modify  it  under  the  terms  of  the  GNU
       General  Public  License  as  published  by the Free Software Foundation, version 2; OR the Perl Artistic
       License.  On UNIX and similar systems, you can issue `man perlgpl' or `man perlartistic'  to  read  these
       licenses.

       You  should have received a copy of the GNU General Public License along with this program; if not, write
       to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA.

VERSION

       pt-query-digest 2.2.16

perl v5.20.2                                       2015-11-06                                PT-QUERY-DIGEST(1p)