lunar (1) pgreplay.1.gz

Provided by: pgreplay_1.2.0-2ubuntu2_amd64 bug

NAME

       pgreplay - PostgreSQL log file replayer for performance tests

SYNOPSIS

       pgreplay [parse options] [replay options] [-d level] [infile]
       pgreplay -f [parse options] [-o outfile] [-d level] [infile]
       pgreplay -r [replay options] [-d level] [infile]

DESCRIPTION

       pgreplay  reads  a  PostgreSQL  log file (not a WAL file), extracts the SQL statements and
       executes them in the same order and relative time against a PostgreSQL  database  cluster.
       A final report gives you a useful statistical analysis of your workload and its execution.

       In the first form, the log file infile is replayed at the time it is read.

       With  the -f option, pgreplay will not execute the statements, but write them to a ‘replay
       file’ outfile that can be replayed with the third form.

       With the -r option, pgreplay will execute the statements in the replay  file  infile  that
       was created by the second form.

       If  the  execution  of  statements  gets behind schedule, warning messages are issued that
       indicate that the server cannot handle the load in a  timely  fashion.   The  idea  is  to
       replay a real-world database workload as exactly as possible.

       To  create  a  log  file  that  can  be  parsed by pgreplay, you need to set the following
       parameters in postgresql.conf:

              log_min_messages=error (or more)
              log_min_error_statement=log (or more)
              log_connections=on
              log_disconnections=on
              log_line_prefix='%m|%u|%d|%c|' (if you don't use CSV logging)
              log_statement='all'
              lc_messages must be set to English (encoding does not matter)
              bytea_output=escape (from version 9.0 on, only if you want to replay the log on 8.4
              or earlier)

       The  database  cluster  against which you replay the SQL statements must be a clone of the
       database cluster that generated the logs from the time immediately before  the  logs  were
       generated.

       pgreplay is useful for performance tests, particularly in the following situations:

       *   You  want  to  compare  the  performance  of  your PostgreSQL application on different
           hardware or different operating systems.

       *   You want to upgrade your database and want to make sure that the new database  version
           does not suffer from performance regressions that affect you.

       Moreover,  pgreplay  can  give  you some feeling as to how your application might scale by
       allowing you to try to replay the workload at a higher speed.  Be warned, though, that 500
       users  working  at  double  speed  is  not really the same as 1000 users working at normal
       speed.

OPTIONS

   Parse options:
       -c     Specifies that the log file is in 'csvlog' format (highly recommended) and  not  in
              'stderr' format.

       -b timestamp
              Only  log entries greater or equal to that timestamp will be parsed.  The format is
              YYYY-MM-DD HH:MM:SS.FFF like in the log file.  An optional time zone part  will  be
              ignored.

       -e timestamp
              Only  log  entries  less  or equal to that timestamp will be parsed.  The format is
              YYYY-MM-DD HH:MM:SS.FFF like in the log file.  An optional time zone part  will  be
              ignored.

       -q     Specifies  that  a  backslash  in a simple string literal will escape the following
              single    quote.     This     depends     on     configuration     options     like
              standard_conforming_strings and is the default for server version 9.0 and less.

   Replay options:
       -h hostname
              Host name where the target database cluster is running (or directory where the UNIX
              socket can be found). Defaults to local connections.
              This works just like the -h option of psql.

       -p port
              TCP port where the target database cluster can be reached.

       -W password
              By default, pgreplay assumes that the target database  cluster  is  configured  for
              trust  authentication.  With  the -W option you can specify a password that will be
              used for all users in the cluster.

       -s factor
              Speed factor for replay, by default 1. This can  be  any  valid  positive  floating
              point number. A factor less than 1 will replay the workload in ‘slow motion’, while
              a factor greater than 1 means ‘fast forward’.

       -E encoding
              Specifies the encoding of the log file, which  will  be  used  as  client  encoding
              during replay. If it is omitted, your default client encoding will be used.

       -j     If  all  connections  are idle, jump ahead to the next request instead of sleeping.
              This will speed up replay. Execution delays will still be reported  correctly,  but
              replay statistics will not contain the idle time.

   Output options:
       -o outfile
              specifies the replay file where the statements will be written for later replay.

   Debug options:
       -d level
              Specifies  the  trace  level (between 1 and 3). Increasing levels will produce more
              detailed information about what pgreplay is doing.

       -v     Prints the program version and exits.

ENVIRONMENT

       PGHOST Specifies the default value for the -h option.

       PGPORT Specifies the default value for the -p option.

       PGCLIENTENCODING
              Specifies the default value for the -E option.

LIMITATIONS

       pgreplay can only replay what is logged by PostgreSQL.  This leads to some limitations:

       *   COPY statements will not be replayed, because the copy data are not logged.

       *   Fast-path API function calls are not logged and will not be replayed.   Unfortunately,
           this includes the Large Object API.

       *   Since the log file is always in the server encoding (which you can specify with the -E
           switch of pgreplay), all SET client_encoding statements will be ignored.

       *   Since  the  preparation  time  of  prepared   statements   is   not   logged   (unless
           log_min_messages  is  debug2  or  more), these statements will be prepared immediately
           before they are first executed during replay.

       *   Because the log file contains only text,  query  parameters  and  return  values  will
           always be in text and never in binary format. If you use binary mode to, say, transfer
           large binary data, pgreplay can cause significantly  more  network  traffic  than  the
           original run.

       *   Sometimes,  if  a  connection  takes  longer  to complete, the session ID unexpectedly
           changes in the PostgreSQL log file. This causes pgreplay to treat the session  as  two
           different  ones,  resulting  in  an  additional  connection. This is arguably a bug in
           PostgreSQL.

AUTHOR

       Written by Laurenz Albe <laurenz.albe@wien.gv.at>.