lunar (1) pt-table-sync.1p.gz

Provided by: percona-toolkit_3.2.1-1_all bug

NAME

       pt-table-sync - Synchronize MySQL table data efficiently.

SYNOPSIS

       Usage: pt-table-sync [OPTIONS] DSN [DSN]

       pt-table-sync synchronizes data efficiently between MySQL tables.

       This tool changes data, so for maximum safety, you should back up your data before using
       it.  When synchronizing a server that is a replication slave with the "--replicate" or
       "--sync-to-master" methods, it always makes the changes on the replication master, never
       the replication slave directly.  This is in general the only safe way to bring a replica
       back in sync with its master; changes to the replica are usually the source of the
       problems in the first place.  However, the changes it makes on the master should be no-op
       changes that set the data to their current values, and actually affect only the replica.

       Sync db.tbl on host1 to host2:

         pt-table-sync --execute h=host1,D=db,t=tbl h=host2

       Sync all tables on host1 to host2 and host3:

         pt-table-sync --execute host1 host2 host3

       Make slave1 have the same data as its replication master:

         pt-table-sync --execute --sync-to-master slave1

       Resolve differences that pt-table-checksum found on all slaves of master1:

         pt-table-sync --execute --replicate test.checksum master1

       Same as above but only resolve differences on slave1:

         pt-table-sync --execute --replicate test.checksum \
           --sync-to-master slave1

       Sync master2 in a master-master replication configuration, where master2's copy of db.tbl
       is known or suspected to be incorrect:

         pt-table-sync --execute --sync-to-master h=master2,D=db,t=tbl

       Note that in the master-master configuration, the following will NOT do what you want,
       because it will make changes directly on master2, which will then flow through replication
       and change master1's data:

         # Don't do this in a master-master setup!
         pt-table-sync --execute h=master1,D=db,t=tbl master2

RISKS

       WARNING: pt-table-sync changes data!  Before using this tool, please:

       •   Read the tool's documentation

       •   Review the tool's known "BUGS"

       •   Test the tool on a non-production server

       •   Backup your production server and verify the backups

           pt-table-sync is mature, proven in the real world, and well tested, but if used
           improperly it can have adverse consequences.  Always test syncing first with
           "--dry-run" and "--print".

DESCRIPTION

       pt-table-sync does one-way and bidirectional synchronization of table data.  It does not
       synchronize table structures, indexes, or any other schema objects.  The following
       describes one-way synchronization.  "BIDIRECTIONAL SYNCING" is described later.

       This tool is complex and functions in several different ways.  To use it safely and
       effectively, you should understand three things: the purpose of "--replicate", finding
       differences, and specifying hosts.  These three concepts are closely related and determine
       how the tool will run.  The following is the abbreviated logic:

          if DSN has a t part, sync only that table:
             if 1 DSN:
                if --sync-to-master:
                   The DSN is a slave.  Connect to its master and sync.
             if more than 1 DSN:
                The first DSN is the source.  Sync each DSN in turn.
          else if --replicate:
             if --sync-to-master:
                The DSN is a slave.  Connect to its master, find records
                of differences, and fix.
             else:
                The DSN is the master.  Find slaves and connect to each,
                find records of differences, and fix.
          else:
             if only 1 DSN and --sync-to-master:
                The DSN is a slave.  Connect to its master, find tables and
                filter with --databases etc, and sync each table to the master.
             else:
                find tables, filtering with --databases etc, and sync each
                DSN to the first.

       pt-table-sync can run in one of two ways: with "--replicate" or without.  The default is
       to run without "--replicate" which causes pt-table-sync to automatically find differences
       efficiently with one of several algorithms (see "ALGORITHMS").  Alternatively, the value
       of "--replicate", if specified, causes pt-table-sync to use the differences already found
       by having previously ran pt-table-checksum with its own "--replicate" option.  Strictly
       speaking, you don't need to use "--replicate" because pt-table-sync can find differences,
       but many people use "--replicate" if, for example, they checksum regularly using pt-table-
       checksum then fix differences as needed with pt-table-sync.  If you're unsure, read each
       tool's documentation carefully and decide for yourself, or consult with an expert.

       Regardless of whether "--replicate" is used or not, you need to specify which hosts to
       sync.  There are two ways: with "--sync-to-master" or without.  Specifying
       "--sync-to-master" makes pt-table-sync expect one and only slave DSN on the command line.
       The tool will automatically discover the slave's master and sync it so that its data is
       the same as its master.  This is accomplished by making changes on the master which then
       flow through replication and update the slave to resolve its differences.  Be careful
       though: although this option specifies and syncs a single slave, if there are other slaves
       on the same master, they will receive via replication the changes intended for the slave
       that you're trying to sync.

       Alternatively, if you do not specify "--sync-to-master", the first DSN given on the
       command line is the source host.  There is only ever one source host.  If you do not also
       specify "--replicate", then you must specify at least one other DSN as the destination
       host.  There can be one or more destination hosts.  Source and destination hosts must be
       independent; they cannot be in the same replication topology.  pt-table-sync will die with
       an error if it detects that a destination host is a slave because changes are written
       directly to destination hosts (and it's not safe to write directly to slaves).  Or, if you
       specify "--replicate" (but not "--sync-to-master") then pt-table-sync expects one and only
       one master DSN on the command line.  The tool will automatically discover all the master's
       slaves and sync them to the master.  This is the only way to sync several (all) slaves at
       once (because "--sync-to-master" only specifies one slave).

       Each host on the command line is specified as a DSN.  The first DSN (or only DSN for cases
       like "--sync-to-master") provides default values for other DSNs, whether those other DSNs
       are specified on the command line or auto-discovered by the tool.  So in this example,

         pt-table-sync --execute h=host1,u=msandbox,p=msandbox h=host2

       the host2 DSN inherits the "u" and "p" DSN parts from the host1 DSN.  Use the
       "--explain-hosts" option to see how pt-table-sync will interpret the DSNs given on the
       command line.

LIMITATIONS

       Replicas using row-based replication
           pt-table-sync requires statement-based replication when used with the
           "--sync-to-master" or "--replicate" option.  Therefore it will set
           "binlog_format=STATEMENT" on the master for its session if required.  To do this user
           must have "SUPER" privilege.

OUTPUT

       If you specify the "--verbose" option, you'll see information about the differences
       between the tables.  There is one row per table.  Each server is printed separately.  For
       example,

         # Syncing h=host1,D=test,t=test1
         # DELETE REPLACE INSERT UPDATE ALGORITHM START    END      EXIT DATABASE.TABLE
         #      0       0      3      0 Chunk     13:00:00 13:00:17 2    test.test1

       Table test.test1 on host1 required 3 "INSERT" statements to synchronize and it used the
       Chunk algorithm (see "ALGORITHMS").  The sync operation for this table started at 13:00:00
       and ended 17 seconds later (times taken from "NOW()" on the source host).  Because
       differences were found, its "EXIT STATUS" was 2.

       If you specify the "--print" option, you'll see the actual SQL statements that the script
       uses to synchronize the table if "--execute" is also specified.

       If you want to see the SQL statements that pt-table-sync is using to select chunks,
       nibbles, rows, etc., then specify "--print" once and "--verbose" twice.  Be careful
       though: this can print a lot of SQL statements.

       There are cases where no combination of "INSERT", "UPDATE" or "DELETE" statements can
       resolve differences without violating some unique key.  For example, suppose there's a
       primary key on column a and a unique key on column b.  Then there is no way to sync these
       two tables with straightforward UPDATE statements:

        +---+---+  +---+---+
        | a | b |  | a | b |
        +---+---+  +---+---+
        | 1 | 2 |  | 1 | 1 |
        | 2 | 1 |  | 2 | 2 |
        +---+---+  +---+---+

       The tool rewrites queries to "DELETE" and "REPLACE" in this case.  This is automatically
       handled after the first index violation, so you don't have to worry about it.

       Be careful when using pt-table-sync in any master-master setup.  Master-master replication
       is inherently tricky, and it's easy to make mistakes.  You need to be sure you're using
       the tool correctly for master-master replication.  See the "SYNOPSIS" for the overview of
       the correct usage.

       Also be careful with tables that have foreign key constraints with "ON DELETE" or "ON
       UPDATE" definitions because these might cause unintended changes on the child tables.  See
       "--[no]check-child-tables".

       In general, this tool is best suited when your tables have a primary key or unique index.
       Although it can synchronize data in tables lacking a primary key or unique index, it might
       be best to synchronize that data by another means.

REPLICATION SAFETY

       Synchronizing a replication master and slave safely is a non-trivial problem, in general.
       There are all sorts of issues to think about, such as other processes changing data,
       trying to change data on the slave, whether the destination and source are a master-master
       pair, and much more.

       In general, the safe way to do it is to change the data on the master, and let the changes
       flow through replication to the slave like any other changes.  However, this works only if
       it's possible to REPLACE into the table on the master.  REPLACE works only if there's a
       unique index on the table (otherwise it just acts like an ordinary INSERT).

       If your table has unique keys, you should use the "--sync-to-master" and/or "--replicate"
       options to sync a slave to its master.  This will generally do the right thing.  When
       there is no unique key on the table, there is no choice but to change the data on the
       slave, and pt-table-sync will detect that you're trying to do so.  It will complain and
       die unless you specify "--no-check-slave" (see "--[no]check-slave").

       If you're syncing a table without a primary or unique key on a master-master pair, you
       must change the data on the destination server.  Therefore, you need to specify
       "--no-bin-log" for safety (see "--[no]bin-log").  If you don't, the changes you make on
       the destination server will replicate back to the source server and change the data there!

       The generally safe thing to do on a master-master pair is to use the "--sync-to-master"
       option so you don't change the data on the destination server.  You will also need to
       specify "--no-check-slave" to keep pt-table-sync from complaining that it is changing data
       on a slave.

ALGORITHMS

       pt-table-sync has a generic data-syncing framework which uses different algorithms to find
       differences.  The tool automatically chooses the best algorithm for each table based on
       indexes, column types, and the algorithm preferences specified by "--algorithms".  The
       following algorithms are available, listed in their default order of preference:

       Chunk
           Finds an index whose first column is numeric (including date and time types), and
           divides the column's range of values into chunks of approximately "--chunk-size" rows.
           Syncs a chunk at a time by checksumming the entire chunk.  If the chunk differs on the
           source and destination, checksums each chunk's rows individually to find the rows that
           differ.

           It is efficient when the column has sufficient cardinality to make the chunks end up
           about the right size.

           The initial per-chunk checksum is quite small and results in minimal network traffic
           and memory consumption.  If a chunk's rows must be examined, only the primary key
           columns and a checksum are sent over the network, not the entire row.  If a row is
           found to be different, the entire row will be fetched, but not before.

           Note that this algorithm will not work if chunking a char column where all the values
           start with the same character. In that case, the tool will exit and suggest picking a
           different algorithm.

       Nibble
           Finds an index and ascends the index in fixed-size nibbles of "--chunk-size" rows,
           using a non-backtracking algorithm (see pt-archiver for more on this algorithm).  It
           is very similar to "Chunk", but instead of pre-calculating the boundaries of each
           piece of the table based on index cardinality, it uses "LIMIT" to define each nibble's
           upper limit, and the previous nibble's upper limit to define the lower limit.

           It works in steps: one query finds the row that will define the next nibble's upper
           boundary, and the next query checksums the entire nibble.  If the nibble differs
           between the source and destination, it examines the nibble row-by-row, just as "Chunk"
           does.

       GroupBy
           Selects the entire table grouped by all columns, with a COUNT(*) column added.
           Compares all columns, and if they're the same, compares the COUNT(*) column's value to
           determine how many rows to insert or delete into the destination.  Works on tables
           with no primary key or unique index.

       Stream
           Selects the entire table in one big stream and compares all columns.  Selects all
           columns.  Much less efficient than the other algorithms, but works when there is no
           suitable index for them to use.

       Future Plans
           Possibilities for future algorithms are TempTable (what I originally called bottom-up
           in earlier versions of this tool), DrillDown (what I originally called top-down), and
           GroupByPrefix (similar to how SqlYOG Job Agent works).  Each algorithm has strengths
           and weaknesses.  If you'd like to implement your favorite technique for finding
           differences between two sources of data on possibly different servers, I'm willing to
           help.  The algorithms adhere to a simple interface that makes it pretty easy to write
           your own.

BIDIRECTIONAL SYNCING

       Bidirectional syncing is a new, experimental feature.  To make it work reliably there are
       a number of strict limitations:

         * only works when syncing one server to other independent servers
         * does not work in any way with replication
         * requires that the table(s) are chunkable with the Chunk algorithm
         * is not N-way, only bidirectional between two servers at a time
         * does not handle DELETE changes

       For example, suppose we have three servers: c1, r1, r2.  c1 is the central server, a
       pseudo-master to the other servers (viz. r1 and r2 are not slaves to c1).  r1 and r2 are
       remote servers.  Rows in table foo are updated and inserted on all three servers and we
       want to synchronize all the changes between all the servers.  Table foo has columns:

         id    int PRIMARY KEY
         ts    timestamp auto updated
         name  varchar

       Auto-increment offsets are used so that new rows from any server do not create conflicting
       primary key (id) values.  In general, newer rows, as determined by the ts column, take
       precedence when a same but differing row is found during the bidirectional sync.  "Same
       but differing" means that two rows have the same primary key (id) value but different
       values for some other column, like the name column in this example.  Same but differing
       conflicts are resolved by a "conflict".  A conflict compares some column of the competing
       rows to determine a "winner".  The winning row becomes the source and its values are used
       to update the other row.

       There are subtle differences between three columns used to achieve bidirectional syncing
       that you should be familiar with: chunk column ("--chunk-column"), comparison column(s)
       ("--columns"), and conflict column ("--conflict-column").  The chunk column is only used
       to chunk the table; e.g. "WHERE id >= 5 AND id < 10".  Chunks are checksummed and when
       chunk checksums reveal a difference, the tool selects the rows in that chunk and checksums
       the "--columns" for each row.  If a column checksum differs, the rows have one or more
       conflicting column values.  In a traditional unidirectional sync, the conflict is a moot
       point because it can be resolved simply by updating the entire destination row with the
       source row's values.  In a bidirectional sync, however, the "--conflict-column" (in
       accordance with other "--conflict-*" options list below) is compared to determine which
       row is "correct" or "authoritative"; this row becomes the "source".

       To sync all three servers completely, two runs of pt-table-sync are required.  The first
       run syncs c1 and r1, then syncs c1 and r2 including any changes from r1.  At this point c1
       and r2 are completely in sync, but r1 is missing any changes from r2 because c1 didn't
       have these changes when it and r1 were synced.  So a second run is needed which syncs the
       servers in the same order, but this time when c1 and r1 are synced r1 gets r2's changes.

       The tool does not sync N-ways, only bidirectionally between the first DSN given on the
       command line and each subsequent DSN in turn.  So the tool in this example would be ran
       twice like:

         pt-table-sync --bidirectional h=c1 h=r1 h=r2

       The "--bidirectional" option enables this feature and causes various sanity checks to be
       performed.  You must specify other options that tell pt-table-sync how to resolve
       conflicts for same but differing rows.  These options are:

         * --conflict-column
         * --conflict-comparison
         * --conflict-value
         * --conflict-threshold
         * --conflict-error">  (optional)

       Use "--print" to test this option before "--execute".  The printed SQL statements will
       have comments saying on which host the statement would be executed if you used
       "--execute".

       Technical side note: the first DSN is always the "left" server and the other DSNs are
       always the "right" server.  Since either server can become the source or destination it's
       confusing to think of them as "src" and "dst".  Therefore, they're generically referred to
       as left and right.  It's easy to remember this because the first DSN is always to the left
       of the other server DSNs on the command line.

EXIT STATUS

       The following are the exit statuses (also called return values, or return codes) when pt-
       table-sync finishes and exits.

          STATUS  MEANING
          ======  =======================================================
          0       Success.
          1       Internal error.
          2       At least one table differed on the destination.
          3       Combination of 1 and 2.

OPTIONS

       Specify at least one of "--print", "--execute", or "--dry-run".

       "--where" and "--replicate" are mutually exclusive.

       This tool accepts additional command-line arguments.  Refer to the "SYNOPSIS" and usage
       information for details.

       --algorithms
           type: string; default: Chunk,Nibble,GroupBy,Stream

           Algorithm to use when comparing the tables, in order of preference.

           For each table, pt-table-sync will check if the table can be synced with the given
           algorithms in the order that they're given.  The first algorithm that can sync the
           table is used.  See "ALGORITHMS".

       --ask-pass
           Prompt for a password when connecting to MySQL.

       --bidirectional
           Enable bidirectional sync between first and subsequent hosts.

           See "BIDIRECTIONAL SYNCING" for more information.

       --[no]bin-log
           default: yes

           Log to the binary log ("SET SQL_LOG_BIN=1").

           Specifying "--no-bin-log" will "SET SQL_LOG_BIN=0".

       --buffer-in-mysql
           Instruct MySQL to buffer queries in its memory.

           This option adds the "SQL_BUFFER_RESULT" option to the comparison queries.  This
           causes MySQL to execute the queries and place them in a temporary table internally
           before sending the results back to pt-table-sync.  The advantage of this strategy is
           that pt-table-sync can fetch rows as desired without using a lot of memory inside the
           Perl process, while releasing locks on the MySQL table (to reduce contention with
           other queries).  The disadvantage is that it uses more memory on the MySQL server
           instead.

           You probably want to leave "--[no]buffer-to-client" enabled too, because buffering
           into a temp table and then fetching it all into Perl's memory is probably a silly
           thing to do.  This option is most useful for the GroupBy and Stream algorithms, which
           may fetch a lot of data from the server.

       --[no]buffer-to-client
           default: yes

           Fetch rows one-by-one from MySQL while comparing.

           This option enables "mysql_use_result" which causes MySQL to hold the selected rows on
           the server until the tool fetches them.  This allows the tool to use less memory but
           may keep the rows locked on the server longer.

           If this option is disabled by specifying "--no-buffer-to-client" then
           "mysql_store_result" is used which causes MySQL to send all selected rows to the tool
           at once.  This may result in the results "cursor" being held open for a shorter time
           on the server, but if the tables are large, it could take a long time anyway, and use
           all your memory.

           For most non-trivial data sizes, you want to leave this option enabled.

           This option is disabled when "--bidirectional" is used.

       --channel
           type: string

           Channel name used when connected to a server using replication channels.  Suppose you
           have two masters, master_a at port 12345, master_b at port 1236 and a slave connected
           to both masters using channels chan_master_a and chan_master_b.  If you want to run
           pt-table-sync to synchronize the slave against master_a, pt-table-sync won't be able
           to determine what's the correct master since SHOW SLAVE STATUS will return 2 rows. In
           this case, you can use --channel=chan_master_a to specify the channel name to use in
           the SHOW SLAVE STATUS command.

       --charset
           short form: -A; type: string

           Default character set.  If the value is utf8, sets Perl's binmode on STDOUT to utf8,
           passes the mysql_enable_utf8 option to DBD::mysql, and runs SET NAMES UTF8 after
           connecting to MySQL.  Any other value sets binmode on STDOUT without the utf8 layer,
           and runs SET NAMES after connecting to MySQL.

       --[no]check-child-tables
           default: yes

           Check if "--execute" will adversely affect child tables.  When "--replace",
           "--replicate", or "--sync-to-master" is specified, the tool may sync tables using
           "REPLACE" statements.  If a table being synced has child tables with "ON DELETE
           CASCADE", "ON UPDATE CASCADE", or "ON UPDATE SET NULL", the tool prints an error and
           skips the table because "REPLACE" becomes "DELETE" then "INSERT", so the "DELETE" will
           cascade to the child table and delete its rows.  In the worst case, this can delete
           all rows in child tables!

           Specify "--no-check-child-tables" to disable this check.  To completely avoid
           affecting child tables, also specify "--no-foreign-key-checks" so MySQL will not
           cascade any operations from the parent to child tables.

           This check is only preformed if "--execute" and one of "--replace", "--replicate", or
           "--sync-to-master" is specified.  "--print" does not check child tables.

           The error message only prints the first child table found with an "ON DELETE CASCADE",
           "ON UPDATE CASCADE", or "ON UPDATE SET NULL" foreign key constraint.  There could be
           other affected child tables.

       --[no]check-master
           default: yes

           With "--sync-to-master", try to verify that the detected master is the real master.

       --[no]check-slave
           default: yes

           Check whether the destination server is a slave.

           If the destination server is a slave, it's generally unsafe to make changes on it.
           However, sometimes you have to; "--replace" won't work unless there's a unique index,
           for example, so you can't make changes on the master in that scenario.  By default pt-
           table-sync will complain if you try to change data on a slave.  Specify
           "--no-check-slave" to disable this check.  Use it at your own risk.

       --[no]check-triggers
           default: yes

           Check that no triggers are defined on the destination table.

           Triggers were introduced in MySQL v5.0.2, so for older versions this option has no
           effect because triggers will not be checked.

       --chunk-column
           type: string

           Chunk the table on this column.

       --chunk-index
           type: string

           Chunk the table using this index.

       --chunk-size
           type: string; default: 1000

           Number of rows or data size per chunk.

           The size of each chunk of rows for the "Chunk" and "Nibble" algorithms.  The size can
           be either a number of rows, or a data size.  Data sizes are specified with a suffix of
           k=kibibytes, M=mebibytes, G=gibibytes.  Data sizes are converted to a number of rows
           by dividing by the average row length.

       --columns
           short form: -c; type: array

           Compare this comma-separated list of columns.

       --config
           type: Array

           Read this comma-separated list of config files; if specified, this must be the first
           option on the command line.

       --conflict-column
           type: string

           Compare this column when rows conflict during a "--bidirectional" sync.

           When a same but differing row is found the value of this column from each row is
           compared according to "--conflict-comparison", "--conflict-value" and
           "--conflict-threshold" to determine which row has the correct data and becomes the
           source.  The column can be any type for which there is an appropriate
           "--conflict-comparison" (this is almost all types except, for example, blobs).

           This option only works with "--bidirectional".  See "BIDIRECTIONAL SYNCING" for more
           information.

       --conflict-comparison
           type: string

           Choose the "--conflict-column" with this property as the source.

           The option affects how the "--conflict-column" values from the conflicting rows are
           compared.  Possible comparisons are one of these MAGIC_comparisons:

             newest|oldest|greatest|least|equals|matches

             COMPARISON  CHOOSES ROW WITH
             ==========  =========================================================
             newest      Newest temporal --conflict-column value
             oldest      Oldest temporal --conflict-column value
             greatest    Greatest numerical "--conflict-column value
             least       Least numerical --conflict-column value
             equals      --conflict-column value equal to --conflict-value
             matches     --conflict-column value matching Perl regex pattern
                         --conflict-value

           This option only works with "--bidirectional".  See "BIDIRECTIONAL SYNCING" for more
           information.

       --conflict-error
           type: string; default: warn

           How to report unresolvable conflicts and conflict errors

           This option changes how the user is notified when a conflict cannot be resolved or
           causes some kind of error.  Possible values are:

             * warn: Print a warning to STDERR about the unresolvable conflict
             * die:  Die, stop syncing, and print a warning to STDERR

           This option only works with "--bidirectional".  See "BIDIRECTIONAL SYNCING" for more
           information.

       --conflict-threshold
           type: string

           Amount by which one "--conflict-column" must exceed the other.

           The "--conflict-threshold" prevents a conflict from being resolved if the absolute
           difference between the two "--conflict-column" values is less than this amount.  For
           example, if two "--conflict-column" have timestamp values "2009-12-01 12:00:00" and
           "2009-12-01 12:05:00" the difference is 5 minutes.  If "--conflict-threshold" is set
           to "5m" the conflict will be resolved, but if "--conflict-threshold" is set to "6m"
           the conflict will fail to resolve because the difference is not greater than or equal
           to 6 minutes.  In this latter case, "--conflict-error" will report the failure.

           This option only works with "--bidirectional".  See "BIDIRECTIONAL SYNCING" for more
           information.

       --conflict-value
           type: string

           Use this value for certain "--conflict-comparison".

           This option gives the value for "equals" and "matches" "--conflict-comparison".

           This option only works with "--bidirectional".  See "BIDIRECTIONAL SYNCING" for more
           information.

       --databases
           short form: -d; type: hash

           Sync only this comma-separated list of databases.

           A common request is to sync tables from one database with tables from another database
           on the same or different server.  This is not yet possible.  "--databases" will not do
           it, and you can't do it with the D part of the DSN either because in the absence of a
           table name it assumes the whole server should be synced and the D part controls only
           the connection's default database.

       --defaults-file
           short form: -F; type: string

           Only read mysql options from the given file.  You must give an absolute pathname.

       --dry-run
           Analyze, decide the sync algorithm to use, print and exit.

           Implies "--verbose" so you can see the results.  The results are in the same output
           format that you'll see from actually running the tool, but there will be zeros for
           rows affected.  This is because the tool actually executes, but stops before it
           compares any data and just returns zeros.  The zeros do not mean there are no changes
           to be made.

       --engines
           short form: -e; type: hash

           Sync only this comma-separated list of storage engines.

       --execute
           Execute queries to make the tables have identical data.

           This option makes pt-table-sync actually sync table data by executing all the queries
           that it created to resolve table differences.  Therefore, the tables will be changed!
           And unless you also specify "--verbose", the changes will be made silently.  If this
           is not what you want, see "--print" or "--dry-run".

       --explain-hosts
           Print connection information and exit.

           Print out a list of hosts to which pt-table-sync will connect, with all the various
           connection options, and exit.

       --float-precision
           type: int

           Precision for "FLOAT" and "DOUBLE" number-to-string conversion.  Causes FLOAT and
           DOUBLE values to be rounded to the specified number of digits after the decimal point,
           with the ROUND() function in MySQL.  This can help avoid checksum mismatches due to
           different floating-point representations of the same values on different MySQL
           versions and hardware.  The default is no rounding; the values are converted to
           strings by the CONCAT() function, and MySQL chooses the string representation.  If you
           specify a value of 2, for example, then the values 1.008 and 1.009 will be rounded to
           1.01, and will checksum as equal.

       --[no]foreign-key-checks
           default: yes

           Enable foreign key checks ("SET FOREIGN_KEY_CHECKS=1").

           Specifying "--no-foreign-key-checks" will "SET FOREIGN_KEY_CHECKS=0".

       --function
           type: string

           Which hash function you'd like to use for checksums.

           The default is "CRC32".  Other good choices include "MD5" and "SHA1".  If you have
           installed the "FNV_64" user-defined function, "pt-table-sync" will detect it and
           prefer to use it, because it is much faster than the built-ins.  You can also use
           MURMUR_HASH if you've installed that user-defined function.  Both of these are
           distributed with Percona Server.  See pt-table-checksum for more information and
           benchmarks.

       --help
           Show help and exit.

       --[no]hex-blob
           default: yes

           "HEX()" "BLOB", "TEXT" and "BINARY" columns.

           When row data from the source is fetched to create queries to sync the data (i.e. the
           queries seen with "--print" and executed by "--execute"), binary columns are wrapped
           in HEX() so the binary data does not produce an invalid SQL statement.  You can
           disable this option but you probably shouldn't.

       --host
           short form: -h; type: string

           Connect to host.

       --ignore-columns
           type: Hash

           Ignore this comma-separated list of column names in comparisons.

           This option causes columns not to be compared.  However, if a row is determined to
           differ between tables, all columns in that row will be synced, regardless.  (It is not
           currently possible to exclude columns from the sync process itself, only from the
           comparison.)

       --ignore-databases
           type: Hash

           Ignore this comma-separated list of databases.

           (system databases such as information_schema and performance_schema are ignored by
           default)

       --ignore-engines
           type: Hash; default: FEDERATED,MRG_MyISAM

           Ignore this comma-separated list of storage engines.

       --ignore-tables
           type: Hash

           Ignore this comma-separated list of tables.

           Table names may be qualified with the database name.

       --ignore-tables-regex
           type: string; group: Filter

           Ignore tables whose names match the Perl regex.

       --[no]index-hint
           default: yes

           Add FORCE/USE INDEX hints to the chunk and row queries.

           By default "pt-table-sync" adds a FORCE/USE INDEX hint to each SQL statement to coerce
           MySQL into using the index chosen by the sync algorithm or specified by
           "--chunk-index".  This is usually a good thing, but in rare cases the index may not be
           the best for the query so you can suppress the index hint by specifying
           "--no-index-hint" and let MySQL choose the index.

           This does not affect the queries printed by "--print"; it only affects the chunk and
           row queries that "pt-table-sync" uses to select and compare rows.

       --lock
           type: int

           Lock tables: 0=none, 1=per sync cycle, 2=per table, or 3=globally.

           This uses "LOCK TABLES".  This can help prevent tables being changed while you're
           examining them.  The possible values are as follows:

             VALUE  MEANING
             =====  =======================================================
             0      Never lock tables.
             1      Lock and unlock one time per sync cycle (as implemented
                    by the syncing algorithm).  This is the most granular
                    level of locking available.  For example, the Chunk
                    algorithm will lock each chunk of C<N> rows, and then
                    unlock them if they are the same on the source and the
                    destination, before moving on to the next chunk.
             2      Lock and unlock before and after each table.
             3      Lock and unlock once for every server (DSN) synced, with
                    C<FLUSH TABLES WITH READ LOCK>.

           A replication slave is never locked if "--replicate" or "--sync-to-master" is
           specified, since in theory locking the table on the master should prevent any changes
           from taking place.  (You are not changing data on your slave, right?)  If "--wait" is
           given, the master (source) is locked and then the tool waits for the slave to catch up
           to the master before continuing.

           If "--transaction" is specified, "LOCK TABLES" is not used.  Instead, lock and unlock
           are implemented by beginning and committing transactions.  The exception is if
           "--lock" is 3.

           If "--no-transaction" is specified, then "LOCK TABLES" is used for any value of
           "--lock". See "--[no]transaction".

       --lock-and-rename
           Lock the source and destination table, sync, then swap names.  This is useful as a
           less-blocking ALTER TABLE, once the tables are reasonably in sync with each other
           (which you may choose to accomplish via any number of means, including dump and reload
           or even something like pt-archiver).  It requires exactly two DSNs and assumes they
           are on the same server, so it does no waiting for replication or the like.  Tables are
           locked with LOCK TABLES.

       --password
           short form: -p; type: string

           Password to use when connecting.  If password contains commas they must be escaped
           with a backslash: "exam\,ple"

       --pid
           type: string

           Create the given PID file.  The tool won't start if the PID file already exists and
           the PID it contains is different than the current PID.  However, if the PID file
           exists and the PID it contains is no longer running, the tool will overwrite the PID
           file with the current PID.  The PID file is removed automatically when the tool exits.

       --port
           short form: -P; type: int

           Port number to use for connection.

       --print
           Print queries that will resolve differences.

           If you don't trust "pt-table-sync", or just want to see what it will do, this is a
           good way to be safe.  These queries are valid SQL and you can run them yourself if you
           want to sync the tables manually.

       --recursion-method
           type: array; default: processlist,hosts

           Preferred recursion method used to find slaves.

           Possible methods are:

             METHOD       USES
             ===========  ==================
             processlist  SHOW PROCESSLIST
             hosts        SHOW SLAVE HOSTS
             none         Do not find slaves

           The processlist method is preferred because SHOW SLAVE HOSTS is not reliable.
           However, the hosts method is required if the server uses a non-standard port (not
           3306).  Usually pt-table-sync does the right thing and finds the slaves, but you may
           give a preferred method and it will be used first.  If it doesn't find any slaves, the
           other methods will be tried.

       --replace
           Write all "INSERT" and "UPDATE" statements as "REPLACE".

           This is automatically switched on as needed when there are unique index violations.

       --replicate
           type: string

           Sync tables listed as different in this table.

           Specifies that "pt-table-sync" should examine the specified table to find data that
           differs.  The table is exactly the same as the argument of the same name to pt-table-
           checksum.  That is, it contains records of which tables (and ranges of values) differ
           between the master and slave.

           For each table and range of values that shows differences between the master and
           slave, "pt-table-checksum" will sync that table, with the appropriate "WHERE" clause,
           to its master.

           This automatically sets "--wait" to 60 and causes changes to be made on the master
           instead of the slave.

           If "--sync-to-master" is specified, the tool will assume the server you specified is
           the slave, and connect to the master as usual to sync.

           Otherwise, it will try to use "SHOW PROCESSLIST" to find slaves of the server you
           specified.  If it is unable to find any slaves via "SHOW PROCESSLIST", it will inspect
           "SHOW SLAVE HOSTS" instead.  You must configure each slave's "report-host",
           "report-port" and other options for this to work right.  After finding slaves, it will
           inspect the specified table on each slave to find data that needs to be synced, and
           sync it.

           The tool examines the master's copy of the table first, assuming that the master is
           potentially a slave as well.  Any table that shows differences there will NOT be
           synced on the slave(s).  For example, suppose your replication is set up as A->B,
           B->C, B->D.  Suppose you use this argument and specify server B.  The tool will
           examine server B's copy of the table.  If it looks like server B's data in table
           "test.tbl1" is different from server A's copy, the tool will not sync that table on
           servers C and D.

       --slave-user
           type: string

           Sets the user to be used to connect to the slaves.  This parameter allows you to have
           a different user with less privileges on the slaves but that user must exist on all
           slaves.

       --slave-password
           type: string

           Sets the password to be used to connect to the slaves.  It can be used with
           --slave-user and the password for the user must be the same on all slaves.

       --set-vars
           type: Array

           Set the MySQL variables in this comma-separated list of "variable=value" pairs.

           By default, the tool sets:

              wait_timeout=10000

           Variables specified on the command line override these defaults.  For example,
           specifying "--set-vars wait_timeout=500" overrides the defaultvalue of 10000.

           The tool prints a warning and continues if a variable cannot be set.

       --socket
           short form: -S; type: string

           Socket file to use for connection.

       --sync-to-master
           Treat the DSN as a slave and sync it to its master.

           Treat the server you specified as a slave.  Inspect "SHOW SLAVE STATUS", connect to
           the server's master, and treat the master as the source and the slave as the
           destination.  Causes changes to be made on the master.  Sets "--wait" to 60 by
           default, sets "--lock" to 1 by default, and disables "--[no]transaction" by default.
           See also "--replicate", which changes this option's behavior.

       --tables
           short form: -t; type: hash

           Sync only this comma-separated list of tables.

           Table names may be qualified with the database name.

       --timeout-ok
           Keep going if "--wait" fails.

           If you specify "--wait" and the slave doesn't catch up to the master's position before
           the wait times out, the default behavior is to abort.  This option makes the tool keep
           going anyway.  Warning: if you are trying to get a consistent comparison between the
           two servers, you probably don't want to keep going after a timeout.

       --[no]transaction
           Use transactions instead of "LOCK TABLES".

           The granularity of beginning and committing transactions is controlled by "--lock".
           This is enabled by default, but since "--lock" is disabled by default, it has no
           effect.

           Most options that enable locking also disable transactions by default, so if you want
           to use transactional locking (via "LOCK IN SHARE MODE" and "FOR UPDATE", you must
           specify "--transaction" explicitly.

           If you don't specify "--transaction" explicitly "pt-table-sync" will decide on a per-
           table basis whether to use transactions or table locks.  It currently uses
           transactions on InnoDB tables, and table locks on all others.

           If "--no-transaction" is specified, then "pt-table-sync" will not use transactions at
           all (not even for InnoDB tables) and locking is controlled by "--lock".

           When enabled, either explicitly or implicitly, the transaction isolation level is set
           "REPEATABLE READ" and transactions are started "WITH CONSISTENT SNAPSHOT".

       --trim
           "TRIM()" "VARCHAR" columns in "BIT_XOR" and "ACCUM" modes.  Helps when comparing MySQL
           4.1 to >= 5.0.

           This is useful when you don't care about the trailing space differences between MySQL
           versions which vary in their handling of trailing spaces. MySQL 5.0 and later all
           retain trailing spaces in "VARCHAR", while previous versions would remove them.

       --[no]unique-checks
           default: yes

           Enable unique key checks ("SET UNIQUE_CHECKS=1").

           Specifying "--no-unique-checks" will "SET UNIQUE_CHECKS=0".

       --user
           short form: -u; type: string

           User for login if not current user.

       --verbose
           short form: -v; cumulative: yes

           Print results of sync operations.

           See "OUTPUT" for more details about the output.

       --version
           Show version and exit.

       --[no]version-check
           default: yes

           Check for the latest version of Percona Toolkit, MySQL, and other programs.

           This is a standard "check for updates automatically" feature, with two additional
           features.  First, the tool checks its own version and also the versions of the
           following software: operating system, Percona Monitoring and Management (PMM), MySQL,
           Perl, MySQL driver for Perl (DBD::mysql), and Percona Toolkit. Second, it checks for
           and warns about versions with known problems. For example, MySQL 5.5.25 had a critical
           bug and was re-released as 5.5.25a.

           A secure connection to Percona’s Version Check database server is done to perform
           these checks. Each request is logged by the server, including software version numbers
           and unique ID of the checked system. The ID is generated by the Percona Toolkit
           installation script or when the Version Check database call is done for the first
           time.

           Any updates or known problems are printed to STDOUT before the tool's normal output.
           This feature should never interfere with the normal operation of the tool.

           For more information, visit
           <https://www.percona.com/doc/percona-toolkit/LATEST/version-check.html>.

       --wait
           short form: -w; type: time

           How long to wait for slaves to catch up to their master.

           Make the master wait for the slave to catch up in replication before comparing the
           tables.  The value is the number of seconds to wait before timing out (see also
           "--timeout-ok").  Sets "--lock" to 1 and "--[no]transaction" to 0 by default.  If you
           see an error such as the following,

             MASTER_POS_WAIT returned -1

           It means the timeout was exceeded and you need to increase it.

           The default value of this option is influenced by other options.  To see what value is
           in effect, run with "--help".

           To disable waiting entirely (except for locks), specify "--wait" 0.  This helps when
           the slave is lagging on tables that are not being synced.

       --where
           type: string

           "WHERE" clause to restrict syncing to part of the table.

       --[no]zero-chunk
           default: yes

           Add a chunk for rows with zero or zero-equivalent values.  The only has an effect when
           "--chunk-size" is specified.  The purpose of the zero chunk is to capture a
           potentially large number of zero values that would imbalance the size of the first
           chunk.  For example, if a lot of negative numbers were inserted into an unsigned
           integer column causing them to be stored as zeros, then these zero values are captured
           by the zero chunk instead of the first chunk and all its non-zero values.

DSN OPTIONS

       These DSN options are used to create a DSN.  Each option is given like "option=value".
       The options are case-sensitive, so P and p are not the same option.  There cannot be
       whitespace before or after the "=" and if the value contains whitespace it must be quoted.
       DSN options are comma-separated.  See the percona-toolkit manpage for full details.

       •   A

           dsn: charset; copy: yes

           Default character set.

       •   D

           dsn: database; copy: yes

           Database containing the table to be synced.

       •   F

           dsn: mysql_read_default_file; copy: yes

           Only read default options from the given file

       •   h

           dsn: host; copy: yes

           Connect to host.

       •   p

           dsn: password; copy: yes

           Password to use when connecting.  If password contains commas they must be escaped
           with a backslash: "exam\,ple"

       •   P

           dsn: port; copy: yes

           Port number to use for connection.

       •   S

           dsn: mysql_socket; copy: yes

           Socket file to use for connection.

       •   t

           copy: yes

           Table to be synced.

       •   u

           dsn: user; copy: yes

           User for login if not current user.

ENVIRONMENT

       The environment variable "PTDEBUG" enables verbose debugging output to STDERR.  To enable
       debugging and capture all output to a file, run the tool like:

          PTDEBUG=1 pt-table-sync ... > FILE 2>&1

       Be careful: debugging output is voluminous and can generate several megabytes of output.

SYSTEM REQUIREMENTS

       You need Perl, DBI, DBD::mysql, and some core packages that ought to be installed in any
       reasonably new version of Perl.

BUGS

       For a list of known bugs, see <http://www.percona.com/bugs/pt-table-sync>.

       Please report bugs at <https://jira.percona.com/projects/PT>.  Include the following
       information in your bug report:

       •   Complete command-line used to run the tool

       •   Tool "--version"

       •   MySQL version of all servers involved

       •   Output from the tool including STDERR

       •   Input files (log/dump/config files, etc.)

       If possible, include debugging output by running the tool with "PTDEBUG"; see
       "ENVIRONMENT".

DOWNLOADING

       Visit <http://www.percona.com/software/percona-toolkit/> to download the latest release of
       Percona Toolkit.  Or, get the latest release from the command line:

          wget percona.com/get/percona-toolkit.tar.gz

          wget percona.com/get/percona-toolkit.rpm

          wget percona.com/get/percona-toolkit.deb

       You can also get individual tools from the latest release:

          wget percona.com/get/TOOL

       Replace "TOOL" with the name of any tool.

AUTHORS

       Baron Schwartz

ACKNOWLEDGMENTS

       My work is based in part on Giuseppe Maxia's work on distributed databases,
       <http://www.sysadminmag.com/articles/2004/0408/> and code derived from that article.
       There is more explanation, and a link to the code, at
       <http://www.perlmonks.org/?node_id=381053>.

       Another programmer extended Maxia's work even further.  Fabien Coelho changed and
       generalized Maxia's technique, introducing symmetry and avoiding some problems that might
       have caused too-frequent checksum collisions.  This work grew into pg_comparator,
       <http://www.coelho.net/pg_comparator/>.  Coelho also explained the technique further in a
       paper titled "Remote Comparison of Database Tables"
       (<http://cri.ensmp.fr/classement/doc/A-375.pdf>).

       This existing literature mostly addressed how to find the differences between the tables,
       not how to resolve them once found.  I needed a tool that would not only find them
       efficiently, but would then resolve them.  I first began thinking about how to improve the
       technique further with my article <http://tinyurl.com/mysql-data-diff-algorithm>, where I
       discussed a number of problems with the Maxia/Coelho "bottom-up" algorithm.  After writing
       that article, I began to write this tool.  I wanted to actually implement their algorithm
       with some improvements so I was sure I understood it completely.  I discovered it is not
       what I thought it was, and is considerably more complex than it appeared to me at first.
       Fabien Coelho was kind enough to address some questions over email.

       The first versions of this tool implemented a version of the Coelho/Maxia algorithm, which
       I called "bottom-up", and my own, which I called "top-down."  Those algorithms are
       considerably more complex than the current algorithms and I have removed them from this
       tool, and may add them back later.  The improvements to the bottom-up algorithm are my
       original work, as is the top-down algorithm.  The techniques to actually resolve the
       differences are also my own work.

       Another tool that can synchronize tables is the SQLyog Job Agent from webyog.  Thanks to
       Rohit Nadhani, SJA's author, for the conversations about the general techniques.  There is
       a comparison of pt-table-sync and SJA at <http://tinyurl.com/maatkit-vs-sqlyog>

       Thanks to the following people and organizations for helping in many ways:

       The Rimm-Kaufman Group <http://www.rimmkaufman.com/>, MySQL AB <http://www.mysql.com/>,
       Blue Ridge InternetWorks <http://www.briworks.com/>, Percona <http://www.percona.com/>,
       Fabien Coelho, Giuseppe Maxia and others at MySQL AB, Kristian Koehntopp (MySQL AB), Rohit
       Nadhani (WebYog), The helpful monks at Perlmonks, And others too numerous to mention.

ABOUT PERCONA TOOLKIT

       This tool is part of Percona Toolkit, a collection of advanced command-line tools for
       MySQL developed by Percona.  Percona Toolkit was forked from two projects in June, 2011:
       Maatkit and Aspersa.  Those projects were created by Baron Schwartz and primarily
       developed by him and Daniel Nichter.  Visit <http://www.percona.com/software/> to learn
       about other free, open-source software from Percona.

COPYRIGHT, LICENSE, AND WARRANTY

       This program is copyright 2011-2018 Percona LLC and/or its affiliates, 2007-2011 Baron
       Schwartz.

       THIS PROGRAM IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING,
       WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
       PURPOSE.

       This program is free software; you can redistribute it and/or modify it under the terms of
       the GNU General Public License as published by the Free Software Foundation, version 2; OR
       the Perl Artistic License.  On UNIX and similar systems, you can issue `man perlgpl' or
       `man perlartistic' to read these licenses.

       You should have received a copy of the GNU General Public License along with this program;
       if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
       MA  02111-1307  USA.

VERSION

       pt-table-sync 3.2.1

POD ERRORS

       Hey! The above document had some coding errors, which are explained below:

       Around line 12823:
           Non-ASCII character seen before =encoding in 'Percona’s'. Assuming UTF-8