Provided by: reposurgeon_4.38-2_amd64 bug

NAME

       repocutter - surgical and filtering operations on Subversion dump files

SYNOPSIS

       repocutter [-q] [-d n] [-i 'filename'] [-r 'selection'] 'subcommand'

DESCRIPTION

       This program does surgical and filtering operations on Subversion dump files. While it is
       is not as flexible as reposurgeon(1), it can perform Subversion-specific transformations
       that reposurgeon cannot, and can be useful for processing Subversion repositories into a
       form suitable for conversion. Also, it supports the version 3 dumpfile format, which
       reposurgeon does not.

       In most commands, the -r (or --range) option limits the selection of revisions over which
       an operation will be performed. Usually other revisions will be passed through unaltered,
       except in the select and deselect commands for which the option controls which revisions
       will be passed through. A selection consists of one or more comma-separated ranges. A
       range may consist of an integer revision number or the special name HEAD for the head
       revision. Or it may be a colon-separated pair of integers, or an integer followed by a
       colon followed by HEAD.

       If the output stream contains copyfrom references to missing revisions, repocutter
       silently patch each copysources by stepping it backwards to the most recent previous
       version that exists.

       (Older versions of this tool, before 4.30, treated -r as an implied selection filter
       rather than passing through unselected revisions unaltered. If you have old scripts using
       repocutter they may need modification.)

       Normally, each subcommand produces a progress spinner on standard error; each turn means
       another revision has been filtered. The -q (or --quiet) option suppresses this. Quiet mode
       is set when output is redirected to a file or pipe.

       The -d option enables debug messages on standard error. It takes an integer debug level.
       These messages are probably only of interest to repocutter developers.

       The -i option sets the input source to a specified filename. This is primarily useful when
       running the program under a debugger. When this option is not present the program expects
       to read a stream from standard input.

       Generally, if you need to use this program at all, you will find that you need to pipe
       your dump file through multiple instances of it doing one kind of operation each. This is
       not as expensive as it sounds; with the exception of the reduce subcommand, the working
       set of this program is bounded by the size of the the largest single blob plus its
       metadata. It does not need to hold the entire repo metadata in memory.

       The -f/-fixed option disables regexp compilation of PATTERN arguments, treating them as
       literal strings.

       The -t option sets a tag to be included in error and warning messages. This will be useful
       for determining which stage of a multistage repocutter pipeline failed.

       There are a few other command-specific options described under individual commands.

       In the command descriptions, PATTERN arguments are regular expressions to match pathnames,
       constrained so that each match must be a path segment or a sequence of path segments; that
       is, the left end must be either at the start of path or immediately following a /, and the
       right end must precede a / or be at end of string. With a leading ^ the match is
       constrained to be a leading sequence of the pathname; with a trailing $, a trailing one.

       The following subcommands are available:

       select
           The 'select' subcommand selects a range and permits only revisions and nodes in that
           range to pass to standard output. A range beginning with 0 includes the dumpfile
           header. Mergeinfo properties in all revisions are updated so they no longer refer to
           omitted revisions.

           Warning::valid dump that can be read by reposurgeon. In particular, it may delete a
           revision that is referenced in a later copy-from operation, which will crash
           reposurgeon.

       deselect
           The 'deselect' subcommand selects a range and permits only revisions and nodes NOT in
           that range to pass to standard output. Any mergeinfo properties in other revisions are
           updated so they no longer refer to dropped revisions.

           Warning::valid dump that can be read by reposurgeon. In particular, it may delete a
           revision that is referenced in a later copy-from operation, which will crash
           reposurgeon.

       see
           Render a very condensed report on the repository node structure, mainly useful for
           examining strange and pathological repositories. File content is ignored. You get one
           line per repository operation, reporting the revision, operation type, file path, and
           the copy source (if any). Directory paths are distinguished by a trailing slash. The
           'copy' operation is really an 'add' with a directory source and target; the display
           name is changed to make them easier to see. This report can be restricted by a
           selection set.

       renumber
           Renumber all revisions, patching Node-copyfrom headers as required. Any selection
           option is ignored. Takes no arguments. The -b option can be used to set the base to
           renumber from, defaulting to 0.

       count
           The 'count' subcommand lists the last revision number in the input stream. This is
           normally the revision count, buut may not if the stream has omitted revisions.

       log
           Generate a log report, same format as the output of svn log on a repository, to
           standard output.

       setlog
           Replace the log entries in the input dumpfile with the corresponding entries in the
           LOGFILE, which should be in the format of an svn log output. Replacements may be
           restricted to a specified range.

       propdel
           Delete the property PROPNAME. May be restricted by a revision selection. You may
           specify multiple properties to be deleted.

       proprename
           Rename the property OLDNAME to NEWNAME. May be restricted by a revision selection. You
           may specify multiple properties to be renamed.

       propset
           Set the property PROPNAME to PROPVAL.

           May be restricted by a revision selection. Note that specifying only a revision will
           cause the property  to be seet on the revision properties and on all nodes in the
           rtevision; you’ll probably want to specify a node index.

           You may specify multiple property settings.

       propclean
           Every path with a suffix matching one of SUFFIXES gets a property turned off. The
           default property is svn::Another property may be set with the -p option.

       expunge
           Delete all operations with Node-path or Node-copyfrom-path headers matching specified
           Golang regular expressions (opposite of 'sift'). Any revision left with no Node
           records after this filtering has its Revision record dropped as well. Mergeinfo
           properties in all revisions are updated so they no longer refer to dropped revisions.

           Warning::valid dump that can be read by reposurgeon. In particular, it may delete a
           revision that is referenced in a later copy-from operation, which will crash
           reposurgeon.

       sift
           Delete all operations with either Node-path or Node-copyfrom-path headers not matching
           specified Golang regular expressions (opposite of 'expunge'). Any revision left with
           no Node records after this filtering has its Revision record removed as well.
           Mergeinfo properties in all revisions are updated so they no longer refer to dropped
           revisions.

           This transform can be restricted by a selection set.

           Warning::valid dump that can be read by reposurgeon. In particular, it may delete a
           revision that is referenced in a later copy-from operation, which will crash
           reposurgeon.

       closure
           The 'closure' subcommand computes the transitive closure of a path set under the
           relation 'copies from' - that is, with the smallest set of additional paths such that
           every copy-from source is in the set.

       pathlist
           List all distinct node-paths in the stream, once each, in the order first encountered.

       pathrename
           Modify Node-path headers, Node-copyfrom-path headers, and svn::expression FROM;
           replace with TO. TO may contain Golang-style backreferences (${1}, ${2} etc - curly
           brackets not optional) to parenthesized portions of FROM.

           Matches are constrained so that each match must be a path segment or a sequence of
           path segments; that is, the left end must be either at the start of path or
           immediately following a /, and the right end must precede a / or be at end of string.
           With a leading ^ the match is constrained to be a leading sequence of the pathname;
           with a trailing $, a trailing one.

           Multiple FROM/TO pairs may be specified and are applied in order. This transform can
           be restricted by a selection set.

           All mergeinfo properties are updated in accordance with the path renames,

       setpath
           In the specified revisions, replace the Node-path with the specified PATH. Does not
           alter mergeinfo properties as a side effect.

       setcopyfrom
           In the specified revisions, replace the Node-copyfrom-path with the specified PATH.
           Does not alter mergeinfo properties as a side effect. Terminates with error if any
           selected node is not a copy.

       pop
           Pop initial segment off each path matching PATTERN - by default, all paths.

           May be useful after a sift command to turn a dump from a subproject stripped from a
           dump for a multiple-project repository into the normal form with trunk/tags/branches
           at the top level.

           This transform cannot be restricted by a selection set, as it is not possible to
           guarantee that copyfro paths and mergeinfo properties will be modified consistently in
           the presence of that kind of restriction.

           Mergeinfo properties in all revisions are updated, as well as path and copyfrom parts.

       push
           Push an initial segment onto each matching path. Normally used to add a "trunk" prefix
           to every path in a flat repository. The -s option can be used rton set a different
           initial segment.

           This transform cannot be restricted by a selection set, as it is not possible to
           guarantee that copyfro paths and mergeinfo properties will be modified consistently in
           the presence of that kind of restriction.

           Mergeinfo properties in all revisions are updated toi refer to the new pathnames.

       filecopy
           For each node in the revision range, stash the current version of the node-path’s
           content. For each later file copy operation with that source, replace the file copy
           with an explicit add/change using the stashed content.

           You can use this operation to sever links from obsolete branches or non-conformable
           directories in a multiproject repository so the unwanted content can be expunged
           without changing the content of later revisions.

           If a PATTERN argument is provided, only replace copies with an explicit add/change
           when the source node path matches PATTERN.

           With the -n flag, only the basename is required to match PATTERN if it is provided.
           Otherwise, with -n and no PATTERN, require a match of source to target on basename
           only rather than the full path. This may be required in order to extract filecopies
           from branches.

           Restricting the range holds down the memory requirement of this tool, which in the
           worst (and default) 1:$ case will keep a copy of every blob in the repository until
           it’s done processing the stream.

       skipcopy
           Replace the source revision and path of a copy at the upper end of the selection with
           the source revisions and path of a copy at the lower end. Fails unless both revisions
           are copies. Used to remove an unwanted intermediate copy or copies, cleaning up the
           history.

       swap
           Swap the top two elements of each pathname in every revision in the selection set.
           Useful following a sift operation for straightening out a common form of multi-project
           repository. If a PATTERN argument is given, only paths matching it are swapped.

       swapsvn
           Like swap, but is aware of Subversion structure. Used for transforming multiproject
           repositories into a standard layout with trunk, tags, and branches at the top level.

           Fires when the second component of a matching path is "trunk", "branches", or "tags",
           or the path consists of a single segment that is a top-level project directory; passes
           through all paths for this is not so unaltered.

           Top-level project directories with properties or comments make this command die
           (return status 1) with an error message on stderr; otherwise these directories are
           silently discarded.

           Otherwise, swaps "trunk" and the top-level (project) directory straight up. For tags
           and branches, the following two components are swapped to the top. thus,
           "foo/branches/release23" becomes "branches/release23/foo", putting the project
           directory beneath the branch.

           Also fires when an entire project directory is copied; this is transformed into a copy
           of trunk and copies of each subbranch and tag that exists.

           After the swap, there are attempts to recognize spans of copies into branch
           directories, and copies into tag subdirectories that are parallel in all top-level
           (project) directories. These are coalesced into single copies in the inverted
           structure. No attempts is made to coalesce deletes; the user must manually trim
           unneeded branches.

           Accordingly, copies with three-segment sources and three-segment targets are
           transformed; for tags/ and branches/ paths the last segment (the subdirectory below
           the branch name) is dropped, Following copies are skipped.

           This has two minor negative consequences. One is that metadata belonging to all
           deletes or copies after the first one in a coalesced span is lost. The other is that
           branches and tags local to individual project directories are promoted to global
           branches and tags across the entire transformed repository; no content is lost this
           way.

           Parallel rename sequences are also coalesced.

           If a PATTERN argument is given, only paths matching the pattern are swapped.

           Note that the result of swapping does not have initial trunk/branches/tags directory
           creations and can thus not be fed directly to svnload. reposurgeon copes with this,
           but Subversion will not.

           Merfeinfo propertied are updated to use the swapped path names.

           This transform can be restricted by a selection set.

       swapcheck
           List directory prefixes of anomalous paths that would confuse swapsvn. This includes
           any single-segment path other than trunk/tags/branches or a project copy operation,
           any path with two or more segments in which the second is not trunk/tags/branches, and
           any path in which trunk/tags/branches occurs more than one segment down from the root.

           Each report line has two fields; the first is the earliest revision containing a path
           with the prefix given, and the second is the prefix. Once a particular path prefix has
           been recognized and reported as anomalous, later paths with that prefix are not
           reported.

           If feeding a Subversion dump to this subcommand doesn’t produce an empty report, you
           can expect swapsvn to produce an invalid dump that will confuse and possibly crash
           reposurgeon. The remedy for this is a set of pathrenames and/or deselections that
           yields paths conformable to being swapped into a regular Subversion structure.

       replace
           Perform a regular expression search/replace on blob content. The first character of
           the argument (normally /) is treated as the end delimiter for the regular-expression
           and replacement parts. This transform can be restricted by a selection set.

       strip
           Replace content with unique generated cookies on all node paths matching the specified
           regular expressions; if no expressions are given, match all paths.

           This command is useful for reducing the bulk of a stream without touching its
           metadata, so you can do test conversions more quickly.

       hash
           Replace content with hash on all node paths matching the specified regular
           expressions; if no expressions are given, match all paths.

       obscure
           Replace path segments and committer IDs with arbitrary but consistent names in order
           to obscure them. The replacement algorithm is tuned to make the replacements readily
           distinguishable by eyeball. This transform can be restricted by a selection set.

       reduce
           Strip revisions out of a dump so the only parts left those likely to be relevant to a
           conversion problem. This is done by dropping every node that consists of a change on a
           file and has no property settings. Mergeinfo properties in all revisions are updated
           so they no longer refer to dropped revisions.

       testify
           Replace commit timestamps with a monotonically increasing clock tick starting at the
           Unix epoch and advancing by 10 seconds per commit. Replace all attributions with
           'fred'. Discard the repository UUID. Use this to neutralize procedurally-generated
           streams so they can be compared. This transform can be restricted by a selection set.

       count
           Set the debug level to the specified value on the selected revisions. Setting
           debugging enables diagnostics to standard error, and suppresses the progress baton for
           the entire run in order not to step on any diagnostics that might be emitted.

           For the meaning of the debug levels, see the source code. This option is probably only
           of interest to repocutter developers.

       version
           Report major and minor repocutter version.

HISTORY

       Under the name "svncutter", an ancestor of this program traveled in the 'contrib/'
       director of the Subversion distribution. It had functional overlap with reposurgeon(1)
       because it was directly ancestral to that code. It was moved to the reposurgeon(1)
       distribution in January 2016. This program was ported from Python to Go in August 2018, at
       which time the obsolete "squash" command was retired. The syntax of regular expressions in
       the pathrename command changed at that time.

       The reason for the partial functional overlap between repocutter and reposurgeon is that
       repocutter was first written earlier and became a testbed for some of the design concepts
       in reposurgeon. After reposurgeon was written, the author learned that it could not
       naturally support some useful operations very specific to Subversion, and enhanced
       repocutter to do those.

RETURN VALUES

       Normally 0. Can be 1 if repocutter sees an ill-formed dump, or if the output stream
       contains any copyfrom references to missing revisions.

BUGS

       There is one regression since the Python version: repocutter no longer recognizes
       Macintosh-style line endings consisting of a carriage return only. This may be addressed
       in a future version.

SEE ALSO

       reposurgeon(1).

EXAMPLE

       Suppose you have a Subversion repository with the following semi-pathological structure:

           Directory1/ (with unrelated content)
           Directory2/ (with unrelated content)
           TheDirIWantToMigrate/
                           branches/
                                          crazy-feature/
                                                          UnrelatedApp1/
                                                          TheAppIWantToMigrate/
                           tags/
                                          v1.001/
                                                          UnrelatedApp1/
                                                          UnrelatedApp2/
                                                          TheAppIWantToMigrate/
                           trunk/
                                          UnrelatedApp1/
                                          UnrelatedApp2/
                                          TheAppIWantToMigrate/

       You want to transform the dump file so that TheAppIWantToMigrate can be subject to a
       regular branchy lift. A way to dissect out the code of interest would be with the
       following series of filters applied:

           repocutter expunge '^Directory1' '^Directory2'
           repocutter pathrename '^TheDirIWantToMigrate/' ''
           repocutter expunge '^branches/crazy-feature/UnrelatedApp1/
           repocutter pathrename 'branches/crazy-feature/TheAppIWantToMigrate/' 'branches/crazy-feature/'
           repocutter expunge '^tags/v1.001/UnrelatedApp1/'
           repocutter expunge '^tags/v1.001/UnrelatedApp2/'
           repocutter pathrename '^tags/v1.001/TheAppIWantToMigrate/' 'tags/v1.001/'
           repocutter expunge '^trunk/UnrelatedApp1/'
           repocutter expunge '^trunk/UnrelatedApp2/'
           repocutter pathrename '^trunk/TheAppIWantToMigrate/' 'trunk/'

LIMITATIONS

       The sift and expunge operations can produce output dumps that are invalid. The problem is
       copyfrom operations (Subversion branch and tag creations). If an included revision
       includes a copyfrom reference to an excluded one, the reference target won’t be in the
       emitted dump; it won’t load correctly in Subversion, and while reposurgeon has fallback
       logic that backs down to the latest existing revision before the kissing one this
       expedient is fragile. The revision number in a copyfrom header pointing to a missing
       revision will be zero. Attempts to be clever about this won’t work; the problem is
       inherent in the data model of Subversion.

AUTHOR

       Eric S. Raymond esr@thyrsus.com. This tool is distributed with reposurgeon; see the
       project page <http://www.catb.org/~esr/reposurgeon>.

                                            2024-11-05                              REPOCUTTER(1)