Provided by: makepp_2.0.98.5-2.1_all bug

NAME

       makepp_signatures -- How makepp knows when files have changed

DESCRIPTION

       C: C,
         c_compilation_md5,  M: "md5",  P: "plain",  S: "shared_object",  X: "xml",
         xml_space

       Each file is associated with a signature, which is a string that changes if the file has
       changed.  Makepp compares signatures to see whether it needs to rebuild anything.  The
       default signature for files is a concatenation of the file's modification time and its
       size, unless you're executing a C/C++ compilation command, in which case the default
       signature is a cryptographic checksum on the file's contents, ignoring comments and
       whitespace.  If you want, you can switch to a different method, or you can define your own
       signature functions.

       How the signature is actually used is controlled by the build check method (see
       makepp_build_check).  Normally, if a file's signature changes, the file itself is
       considered to have changed, and makepp forces a rebuild.

       If makepp is building a file, and you don't think it should be, you might want to check
       the build log (see makepplog).  Makepp writes an explanation of what it thought each file
       depended on, and why it chose to rebuild.

       There are several signature methods included in makepp.  Makepp usually picks the most
       appropriate standard one automatically.  However, you can change the signature method for
       an individual rule by using ":signature" modifier on the rule which depends on the files
       you want to check, or for all rules in a makefile by using the "signature" statement, or
       for all makefiles at once using the "-m" or "--signature-method" command line option.

   Mpp::Signature methods included in the distribution
       plain (actually nameless)
           The plain signature method is the file's modification time and the file's size,
           concatenated.  These values are quickly obtainable from the operating system and
           almost always change when the file changes.  For symlinks it uses the values of the
           linkee.  If there is no linkee, i.e. it's a dangling symlink, then it uses its own
           values, but prepends a 0 to mark the fact.

           Makepp used to look only at the file's modification time, but if you run makepp
           several times within a second (e.g., in a script that's building several small
           things), sometimes modification times won't change.  Then, hopefully the file's size
           will change.

           If the case where you may run makepp several times a second is a problem for you, you
           may find that using the "md5" method is somewhat more reliable.  If makepp builds a
           file, it flushes its cached MD5 signatures even if the file's date hasn't changed.

           For efficiency's sake, makepp won't reread the file and recompute the complex
           signatures below if this plain signature hasn't changed since the last time it
           computed it.  This can theoretically cause a problem, since it's possible to change
           the file's contents without changing its date and size.  In practice, this is quite
           hard to do so it's not a serious danger.  In the future, as more filesystems switch to
           timestamps of under a second, hopefully Perl will give us access to this info, making
           this failsafe.

       C
       c_compilation_md5
           This is the method for input files to C like compilers.  It checks if a file's name
           looks like C or C++ source code, including things like Corba IDL.  If it does, this
           method applies.  If it doesn't, it falls back to plain signatures for binary files
           (determined by name or else by content) and else to "md5".

           The idea is to be independent of formatting changes.  This is done by pulling
           everything up as far as possible, and by eliminating insignificant spaces.  Words are
           exempt from pulling up, since they might be macros containing "__LINE__", so they
           remain on the line where they were.

               // ignored comment

               #ifdef XYZ
                   #include <xyz.h>
               #endif

               int a = 1;

               #line 20
               void f
               (
                   int b
               )
               {
                   a += b + ++c;
               }

                   /* more ignored comment */

           is treated as though it were

               #ifdef XYZ
               #include<xyz.h>
               #endif

               int a=1;
               #line 20
               void f(

               int b){

               a+=b+ ++c;}

           That way you can reindent your code or add or change comments without triggering a
           rebuild, so long as you don't change the line numbers.  (This signature method
           recompiles if line numbers have changed because that causes calls to "__LINE__" and
           most debugging information to change.)  It also ignores whitespace and comments after
           the last token.  This is useful for preventing a useless rebuild if your VC adds lines
           at a "$""Log$" tag when checking in.

           This method is particularly useful for the following situations:

           •   You want to make changes to the comments in a commonly included header file, or
               you want to reformat or reindent part of it.  For one project that I worked on a
               long time ago, we were very unwilling to correct inaccurate comments in a common
               header file, even when they were seriously misleading, because doing so would
               trigger several hours of rebuilds.  With this signature method, this is no longer
               a problem.

           •   You like to save your files often, and your editor (unlike emacs) will happily
               write a new copy out even if nothing has changed.

           •   You have C/C++ source files which are generated automatically by other build
               commands (e.g., yacc or some other preprocessor).  For one system I work with, we
               have a preprocessor which (like yacc) produces two output files, a ".cxx" and a
               ".h" file:

                   %.h %.cxx: %.qtdlg $(HLIB)/Qt/qt_dialog_generator
                       $(HLIB)/Qt/qt_dialog_generator $(input)

               Every time the input file changed, the resulting .h file also was rewritten, and
               ordinarily this would trigger a rebuild of everything that included it.  However,
               most of the time the contents of the .h file didn't actually change (except for a
               comment about the build time written by the preprocessor), so a recompilation was
               not actually necessary.

           Actually in practice this saves less recompiles than you'd hope for, because mere
           comment changes often add lines.  In order for logging with "__LINE__" or the debugger
           to match your source, this requires recompilation.  So this signature is specially
           useless for the "tangle" family of tools from literate programming, where your code
           resides in some bigger file and even changes to a documentation section irrelevant to
           code will be reflected in the extracted source via a "#line" directive.

           If you can live with wrong line numbers during development, you can set the variable
           "makepp_signature_C_flat" (with an uppercase C) to some true value (like 1).  Then,
           whereas the compiler still sees the real file, the above example will be flattened for
           signing as:

               #ifdef XYZ
               #include<xyz.h>
               #endif
               int a=1;void f(int b){a+=b+ ++c;}

           Note that signatures are only recalculated when files change.  So you can build for
           everyone in a repository without this option, and those who want the option can set it
           when building in their sandbox.  When they first locally change a file, even only
           trivially, that will cause a recompilation, because with this option a totally
           different signature is calculated.  But then they can reformat the file as much as
           they want without further recompilation.

           The opposite is also true: Just omitting this option after it was set and recompiling
           will not fix your line numbers.  So, if line numbers matter, don't do a production
           build in the same sandbox without cleaning first.

       md5 This is the default method, for files not recognized by the "C" method.  Computes an
           MD5 checksum of the file's contents, rather than looking at the file's date or size.
           This means that if you change the date on the file but don't change its contents,
           makepp won't try to rebuild anything that depends on it.

           This is particularly useful if you have some file which is often regenerated during
           the build process that other files depend on, but which usually doesn't actually
           change.  If you use the "md5" signature checking method, makepp will realize that the
           file's contents haven't changed even if the file's date has changed.  (Of course, this
           won't help if the files have a timestamp written inside of them, as archive files do
           for example.)

       shared_object
           This method only works if you have the utility "nm" in your path, and it accepts the
           "-P" option to output Posix format.  In that case only the names and types of symbols
           in dynamically loaded libraries become part of their signature.  The result is that
           you can change the coding of functions without having to relink the programs that use
           them.

           In the following command the parser will detect an implicit dependency on
           $(LIBDIR)/libmylib.so, and build it if necessary.  However the link command will only
           be reperformed whenever the library exports a different set of symbols:

               myprog: $(OBJECTS) :signature shared_object
                   $(LD) -L$(LIBDIR) -lmylib $(inputs) -o $(output)

           This works as long as the functions' interfaces don't change.  But in that case you'd
           change the declaration, so you'd also need to change the callers.

           Note that this method only applies to files whose name looks like a shared library.
           For all other files it falls back to "c_compilation_md5", which may in turn fall back
           to others.

       xml
       xml_space
           These are two similar methods which treat xml canonically and differ only in their
           handling of whitespace.  The first completely ignores it around tags and considers it
           like a single space elsewhere, making the signature immune to formatting changes.  The
           second respects any whitespace in the xml, which is necessary even if just a small
           part requires that, like a "<pre>" section in an xhtml document.

           Common to both methods is that they sign the essence of each xml document.  Presence
           or not of a BOM or "<?xml?>" header is ignored.  Comments are ignored, as is whether
           text is protected as "CDATA" or with entities.  Order and quoting style of attributes
           doesn't matter, nor does how you render empty tags.

           For any file which is not valid xml, or if the Expat based "XML::Parser" or the
           "XML::LibXML" parser is not installed, this falls back to method md5.  If you switch
           your Perl installation from one of the parsers to the others, makepp will think the
           files are different as soon as their timestamp changes.  This is because the result of
           either parser is logically equivalent, but they produce different signatures.  In the
           unlikely case that this is a problem, you can force use of only "XML::LibXML" by
           setting in Perl:

               $Mpp::Signature::xml::libxml = 1;

   Extending applicability
       The "C" or "c_compilation_md5" method has a built in list of suffixes it recognizes as
       being C or C-like.  If it gets applied to other files it falls back to simpler signature
       methods.  But many file types are syntactically close enough to C++ for this method to be
       useful.  Close enough means C++ comment and string syntax and whitespace is meaningless
       except one space between words (and C++'s problem cases "- -", "+ +", "/ *" and "< <").

       It (and its subclasses) can now easily be extended to other suffixes.  Anyplace you can
       specify a signature you can now tack on one one of these syntaxes to make the method
       accept additional filenames:

       C.suffix1,suffix2,suffix3
           One or more comma-separated suffixes can be added to the method by a colon.  For
           example "C.ipp,tpp" means that besides the built in suffixes it will also apply to
           files ending in .ipp or .tpp, which you might be using for the inline and template
           part of C++ headers.

       C.(suffix-regexp)
           This is like the previous, but instead of enumerating suffixes, you give a Perl
           regular expression to match the ones you want.  The previous example would be
           "C.(ipp|tpp)" or "C.([it]pp)" in this syntax.

       C(regexp)
           Without a dot the Perl regular expression can match anywhere in the file name.  If it
           includes a slash, it will be tried against the fully qualified filename, otherwise
           only against the last part, without any directory.  So if you have C++ style
           suffixless headers in a directory include, use "C(include/)" as your signature method.
           However the above suffix example would be quite nasty this way, "C(\.(?:ipp|tpp)$$)"
           or "C(\.[it]pp$$)" because "$" is the expansion character in makefiles.

   Shortcomings
       Signature methods apply to all files of a rule.  Now if you have a compiler that takes a C
       like source code and an XML configuration file you'd either need a combined signature
       method that smartly handles both file types, or you must choose an existing method which
       will not know whether a change in the other file is significant.

       In the future signature method configuration may be changed to filename-pattern,
       optionally per command.

   Custom methods
       You can, if you want, define your own methods for calculating file signatures and
       comparing them.  You will need to write a Perl module to do this.  Have a look at the
       comments in "Mpp/Signature.pm" in the distribution, and also at the existing signature
       algorithms in "Mpp/Signature/*.pm" for details.

       Here are some cases where you might want a custom signature method:

       •   When you want all changes in a file to be ignored.  Say you always want dateStamp.o to
           be a dependency (to force a rebuild), but you don't want to rebuild if only
           dateStamp.o has changed.  You could define a signature method that inherits from
           "c_compilation_md5" that recognizes the dateStamp.o file by its name, and always
           returns a constant value for that file.

       •   When you want to ignore part of a file.  Suppose that you have a program that
           generates a file that has a date stamp in it, but you don't want to recompile if only
           the date stamp has changed.  Just define a signature method similar to
           "c_compilation_md5" that understands your file format and skips the parts you don't
           want to take into account.