Provided by: libpcre2-dev_10.42-1_amd64 bug

NAME

       PCRE2 - Perl-compatible regular expressions (revised API)

SAVING AND RE-USING PRECOMPILED PCRE2 PATTERNS


       int32_t pcre2_serialize_decode(pcre2_code **codes,
         int32_t number_of_codes, const uint8_t *bytes,
         pcre2_general_context *gcontext);

       int32_t pcre2_serialize_encode(const pcre2_code **codes,
         int32_t number_of_codes, uint8_t **serialized_bytes,
         PCRE2_SIZE *serialized_size, pcre2_general_context *gcontext);

       void pcre2_serialize_free(uint8_t *bytes);

       int32_t pcre2_serialize_get_number_of_codes(const uint8_t *bytes);

       If you are running an application that uses a large number of regular expression patterns,
       it may be useful to store them in a precompiled form instead of  having  to  compile  them
       every time the application is run. However, if you are using the just-in-time optimization
       feature, it is not possible to save and reload the  JIT  data,  because  it  is  position-
       dependent. The host on which the patterns are reloaded must be running the same version of
       PCRE2, with the same code unit width, and must also  have  the  same  endianness,  pointer
       width and PCRE2_SIZE type. For example, patterns compiled on a 32-bit system using PCRE2's
       16-bit library cannot be reloaded on a 64-bit system, nor can they be reloaded  using  the
       8-bit library.

       Note  that  "serialization"  in  PCRE2  does  not convert compiled patterns to an abstract
       format like Java or .NET serialization. The serialized output is really  just  a  bytecode
       dump, which is why it can only be reloaded in the same environment as the one that created
       it. Hence the restrictions mentioned above.  Applications that are not  statically  linked
       with  a  fixed version of PCRE2 must be prepared to recompile patterns from their sources,
       in order to be immune to PCRE2 upgrades.

SECURITY CONCERNS


       The facility for saving and  restoring  compiled  patterns  is  intended  for  use  within
       individual  applications.  As  such,  the  data  supplied  to  pcre2_serialize_decode() is
       expected to be trusted data, not data from arbitrary external sources. There is only  some
       simple consistency checking, not complete validation of what is being re-loaded. Corrupted
       data may cause undefined results. For example, if the length field of  a  pattern  in  the
       serialized  data  is corrupted, the deserializing code may read beyond the end of the byte
       stream that is passed to it.

SAVING COMPILED PATTERNS


       Before compiled patterns can be saved they  must  be  serialized,  which  in  PCRE2  means
       converting  the  pattern to a stream of bytes. A single byte stream may contain any number
       of compiled patterns, but they must all use the same character tables. A  single  copy  of
       the  tables  is  included in the byte stream (its size is 1088 bytes). For more details of
       character tables, see the section on locale support in the pcre2api documentation.

       The function pcre2_serialize_encode() creates a serialized byte  stream  from  a  list  of
       compiled  patterns.  Its first two arguments specify the list, being a pointer to a vector
       of pointers to compiled patterns, and the length of  the  vector.  The  third  and  fourth
       arguments  point  to  variables  which are set to point to the created byte stream and its
       length, respectively. The final argument is a pointer to a general context, which  can  be
       used to specify custom memory mangagement functions. If this argument is NULL, malloc() is
       used to obtain memory for the byte stream. The yield of the  function  is  the  number  of
       serialized patterns, or one of the following negative error codes:

         PCRE2_ERROR_BADDATA      the number of patterns is zero or less
         PCRE2_ERROR_BADMAGIC     mismatch of id bytes in one of the patterns
         PCRE2_ERROR_NOMEMORY     memory allocation failed
         PCRE2_ERROR_MIXEDTABLES  the patterns do not all use the same tables
         PCRE2_ERROR_NULL         the 1st, 3rd, or 4th argument is NULL

       PCRE2_ERROR_BADMAGIC means either that a pattern's code has been corrupted, or that a slot
       in the vector does not point to a compiled pattern.

       Once a set of patterns has been serialized you  can  save  the  data  in  any  appropriate
       manner.  Here  is  sample  code  that  compiles two patterns and writes them to a file. It
       assumes that the variable fd refers to a file that is open for output. The error  checking
       that should be present in a real application has been omitted for simplicity.

         int errorcode;
         uint8_t *bytes;
         PCRE2_SIZE erroroffset;
         PCRE2_SIZE bytescount;
         pcre2_code *list_of_codes[2];
         list_of_codes[0] = pcre2_compile("first pattern",
           PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
         list_of_codes[1] = pcre2_compile("second pattern",
           PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
         errorcode = pcre2_serialize_encode(list_of_codes, 2, &bytes,
           &bytescount, NULL);
         errorcode = fwrite(bytes, 1, bytescount, fd);

       Note that the serialized data is binary data that may contain any of the 256 possible byte
       values. On systems that make a distinction between binary and  non-binary  data,  be  sure
       that the file is opened for binary output.

       Serializing  a  set  of  patterns leaves the original data untouched, so they can still be
       used for matching. Their memory must eventually be freed  in  the  usual  way  by  calling
       pcre2_code_free().  When  you  have finished with the byte stream, it too must be freed by
       calling pcre2_serialize_free(). If this function  is  called  with  a  NULL  argument,  it
       returns immediately without doing anything.

RE-USING PRECOMPILED PATTERNS


       In  order to re-use a set of saved patterns you must first make the serialized byte stream
       available in main memory (for example, by reading from a file).  The  management  of  this
       memory     block     is     up     to     the     application.    You    can    use    the
       pcre2_serialize_get_number_of_codes() function to find out how many compiled patterns  are
       in the serialized data without actually decoding the patterns:

         uint8_t *bytes = <serialized data>;
         int32_t number_of_codes = pcre2_serialize_get_number_of_codes(bytes);

       The  pcre2_serialize_decode()  function  reads  a  byte  stream and recreates the compiled
       patterns in new memory blocks, setting pointers  to  them  in  a  vector.  The  first  two
       arguments are a pointer to a suitable vector and its length, and the third argument points
       to a byte stream. The final argument is a pointer to a general context, which can be  used
       to  specify custom memory mangagement functions for the decoded patterns. If this argument
       is NULL, malloc() and free() are used. After deserialization, the byte stream is no longer
       needed and can be discarded.

         pcre2_code *list_of_codes[2];
         uint8_t *bytes = <serialized data>;
         int32_t number_of_codes =
           pcre2_serialize_decode(list_of_codes, 2, bytes, NULL);

       If  the  vector  is not large enough for all the patterns in the byte stream, it is filled
       with those that fit, and the remainder are ignored. The  yield  of  the  function  is  the
       number of decoded patterns, or one of the following negative error codes:

         PCRE2_ERROR_BADDATA    second argument is zero or less
         PCRE2_ERROR_BADMAGIC   mismatch of id bytes in the data
         PCRE2_ERROR_BADMODE    mismatch of code unit size or PCRE2 version
         PCRE2_ERROR_BADSERIALIZEDDATA  other sanity check failure
         PCRE2_ERROR_MEMORY     memory allocation failed
         PCRE2_ERROR_NULL       first or third argument is NULL

       PCRE2_ERROR_BADMAGIC  may  mean  that  the  data  is corrupt, or that it was compiled on a
       system with different endianness.

       Decoded patterns can be used for matching in the usual way, and must be freed  by  calling
       pcre2_code_free(). However, be aware that there is a potential race issue if you are using
       multiple patterns that  were  decoded  from  a  single  byte  stream  in  a  multithreaded
       application. A single copy of the character tables is used by all the decoded patterns and
       a reference count is used to arrange for its memory to be  automatically  freed  when  the
       last  pattern is freed, but there is no locking on this reference count. Therefore, if you
       want to call pcre2_code_free() for these patterns in different threads, you  must  arrange
       your own locking, and ensure that pcre2_code_free() cannot be called by two threads at the
       same time.

       If a pattern was processed by pcre2_jit_compile() before being serialized, the JIT data is
       discarded  and  so  is  no  longer available after a save/restore cycle. You can, however,
       process a restored pattern with pcre2_jit_compile() if you wish.

AUTHOR


       Philip Hazel
       University Computing Service
       Cambridge, England.

REVISION


       Last updated: 27 June 2018
       Copyright (c) 1997-2018 University of Cambridge.