noble (3) Perl::Critic::Policy::RegularExpressions::ProhibitComplexRegexes.3pm.gz

Provided by: libperl-critic-perl_1.152-1_all bug

NAME

       Perl::Critic::Policy::RegularExpressions::ProhibitComplexRegexes - Split long regexps into smaller "qr//"
       chunks.

AFFILIATION

       This Policy is part of the core Perl::Critic distribution.

DESCRIPTION

       Big regexps are hard to read, perhaps even the hardest part of Perl.  A good practice to write digestible
       chunks of regexp and put them together.  This policy flags any regexp that is longer than "N" characters,
       where "N" is a configurable value that defaults to 60.  If the regexp uses the "x" flag, then the length
       is computed after parsing out any comments or whitespace.

       Unfortunately the use of descriptive (and therefore longish) variable names can cause regexps to be in
       violation of this policy, so interpolated variables are counted as 4 characters no matter how long their
       names actually are.

CASE STUDY

       As an example, look at the regexp used to match email addresses in Email::Valid::Loose (tweaked lightly
       to wrap for POD)

           (?x-ism:(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]
           \000-\037\x80-\xff])|"[^\\\x80-\xff\n\015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015
           "]*)*")(?:(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[
           \]\000-\037\x80-\xff])|"[^\\\x80-\xff\n\015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\n
           \015"]*)*")|\.)*\@(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,
           ;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\]
           )(?:\.(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000
           -\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\]))*)

       which is constructed from the following code:

           my $esc         = '\\\\';
           my $period      = '\.';
           my $space       = '\040';
           my $open_br     = '\[';
           my $close_br    = '\]';
           my $nonASCII    = '\x80-\xff';
           my $ctrl        = '\000-\037';
           my $cr_list     = '\n\015';
           my $qtext       = qq/[^$esc$nonASCII$cr_list\"]/; # "
           my $dtext       = qq/[^$esc$nonASCII$cr_list$open_br$close_br]/;
           my $quoted_pair = qq<$esc>.qq<[^$nonASCII]>;
           my $atom_char   = qq/[^($space)<>\@,;:\".$esc$open_br$close_br$ctrl$nonASCII]/;# "
           my $atom        = qq<$atom_char+(?!$atom_char)>;
           my $quoted_str  = qq<\"$qtext*(?:$quoted_pair$qtext*)*\">; # "
           my $word        = qq<(?:$atom|$quoted_str)>;
           my $domain_ref  = $atom;
           my $domain_lit  = qq<$open_br(?:$dtext|$quoted_pair)*$close_br>;
           my $sub_domain  = qq<(?:$domain_ref|$domain_lit)>;
           my $domain      = qq<$sub_domain(?:$period$sub_domain)*>;
           my $local_part  = qq<$word(?:$word|$period)*>; # This part is modified
           $Addr_spec_re   = qr<$local_part\@$domain>;

       If you read the code from bottom to top, it is quite readable.  And, you can even see the one violation
       of RFC822 that Tatsuhiko Miyagawa deliberately put into Email::Valid::Loose to allow periods.  Look for
       the "|\." in the upper regexp to see that same deviation.

       One could certainly argue that the top regexp could be re-written more legibly with "m//x" and comments.
       But the bottom version is self-documenting and, for example, doesn't repeat "\x80-\xff" 18 times.
       Furthermore, it's much easier to compare the second version against the source BNF grammar in RFC 822 to
       judge whether the implementation is sound even before running tests.

CONFIGURATION

       This policy allows regexps up to "N" characters long, where "N" defaults to 60.  You can override this to
       set it to a different number with the "max_characters" setting.  To do this, put entries in a
       .perlcriticrc file like this:

           [RegularExpressions::ProhibitComplexRegexes]
           max_characters = 40

CREDITS

       Initial development of this policy was supported by a grant from the Perl Foundation.

AUTHOR

       Chris Dolan <cdolan@cpan.org>

       Copyright (c) 2007-2023 Chris Dolan

       This program is free software; you can redistribute it and/or modify it under the same terms as Perl
       itself.  The full text of this license can be found in the LICENSE file included with this module

perl v5.36.0                               Perl::Critic::Policy::RegularExpressions::ProhibitComplexRegexes(3pm)