Provided by: spamassassin_4.0.0-7ubuntu1_all bug

NAME

       Mail::SpamAssassin::PerMsgStatus - per-message status (spam or not-spam)

SYNOPSIS

         my $spamtest = Mail::SpamAssassin->new({
           'rules_filename'      => '/etc/spamassassin.rules',
           'userprefs_filename'  => $ENV{HOME}.'/.spamassassin/user_prefs'
         });
         my $mail = $spamtest->parse();

         my $status = $spamtest->check ($mail);

         my $rewritten_mail;
         if ($status->is_spam()) {
           $rewritten_mail = $status->rewrite_mail ();
         }
         ...

DESCRIPTION

       The Mail::SpamAssassin "check()" method returns an object of this class.  This object
       encapsulates all the per-message state.

METHODS

       $status->check ()
           Runs the SpamAssassin rules against the message pointed to by the object.

       $status->learn()
           After a mail message has been checked, this method can be called.  If the score is
           outside a certain range around the threshold, ie. if the message is judged more-or-
           less definitely spam or definitely non-spam, it will be fed into SpamAssassin's
           learning systems (currently the naive Bayesian classifier), so that future similar
           mails will be caught.

       $score = $status->get_autolearn_points()
           Return the message's score as computed for auto-learning.  Certain tests are ignored:

             - rules with tflags set to 'learn' (the Bayesian rules)

             - rules with tflags set to 'userconf' (user welcome/block-listing rules, etc)

             - rules with tflags set to 'noautolearn'

           Also note that auto-learning occurs using scores from either scoreset 0 or 1,
           depending on what scoreset is used during message check.  It is likely that the
           message check and auto-learn scores will be different.

       $score = $status->get_head_only_points()
           Return the message's score as computed for auto-learning, ignoring all rules except
           for header-based ones.

       $score = $status->get_learned_points()
           Return the message's score as computed for auto-learning, ignoring all rules except
           for learning-based ones.

       $score = $status->get_body_only_points()
           Return the message's score as computed for auto-learning, ignoring all rules except
           for body-based ones.

       $score = $status->get_autolearn_force_status()
           Return whether a message's score included any rules that are flagged as
           autolearn_force.

       $rule_names = $status->get_autolearn_force_names()
           Return a list of comma separated list of rule names if a message's score included any
           rules that are flagged as autolearn_force.

       $isspam = $status->is_spam ()
           After a mail message has been checked, this method can be called.  It will return 1
           for mail determined likely to be spam, 0 if it does not seem spam-like.

       $list = $status->get_names_of_tests_hit ()
           After a mail message has been checked, this method can be called. It will return a
           comma-separated string, listing all the symbolic test names of the tests which were
           triggered by the mail.

       $list = $status->get_names_of_tests_hit_with_scores_hash ()
           After a mail message has been checked, this method can be called. It will return a
           pointer to a hash for rule & score pairs for all the symbolic test names and
           individual scores of the tests which were triggered by the mail.

       $list = $status->get_names_of_tests_hit_with_scores ()
           After a mail message has been checked, this method can be called. It will return a
           comma-separated string of rule=score pairs for all the symbolic test names and
           individual scores of the tests which were triggered by the mail.

       $list = $status->get_names_of_subtests_hit ()
           After a mail message has been checked, this method can be called.  It will return a
           comma-separated string, listing all the symbolic test names of the meta-rule sub-tests
           which were triggered by the mail.  Sub-tests are the normally-hidden rules, which
           score 0 and have names beginning with two underscores, used in meta rules.

           If a parameter of collapsed or dbg is passed, the output will be a condensed array of
           sub-tests with multiple hits reduced to one entry.

           If the parameter of dbg is passed, the output will be a condensed string of sub-tests
           with multiple hits reduced to one entry with the number of hits in parentheses. Some
           information is also added at the end regarding the multiple hits.

       $num = $status->get_score ()
           After a mail message has been checked, this method can be called.  It will return the
           message's score.

       $num = $status->get_required_score ()
           After a mail message has been checked, this method can be called.  It will return the
           score required for a mail to be considered spam.

       $num = $status->get_autolearn_status ()
           After a mail message has been checked, this method can be called.  It will return one
           of the following strings depending on whether the mail was auto-learned or not: "ham",
           "no", "spam", "disabled", "failed", "unavailable".

           It also returns is flagged with auto_learn_force, it will also include the status and
           the rules hit.  For example: "autolearn_force=yes (AUTOLEARNTEST_BODY)"

       $report = $status->get_report ()
           Deliver a "spam report" on the checked mail message.  This contains details of how
           many spam detection rules it triggered.

           The report is returned as a multi-line string, with the lines separated by "\n"
           characters.

       $preview = $status->get_content_preview ()
           Give a "preview" of the content.

           This is returned as a multi-line string, with the lines separated by "\n" characters,
           containing a fully-decoded, safe, plain-text sample of the first few lines of the
           message body.

       $msg = $status->get_message()
           Return the object representing the message being scanned.

       $status->rewrite_mail ()
           Rewrite the mail message.  This will at minimum add headers, and at maximum MIME-
           encapsulate the message text, to reflect its spam or not-spam status.  The function
           will return a scalar of the rewritten message.

           The actual modifications depend on the configuration (see "Mail::SpamAssassin::Conf"
           for more information).

           The possible modifications are as follows:

           To:, From: and Subject: modification on spam mails
               Depending on the configuration, the To: and From: lines can have a user-defined
               RFC 2822 comment appended for spam mail. The subject line may have a user-defined
               string prepended to it for spam mail.

           X-Spam-* headers for all mails
               Depending on the configuration, zero or more headers with names beginning with
               "X-Spam-" will be added to mail depending on whether it is spam or ham.

           spam message with report_safe
               If report_safe is set to true (1), then spam messages are encapsulated into their
               own message/rfc822 MIME attachment without any modifications being made.

               If report_safe is set to false (0), then the message will only have the above
               headers added/modified.

       $status->action_depends_on_tags($tags, $code, @args)
           Enqueue the supplied subroutine reference $code, to become runnable when all the
           specified tags become available. The $tags may be a simple scalar - a tag name, or a
           listref of tag names. The subroutine &$code when called will be passed a
           "permessagestatus" object as its first argument, followed by the supplied (optional)
           list @args .

       $status->set_tag($tagname, $value)
           Set a template tag, as used in "add_header", report templates, etc.  This API is
           intended for use by plugins.  Tag names will be converted to an all-uppercase
           representation internally.  Tag names must consist only of [A-Z0-9_] characters and
           must not contain consecutive underscores.  Also the name must not start or end in an
           underscore, as that is the template tagging format.

           $value can be a simple scalar (string or number), or a reference to an array, in which
           case the public method get_tag will join array elements using a space as a separator,
           returning a single string for backward compatibility.

           $value can also be a subroutine reference, which will be evaluated each time the
           template is expanded. The first argument passed by get_tag to a called subroutine will
           be a PerMsgStatus object (this module's object), followed by optional arguments
           provided by a caller to get_tag.

           Note that perl supports closures, which means that variables set in the caller's scope
           can be accessed inside this "sub". For example:

               my $text = "hello world!";
               $status->set_tag("FOO", sub {
                         my $pms = shift;
                         return $text;
                       });

           See "Mail::SpamAssassin::Conf"'s "TEMPLATE TAGS" and "CAPTURING TAGS USING REGEX NAMED
           CAPTURE GROUPS" sections for more details on how template tags are used.

       $string = $status->get_tag($tagname)
           Get the current value of a template tag, as used in "add_header", report templates,
           etc. This API is intended for use by plugins.  Tag names will be converted to an all-
           uppercase representation internally.

           See "Mail::SpamAssassin::Conf"'s "TEMPLATE TAGS" and "CAPTURING TAGS USING REGEX NAMED
           CAPTURE GROUPS" sections for more details on how template tags are used.

           "undef" will be returned if a tag by that name has not been defined.

       $string = $status->get_tag_raw($tagname, @args)
           Similar to "get_tag", but keeps a tag name unchanged (does not uppercase it), and does
           not convert arrayref tag values into a single string.

       $status->set_spamd_result_item($subref)
           Set an entry for the spamd result log line.  $subref should be a code reference for a
           subroutine which will return a string in 'name=VALUE' format, similar to the other
           entries in the spamd result line:

             Jul 17 14:10:47 radish spamd[16670]: spamd: result: Y 22 - ALL_NATURAL,
             DATE_IN_FUTURE_03_06,DIET_1,DRUGS_ERECTILE,DRUGS_PAIN,
             TEST_FORGED_YAHOO_RCVD,TEST_INVALID_DATE,TEST_NOREALNAME,
             TEST_NORMAL_HTTP_TO_IP,UNDISC_RECIPS scantime=0.4,size=3138,user=jm,
             uid=1000,required_score=5.0,rhost=localhost,raddr=127.0.0.1,
             rport=33153,mid=<9PS291LhupY>,autolearn=spam

           "name" and "VALUE" must not contain "=" or "," characters, as it is important that
           these log lines are easy to parse.

           The code reference will be called by spamd after the message has been scanned, and the
           "PerMsgStatus::check()" method has returned.

       $status->finish ()
           Indicate that this $status object is finished with, and can be destroyed.

           If you are using SpamAssassin in a persistent environment, or checking many mail
           messages from one "Mail::SpamAssassin" factory, this method should be called to ensure
           Perl's garbage collection will clean up old status objects.

       $name = $status->get_current_eval_rule_name()
           Return the name of the currently-running eval rule.  "undef" is returned if no eval
           rule is currently being run.  Useful for plugins to determine the current rule name
           while inside an eval test function call.

       $status->get_decoded_body_text_array ()
           Returns the message body, with base64 or quoted-printable encodings decoded, and non-
           text parts or non-inline attachments stripped.

           This is the same result text as used in 'rawbody' rules.

           It is returned as an array of strings, with each string being a 2-4kB chunk of the
           body, split from boundaries if possible.

       $status->get_decoded_stripped_body_text_array ()
           Returns the message body, decoded (as described in get_decoded_body_text_array()),
           with HTML rendered, and with whitespace normalized.

           This is the same result text as used in 'body' rules.

           It will always render text/html.

           It is returned as an array of strings, with each string representing one 'paragraph'.
           Paragraphs, in plain-text mails, are double-newline-separated blocks of multi-line
           text.

       $status->get (header_name [, default_value])
           Returns a message header, pseudo-header or a real name, email-address or some other
           parsed value set by modifiers.  "header_name" is the name of a mail header, such as
           'Subject', 'To', etc.

           Should be called in list context since 4.0.  Will return list of headers content, or
           other values when modifiers used.

           If "default_value" is given, it will be used if the requested "header_name" does not
           exist.  This is mainly useful when called in scalar context to set 'undef' instead of
           legacy '' return value when header does not exist.

           Appending ":raw" modifier to the header name will inhibit decoding of quoted-printable
           or base-64 encoded strings.

           Appending ":addr" modifier to the header name will return all email-addresses found in
           the header.  It is mainly applicable to header fields 'From', 'Sender', 'To', 'Cc'
           along with their 'Resent-*' counterparts, and the 'Return-Path'.  For example, all of
           the following will result in "example@foo" (and "example@bar"):

           example@foo
           example@foo (Foo Blah), <example@bar>
           example@foo, example@bar
           display: example@foo (Foo Blah), example@bar ;
           Foo Blah <example@foo>
           "Foo Blah" <example@foo>
           "'Foo Blah'" <example@foo>

           Appending ":name" modifier to the header name will return all "display names" from the
           header field.  As with ":addr", it is mainly applicable to header fields 'From',
           'Sender', 'To', 'Cc' along with their 'Resent-*' counterparts, and the 'Return-Path'.
           For example, all of the following will result in "Foo Blah" (and "Bar Baz").  One
           level of single quotes is stripped too, as it is often seen.

           example@foo (Foo Blah)
           example@foo (Foo Blah), "Bar Baz" <example@bar>
           display: example@foo (Foo Blah), example@bar ;
           Foo Blah <example@foo>
           "Foo Blah" <example@foo>
           "'Foo Blah'" <example@foo>

           Appending ":host" to the header name will return the first hostname-looking string
           that ends with a valid TLD.  First it tries to find a match after @ character
           (possible email), then from any part of the header.  Normal use of this would be for
           example 'From:addr:host' to return the hostname portion of a From-address.

           Appending ":domain" to the header name implies ":host", but will return only domain
           part of the hostname, as returned by RegistryBoundaries::trim_domain().

           Appending ":ip" to the header name, will return the first IPv4 or IPv6 address string
           found.  Could be used for example as 'X-Originating-IP:ip'.

           Appending ":revip" to the header name implies ":ip", but will return the found IP in
           reverse (usually for DNSBL usage).

           Appending ":first" modifier to the header name will return only the first (topmost)
           header, in case there are multiple ones.  Similarly ":last" will select the last one.
           These affect only the physical header line selection.  If selected header is parsed
           further with ":addr" or similar, it may return multiple results, if the selected
           header contains multiple addresses.

           There are several special pseudo-headers that can be specified:

           "ALL" can be used to mean the text of all the message's headers. Each header is
           decoded and unfolded to single line, unless called with :raw.
           "ALL-TRUSTED" can be used to mean the text of all the message's headers that could
           only have been added by trusted relays.
           "ALL-INTERNAL" can be used to mean the text of all the message's headers that could
           only have been added by internal relays.
           "ALL-UNTRUSTED" can be used to mean the text of all the message's headers that may
           have been added by untrusted relays.  To make this pseudo-header more useful for
           header rules the 'Received' header that was added by the last trusted relay is
           included, even though it can be trusted.
           "ALL-EXTERNAL" can be used to mean the text of all the message's headers that may have
           been added by external relays.  Like "ALL-UNTRUSTED" the 'Received' header added by
           the last internal relay is included.
           "ToCc" can be used to mean the contents of both the 'To' and 'Cc' headers.
           "EnvelopeFrom" is the address used in the 'MAIL FROM:' phase of the SMTP transaction
           that delivered this message, if this data has been made available by the SMTP server.
           "MESSAGEID" is a symbol meaning all Message-Id's found in the message; some mailing
           list software moves the real 'Message-Id' to 'Resent-Message-Id' or 'X-Message-Id',
           then uses its own one in the 'Message-Id' header.  The value returned for this symbol
           is the text from all 3 headers, separated by newlines.
           "X-Spam-Relays-Untrusted" is the generated metadata of untrusted relays the message
           has passed through
           "X-Spam-Relays-Trusted" is the generated metadata of trusted relays the message has
           passed through
           "X-Spam-Relays-External" is the generated metadata of external relays the message has
           passed through
           "X-Spam-Relays-Internal" is the generated metadata of internal relays the message has
           passed through
       $status->get_uri_list ()
           Returns an array of all unique URIs found in the message.  It takes a combination of
           the URIs found in the rendered (decoded and HTML stripped) body and the URIs found
           when parsing the HTML in the message.  Will also set $status->{uri_list} (the array as
           returned by this function).

           The returned array will include the "raw" URI as well as "slightly cooked" versions.
           For example, the single URI 'http://%77&#00119;%77.example.com/' will get turned into:
           ( 'http://%77&#00119;%77.example.com/', 'http://www.example.com/' )

       $status->get_uri_detail_list ()
           Returns a hash reference of all unique URIs found in the message and various data
           about where the URIs were found in the message.  It takes a combination of the URIs
           found in the rendered (decoded and HTML stripped) body and the URIs found when parsing
           the HTML in the message.  Will also set $status->{uri_detail_list} (the hash reference
           as returned by this function).

           The hash format looks something like this:

             raw_uri => {
               types => { a => 1, img => 1, parsed => 1, domainkeys => 1,
                          unlinked => 1, schemeless => 1 },
               cleaned => [ canonicalized_uri ],
               anchor_text => [ "click here", "no click here" ],
               domains => { domain1 => 1, domain2 => 1 },
               hosts => { host1 => domain1, host2 => domain2 },
             }

           "raw_uri" is whatever the URI was in the message itself
           (http://spamassassin.apache%2Eorg/).  Uris parsed from text will be prefixed with
           scheme if missing (http://, mailto: etc).  HTML uris are as found.

           "types" is a hash of the HTML tags (lowercase) which referenced the raw_uri.  parsed
           is a faked type which specifies that the raw_uri was seen in the rendered text.
           domainkeys is defined when raw_uri was found from DK/DKIM d= field.  unlinked is
           defined when it's assumed that MUA will not linkify uri (found in body without scheme
           or www. prefix).  schemeless is always added for uris without scheme, regardless of
           linkifying (i.e. email address found in body without mailto:).

           "cleaned" is an array of the raw and canonicalized version of the raw_uri
           (http://spamassassin.apache%2Eorg/, https://spamassassin.apache.org/).

           "anchor_text" is an array of the anchor text (text between <a> and </a>), if any,
           which linked to the URI.

           "domains" is a hash of the domains found in the canonicalized URIs.

           "hosts" is a hash of unstripped hostnames found in the canonicalized URIs as hash
           keys, with their domain part stored as a value of each hash entry.

       $status->add_uri_detail_list ($raw_uri, $types, $source, $valid_domain)
           Adds values to internal uri_detail_list.  When used from Plugins, recommended to call
           from parsed_metadata (along with register_method_priority, -10) so other Plugins
           calling get_uri_detail_list() will see it.

           "raw_uri" is the URI to be added. The only required parameter.

           "types" is an optional hash reference, contents are added to uri_detail_list->{types}
           (see get_uri_detail_list for known keys).  parsed is default is no hash given.
           nocanon does not run uri_list_canonicalize (no redirector, uri fixing).  noclean skips
           adding uri_detail_list->{cleaned}, so it would not be used in "uri" rule checks, but
           domain/hosts would still be used for URIBL/RBL purposes.

           "source" is an optional simple string, only used for debug logging purposes to
           identify where uri originates from (default: "parsed").

           "valid_domain" is an optional boolean (0/1).  If true, uri will not be added unless
           hostname/domain is in valid format and contains a valid TLD.  (default: 0)

       $status->clear_test_state()
           DEPRECATED, UNNEEDED SINCE 4.0

       $status->got_hit ($rulename, $desc_prepend [, name => value, ...])
           Register a hit against a rule in the ruleset.

           There are two mandatory arguments. These are $rulename, the name of the rule that
           fired, and $desc_prepend, which is a short string that will be prepended to the rules
           "describe" string in output reports.

           In addition, callers can supplement that with the following optional data:

           score => $num
               Optional: the score to use for the rule hit.  If unspecified, the value from the
               "Mail::SpamAssassin::Conf" object's "{scores}" hash will be used (a configured
               score), and in its absence the "defscore" option value.

           defscore => $num
               Optional: the score to use for the rule hit if neither the option "score" is
               provided, nor a configured score value is provided.

           value => $num
               Optional: the value to assign to the rule; the default value is 1.  tflags
               multiple rules use values of greater than 1 to indicate multiple hits.  This value
               is accessible to meta rules.

           ruletype => $type
               Optional, but recommended: the rule type string.  This is used in the "hit_rule"
               plugin call, called by this method.  If unset, 'unknown' is used.

           tflags => $string
               Optional: a string, i.e. a space-separated list of additional tflags to be
               appended to an existing list of flags in $self->{conf}->{tflags}, such as: "nice
               noautolearn multiple". No syntax checks are performed.

           description => $string
               Optional: a custom rule description string.  This is used in the "hit_rule" plugin
               call, called by this method. If unset, the static description is used.

           Backward compatibility: the two mandatory arguments have been part of this API since
           SpamAssassin 2.x.  The optional name=<gtvalue> pairs, however, are a new addition in
           SpamAssassin 3.2.0.

       $status->rule_ready ($rulename [, $no_async])
           Mark an asynchronous rule ready, so it can be considered for meta rule evaluation.
           Asynchronous rule is a rule whose eval-function returns undef, marking that it's not
           ready yet, expecting results later.  $status->rule_ready() must be called later to
           mark it ready, alternatively $status->got_hit() also does this.  If neither is called,
           then any meta rule that depends on this rule might not evaluate.

           Optional boolean $no_async skips checking if there are pending async DNS lookups for
           the rule.

       $status->test_log ($text [, $rulename])
           Add $text log entry for a hit rule in final message REPORT/SUMMARY.

           Usually called just before got_hit(), to describe for example what URI the rule
           matched on.  Optional <$rulename> argument is recommended to make sure log is written
           to correct rule.  If rulename is not provided, get_current_eval_rule_name() is used as
           fallback.

           Can be called multiple times per rule for additional entries.

       $status->create_fulltext_tmpfile (fulltext_ref)
           This function creates a temporary file containing the passed scalar reference data.
           If no scalar is passed, full/pristine message text is assumed.  This is typically used
           by external programs like pyzor and dccproc, to avoid hangs due to buffering issues.

           All tempfiles are automatically cleaned up by PerMsgStatus destructor.

       $status->delete_fulltext_tmpfile (tmpfile)
           Will cleanup after a $status->create_fulltext_tmpfile() call.  Deletes the temporary
           file and uncaches the filename.  Generally there no need to call this, PerMsgStatus
           destructor cleans up all tmpfiles.

       all_from_addrs_domains
           This function returns all the various from addresses in a message using
           all_from_addrs() and then returns only the domain names.

SEE ALSO

       Mail::SpamAssassin(3) spamassassin(1)