Ubuntu Manpage: PDL::BadValues - Discussion of bad value support in PDL

NAME

       PDL::BadValues - Discussion of bad value support in PDL

DESCRIPTION

   What are bad values and why should I bother with them?
       Sometimes it's useful to be able to specify a certain value is 'bad' or 'missing'; for
       example CCDs used in astronomy produce 2D images which are not perfect since certain areas
       contain invalid data due to imperfections in the detector.  Whilst PDL's powerful index
       routines and all the complicated business with dataflow, slices, etc etc mean that these
       regions can be ignored in processing, it's awkward to do. It would be much easier to be
       able to say "$c = $x + $y" and leave all the hassle to the computer.

       If you're not interested in this, then you may (rightly) be concerned with how this
       affects the speed of PDL, since the overhead of checking for a bad value at each operation
       can be large.  Because of this, the code has been written to be as fast as possible -
       particularly when operating on ndarrays which do not contain bad values.  In fact, you
       should notice essentially no speed difference when working with ndarrays which do not
       contain bad values.

       You may also ask 'well, my computer supports IEEE NaN, so I already have this'.  They are
       different things; a bad value signifies "leave this out of processing", whereas NaN is the
       result of a mathematically-invalid operation.

       Many routines, such as "y=sin(x)", will propagate NaN's without the user having to code
       differently, but routines such as "qsort", or finding the median of an array, need to be
       re-coded to handle bad values.  For floating-point datatypes, "NaN" and "Inf" can be used
       to flag bad values, but by default special values are used (Default bad values).

       There is one default bad value for each datatype, but as of PDL 2.040, you can have
       different bad values for separate ndarrays of the same type.

       You can use "NaN" as the bad value for any floating-point type, including complex.

   A quick overview
        pdl> $x = sequence(4,3);
        pdl> p $x
        [
         [ 0  1  2  3]
         [ 4  5  6  7]
         [ 8  9 10 11]
        ]
        pdl> $x = $x->setbadif( $x % 3 == 2 )
        pdl> p $x
        [
         [  0   1 BAD   3]
         [  4 BAD   6   7]
         [BAD   9  10 BAD]
        ]
        pdl> $x *= 3
        pdl> p $x
        [
         [  0   3 BAD   9]
         [ 12 BAD  18  21]
         [BAD  27  30 BAD]
        ]
        pdl> p $x->sum
        120

       "demo bad" within perldl or pdl2 gives a demonstration of some of the things possible with
       bad values.  These are also available on PDL's web-site, at http://pdl.perl.org/demos/.
       See PDL::Bad for useful routines for working with bad values and t/bad.t to see them in
       action.

       To find out if a routine supports bad values, use the "badinfo" command in perldl or pdl2
       or the "-b" option to pdldoc.

       Each ndarray contains a flag - accessible via "$pdl->badflag" - to say whether there's any
       bad data present:

       •   If false/0, which means there's no bad data here, the code supplied by the "Code"
           option to "pp_def()" is executed.

       •   If true/1, then this says there MAY be bad data in the ndarray, so use the code in the
           "BadCode" option (assuming that the "pp_def()" for this routine has been updated to
           have a BadCode key).  You get all the advantages of broadcasting, as with the "Code"
           option, but it will run slower since you are going to have to handle the presence of
           bad values.

       If you create an ndarray, it will have its bad-value flag set to 0. To change this, use
       "$pdl->badflag($new_bad_status)", where $new_bad_status can be 0 or 1.  When a routine
       creates an ndarray, its bad-value flag will depend on the input ndarrays: unless over-
       ridden (see the "CopyBadStatusCode" option to "pp_def"), the bad-value flag will be set
       true if any of the input ndarrays contain bad values.  To check that an ndarray really
       contains bad data, use the "check_badflag" method.

       NOTE: propagation of the badflag

       If you change the badflag of an ndarray, this change is propagated to all the children of
       an ndarray, so

          pdl> $x = zeroes(20,30);
          pdl> $y = $x->slice('0:10,0:10');
          pdl> $c = $y->slice(',(2)');
          pdl> print ">>c: ", $c->badflag, "\n";
          >>c: 0
          pdl> $x->badflag(1);
          pdl> print ">>c: ", $c->badflag, "\n";
          >>c: 1

       This is also propagated to the parents of an ndarray, so

          pdl> print ">>a: ", $x->badflag, "\n";
          >>a: 1
          pdl> $c->badflag(0);
          pdl> print ">>a: ", $x->badflag, "\n";
          >>a: 0

       There's also the issue of what happens if you change the badvalue of an ndarray - should
       these propagate to children/parents (yes) or whether you should only be able to change the
       badvalue at the 'top' level - i.e. those ndarrays which do not have parents.

       The "orig_badvalue()" method returns the compile-time value for a given datatype. It works
       on ndarrays, PDL::Type objects, and numbers - eg

         $pdl->orig_badvalue(), byte->orig_badvalue(), and orig_badvalue(4).

       To get the current bad value, use the "badvalue()" method - it has the same syntax as
       "orig_badvalue()".

       To change the current bad value, supply the new number to badvalue - eg

         $pdl->badvalue(2.3), byte->badvalue(2), badvalue(5,-3e34).

       Note: the value is silently converted to the correct C type, and returned - i.e.
       "byte->badvalue(-26)" returns 230 on my Linux machine.

       Note that changes to the bad value are NOT propagated to previously-created ndarrays -
       they will still have the bad flag set, but suddenly the elements that were bad will become
       'good', but containing the old bad value.  See discussion below.

   Bad values and boolean operators
       For those boolean operators in PDL::Ops, evaluation on a bad value returns the bad value.
       This:

        $mask = $img > $thresh;

       correctly propagates bad values. This will omit any bad values, but return a bad value if
       there are no good ones:

        $bool = any( $img > $thresh );

       As of 2.077, a bad value used as a boolean will throw an exception.

       When using one of the 'projection' functions in PDL::Ufunc - such as orover - bad values
       are skipped over (see the documentation of these functions for the current handling of the
       case when all elements are bad).

IMPLEMENTATION DETAILS

       A new flag has been added to the state of an ndarray - "PDL_BADVAL". If unset, then the
       ndarray does not contain bad values, and so all the support code can be ignored. If set,
       it does not guarantee that bad values are present, just that they should be checked for.

       The "pdl_trans" structure has been extended to include an integer value, "bvalflag", which
       acts as a switch to tell the code whether to handle bad values or not. This value is set
       if any of the input ndarrays have their "PDL_BADVAL" flag set (although this code can be
       replaced by setting "FindBadStateCode" in pp_def).

   Default bad values
       The default bad values are now stored in a structure within the Core PDL structure -
       "PDL.bvals" (eg Basic/Core/pdlcore.h.PL); see also "typedef badvals" in
       Basic/Core/pdl.h.PL and the BOOT code of Basic/Core/Core.xs.PL where the values are
       initialised to (hopefully) sensible values.  See "badvalue" in PDL::Bad and
       "orig_badvalue" in PDL::Bad for read/write routines to the values.

       The default/original bad values are set to the C type's maximum (unsigned integers) or the
       minimum (floating-point and signed integers).

   How do I change a routine to handle bad values?
       See "BadCode" in PDL::PP and "HandleBad" in PDL::PP.

       If you have a routine that you want to be able to use as in-place, look at the routines in
       bad.pd (or ops.pd) which use the "in-place" option to see how the bad flag is propagated
       to children using the "xxxBadStatusCode" options.  I decided not to automate this as rules
       would be a little complex, since not every in-place op will need to propagate the badflag
       (eg unary functions).

       This all means that you can change

          Code => '$a() = $b() + $c();'

       to

          BadCode => 'if ( $ISBAD(b()) || $ISBAD(c()) ) {
                        $SETBAD(a());
                      } else {
                        $a() = $b() + $c();
                      }'

       leaving Code as it is. PP::PDLCode will then create code something like

          if ( __trans->bvalflag ) {
               broadcastloop over BadCode
          } else {
               broadcastloop over Code
          }

WHAT ABOUT DOCUMENTATION?

       One of the strengths of PDL is its on-line documentation. The aim is to use this system to
       provide information on how/if a routine supports bad values: in many cases "pp_def()"
       contains all the information anyway, so the function-writer doesn't need to do anything at
       all! For the cases when this is not sufficient, there's the "BadDoc" option. For code
       written at the Perl level - i.e. in a .pm file - use the "=for bad" pod directive.

       This information will be available via man/pod2man/html documentation. It's also
       accessible from the "perldl" or "pdl2" shells - using the "badinfo" command - and the
       "pdldoc" shell command - using the "-b" option.

AUTHOR

       Copyright (C) Doug Burke (djburke@cpan.org), 2000, 2006.

       The per-ndarray bad value support is by Heiko Klein (2006).