Provided by: libpmemobj-dev_1.13.1-1.1ubuntu2_amd64 bug

NAME

       pmemobj_tx_stage(),

       pmemobj_tx_begin(),   pmemobj_tx_lock(),   pmemobj_tx_xlock(),  pmemobj_tx_abort(),  pmemobj_tx_commit(),
       pmemobj_tx_end(), pmemobj_tx_errno(), pmemobj_tx_process(),

       TX_BEGIN_PARAM(), TX_BEGIN_CB(), TX_BEGIN(), TX_ONABORT, TX_ONCOMMIT, TX_FINALLY, TX_END,

       pmemobj_tx_log_append_buffer(),       pmemobj_tx_xlog_append_buffer(),       pmemobj_tx_log_auto_alloc(),
       pmemobj_tx_log_snapshots_max_size(), pmemobj_tx_log_intents_max_size(),

       pmemobj_tx_set_user_data(), pmemobj_tx_get_user_data(),

       pmemobj_tx_set_failure_behavior(), pmemobj_tx_get_failure_behavior() - transactional object manipulation

SYNOPSIS

              #include <libpmemobj.h>

              enum pobj_tx_stage pmemobj_tx_stage(void);

              int pmemobj_tx_begin(PMEMobjpool *pop, jmp_buf *env, enum pobj_tx_param, ...);
              int pmemobj_tx_lock(enum tx_lock lock_type, void *lockp);
              int pmemobj_tx_xlock(enum tx_lock lock_type, void *lockp, uint64_t flags);
              void pmemobj_tx_abort(int errnum);
              void pmemobj_tx_commit(void);
              int pmemobj_tx_end(void);
              int pmemobj_tx_errno(void);
              void pmemobj_tx_process(void);

              TX_BEGIN_PARAM(PMEMobjpool *pop, ...)
              TX_BEGIN_CB(PMEMobjpool *pop, cb, arg, ...)
              TX_BEGIN(PMEMobjpool *pop)
              TX_ONABORT
              TX_ONCOMMIT
              TX_FINALLY
              TX_END

              int pmemobj_tx_log_append_buffer(enum pobj_log_type type, void *addr, size_t size);
              int pmemobj_tx_xlog_append_buffer(enum pobj_log_type type, void *addr, size_t size, uint64_t flags);
              int pmemobj_tx_log_auto_alloc(enum pobj_log_type type, int on_off);
              size_t pmemobj_tx_log_snapshots_max_size(size_t *sizes, size_t nsizes);
              size_t pmemobj_tx_log_intents_max_size(size_t nintents);

              void pmemobj_tx_set_user_data(void *data);
              void *pmemobj_tx_get_user_data(void);

              void pmemobj_tx_set_failure_behavior(enum pobj_tx_failure_behavior behavior);
              enum pobj_tx_failure_behavior pmemobj_tx_get_failure_behavior(void);

DESCRIPTION

       The  non-transactional  functions  and  macros  described in pmemobj_alloc(3), pmemobj_list_insert(3) and
       POBJ_LIST_HEAD(3) only guarantee the atomicity of a single operation on  an  object.   In  case  of  more
       complex  changes  involving  multiple operations on an object, or allocation and modification of multiple
       objects, data consistency and fail-safety may be provided only by using atomic transactions.

       A transaction is defined as series of operations on persistent memory objects that either all  occur,  or
       nothing  occurs.  In particular, if the execution of a transaction is interrupted by a power failure or a
       system crash, it is guaranteed that after system  restart,  all  the  changes  made  as  a  part  of  the
       uncompleted  transaction  will be rolled back, restoring the consistent state of the memory pool from the
       moment when the transaction was started.

       Note that transactions do not provide atomicity with respect to other  threads.   All  the  modifications
       performed  within  the  transactions  are  immediately  visible  to  other  threads.  Therefore it is the
       responsibility of the application to implement a proper thread synchronization mechanism.

       Each thread may have only one transaction open at a time, but that transaction  may  be  nested.   Nested
       transactions  are  flattened.   Committing  the nested transaction does not commit the outer transaction;
       however, errors in the nested transaction are propagated up to the  outermost  level,  resulting  in  the
       interruption of the entire transaction.

       Each  transaction  is  visible only for the thread that started it.  No other threads can add operations,
       commit or abort the transaction initiated by another thread.  Multiple threads may have transactions open
       on a given memory pool at the same time.

       Please see the CAVEATS section below for known limitations of the transactional API.

       The  pmemobj_tx_stage()  function returns the current transaction stage for a thread.  Stages are changed
       only by the pmemobj_tx_*() functions.  Transaction stages are defined as follows:

       • TX_STAGE_NONE - no open transaction in this thread

       • TX_STAGE_WORK - transaction in progress

       • TX_STAGE_ONCOMMIT - successfully committed

       • TX_STAGE_ONABORT - starting the transaction failed or transaction aborted

       • TX_STAGE_FINALLY - ready for clean up

       The pmemobj_tx_begin() function starts a new transaction in the current thread.  If called within an open
       transaction, it starts a nested transaction.  The caller may use the env argument to provide a pointer to
       a calling environment to be restored in case of transaction abort.  This information must be provided  by
       the caller using the setjmp(3) macro.

       A  new  transaction  may  be  started  only  if  the current stage is TX_STAGE_NONE or TX_STAGE_WORK.  If
       successful, the transaction  stage  changes  to  TX_STAGE_WORK.   Otherwise,  the  stage  is  changed  to
       TX_STAGE_ONABORT.

       Optionally,  a list of parameters for the transaction may be provided.  Each parameter consists of a type
       followed by a type-specific number of values.  Currently there are 4 types:

       • TX_PARAM_NONE, used as a termination marker.  No following value.

       • TX_PARAM_MUTEX, followed by one value, a pmem-resident PMEMmutex

       • TX_PARAM_RWLOCK, followed by one value, a pmem-resident PMEMrwlock

       • TX_PARAM_CB, followed by two values: a callback  function  of  type  pmemobj_tx_callback,  and  a  void
         pointer

       Using  TX_PARAM_MUTEX or TX_PARAM_RWLOCK causes the specified lock to be acquired at the beginning of the
       transaction.  TX_PARAM_RWLOCK acquires the lock for writing.  It is  guaranteed  that  pmemobj_tx_begin()
       will  acquire all locks prior to successful completion, and they will be held by the current thread until
       the outermost transaction is finished.  Locks are taken in order from left to right.  To avoid deadlocks,
       the user is responsible for proper lock ordering.

       TX_PARAM_CB  registers  the  specified  callback  function to be executed at each transaction stage.  For
       TX_STAGE_WORK, the callback is executed prior to commit.  For all other stages, the callback is  executed
       as the first operation after a stage change.  It will also be called after each transaction; in this case
       the stage parameter will be set to TX_STAGE_NONE.  pmemobj_tx_callback must be compatible with:

              void func(PMEMobjpool *pop, enum pobj_tx_stage stage, void *arg)

       pop is a pool identifier used in pmemobj_tx_begin(), stage is a current transaction stage and arg is  the
       second  parameter  of  TX_PARAM_CB.   Without  considering  transaction  nesting,  this  mechanism can be
       considered an alternative method for executing code between stages (instead of  TX_ONCOMMIT,  TX_ONABORT,
       etc).  However, there are 2 significant differences when nested transactions are used:

       • The  registered  function is executed only in the outermost transaction, even if registered in an inner
         transaction.

       • There can be only one callback in the entire transaction, that is, the callback cannot be changed in an
         inner transaction.

       Note  that  TX_PARAM_CB  does  not  replace  the TX_ONCOMMIT, TX_ONABORT, etc.  macros.  They can be used
       together: the callback will be executed before a TX_ONCOMMIT, TX_ONABORT, etc.  section.

       TX_PARAM_CB can be used when the code dealing with transaction stage changes is shared  between  multiple
       users  or when it must be executed only in the outer transaction.  For example it can be very useful when
       the application must synchronize persistent and transient state.

       The pmemobj_tx_lock() function acquires the lock lockp of type lock_type  and  adds  it  to  the  current
       transaction.   lock_type  may  be  TX_LOCK_MUTEX  or  TX_LOCK_RWLOCK;  lockp must be of type PMEMmutex or
       PMEMrwlock, respectively.  If lock_type is TX_LOCK_RWLOCK the lock is acquired for writing.  If the  lock
       is  not successfully acquired, the function returns an error number.  This function must be called during
       TX_STAGE_WORK.

       The pmemobj_tx_xlock()  function  behaves  exactly  the  same  as  pmemobj_tx_lock()  when  flags  equals
       POBJ_XLOCK_NO_ABORT.  When flags equals 0 and if the lock is not successfully acquired,the transaction is
       aborted.  flags is a bitmask of the following values:

       • POBJ_XLOCK_NO_ABORT - if the function does not end successfully, do not abort the transaction.

       pmemobj_tx_abort() aborts the current transaction and causes a transition to TX_STAGE_ONABORT.  If errnum
       is  equal  to  0,  the  transaction error code is set to ECANCELED; otherwise, it is set to errnum.  This
       function must be called during TX_STAGE_WORK.

       The pmemobj_tx_commit() function commits  the  current  open  transaction  and  causes  a  transition  to
       TX_STAGE_ONCOMMIT.   If  called  in  the  context  of  the  outermost transaction, all the changes may be
       considered as  durably  written  upon  successful  completion.   This  function  must  be  called  during
       TX_STAGE_WORK.

       The pmemobj_tx_end() function performs a cleanup of the current transaction.  If called in the context of
       the outermost transaction, it releases all the locks acquired by pmemobj_tx_begin() for outer and  nested
       transactions.   If  called in the context of a nested transaction, it returns to the context of the outer
       transaction in TX_STAGE_WORK, without releasing any locks.  The pmemobj_tx_end() function can  be  called
       during  TX_STAGE_NONE  if  transitioned  to  this  stage  using  pmemobj_tx_process().  If not already in
       TX_STAGE_NONE, it causes the transition to TX_STAGE_NONE.  pmemobj_tx_end must always be called for  each
       pmemobj_tx_begin(),  even  if  starting  the transaction failed.  This function must not be called during
       TX_STAGE_WORK.

       The pmemobj_tx_errno() function returns the error code of the last transaction.

       The pmemobj_tx_process() function  performs  the  actions  associated  with  the  current  stage  of  the
       transaction,  and  makes  the  transition  to  the  next stage.  It must be called in a transaction.  The
       current stage must always be obtained by a call to pmemobj_tx_stage().  pmemobj_tx_process() performs the
       following transitions in the transaction stage flow:

       • TX_STAGE_WORK -> TX_STAGE_ONCOMMITTX_STAGE_ONABORT -> TX_STAGE_FINALLYTX_STAGE_ONCOMMIT -> TX_STAGE_FINALLYTX_STAGE_FINALLY -> TX_STAGE_NONETX_STAGE_NONE -> TX_STAGE_NONE

       pmemobj_tx_process() must not be called after calling pmemobj_tx_end() for the outermost transaction.

       In addition to the above API, libpmemobj(7) offers a more intuitive method of building transactions using
       the set of macros described below.  When using these macros, the complete  transaction  flow  looks  like
       this:

              TX_BEGIN(Pop) {
                  /* the actual transaction code goes here... */
              } TX_ONCOMMIT {
                  /*
                   * optional - executed only if the above block
                   * successfully completes
                   */
              } TX_ONABORT {
                  /*
                   * optional - executed only if starting the transaction fails,
                   * or if transaction is aborted by an error or a call to
                   * pmemobj_tx_abort()
                   */
              } TX_FINALLY {
                  /*
                   * optional - if exists, it is executed after
                   * TX_ONCOMMIT or TX_ONABORT block
                   */
              } TX_END /* mandatory */

              TX_BEGIN_PARAM(PMEMobjpool *pop, ...)
              TX_BEGIN_CB(PMEMobjpool *pop, cb, arg, ...)
              TX_BEGIN(PMEMobjpool *pop)

       The  TX_BEGIN_PARAM(),  TX_BEGIN_CB()  and  TX_BEGIN()  macros start a new transaction in the same way as
       pmemobj_tx_begin(), except that instead of the environment buffer provided by a caller, they set  up  the
       local  jmp_buf  buffer  and  use  it  to  catch  the  transaction  abort.   The TX_BEGIN() macro starts a
       transaction without any options.  TX_BEGIN_PARAM may be used when there is a need to acquire locks  prior
       to  starting a transaction (such as for a multi-threaded program) or set up a transaction stage callback.
       TX_BEGIN_CB is just a  wrapper  around  TX_BEGIN_PARAM  that  validates  the  callback  signature.   (For
       compatibility  there is also a TX_BEGIN_LOCK macro, which is an alias for TX_BEGIN_PARAM).  Each of these
       macros must be followed by a block of code with all the operations that are to be performed atomically.

       The TX_ONABORT macro starts a block of code that will be executed only if starting the transaction  fails
       due  to an error in pmemobj_tx_begin(), or if the transaction is aborted.  This block is optional, but in
       practice it should not be omitted.  If it is desirable to crash the application when a transaction aborts
       and  there  is  no  TX_ONABORT  section, the application can define the POBJ_TX_CRASH_ON_NO_ONABORT macro
       before inclusion of <libpmemobj.h>.   This  provides  a  default  TX_ONABORT  section  which  just  calls
       abort(3).

       The  TX_ONCOMMIT  macro  starts  a  block  of  code  that  will  be  executed  only if the transaction is
       successfully committed, which means that the execution of code in  the  TX_BEGIN()  block  has  not  been
       interrupted by an error or by a call to pmemobj_tx_abort().  This block is optional.

       The  TX_FINALLY  macro starts a block of code that will be executed regardless of whether the transaction
       is committed or aborted.  This block is optional.

       The TX_END macro cleans up and closes the transaction started by  the  TX_BEGIN()  /  TX_BEGIN_PARAM()  /
       TX_BEGIN_CB() macros.  It is mandatory to terminate each transaction with this macro.  If the transaction
       was aborted, errno is set appropriately.

   TRANSACTION LOG TUNING
       From libpmemobj implementation perspective there are two types of operations in a transaction:

       • snapshots, where action must be persisted immediately,

       • intents, where action can be persisted at the transaction commit phase

       pmemobj_tx_add_range(3) and all its variants belong to the snapshots group.

       pmemobj_tx_alloc(3) (with its variants), pmemobj_tx_free(3), pmemobj_tx_realloc(3)  (with  its  variants)
       and  pmemobj_tx_publish(3)  belong to the intents group.  Even though pmemobj_tx_alloc() allocates memory
       immediately, it modifies only the runtime state and postpones  persistent  memory  modifications  to  the
       commit  phase.   pmemobj_tx_free(3)  cannot  free the object immediately, because of possible transaction
       rollback, so it postpones both the action and  persistent  memory  modifications  to  the  commit  phase.
       pmemobj_tx_realloc(3)  is  just a combination of those two.  pmemobj_tx_publish(3) postpones reservations
       and deferred frees to the commit phase.

       Those two types of operations (snapshots and intents) require that libpmemobj builds a persistent log  of
       operations.  Intent log (also known as a “redo log”) is applied on commit and snapshot log (also known as
       an “undo log”) is applied on abort.

       When libpmemobj transaction starts, it’s not possible to predict how much persistent memory space will be
       needed  for  those  logs.   This  means that libpmemobj must internally allocate this space whenever it’s
       needed.  This has two downsides:

       • when transaction snapshots a lot of memory or does a lot of allocations, libpmemobj may need to do many
         internal  allocations,  which  must  be  freed  when  transaction  ends,  adding time overhead when big
         transactions are frequent,

       • transactions can start to fail due to not enough space for logs - this can  be  especially  problematic
         for transactions that want to deallocate objects, as those might also fail

       To solve both of these problems libpmemobj exposes the following functions:

       • pmemobj_tx_log_append_buffer(),

       • pmemobj_tx_xlog_append_buffer(),

       • pmemobj_tx_log_auto_alloc()

       pmemobj_tx_log_append_buffer() appends a given range of memory [addr, addr + size) to the log type of the
       current transaction.  type can be one of the two values (with meanings described above):

       • TX_LOG_TYPE_SNAPSHOT,

       • TX_LOG_TYPE_INTENT

       The range of memory must belong to the same pool the transaction is on and must not be used by more  than
       one  thread at the same time.  The latter condition can be verified with tx.debug.verify_user_buffers ctl
       (see pmemobj_ctl_get(3)).

       The pmemobj_tx_xlog_append_buffer() function behaves exactly the same  as  pmemobj_tx_log_append_buffer()
       when flags equals zero.  flags is a bitmask of the following values:

       • POBJ_XLOG_APPEND_BUFFER_NO_ABORT  -  if  the  function  does  not  end  successfully,  do not abort the
         transaction.

       pmemobj_tx_log_snapshots_max_size calculates the maximum size of a buffer which  will  be  able  to  hold
       nsizes  snapshots, each of size sizes[i].  Application should not expect this function to return the same
       value between restarts.  In future versions of libpmemobj this function can return  smaller  (because  of
       better  accuracy  or  space  optimizations)  or  higher  (because of higher alignment required for better
       performance) value.  This function is independent of transaction stage and can be called both inside  and
       outside  of  transaction.   If  the  returned  value S is greater than PMEMOBJ_MAX_ALLOC_SIZE, the buffer
       should  be  split  into  N  chunks  of  size  PMEMOBJ_MAX_ALLOC_SIZE,  where  N  is   equal   to   (S   /
       PMEMOBJ_MAX_ALLOC_SIZE) (rounded down) and the last chunk of size (S - (N * PMEMOBJ_MAX_ALLOC_SIZE)).

       pmemobj_tx_log_intents_max_size  calculates  the  maximum  size  of  a  buffer which will be able to hold
       nintents intents.  Just like with pmemobj_tx_log_snapshots_max_size, application should not  expect  this
       function  to  return the same value between restarts, for the same reasons.  This function is independent
       of transaction stage and can be called both inside and outside of transaction.

       pmemobj_tx_log_auto_alloc() disables (on_off set to 0) or enables (on_off set to 1) automatic  allocation
       of   internal   logs   of   given   type.    It   can  be  used  to  verify  that  the  buffer  set  with
       pmemobj_tx_log_append_buffer() is big enough to hold the log, without reaching out-of-space scenario.

       The pmemobj_tx_set_user_data() function associates custom volatile state, represented  by  pointer  data,
       with  the  current  transaction.   This  state  can  later  be retrieved using pmemobj_tx_get_user_data()
       function.    If   pmemobj_tx_set_user_data()   was   not    called    for    a    current    transaction,
       pmemobj_tx_get_user_data()  will  return  NULL.   These  functions must be called during TX_STAGE_WORK or
       TX_STAGE_ONABORT or TX_STAGE_ONCOMMIT or TX_STAGE_FINALLY.

       pmemobj_tx_set_failure_behavior()  specifies  what  should  happen  in  case  of  an  error  within   the
       transaction.  It only affects functions which take a NO_ABORT flag.  If pmemobj_tx_set_failure_behavior()
       is called with POBJ_TX_FAILURE_RETURN a NO_ABORT flag is implicitly passed to all functions which  accept
       this  flag.   If  called  with  POBJ_TX_FAILURE_ABORT  then  all  functions abort the transaction (unless
       NO_ABORT flag is passed explicitly).  This setting is inherited  by  inner  transactions.   It  does  not
       affect   any   of   the   outer   transactions.    Aborting   on   failure   is   the  default  behavior.
       pmemobj_tx_get_failure_behavior()  returns  failure  behavior  for   the   current   transaction.    Both
       pmemobj_tx_set_failure_behavior()   and   pmemobj_tx_get_failure_behavior()   must   be   called   during
       TX_STAGE_WORK.

RETURN VALUE

       The pmemobj_tx_stage() function returns the stage of the current transaction stage for a thread.

       On success, pmemobj_tx_begin() returns 0.  Otherwise, an error number is returned.

       The pmemobj_tx_begin() and pmemobj_tx_lock() functions return zero if lockp is successfully added to  the
       transaction.  Otherwise, an error number is returned.

       The  pmemobj_tx_xlock()  function  return  zero  if  lockp  is  successfully  added  to  the transaction.
       Otherwise, the error number is returned, errno is set and when flags do not contain  POBJ_XLOCK_NO_ABORT,
       the transaction is aborted.

       The pmemobj_tx_abort() and pmemobj_tx_commit() functions return no value.

       The  pmemobj_tx_end()  function  returns  0  if the transaction was successful.  Otherwise it returns the
       error code set by pmemobj_tx_abort().  Note that pmemobj_tx_abort()  can  be  called  internally  by  the
       library.

       The pmemobj_tx_errno() function returns the error code of the last transaction.

       The pmemobj_tx_process() function returns no value.

       On   success,   pmemobj_tx_log_append_buffer()   returns   0.    Otherwise,   the  stage  is  changed  to
       TX_STAGE_ONABORT, errno is set appropriately and transaction is aborted.

       On success, pmemobj_tx_xlog_append_buffer() returns 0.  Otherwise, the error number is returned, errno is
       set and when flags do not contain POBJ_XLOG_NO_ABORT, the transaction is aborted.

       On  success,  pmemobj_tx_log_auto_alloc()  returns 0.  Otherwise, the transaction is aborted and an error
       number is returned.

       On success, pmemobj_tx_log_snapshots_max_size() returns size  of  the  buffer.   On  failure  it  returns
       SIZE_MAX and sets errno appropriately.

       On success, pmemobj_tx_log_intents_max_size() returns size of the buffer.  On failure it returns SIZE_MAX
       and sets errno appropriately.

CAVEATS

       Transaction flow control is governed by the setjmp(3) and longjmp(3) macros, and they are  used  in  both
       the  macro and function flavors of the API.  The transaction will longjmp on transaction abort.  This has
       one major drawback, which is described in the ISO C standard  subsection  7.13.2.1.   It  says  that  the
       values  of  objects  of  automatic  storage duration that are local to the function containing the setjmp
       invocation that do not have volatile-qualified type and have been changed between the  setjmp  invocation
       and longjmp call are indeterminate.

       The following example illustrates the issue described above.

              int *bad_example_1 = (int *)0xBAADF00D;
              int *bad_example_2 = (int *)0xBAADF00D;
              int *bad_example_3 = (int *)0xBAADF00D;
              int * volatile good_example = (int *)0xBAADF00D;

              TX_BEGIN(pop) {
                  bad_example_1 = malloc(sizeof(int));
                  bad_example_2 = malloc(sizeof(int));
                  bad_example_3 = malloc(sizeof(int));
                  good_example = malloc(sizeof(int));

                  /* manual or library abort called here */
                  pmemobj_tx_abort(EINVAL);
              } TX_ONCOMMIT {
                  /*
                   * This section is longjmp-safe
                   */
              } TX_ONABORT {
                  /*
                   * This section is not longjmp-safe
                   */
                  free(good_example); /* OK */
                  free(bad_example_1); /* undefined behavior */
              } TX_FINALLY {
                  /*
                   * This section is not longjmp-safe on transaction abort only
                   */
                  free(bad_example_2); /* undefined behavior */
              } TX_END

              free(bad_example_3); /* undefined behavior */

       Objects which are not volatile-qualified, are of automatic storage duration and have been changed between
       the invocations of setjmp(3) and longjmp(3) (that also means within the work section of  the  transaction
       after TX_BEGIN()) should not be used after a transaction abort, or should be used with utmost care.  This
       also includes code after the TX_END macro.

       libpmemobj(7) is not cancellation-safe.  The pool will never be corrupted because of a  canceled  thread,
       but  other  threads  may  stall  waiting  on locks taken by that thread.  If the application wants to use
       pthread_cancel(3),  it  must  disable  cancellation  before   calling   any   libpmemobj(7)   APIs   (see
       pthread_setcancelstate(3)   with   PTHREAD_CANCEL_DISABLE),   and  re-enable  it  afterwards.   Deferring
       cancellation  (pthread_setcanceltype(3)  with  PTHREAD_CANCEL_DEFERRED)  is  not  safe  enough,   because
       libpmemobj(7) internally may call functions that are specified as cancellation points in POSIX.

       libpmemobj(7)  relies  on the library destructor being called from the main thread.  For this reason, all
       functions that might trigger destruction  (e.g.   dlclose(3))  should  be  called  in  the  main  thread.
       Otherwise some of the resources associated with that thread might not be cleaned up properly.

SEE ALSO

       dlclose(3),    longjmp(3),   pmemobj_tx_add_range(3),   pmemobj_tx_alloc(3),   pthread_setcancelstate(3),
       pthread_setcanceltype(3), setjmp(3), libpmemobj(7) and <https://pmem.io>