Ubuntu Manpage: elektra-algorithm - core algorithm of elektra

name
introduction
kdbget
kdbset}

NAME

       elektra-algorithm - core algorithm of elektra

       You   might   want  to  read  about  architecture  elektra-architecture.md.  and  data  structures  first
       elektra-data-structures.md.

Introduction

In this section, we will explain the heart of Elektra. kdbOpen() is responsible for the setup and the
construction of the data structures needed later. kdbGet() does, together with the plugins, all actions
necessary to read in the configuration. kdbSet() orchestrates the plugins to write out the configuration
correctly. kdbClose() finally frees all previously allocated data structures.

kdbOpen
kdbOpen() retrieves the \empha{mount point configuration} with kdbGet() using the \empha{default
backend}. During this process, the function sets up the data structures which are needed for later
invocations of kdbGet() or kdbSet(). All backends are opened and mounted in the appropriate parts of the
key hierarchy. The resulting backends are added both to the Split and the Trie object. kdbOpen() finally
returns a KDB object that contains all this information as shown in Figure~\ref{fig:architecture}.

The reading of the mount point configuration and the consequential self configuring of the system is
called \intro{bootstrapping}. Elektra builds itself up from the simple variant with a default backend
only to the sophisticated configuration system presented in this thesis.

kdbOpen() creates a Split object. It adds all backend handles and parentKeys during bootstrapping. So the
buildup of the Split object takes place once. The resulting object is then used for both kdbGet() and
kdbSet(). This approach is much better testable because the Split object is first initialised using the
mount point configuration -- separated from the filtering of the backends for every specific kdbGet() and
kdbSet() request.

Afterwards the key hierarchy is static. Every application using Elektra will build up the same key
database. Application specific mount points are prohibited because changes of mount points would destroy
the global key database. Elektra could not guarantee that every application retrieves the same
configuration with the same key names any longer.

In kdbOpen(), nearly no checks are done regarding the expected behaviour of the backend. The contract
checker guarantees that only appropriate mount points are written into the mount point configuration.
kdbOpen() checks only if the opening of plugin was successful. If not, the backend enclosing the plugin
is not mounted at all.

Removing Keys
In Elektra version $0.6$, removing keys was an explicit request. Only a single Key object could be
removed from the database. For configuration files this method is inapplicable. For filesys, however, it
was easy to implement.

In Elektra version $0.7$, the behaviour changed. Removing keys was integrated into kdbSet(). The user
tagged keys that should be removed. After the next kdbSet(), these keys were removed from the key
database. On the one hand, backends writing configuration files simply ignored the keys marked for
removal. On the other hand, filesys needed that information to remove the files. To make this approach
work for filesys, the marked keys were located at the very end of the KeySet and sorted in reverse. With
this trick, recursive removing worked well. But this approach had major defects in the usage of KeySet.
Because marking a key to be removed changed the sort order of the key set \method{ksLookupByName()} did
not find this key anymore.

So in the present version removing keys is consistent again. A KeySet describes the current
configuration. The user can reduce the KeySet object by \empha[pop]{popping} keys out. The kdbSet()
function applies exactly this configuration as specified by the key set to the key database. Contrary to
the previous versions, the popped keys of the key set will be permanently removed.

The new circumstance yields idempotent properties for kdbSet(). The same KeySet can be applied multiple
times, but after the first time, the key database will not be changed anymore. (Note that kdbSet())
actually detects that there are no changes and will do nothing. To actually show the idempotent behaviour
the KeySet has to be regenerated or the key database needs to be reopened.

It is, however, not known if keys should be removed permanently only by investigating the KeySet. But
only if this knowledge is present, the core can decide if the key set needs to be written out or if the
configuration is unchanged. So we decided to track how many keys are delivered in kdbGet(). If the size
of the KeySet is lower than this number determined at the previous kdbGet(), Elektra´s core knows that
some keys were popped. Hence, the next kdbSet() invocation needs to change the concerned key database.

The situation is now much clearer. The semantics of popping a key will result in removing the key from
the key database. And the intuitive idea that a KeySet will be applied to the key database is correct
again.

kdbGet

       It is critical for application startup time to retrieve the configuration as fast as possible. Hence, the
       design  goal  of  the  kdbGet() algorithm is to be efficient while still enabling plugins to have relaxed
       postconditions. To achieve this, the sequence of \intro[syscall]{syscalls} must be optimal. On the  other
       hand,  it  is  not  tolerable to waste time or memory inside Elektra´s core, especially during an initial
       request or when no update is available.

       The synopsis of the function is:

           int kdbGet(KDB *handle, KeySet *returned, Key * parentKey);

       The user passes a key set, called returned. If the user invokes kdbGet() the first time, he or  she  will
       usually  pass  an  empty  key  set. If the user wants to update the application´s settings, returned will
       typically contain the configuration of the previous kdbGet() request. The parentKey holds the information
       below which key the configuration should be retrieved. The handle contains the data structures needed for
       the algorithm, like the Split and the Trie objects, as shown in Figure~\ref{fig:architecture}.

       kdbGet() does a  rather  easy  job,  because  kdbSet()  already  guarantees  that  only  well  formatted,
       non-corrupted  and  well-typed configuration is written out in the key database. The task is to query all
       backends in question for their configuration and then merge everything.

   Responsibility
       A backend may yield keys that it is not responsible for. It is not possible for a backend  to  know  that
       another backend has been mounted below and the other backend is now responsible for some of the keys that
       are still in the storage. Additionally, plugins are not able to determine if they are responsible  for  a
       key or not. Consequently, it can happen that more than one backend delivers a key with the same name.

       kdbGet()  ensures  that  a  key is uniquely identified by its name. Elektra´s core will pop keys that are
       outside of the backend´s responsibility. Hence, these keys will not be passed to the user and we get  the
       desired behaviour: The nearest mounted backend to the key is responsible.

       For  example,  a  generator  plugin  in  the  backend  A always emits following keys\footnote{(A) and (B)
       indicate from which backend the key comes  from.}:  \begin{lstlisting}[language=]  user/sw/generator/akey
       (A)  user/sw/generator/dir  (A)  user/sw/generator/dir/outside1  (A)  user/sw/generator/dir/outside2  (A)
       \end{lstlisting} It will still return these keys even if the plugin is not responsible for some  of  them
       anymore.  This  can  happen  if  another  backend B is mounted to \keyname{user/sw/generator/dir}. In the
       example  it  yields  the  following   keys:   \begin{lstlisting}[language=]   user/sw/generator/dir   (B)
       user/sw/generator/dir/new    (B)   user/sw/generator/dir/outside1   (B)   user/sw/generator/outside   (B)
       \end{lstlisting} In this situation kdbGet()  is  responsible  to  pop  all  three  keys  at,  and  below,
       \keyname{user/sw/generator/dir}  of  backend A and the key \keyname{user/sw/generator/outside} of backend
       B. The user will get the resulting  key  set:  \begin{lstlisting}[language=]  user/sw/generator/akey  (A)
       user/sw/generator/dir    (B)    user/sw/generator/dir/new    (B)    user/sw/generator/dir/outside1    (B)
       \end{lstlisting} Note that the key exactly  at  the  mount  point  comes  from  the  backend  mounted  at
       \keyname{user/sw/generator/dir}.

   Sequence
       kdbOpen() already creates a Split object for the whole configuration tree. In this object, kdbOpen() will
       append a list of all backends available. A specific kdbGet() request usually includes only a part of  the
       configuration. For example, the user is only interested in keys below \keyname{user/sw/apps/userapp}. All
       backends that cannot contribute to configuration below \keyname{user/sw/apps/userapp} will be omitted for
       that  request.  To  achieve this, parts of the Split object are filtered out. After this step we know the
       list of backends involved. The Split object allocates a key set for each of these backends.

       Afterwards the first plugin of each backend is called to determine if an update is needed. If  no  update
       is needed, the algorithm has finished and returns zero.

       Now  we  know  which  backends do not need an update. For these backends, the previous configuration from
       returned is appointed from to the key  sets  of  the  Split  object.  The  algorithm  will  not  set  the
       \empha{syncbits}  of  the  Split  object  for  these backends because the storage of the backends already
       contains up-to-date configuration.

       The other backends will be requested to \emph{retrieve} their configuration.  The  initial  empty  KeySet
       from  the  Split  object  and  the  relevant  file  name in the key value of parentKey are passed to each
       remaining plugin. The plugins extend, validate and process the key set. When an error has  occurred,  the
       algorithm can stop immediately because the user´s KeySet returned is not changed at this point. When this
       part finishes, the Split object contains the whole requested configuration separated in various key sets.

       Subsequently the freshly received keys need some \emph{post-processing}:

       •   Newly allocated keys in Elektra always have the \empha{sync flag} set. Because the  plugins  allocate
           and modify keys with the same functions as the user, the returned keys will also have their sync flag
           set. But from the user´s point of view the configuration is unmodified. So some code needs to  remove
           this sync flag. To relax the post conditions of the plugins, kdbGet() removes it.

       •   To  detect  removed keys in subsequent kdbSet() calls, kdbGet() needs to store the number of received
           keys of each backend.

       •   Additionally, for every key it is checked if it belongs to this backend. This makes sure  that  every
           key  comes  from  a single source only as designated by the Trie. In this process, all duplicated and
           overlapping keys will be  popped  in  favour  of  the  responsible  backend  as  described  below  in
           responsibility.

       The last step is to \emph{merge} all these key sets together. This step changes the configuration visible
       to the user. After some cleanup the algorithm finally finishes.

   Updating Configuration
       The user can call kdbGet() often even if the configuration or parts of it are already up  to  date.  This
       can happen when applications reread configuration in some events. Examples are signals\footnote{SIGHUP is
       the signal used for that on UNIX systems. It is sent when the program´s controlling terminal  is  closed.
       Daemons do not have a terminal so the signal is reused for reloading configuration.}, notifications, user
       requests and in the worst case periodical attempts to reread configuration.

       The given goal is to keep the sequence of needed syscalls low. If no update is needed, it  is  sufficient
       to  request  the  timestamp\footnote{On  POSIX systems using \lstinline{stat()}.} of every file. No other
       syscall is needed. Elektra´s core alone cannot check that because getting  a  timestamp  is  not  defined
       within the standard C99. So instead the resolver plugin handles this problem. The resolver plugin returns
       0 if nothing has changed.

       This decision yields some advantages. Both the storage plugins and Elektra´s core  can  conform  to  C99.
       Because  the  resolver plugin is the very first in the chain of plugins, it is guaranteed that no useless
       work is done.

   Initial kdbGet Problem
       Because Elektra provides  self-contained  configuration,  kdbOpen()  has  to  retrieve  settings  in  the
       \empha{bootstrapping}  process  below  \keyname{system/elektra}  as  explained in \secref{bootstrapping}.
       Because of the new way to keep track of removed keys, the internally executed kdbGet() creates a problem.
       Without   countermeasures  even  the  first  kdbGet()  of  a  user  requesting  the  configuration  below
       \keyname{system/elektra} fails because the resolver finds out that the configuration  is  already  up  to
       date.  The  configuration  delivered  by  the  user  is  empty  at  this  point.  As  a result, the empty
       configuration will be appointed and returned to the user.

       A simple way to resolve this issue is to reload the default backend after the internal configuration  was
       fetched. Reloading resets the timestamps and kdbGet() works as expected.

kdbSet}

       Not  the  performance,  but  robust  and reliable behaviour is the most important issue for kdbSet(). The
       design was chosen so that some additional in-memory comparisons are preferred to a suboptimal sequence of
       \empha[syscall]{syscalls}.  The  algorithm  makes  sure that keys are written out only if it is necessary
       because applications can call kdbSet() with an unchanged KeySet. For the code to decide this, performance
       is important.

   Properties
       kdbSet() \empha[guarantee]{guarantees} the following properties:

       \begin{enumerate}

       \item Modifications to permanent storage are only made when the configuration was changed.

       \item  When  errors  occur,  every  plugin  gets  a  chance  to  rollback  its  changes  as  described in
       \secref{exception safety}.

       \item If every plugin does this correctly, the whole KeySet is propagated to permanent storage. Otherwise
       nothing is changed in the key database. Plugins developed during the thesis meet this requirement.

       \end{enumerate}

       The  synopsis  of  the  function  is:  \begin{lstlisting}  int  kdbSet(KDB  handle, KeySetreturned, Key *
       parentKey); \end{lstlisting}

       The user passes the configuration using the KeySet returned. The key set will not be changed by kdbSet().
       The  parentKey  provides  a way to limit which part of the configuration is written out. For example, the
       parentKey \keyname{user/sw/apps/myapp} will induce kdbSet()  to  only  modify  the  key  databases  below
       \keyname{user/sw/apps/myapp}  even if the KeySet returned also contains more configuration. Note that all
       backends with no keys in returned but that are  below  parentKey  will  completely  wipe  out  their  key
       database.    The    KDB    handle    contains    the    necessary    data    structures   as   shown   in
       Figure~\ref{fig:architecture}.

       \newpage

   Search for Changes
       \begin{wrapfigure}{r}{0.5\textwidth} \vspace{-40pt} \begin{center} \includegraphics[trim  =  0  65  0  0,
       clip=true,  width=6cm]{resolver_set} \end{center} \vspace{-20pt} \caption{\lstinline{kdbSet()} Algorithm}
       \label{fig:resolver_set} \vspace{-10pt} \end{wrapfigure}

       As a first step, kdbSet() \emph{divides} the configuration passed in by the user to the key sets  in  the
       Split  object. kdbSet() searches for every key if the \empha{sync flag} is checked. Then kdbSet() decides
       if a key was removed from a backend by comparing the actual size of the key set with the size stored from
       the last kdbGet() call. We see that it is necessary to call kdbGet() first before invocations of kdbSet()
       are allowed.

       We know that data of a backend has to be written out if at least one key was changed or  removed.  If  no
       backend  has any changes, the algorithm will terminate at this point. The careful reader notices that the
       process involves no file operations.

   Duplicated Key Sets
       If some backends need synchronisation, the algorithm continues by filtering out all backends in the Split
       object  that  do  not  have  changes.  At  this point, the Split object has a list of backends with their
       respective key sets.

       \label{deep duplicate}

       Plugins in kdbSet() can change values. Other than in kdbGet(),  the  user  is  not  interested  in  these
       changes.  Instead,  the  values  are  transformed  to  be suitable for the storage. To make sure that the
       changed values are not passed to the user, the algorithm continues with a \empha{deep duplication} of all
       key sets in the Split object.

   Resolver
       All plugins of each included backend are executed one by one up to the resolver plugin. If this succeeds,
       the  resolver  plugin  is  responsible  for  committing  these  changes.  After  the  successful  commit,
       \empha[error  code]{error  codes}  of  plugins  are  ignored.  Only  logging and notification plugins are
       affected.

   Atomic Replacement
       For this thesis only file-based storage with atomic properties were developed. The replacement of a  file
       with another file that has not yet been written is not trivial. The straightforward way is to lock a file
       and start writing to it. But this approach can result in broken or partially  finished  files  in  events
       like ´´out of disc space´´, signals or other asynchronous aborts of the program.

       A  temporary  file  solves this problem, because in problematic events the original file stays untouched.
       When the temporary file is written out properly, it is renamed and the  original  configuration  file  is
       overwritten.  But  another  concurrent invocation of kdbSet() can try to do the same with the result that
       one of the newly written files is lost.

       To avoid this problem, locks are needed again. It is not possible to lock the configuration  file  itself
       because  it  will  be unlinked when the temporary file is renamed. So a third file for locking is needed.
       The resolver currently implements this approach.

       An alternative to this approach without locks is  to  completely  rely  on  the  modification  time.  The
       modification  time  typically  has  only a resolution of one second. So any changes within that time slot
       will not be recognised. For this approach, however, the name of  every  temporary  file  must  be  unique
       because  concurrent kdbSet() invocations each try to create one. The temporary file must also be unlinked
       in case of a rollback. The opened temporary file can be passed to the storage plugins using a  file  name
       in  the  directory  \keyname{/dev/fd}. This approach may be more practical than the currently implemented
       way because it does not need the additional lock file\footnote{Nevertheless, the other way was chosen  to
       test if the algorithm is exception safe as described in \secref{exception safety}.}.

   Errors
       The  plugins  within  kdbSet()  can fail for a variety of reasons. \intro[conflict]{Conflicts} occur most
       frequently. A conflict means that during executions of kdbGet() and kdbSet() another program has  changed
       the  key  database.  In  order  not  to lose any data, kdbSet() fails without doing anything. In conflict
       situations Elektra leaves the programmer no choice. The programmer  has  to  retrieve  the  configuration
       using  kdbGet()  again  to be up to date with the key database. Afterwards it is up to the application to
       decide which configuration to use. In this situation it is the best to ask the user, by showing  him  the
       description and reason of the error, how to continue:

       \begin{enumerate}

       \label{conflicts}

       \item Save the configuration again. The changes of the other program will be lost in this case.

       \item The key database can also be left unchanged as the other program wrote it. After using kdbGet() the
       application is already up to date with the new configuration. All configuration  changes  the  user  made
       before will be lost.

       \item  The application can try to merge the key sets to get the best result. If no key is changed on both
       sides the result is clear, otherwise the application has to decide if the own or the other  configuration
       should be favoured. The result of the merged key sets has to be written out with kdbSet().

       Merging  the  key  sets can be done with ksAppend(). The source parameter is the preferred configuration.
       Note that the downside of the third option is that a  merged  configuration  can  be  an  not  validating
       configuration.

       \end{enumerate}

       Sometimes  a  concrete key causes the problem that the whole key set cannot be stored. That can happen on
       validation or because of type errors. Such errors are usually caused by a mistake made by  the  user.  So
       the  user  is  responsible  for  changing  the  settings  to make it valid again. In such situations, the
       \empha{internal cursor} of the KeySet returned will point to the problematic key.

       A completely different approach is to export the configuration when kdbSet() returned an error code.  The
       user  can  then  edit, change or merge this configuration with more powerful tools. Finally, the user can
       import the configuration into the global  key  database.  The  export  and  import  mechanism  is  called
       ´´streaming´´ and will be explained in \secref{streaming}.

                                                  November 2015                             ELEKTRA-ALGORITHM(7)