Provided by: manpages_2.77-1_all bug

NAME

       epoll - I/O event notification facility

SYNOPSIS

       #include <sys/epoll.h>

DESCRIPTION

       epoll  is  a  variant  of  poll(2)  that can be used either as an edge-
       triggered or a level-triggered  interface  and  scales  well  to  large
       numbers  of  watched file descriptors.  Three system calls are provided
       to set up and control  an  epoll  set:  epoll_create(2),  epoll_ctl(2),
       epoll_wait(2).

       An   epoll   set   is   connected  to  a  file  descriptor  created  by
       epoll_create(2).   Interest  for  certain  file  descriptors  is   then
       registered  via  epoll_ctl(2).   Finally, the actual wait is started by
       epoll_wait(2).

   Level-Triggered and Edge-Triggered
       The epoll event distribution interface is able to behave both as  edge-
       triggered  (ET)  and  level-triggered (LT).  The difference between the
       two mechanisms can be described as follows.  Suppose that this scenario
       happens :

       1. The file descriptor that represents the read side of a pipe (rfd) is
          added inside the epoll device.

       2. A pipe writer writes 2 kB of data on the write side of the pipe.

       3. A call to epoll_wait(2) is done that will return rfd as a ready file
          descriptor.

       4. The pipe reader reads 1 kB of data from rfd.

       5. A call to epoll_wait(2) is done.

       If  the rfd file descriptor has been added to the epoll interface using
       the EPOLLET flag, the  call  to  epoll_wait(2)  done  in  step  5  will
       probably  hang  despite  the  available  data still present in the file
       input buffer; meanwhile the remote peer might be expecting  a  response
       based  on  the data it already sent.  The reason for this is that edge-
       triggered mode only delivers events when changes occur on the monitored
       file  descriptor.   So,  in  step 5 the caller might end up waiting for
       some data that is already present inside  the  input  buffer.   In  the
       above  example,  an event on rfd will be generated because of the write
       done in 2 and the event is consumed in 3.   Since  the  read  operation
       done  in  4  does  not  consume  the  whole  buffer  data,  the call to
       epoll_wait(2) done in step 5 might block indefinitely.

       An application that employs the EPOLLET  flag  (edge-triggered)  should
       use  non-blocking  file  descriptors to avoid having a blocking read or
       write starve a task that is handling multiple  file  descriptors.   The
       suggested  way to use epoll as an edge-triggered (EPOLLET) interface is
       as follows:

              i      with non-blocking file descriptors

              ii     by waiting for an event only after  read(2)  or  write(2)
                     return EAGAIN.

       By  contrast, when used as a level-triggered interface, epoll is simply
       a faster poll(2), and can be used wherever the latter is used since  it
       shares the same semantics.

       Since  even  with  the  edge-triggered  epoll  multiple  events  can be
       generated upon receipt of multiple chunks of data, the caller  has  the
       option  to  specify the EPOLLONESHOT flag, to tell epoll to disable the
       associated  file  descriptor  after  the  receipt  of  an  event   with
       epoll_wait(2).   When  the  EPOLLONESHOT  flag  is specified, it is the
       caller’s responsibility to rearm the file descriptor using epoll_ctl(2)
       with EPOLL_CTL_MOD.

   Example for Suggested Usage
       While  the  usage of epoll when employed as a level-triggered interface
       does have the same  semantics  as  poll(2),  the  edge-triggered  usage
       requires  more  clarification  to avoid stalls in the application event
       loop.  In this example, listener is  a  non-blocking  socket  on  which
       listen(2) has been called.  The function do_use_fd() uses the new ready
       file descriptor until EAGAIN is returned by either read(2) or write(2).
       An event-driven state machine application should, after having received
       EAGAIN,  record  its  current  state  so  that  at  the  next  call  to
       do_use_fd()  it  will  continue  to  read(2)  or write(2) from where it
       stopped before.

       struct epoll_event ev, *events;

       for (;;) {
           nfds = epoll_wait(kdpfd, events, maxevents, -1);

           for (n = 0; n < nfds; ++n) {
               if (events[n].data.fd == listener) {
                   client = accept(listener, (struct sockaddr *) &local,
                                   &addrlen);
                   if (client < 0){
                       perror("accept");
                       continue;
                   }
                   setnonblocking(client);
                   ev.events = EPOLLIN | EPOLLET;
                   ev.data.fd = client;
                   if (epoll_ctl(kdpfd, EPOLL_CTL_ADD, client, &ev) < 0) {
                       fprintf(stderr, "epoll set insertion error: fd=%d\n",
                               client);
                       return -1;
                   }
               } else {
                   do_use_fd(events[n].data.fd);
               }
           }
       }

       When used as an edge-triggered interface, for performance  reasons,  it
       is  possible  to  add  the  file  descriptor inside the epoll interface
       (EPOLL_CTL_ADD) once by specifying (EPOLLIN|EPOLLOUT).  This allows you
       to  avoid  continuously  switching between EPOLLIN and EPOLLOUT calling
       epoll_ctl(2) with EPOLL_CTL_MOD.

   Questions and Answers
       Q1     What happens if you add the same file descriptor to an epoll set
              twice?

       A1     You  will probably get EEXIST.  However, it is possible that two
              threads may add the same  file  descriptor  twice.   This  is  a
              harmless condition.

       Q2     Can  two  epoll  sets wait for the same file descriptor?  If so,
              are events reported to both epoll file descriptors?

       A2     Yes, and events would be reported to both.  However, it  is  not
              recommended.

       Q3     Is the epoll file descriptor itself poll/epoll/selectable?

       A3     Yes.

       Q4     What  happens  if  the epoll file descriptor is put into its own
              file descriptor set?

       A4     It will fail.  However, you can add  an  epoll  file  descriptor
              inside another epoll file descriptor set.

       Q5     Can  I  send  the  epoll  file  descriptor over a unix-socket to
              another process?

       A5     No.

       Q6     Will closing a file descriptor cause it to be removed  from  all
              epoll sets automatically?

       A6     Yes.

       Q7     If  more  than one event occurs between epoll_wait(2) calls, are
              they combined or reported separately?

       A7     They will be combined.

       Q8     Does an operation  on  a  file  descriptor  affect  the  already
              collected but not yet reported events?

       A8     You  can  do  two  operations  on  an  existing file descriptor.
              Remove would be meaningless for this case.  Modify will  re-read
              available I/O.

       Q9     Do  I  need  to  continuously read/write a file descriptor until
              EAGAIN when using the EPOLLET flag (edge-triggered behavior) ?

       A9     No you don’t.  Receiving  an  event  from  epoll_wait(2)  should
              suggest  to  you  that  such  file  descriptor  is ready for the
              requested I/O operation.  You have simply to consider  it  ready
              until  you  will receive the next EAGAIN.  When and how you will
              use such file descriptor is  entirely  up  to  you.   Also,  the
              condition  that  the  read/write  I/O  space is exhausted can be
              detected by checking the amount of data read from /  written  to
              the target file descriptor.  For example, if you call read(2) by
              asking to read a certain amount of data and  read(2)  returns  a
              lower  number  of bytes, you can be sure of having exhausted the
              read I/O space for such file descriptor.  The same is true  when
              writing using write(2).

   Possible Pitfalls and Ways to Avoid Them
       o Starvation (edge-triggered)

       If  there is a large amount of I/O space, it is possible that by trying
       to drain it the other files will not get processed causing  starvation.
       (This problem is not specific to epoll.)

       The  solution  is to maintain a ready list and mark the file descriptor
       as ready  in  its  associated  data  structure,  thereby  allowing  the
       application  to  remember  which  files  need to be processed but still
       round robin amongst all the ready files.  This also  supports  ignoring
       subsequent  events  you  receive  for file descriptors that are already
       ready.

       o If using an event cache...

       If you use an event cache or store all the  file  descriptors  returned
       from epoll_wait(2), then make sure to provide a way to mark its closure
       dynamically (i.e., caused by a previous event’s  processing).   Suppose
       you receive 100 events from epoll_wait(2), and in event #47 a condition
       causes event #13 to  be  closed.   If  you  remove  the  structure  and
       close(2) the file descriptor for event #13, then your event cache might
       still say there are events waiting for  that  file  descriptor  causing
       confusion.

       One  solution  for  this is to call, during the processing of event 47,
       epoll_ctl(EPOLL_CTL_DEL) to delete file  descriptor  13  and  close(2),
       then  mark  its  associated  data structure as removed and link it to a
       cleanup list.  If you find another event for file descriptor 13 in your
       batch  processing,  you  will  discover  the  file  descriptor had been
       previously removed and there will be no confusion.

VERSIONS

       The epoll API was introduced in Linux  kernel  2.5.44.   Its  interface
       should be finalized in Linux kernel 2.5.66.

CONFORMING TO

       The  epoll  API  is Linux-specific.  Some other systems provide similar
       mechanisms, for example, FreeBSD has kqueue, and Solaris has /dev/poll.

SEE ALSO

       epoll_create(2), epoll_ctl(2), epoll_wait(2)

COLOPHON

       This  page  is  part of release 2.77 of the Linux man-pages project.  A
       description of the project, and information about reporting  bugs,  can
       be found at http://www.kernel.org/doc/man-pages/.