Provided by: manpages_2.62-1_all bug
 

NAME

        epoll - I/O event notification facility
 

SYNOPSIS

        #include <sys/epoll.h>
 

DESCRIPTION

        epoll  is a variant of poll(2) that can be used either as an edge-trig‐
        gered or a level-triggered interface and scales well to  large  numbers
        of watched file descriptors.  Three system calls are provided to set up
        and control an epoll set: epoll_create(2), epoll_ctl(2), epoll_wait(2).
 
        An  epoll  set  is connected to a file descriptor created by epoll_cre     
        ate(2).  Interest for certain file descriptors is then  registered  via
        epoll_ctl(2).  Finally, the actual wait is started by epoll_wait(2).
 
    Level-Triggered and Edge-Triggered
        The  epoll event distribution interface is able to behave both as edge-
        triggered (ET) and level-triggered (LT).  The  difference  between  the
        two mechanisms can be described as follows.  Suppose that this scenario
        happens :
 
        1. The file descriptor that represents the read side of a pipe (rfd) is
           added inside the epoll device.
 
        2. A pipe writer writes 2 kB of data on the write side of the pipe.
 
        3. A call to epoll_wait(2) is done that will return rfd as a ready file
           descriptor.
 
        4. The pipe reader reads 1 kB of data from rfd.
 
        5. A call to epoll_wait(2) is done.
 
        If the rfd file descriptor has been added to the epoll interface  using
        the  EPOLLET flag, the call to epoll_wait(2) done in step 5 will proba‐
        bly hang despite the available data still present  in  the  file  input
        buffer;  meanwhile  the remote peer might be expecting a response based
        on the data it already sent.  The reason for this  is  that  edge-trig‐
        gered  mode  only  delivers  events when changes occur on the monitored
        file descriptor.  So, in step 5 the caller might  end  up  waiting  for
        some  data  that  is  already  present inside the input buffer.  In the
        above example, an event on rfd will be generated because of  the  write
        done  in  2  and  the event is consumed in 3.  Since the read operation
        done in 4  does  not  consume  the  whole  buffer  data,  the  call  to
        epoll_wait(2) done in step 5 might block indefinitely.
 
        An  application  that  employs the EPOLLET flag (edge-triggered) should
        use non-blocking file descriptors to avoid having a  blocking  read  or
        write  starve  a  task that is handling multiple file descriptors.  The
        suggested way to use epoll as an edge-triggered (EPOLLET) interface  is
        as follows:
 
               i      with non-blocking file descriptors
 
               ii     by  waiting  for  an event only after read(2) or write(2)
                      return EAGAIN.
 
        By contrast, when used as a level-triggered interface, epoll is simplay
        a  faster poll(2), and can be used wherever the latter is used since it
        shares the same semantics.
 
        Since even with the edge-triggered epoll multiple events can be  gener‐
        ated upon receipt of multiple chunks of data, the caller has the option
        to specify the EPOLLONESHOT flag, to tell epoll to disable the  associ‐
        ated  file descriptor after the receipt of an event with epoll_wait(2).
        When the EPOLLONESHOT flag is specified, it is the  caller’s  responsi‐
        bility   to   rearm   the   file  descriptor  using  epoll_ctl(2)  with
        EPOLL_CTL_MOD.
 
    Example for Suggested Usage
        While the usage of epoll when employed as a  level-triggered  interface
        does  have  the  same  semantics  as  poll(2), the edge-triggered usage
        requires more clarification to avoid stalls in  the  application  event
        loop.  In this example, listener is a non-blocking socket on which lis     
        ten(2) has been called.  The function do_use_fd() uses  the  new  ready
        file descriptor until EAGAIN is returned by either read(2) or write(2).
        An event-driven state machine application should, after having received
        EAGAIN,  record  its  current  state  so  that  at  the  next  call  to
        do_use_fd() it will continue to  read(2)  or  write(2)  from  where  it
        stopped before.
 
        struct epoll_event ev, *events;
 
        for (;;) {
            nfds = epoll_wait(kdpfd, events, maxevents, -1);
 
            for (n = 0; n < nfds; ++n) {
                if (events[n].data.fd == listener) {
                    client = accept(listener, (struct sockaddr *) &local,
                                    &addrlen);
                    if (client < 0){
                        perror("accept");
                        continue;
                    }
                    setnonblocking(client);
                    ev.events = EPOLLIN | EPOLLET;
                    ev.data.fd = client;
                    if (epoll_ctl(kdpfd, EPOLL_CTL_ADD, client, &ev) < 0) {
                        fprintf(stderr, "epoll set insertion error: fd=%d\n",
                                client);
                        return -1;
                    }
                } else {
                    do_use_fd(events[n].data.fd);
                }
            }
        }
 
        When  used  as an edge-triggered interface, for performance reasons, it
        is possible to add the  file  descriptor  inside  the  epoll  interface
        (EPOLL_CTL_ADD) once by specifying (EPOLLIN|EPOLLOUT).  This allows you
        to avoid continuously switching between EPOLLIN  and  EPOLLOUT  calling
        epoll_ctl(2) with EPOLL_CTL_MOD.
 
    Questions and Answers
        Q1     What happens if you add the same file descriptor to an epoll set
               twice?
 
        A1     You will probably get EEXIST.  However, it is possible that  two
               threads may add the same file descriptor twice.  This is a harm‐
               less condition.
 
        Q2     Can two epoll sets wait for the same file  descriptor?   If  so,
               are events reported to both epoll file descriptors?
 
        A2     Yes,  and  events would be reported to both.  However, it is not
               recommended.
 
        Q3     Is the epoll file descriptor itself poll/epoll/selectable?
 
        A3     Yes.
 
        Q4     What happens if the epoll file descriptor is put  into  its  own
               file descriptor set?
 
        A4     It  will  fail.   However,  you can add an epoll file descriptor
               inside another epoll file descriptor set.
 
        Q5     Can I send the epoll  file  descriptor  over  a  unix-socket  to
               another process?
 
        A5     No.
 
        Q6     Will  closing  a file descriptor cause it to be removed from all
               epoll sets automatically?
 
        A6     Yes.
 
        Q7     If more than one event occurs between epoll_wait(2)  calls,  are
               they combined or reported separately?
 
        A7     They will be combined.
 
        Q8     Does  an  operation on a file descriptor affect the already col‐
               lected but not yet reported events?
 
        A8     You can do  two  operations  on  an  existing  file  descriptor.
               Remove  would be meaningless for this case.  Modify will re-read
               available I/O.
 
        Q9     Do I need to continuously read/write  a  file  descriptor  until
               EAGAIN when using the EPOLLET flag (edge-triggered behavior) ?
 
        A9     No you don’t.  Receiving an event from epoll_wait(2) should sug‐
               gest to you that such file descriptor is ready for the requested
               I/O  operation.   You have simply to consider it ready until you
               will receive the next EAGAIN.  When and how you  will  use  such
               file descriptor is entirely up to you.  Also, the condition that
               the read/write I/O space is exhausted can be detected by  check‐
               ing  the  amount  of data read from / written to the target file
               descriptor.  For example, if you call read(2) by asking to  read
               a  certain  amount of data and read(2) returns a lower number of
               bytes, you can be sure of having exhausted the  read  I/O  space
               for  such  file descriptor.  The same is true when writing using
               the write(2).
 
    Possible Pitfalls and Ways to Avoid Them
        o Starvation (edge-triggered)
 
        If there is a large amount of I/O space, it is possible that by  trying
        to  drain it the other files will not get processed causing starvation.
        (This problem is not specific to epoll.)
 
        The solution is to maintain a ready list and mark the  file  descriptor
        as  ready in its associated data structure, thereby allowing the appli‐
        cation to remember which files need to be  processed  but  still  round
        robin  amongst all the ready files.  This also supports ignoring subse‐
        quent events you receive for file descriptors that are already ready.
 
        o If using an event cache...
 
        If you use an event cache or store all the  file  descriptors  returned
        from epoll_wait(2), then make sure to provide a way to mark its closure
        dynamically (i.e., caused by a previous event’s  processing).   Suppose
        you receive 100 events from epoll_wait(2), and in event #47 a condition
        causes event #13 to  be  closed.   If  you  remove  the  structure  and
        close(2) the file descriptor for event #13, then your event cache might
        still say there are events waiting for  that  file  descriptor  causing
        confusion.
 
        One  solution  for  this is to call, during the processing of event 47,
        epoll_ctl(EPOLL_CTL_DEL) to delete file  descriptor  13  and  close(2),
        then  mark  its  associated  data structure as removed and link it to a
        cleanup list.  If you find another event for file descriptor 13 in your
        batch processing, you will discover the file descriptor had been previ‐
        ously removed and there will be no confusion.
 

VERSIONS

        epoll(7) is a new API introduced in Linux kernel 2.5.44.  Its interface
        should be finalized in Linux kernel 2.5.66.
        The  epoll  API  is Linux specific.  Some other systems provide similar
        mechanisms, for example, FreeBSD has kqueue, and Solaris has /dev/poll.
        epoll_create(2), epoll_ctl(2), epoll_wait(2)