Bug 873815 - abrtd and inotify interaction can deadlock abrtd daemon
Summary: abrtd and inotify interaction can deadlock abrtd daemon
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: abrt
Version: 6.3
Hardware: All
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Denys Vlasenko
QA Contact: Miroslav Hradílek
URL:
Whiteboard:
Depends On:
Blocks: 895654
TreeView+ depends on / blocked
 
Reported: 2012-11-06 19:00 UTC by greg.2.harris
Modified: 2013-02-21 07:54 UTC (History)
7 users (show)

Fixed In Version: abrt-2.0.8-11.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-02-21 07:54:28 UTC
Target Upstream Version:


Attachments (Terms of Use)
Tool to deadlock the abrtd daemon (717 bytes, text/x-python)
2012-11-06 19:00 UTC, greg.2.harris
no flags Details
Proposed fix (1.73 KB, patch)
2012-12-11 12:18 UTC, Denys Vlasenko
no flags Details | Diff


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2013:0290 normal SHIPPED_LIVE abrt, libreport and btparser bug fix and enhancement update 2013-02-20 20:55:47 UTC

Description greg.2.harris 2012-11-06 19:00:30 UTC
Created attachment 639505 [details]
Tool to deadlock the abrtd daemon

Description of problem:

It appears that sometimes the handle_inotify_cb handler can fire even if there are no inotify events ready to be read by abrtd.  When this occurs the abrtd daemon will deadlock.

Version-Release number of selected component (if applicable):

Observed in CentOS 6.3 however it appears to still be a problem in the upstream Fedora sources.

How reproducible:

On a machine with abrt and abrt-addon-python installed run the attached 'deadlock_abrtd.py' python script and after a few moments hit control-c.  Then run the attached 'broken_script.py' Python script.

Steps to Reproduce:
1. Run the attached 'deadlock_abrtd.py' script for a few seconds and then Control-C it.  This program generates a large number of inotify events in the watched directories which eventually triggers the bug in abrtd.
2. Run strace against the abrtd daemon.  It will be stuck on a read() call on the inotify file descriptor.
  
Actual results:

Running any non-interactive Python script with a traceback.  It should hang trying to read a response from the abrtd daemon.

Expected results:

The abrtd daemon should process the events.

Additional info:

It appears that under some conditions the handle_inotify_cb can be triggered even when there is *no* data available to be read on the inotify file descriptor.  Within the handle_inotify_db the following logic is used to determine how much data can be read:

"""
    /* Default size: 128 simultaneous actions (about 1/2 meg) */
#define INOTIFY_BUF_SIZE ((sizeof(struct inotify_event) + FILENAME_MAX)*128)
    /* Determine how much to read (it usually is much smaller) */
    /* NB: this variable _must_ be int-sized, ioctl expects that! */
    int inotify_bytes = INOTIFY_BUF_SIZE;
    if (ioctl(g_io_channel_unix_get_fd(gio), FIONREAD, &inotify_bytes) != 0
     || inotify_bytes < sizeof(struct inotify_event)
     || inotify_bytes > INOTIFY_BUF_SIZE
    ) {
        inotify_bytes = INOTIFY_BUF_SIZE;
    }
"""

In our case there are no bytes available to be read which triggers the inotify_bytes <= sizeof(struct inotify_event) and the buffer is set to the default - ~512KB of data.

Under abrtd 2.0.4 this was OK as the reads were done by g_io_channel_read which is non-blocking and would just return 0 bytes.  Sometime between 2.0.4 and 2.0.8 (taken in RHEL 6.3) this call was changed to g_io_channel_read_char which is *blocking* and thus the call will not return until more than 512Kb of inotify events have occured.

To make matters worse the abrtd daemon will be stuck inside the handle_inotify_cb function which means it will not be able to spawn new abrt-server processes.  These processes are normally responsible for generating the directories abrtd is watching for via inotify.  Since abrt-server cannot start without abrtd returning from handle_inotify_cb the process deadlocks.

Here is the output of strace on the abrtd process during a run of deadlock_abrtd.py:
"""
poll([{fd=3, events=POLLIN}, {fd=7, events=POLLIN|POLLPRI}, {fd=5, events=POLLIN}], 3, 0) = 1 ([{fd=5, revents=POLLIN}])
gettimeofday({1352227936, 672351}, NULL) = 0
ioctl(5, FIONREAD, [96])                = 0
read(5, "\1\0\0\0\0\1\0\0\0\0\0\0 \0\0\0fooey.16766.3166"..., 1024) = 96
poll([{fd=3, events=POLLIN}, {fd=7, events=POLLIN|POLLPRI}, {fd=5, events=POLLIN}], 3, 0) = 0 (Timeout)
gettimeofday({1352227936, 672789}, NULL) = 0
ioctl(5, FIONREAD, [0])                 = 0
mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f575985e000
read(5, 
"""

The first poll() returns indicating there is new information read to be read from inotify, handle_inotify_cb is called and ioctl returns a valid size to be read.  The second time poll() is called in the above trace it returns a timeout *but* the handle_inotify_cb is still executed.  

When handle_inotify_cb calls ioctl it returns '0' bytes to be read and abrtd uses its default buffer size which is large enough for 128 events with the maximum file size.  I'd estimate this will likely block for closer to 5000+ events with normal file names.

I'm not sure what the fix for this should be...  The defaults are clearly wrong now that we are using a blocking g_io_* read but it's also clear there is a bug somewhere else that is causing handle_inotify_cb to be executed when there is no inotify data ready to be read.

Comment 1 greg.2.harris 2012-11-06 19:01:46 UTC
I believe this is the root cause of the hanging python processes described in 837333 and 808562.

Comment 3 Denys Vlasenko 2012-12-11 12:16:51 UTC
(In reply to comment #0)
> It appears that under some conditions the handle_inotify_cb can be triggered
> even when there is *no* data available to be read on the inotify file
> descriptor.  Within the handle_inotify_db the following logic is used to
> determine how much data can be read:
> 
> """
>     /* Default size: 128 simultaneous actions (about 1/2 meg) */
> #define INOTIFY_BUF_SIZE ((sizeof(struct inotify_event) + FILENAME_MAX)*128)
>     /* Determine how much to read (it usually is much smaller) */
>     /* NB: this variable _must_ be int-sized, ioctl expects that! */
>     int inotify_bytes = INOTIFY_BUF_SIZE;
>     if (ioctl(g_io_channel_unix_get_fd(gio), FIONREAD, &inotify_bytes) != 0
>      || inotify_bytes < sizeof(struct inotify_event)
>      || inotify_bytes > INOTIFY_BUF_SIZE
>     ) {
>         inotify_bytes = INOTIFY_BUF_SIZE;
>     }
> """
> 
> In our case there are no bytes available to be read which triggers the
> inotify_bytes <= sizeof(struct inotify_event) and the buffer is set to the
> default - ~512KB of data.
> 
> Under abrtd 2.0.4 this was OK as the reads were done by g_io_channel_read
> which is non-blocking and would just return 0 bytes.  Sometime between 2.0.4
> and 2.0.8 (taken in RHEL 6.3) this call was changed to
> g_io_channel_read_char which is *blocking* and thus the call will not return
> until more than 512Kb of inotify events have occured.

Excellent analysis, thanks!

I can easily switch fd into a nonblocking mode myself, before hooking it up to glib main loop.
I just tested and it seems to work well (g_io_channel_read_char _doesn't_ return it as an error, it just returns short read, good!).
Also, I can exclude "inotify_bytes < sizeof(struct inotify_event)" check, as you just demonstrated that it actually can happen.

> I'm not sure what the fix for this should be...  The defaults are clearly
> wrong now that we are using a blocking g_io_* read but it's also clear there
> is a bug somewhere else that is causing handle_inotify_cb to be executed
> when there is no inotify data ready to be read.

In fact it isn't a bug, but a documented Unix API of select/poll: they don't provide iron-clad guarantee that if they say "there is data available to read", there _will be_ more than zero bytes of that data. It is recommended in this situation to always switch fd's to nonblocking mode (IOW: to use nonblocking reads).

Thanks again!

Comment 4 Denys Vlasenko 2012-12-11 12:18:19 UTC
Created attachment 661386 [details]
Proposed fix

Comment 5 Denys Vlasenko 2012-12-11 14:25:55 UTC
Fixed in git:

commit 87bb7d2b5452054bce605d01ecc72d8ed318a8e8
Author: Denys Vlasenko <vda.linux@googlemail.com>
Date:   Tue Dec 11 15:23:38 2012 +0100

    abrtd: set inotify fd to non-blocking mode; ignore 0-sized inotify reads

Comment 13 Denys Vlasenko 2013-01-28 12:02:18 UTC
Fix has been committed to abrt's rhel6 branch:

commit 29a9a7d009b501a8ef9e6ef023f336bd0a6732cc
Author: Denys Vlasenko <vda.linux@googlemail.com>
Date:   Mon Jan 28 09:56:40 2013 +0100

    abrtd: disable glib's buffering on inotify reads

Comment 19 errata-xmlrpc 2013-02-21 07:54:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0290.html


Note You need to log in before you can comment on or make changes to this bug.