873815 – abrtd and inotify interaction can deadlock abrtd daemon

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 873815 - abrtd and inotify interaction can deadlock abrtd daemon

Summary: abrtd and inotify interaction can deadlock abrtd daemon

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	abrt
Sub Component:
Version:	6.3
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Denys Vlasenko
QA Contact:	Miroslav Hradílek
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	895654
TreeView+	depends on / blocked

Reported:	2012-11-06 19:00 UTC by greg.2.harris
Modified:	2013-02-21 07:54 UTC (History)
CC List:	7 users (show)
Fixed In Version:	abrt-2.0.8-11.el6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-02-21 07:54:28 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Tool to deadlock the abrtd daemon (717 bytes, text/x-python) 2012-11-06 19:00 UTC, greg.2.harris	no flags	Details
Proposed fix (1.73 KB, patch) 2012-12-11 12:18 UTC, Denys Vlasenko	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2013:0290	0	normal	SHIPPED_LIVE	abrt, libreport and btparser bug fix and enhancement update	2013-02-20 20:55:47 UTC

Description greg.2.harris 2012-11-06 19:00:30 UTC

Created attachment 639505 [details]
Tool to deadlock the abrtd daemon

Description of problem:

It appears that sometimes the handle_inotify_cb handler can fire even if there are no inotify events ready to be read by abrtd.  When this occurs the abrtd daemon will deadlock.

Version-Release number of selected component (if applicable):

Observed in CentOS 6.3 however it appears to still be a problem in the upstream Fedora sources.

How reproducible:

On a machine with abrt and abrt-addon-python installed run the attached 'deadlock_abrtd.py' python script and after a few moments hit control-c.  Then run the attached 'broken_script.py' Python script.

Steps to Reproduce:
1. Run the attached 'deadlock_abrtd.py' script for a few seconds and then Control-C it.  This program generates a large number of inotify events in the watched directories which eventually triggers the bug in abrtd.
2. Run strace against the abrtd daemon.  It will be stuck on a read() call on the inotify file descriptor.
  
Actual results:

Running any non-interactive Python script with a traceback.  It should hang trying to read a response from the abrtd daemon.

Expected results:

The abrtd daemon should process the events.

Additional info:

It appears that under some conditions the handle_inotify_cb can be triggered even when there is *no* data available to be read on the inotify file descriptor.  Within the handle_inotify_db the following logic is used to determine how much data can be read:

"""
    /* Default size: 128 simultaneous actions (about 1/2 meg) */
#define INOTIFY_BUF_SIZE ((sizeof(struct inotify_event) + FILENAME_MAX)*128)
    /* Determine how much to read (it usually is much smaller) */
    /* NB: this variable _must_ be int-sized, ioctl expects that! */
    int inotify_bytes = INOTIFY_BUF_SIZE;
    if (ioctl(g_io_channel_unix_get_fd(gio), FIONREAD, &inotify_bytes) != 0
     || inotify_bytes < sizeof(struct inotify_event)
     || inotify_bytes > INOTIFY_BUF_SIZE
    ) {
        inotify_bytes = INOTIFY_BUF_SIZE;
    }
"""

In our case there are no bytes available to be read which triggers the inotify_bytes <= sizeof(struct inotify_event) and the buffer is set to the default - ~512KB of data.

Under abrtd 2.0.4 this was OK as the reads were done by g_io_channel_read which is non-blocking and would just return 0 bytes.  Sometime between 2.0.4 and 2.0.8 (taken in RHEL 6.3) this call was changed to g_io_channel_read_char which is *blocking* and thus the call will not return until more than 512Kb of inotify events have occured.

To make matters worse the abrtd daemon will be stuck inside the handle_inotify_cb function which means it will not be able to spawn new abrt-server processes.  These processes are normally responsible for generating the directories abrtd is watching for via inotify.  Since abrt-server cannot start without abrtd returning from handle_inotify_cb the process deadlocks.

Here is the output of strace on the abrtd process during a run of deadlock_abrtd.py:
"""
poll([{fd=3, events=POLLIN}, {fd=7, events=POLLIN|POLLPRI}, {fd=5, events=POLLIN}], 3, 0) = 1 ([{fd=5, revents=POLLIN}])
gettimeofday({1352227936, 672351}, NULL) = 0
ioctl(5, FIONREAD, [96])                = 0
read(5, "\1\0\0\0\0\1\0\0\0\0\0\0 \0\0\0fooey.16766.3166"..., 1024) = 96
poll([{fd=3, events=POLLIN}, {fd=7, events=POLLIN|POLLPRI}, {fd=5, events=POLLIN}], 3, 0) = 0 (Timeout)
gettimeofday({1352227936, 672789}, NULL) = 0
ioctl(5, FIONREAD, [0])                 = 0
mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f575985e000
read(5, 
"""

The first poll() returns indicating there is new information read to be read from inotify, handle_inotify_cb is called and ioctl returns a valid size to be read.  The second time poll() is called in the above trace it returns a timeout *but* the handle_inotify_cb is still executed.  

When handle_inotify_cb calls ioctl it returns '0' bytes to be read and abrtd uses its default buffer size which is large enough for 128 events with the maximum file size.  I'd estimate this will likely block for closer to 5000+ events with normal file names.

I'm not sure what the fix for this should be...  The defaults are clearly wrong now that we are using a blocking g_io_* read but it's also clear there is a bug somewhere else that is causing handle_inotify_cb to be executed when there is no inotify data ready to be read.

Comment 1 greg.2.harris 2012-11-06 19:01:46 UTC

I believe this is the root cause of the hanging python processes described in 837333 and 808562.

Comment 3 Denys Vlasenko 2012-12-11 12:16:51 UTC

(In reply to comment #0)
> It appears that under some conditions the handle_inotify_cb can be triggered
> even when there is *no* data available to be read on the inotify file
> descriptor.  Within the handle_inotify_db the following logic is used to
> determine how much data can be read:
> 
> """
>     /* Default size: 128 simultaneous actions (about 1/2 meg) */
> #define INOTIFY_BUF_SIZE ((sizeof(struct inotify_event) + FILENAME_MAX)*128)
>     /* Determine how much to read (it usually is much smaller) */
>     /* NB: this variable _must_ be int-sized, ioctl expects that! */
>     int inotify_bytes = INOTIFY_BUF_SIZE;
>     if (ioctl(g_io_channel_unix_get_fd(gio), FIONREAD, &inotify_bytes) != 0
>      || inotify_bytes < sizeof(struct inotify_event)
>      || inotify_bytes > INOTIFY_BUF_SIZE
>     ) {
>         inotify_bytes = INOTIFY_BUF_SIZE;
>     }
> """
> 
> In our case there are no bytes available to be read which triggers the
> inotify_bytes <= sizeof(struct inotify_event) and the buffer is set to the
> default - ~512KB of data.
> 
> Under abrtd 2.0.4 this was OK as the reads were done by g_io_channel_read
> which is non-blocking and would just return 0 bytes.  Sometime between 2.0.4
> and 2.0.8 (taken in RHEL 6.3) this call was changed to
> g_io_channel_read_char which is *blocking* and thus the call will not return
> until more than 512Kb of inotify events have occured.

Excellent analysis, thanks!

I can easily switch fd into a nonblocking mode myself, before hooking it up to glib main loop.
I just tested and it seems to work well (g_io_channel_read_char _doesn't_ return it as an error, it just returns short read, good!).
Also, I can exclude "inotify_bytes < sizeof(struct inotify_event)" check, as you just demonstrated that it actually can happen.

> I'm not sure what the fix for this should be...  The defaults are clearly
> wrong now that we are using a blocking g_io_* read but it's also clear there
> is a bug somewhere else that is causing handle_inotify_cb to be executed
> when there is no inotify data ready to be read.

In fact it isn't a bug, but a documented Unix API of select/poll: they don't provide iron-clad guarantee that if they say "there is data available to read", there _will be_ more than zero bytes of that data. It is recommended in this situation to always switch fd's to nonblocking mode (IOW: to use nonblocking reads).

Thanks again!

Comment 4 Denys Vlasenko 2012-12-11 12:18:19 UTC

Created attachment 661386 [details]
Proposed fix

Comment 5 Denys Vlasenko 2012-12-11 14:25:55 UTC

Fixed in git:

commit 87bb7d2b5452054bce605d01ecc72d8ed318a8e8
Author: Denys Vlasenko <vda.linux>
Date:   Tue Dec 11 15:23:38 2012 +0100

    abrtd: set inotify fd to non-blocking mode; ignore 0-sized inotify reads

Comment 13 Denys Vlasenko 2013-01-28 12:02:18 UTC

Fix has been committed to abrt's rhel6 branch:

commit 29a9a7d009b501a8ef9e6ef023f336bd0a6732cc
Author: Denys Vlasenko <vda.linux>
Date:   Mon Jan 28 09:56:40 2013 +0100

    abrtd: disable glib's buffering on inotify reads

Comment 19 errata-xmlrpc 2013-02-21 07:54:28 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0290.html

Note You need to log in before you can comment on or make changes to this bug.