Bug 1540630

Summary: An abort in libnih causes init to be killed and kernel panic : "Kernel panic - not syncing: Attempted to kill init!"
Product: Red Hat Enterprise Linux 6 Reporter: Welterlen Benoit <bwelterl>
Component: libnihAssignee: Lukáš Nykrýn <lnykryn>
Status: CLOSED ERRATA QA Contact: Frantisek Sumsal <fsumsal>
Severity: medium Docs Contact:
Priority: urgent    
Version: 6.9CC: bwelterl, fkrska, fsumsal, toneata
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libnih-1.0.1-8.el6 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1544405 (view as bug list) Environment:
Last Closed: 2018-06-19 05:20:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1544405    
Attachments:
Description Flags
experimental SRPM patched none

Description Welterlen Benoit 2018-01-31 14:47:10 UTC
Created attachment 1389021 [details]
experimental SRPM patched

Description of problem:
System is crashing with "Kernel panic - not syncing: Attempted to kill init!"
caused by and abort in libnih after a IN_Q_OVERFLOW event received :
The kernel is generating events faster than can be consumed resulting in an assertion failure in nih_watch_handle_by_wd() since the inotify_event object will contain an invalid watch descriptor (-1).

Version-Release number of selected component (if applicable):
libnih-1.0.1-7

How reproducible:

Steps to Reproduce:
1. Overflow of events
2.
3.

Actual results:
=========== Backtrace :
#0  0x00007f23f9438750 in __sigprocmask (how=2, set=0x7ffcded9a210, oset=0x0) at ../sysdeps/unix/sysv/linux/ia64/sigprocmask.c:43
#1  0x00007f23fa6624aa in crash_handler (signum=6) at main.c:404
#2  <signal handler called>
#3  0x00007f23f9438495 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#4  0x00007f23f9439c75 in abort () at abort.c:92
#5  0x00007f23fa22ccb9 in nih_watch_handle_by_wd (watch=<value optimized out>, wd=<value optimized out>) at watch.c:202
#6  0x00007f23fa22d25d in nih_watch_reader (watch=0x7f23fb383060, io=<value optimized out>, buf=0x7f23fa524040 "\377\377\377\377", len=9936) at watch.c:428
#7  0x00007f23fa22b835 in nih_io_watcher (io=0x7f23fb383510, watch=0x7f23fb383790, events=NIH_IO_READ) at io.c:943
#8  0x00007f23fa22b93b in nih_io_handle_fds (readfds=0x7ffcded9ac60, writefds=0x7ffcded9abe0, exceptfds=0x7ffcded9ab60) at io.c:233
#9  0x00007f23fa22da1e in nih_main_loop () at main.c:586
#10 0x00007f23fa661fb7 in main (argc=<value optimized out>, argv=<value optimized out>) at main.c:345

struct inotify_event
{
  int wd;               /* Watch descriptor.  */
  uint32_t mask;        /* Watch mask.  */
  uint32_t cookie;      /* Cookie to synchronize two events.  */
  uint32_t len;         /* Length (including NULs) of name.  */
  char name __flexarr;  /* Name.  */
};

p/x  event->mask
$19 = 0x4000
IN_Q_OVERFLOW    0x00004000 

Expected results:
No crash

Additional info:
Suggested patch : (from James Hunt) :
--- 1/nih/watch.c       2011-01-06 09:28:39 +0000
+++ 2/nih/watch.c       2011-04-26 15:44:45 +0000
@@ -424,6 +424,13 @@
                if (len < sz)
                        goto finish;
 
+               /* Handle situation where kernel is generating events faster
+                * than can be consumed. Inotify will (somewhat ironically)
+                * generate an event to signify this, so ignore it.
+                */
+               if (event->mask & IN_Q_OVERFLOW)
+                       goto consume;
+
                /* Find the handle for this watch */
                handle = nih_watch_handle_by_wd (watch, event->wd);
                if (handle)
@@ -437,6 +444,7 @@
                if (caught_free)
                        return;
 
+consume:
                /* Remove the event from the front of the buffer, and
                 * decrease our own length counter.
                 */

Attached experimental SRPM patched

Comment 11 errata-xmlrpc 2018-06-19 05:20:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1904