Description of problem: ======================= EVENT_POSIX_HEALTH_CHECK_FAILED event not seen when brick underlying file system crashed Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.8.4-5. How reproducible: ================= Always Steps to Reproduce: =================== 1. Have setup to capture the events 2. Create a simple volume using 2 or 3 bricks 3. Crash brick underlying filesystem 4. Check for the event - EVENT_POSIX_HEALTH_CHECK_FAILED Actual results: =============== event EVENT_POSIX_HEALTH_CHECK_FAILED not seen when brick underlying FS crashed Expected results: ================= event should generate. Additional info: ================ when i crashed the bricks FS, i am seeing core files generating
BT of core file: (gdb) bt #0 0x00007f8b58c44694 in vfprintf () from /lib64/libc.so.6 #1 0x00007f8b58d088d5 in __vsnprintf_chk () from /lib64/libc.so.6 #2 0x00007f8b5a573a18 in vsnprintf (__ap=0x7f8b3cef8a70, __fmt=<optimized out>, __n=0, __s=0x0) at /usr/include/bits/stdio2.h:77 #3 gf_vasprintf (string_ptr=string_ptr@entry=0x7f8b3cef8b78, format=format@entry=0x7f8b4c94c110 "op=%s;path=%s;error=%s;brick=%s:%s", arg=arg@entry=0x7f8b3cef8b90) at mem-pool.c:219 #4 0x00007f8b5a5c288a in gf_event (event=event@entry=EVENT_POSIX_HEALTH_CHECK_FAILED, fmt=fmt@entry=0x7f8b4c94c110 "op=%s;path=%s;error=%s;brick=%s:%s") at events.c:84 #5 0x00007f8b4c9435f0 in posix_fs_health_check (this=this@entry=0x7f8b48006c50) at posix-helpers.c:1779 #6 0x00007f8b4c943774 in posix_health_check_thread_proc (data=0x7f8b48006c50) at posix-helpers.c:1817 #7 0x00007f8b593aedc5 in start_thread () from /lib64/libpthread.so.0 #8 0x00007f8b58cf373d in clone () from /lib64/libc.so.6 (gdb)
This has been fixed in BZ 1385606 *** This bug has been marked as a duplicate of bug 1385606 ***
Reopening the BZ since this BZ tracks the event failure notification where as the other BZ tracks the functionality issue. From usecase/testcase point of view also this BZ is different. I agree the root cause might be same, but qe verification is required for both these BZ's. qa_test_case coverage flag will be handy in such cases but that will come in effect in regression cycle and/or for future release with the priorities attached to each case.
downstream patch https://code.engineering.redhat.com/gerrit/90550 is already in rhgs-3.2.0 codebase.
Verified this issue using the build - glusterfs-3.8.4-6. Event is generating for the bz $title scenario. {u'message': {u'path': u'/bricks/brick0/t0/.glusterfs/health_check', u'brick': u'dhcp41-198.lab.eng.blr.redhat.com:/bricks/brick0/t0', u'op': u'open', u'error': u'Input/output error'}, u'event': u'POSIX_HEALTH_CHECK_FAILED', u'ts': 1480575528, u'nodeid': u'7c96741d-d5c1-470a-904d-1928af82454f'} 10.70.41.198 - - [01/Dec/2016 12:23:30] "POST /listen HTTP/1.1" 200 - Moving to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html