Bug 1397681

Summary: [Eventing]: EVENT_POSIX_HEALTH_CHECK_FAILED event not seen when brick underlying filesystem crashed
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Byreddy <bsrirama>
Component: glusterfsAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED ERRATA QA Contact: Byreddy <bsrirama>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: amukherj, rhinduja, vbellur
Target Milestone: ---Keywords: Reopened
Target Release: RHGS 3.2.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.8.4-6 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-23 06:21:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1351528    

Description Byreddy 2016-11-23 07:22:01 UTC
Description of problem:
=======================
EVENT_POSIX_HEALTH_CHECK_FAILED event not seen when brick underlying file system crashed


Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.8.4-5.


How reproducible:
=================
Always

Steps to Reproduce:
===================
1. Have setup to capture the events
2. Create a simple volume using 2 or 3 bricks
3. Crash brick underlying filesystem 
4. Check for the event  - EVENT_POSIX_HEALTH_CHECK_FAILED

Actual results:
===============
event EVENT_POSIX_HEALTH_CHECK_FAILED not seen when brick underlying FS crashed


Expected results:
=================
event should generate.


Additional info:
================
when i crashed the bricks FS, i am seeing core files generating

Comment 2 Byreddy 2016-11-23 07:25:46 UTC
BT of core file:

(gdb) bt
#0  0x00007f8b58c44694 in vfprintf () from /lib64/libc.so.6
#1  0x00007f8b58d088d5 in __vsnprintf_chk () from /lib64/libc.so.6
#2  0x00007f8b5a573a18 in vsnprintf (__ap=0x7f8b3cef8a70, __fmt=<optimized out>, __n=0, __s=0x0) at /usr/include/bits/stdio2.h:77
#3  gf_vasprintf (string_ptr=string_ptr@entry=0x7f8b3cef8b78, format=format@entry=0x7f8b4c94c110 "op=%s;path=%s;error=%s;brick=%s:%s", arg=arg@entry=0x7f8b3cef8b90) at mem-pool.c:219
#4  0x00007f8b5a5c288a in gf_event (event=event@entry=EVENT_POSIX_HEALTH_CHECK_FAILED, fmt=fmt@entry=0x7f8b4c94c110 "op=%s;path=%s;error=%s;brick=%s:%s") at events.c:84
#5  0x00007f8b4c9435f0 in posix_fs_health_check (this=this@entry=0x7f8b48006c50) at posix-helpers.c:1779
#6  0x00007f8b4c943774 in posix_health_check_thread_proc (data=0x7f8b48006c50) at posix-helpers.c:1817
#7  0x00007f8b593aedc5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007f8b58cf373d in clone () from /lib64/libc.so.6
(gdb)

Comment 3 Atin Mukherjee 2016-11-23 07:46:49 UTC
This has been fixed in BZ 1385606

*** This bug has been marked as a duplicate of bug 1385606 ***

Comment 4 Rahul Hinduja 2016-11-23 09:28:04 UTC
Reopening the BZ since this BZ tracks the event failure notification where as the other BZ tracks the functionality issue. From usecase/testcase point of view also this BZ is different. 

I agree the root cause might be same, but qe verification is required for both these BZ's. qa_test_case coverage flag will be handy in such cases but that will come in effect in regression cycle and/or for future release with the priorities attached to each case.

Comment 5 Atin Mukherjee 2016-11-23 09:32:03 UTC
downstream patch https://code.engineering.redhat.com/gerrit/90550 is already in rhgs-3.2.0 codebase.

Comment 9 Byreddy 2016-12-01 07:03:30 UTC
Verified this issue using the build - glusterfs-3.8.4-6.

Event is generating for the bz $title scenario.

{u'message': {u'path': u'/bricks/brick0/t0/.glusterfs/health_check', u'brick': u'dhcp41-198.lab.eng.blr.redhat.com:/bricks/brick0/t0', u'op': u'open', u'error': u'Input/output error'}, u'event': u'POSIX_HEALTH_CHECK_FAILED', u'ts': 1480575528, u'nodeid': u'7c96741d-d5c1-470a-904d-1928af82454f'}
10.70.41.198 - - [01/Dec/2016 12:23:30] "POST /listen HTTP/1.1" 200 -


Moving to verified state.

Comment 11 errata-xmlrpc 2017-03-23 06:21:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html