Bug 1397681 - [Eventing]: EVENT_POSIX_HEALTH_CHECK_FAILED event not seen when brick underlying filesystem crashed
Summary: [Eventing]: EVENT_POSIX_HEALTH_CHECK_FAILED event not seen when brick underly...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterfs
Version: rhgs-3.2
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: RHGS 3.2.0
Assignee: Pranith Kumar K
QA Contact: Byreddy
URL:
Whiteboard:
Depends On:
Blocks: 1351528
TreeView+ depends on / blocked
 
Reported: 2016-11-23 07:22 UTC by Byreddy
Modified: 2017-03-23 06:21 UTC (History)
3 users (show)

Fixed In Version: glusterfs-3.8.4-6
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-03-23 06:21:19 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1385606 0 unspecified CLOSED 4 of 8 bricks (2 dht subvols) crashed on systemic setup 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHSA-2017:0486 0 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update 2017-03-23 09:18:45 UTC

Internal Links: 1385606

Description Byreddy 2016-11-23 07:22:01 UTC
Description of problem:
=======================
EVENT_POSIX_HEALTH_CHECK_FAILED event not seen when brick underlying file system crashed


Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.8.4-5.


How reproducible:
=================
Always

Steps to Reproduce:
===================
1. Have setup to capture the events
2. Create a simple volume using 2 or 3 bricks
3. Crash brick underlying filesystem 
4. Check for the event  - EVENT_POSIX_HEALTH_CHECK_FAILED

Actual results:
===============
event EVENT_POSIX_HEALTH_CHECK_FAILED not seen when brick underlying FS crashed


Expected results:
=================
event should generate.


Additional info:
================
when i crashed the bricks FS, i am seeing core files generating

Comment 2 Byreddy 2016-11-23 07:25:46 UTC
BT of core file:

(gdb) bt
#0  0x00007f8b58c44694 in vfprintf () from /lib64/libc.so.6
#1  0x00007f8b58d088d5 in __vsnprintf_chk () from /lib64/libc.so.6
#2  0x00007f8b5a573a18 in vsnprintf (__ap=0x7f8b3cef8a70, __fmt=<optimized out>, __n=0, __s=0x0) at /usr/include/bits/stdio2.h:77
#3  gf_vasprintf (string_ptr=string_ptr@entry=0x7f8b3cef8b78, format=format@entry=0x7f8b4c94c110 "op=%s;path=%s;error=%s;brick=%s:%s", arg=arg@entry=0x7f8b3cef8b90) at mem-pool.c:219
#4  0x00007f8b5a5c288a in gf_event (event=event@entry=EVENT_POSIX_HEALTH_CHECK_FAILED, fmt=fmt@entry=0x7f8b4c94c110 "op=%s;path=%s;error=%s;brick=%s:%s") at events.c:84
#5  0x00007f8b4c9435f0 in posix_fs_health_check (this=this@entry=0x7f8b48006c50) at posix-helpers.c:1779
#6  0x00007f8b4c943774 in posix_health_check_thread_proc (data=0x7f8b48006c50) at posix-helpers.c:1817
#7  0x00007f8b593aedc5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007f8b58cf373d in clone () from /lib64/libc.so.6
(gdb)

Comment 3 Atin Mukherjee 2016-11-23 07:46:49 UTC
This has been fixed in BZ 1385606

*** This bug has been marked as a duplicate of bug 1385606 ***

Comment 4 Rahul Hinduja 2016-11-23 09:28:04 UTC
Reopening the BZ since this BZ tracks the event failure notification where as the other BZ tracks the functionality issue. From usecase/testcase point of view also this BZ is different. 

I agree the root cause might be same, but qe verification is required for both these BZ's. qa_test_case coverage flag will be handy in such cases but that will come in effect in regression cycle and/or for future release with the priorities attached to each case.

Comment 5 Atin Mukherjee 2016-11-23 09:32:03 UTC
downstream patch https://code.engineering.redhat.com/gerrit/90550 is already in rhgs-3.2.0 codebase.

Comment 9 Byreddy 2016-12-01 07:03:30 UTC
Verified this issue using the build - glusterfs-3.8.4-6.

Event is generating for the bz $title scenario.

{u'message': {u'path': u'/bricks/brick0/t0/.glusterfs/health_check', u'brick': u'dhcp41-198.lab.eng.blr.redhat.com:/bricks/brick0/t0', u'op': u'open', u'error': u'Input/output error'}, u'event': u'POSIX_HEALTH_CHECK_FAILED', u'ts': 1480575528, u'nodeid': u'7c96741d-d5c1-470a-904d-1928af82454f'}
10.70.41.198 - - [01/Dec/2016 12:23:30] "POST /listen HTTP/1.1" 200 -


Moving to verified state.

Comment 11 errata-xmlrpc 2017-03-23 06:21:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html


Note You need to log in before you can comment on or make changes to this bug.