Bug 1436197

Summary: glusterd crashes when disk hosting a brick is removed from system
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Raghavendra Talur <rtalur>
Component: glusterdAssignee: Atin Mukherjee <amukherj>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Byreddy <bsrirama>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rhgs-3.3CC: rhs-bugs, rtalur, sasundar, sbairagy, storage-qa-internal, vbellur
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-07-26 10:49:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1435613    

Description Raghavendra Talur 2017-03-27 12:29:31 UTC
Description of problem:

When the disk is removed from the VM, we get these logs from glusterfsd(brick process)

```
Broadcast message from systemd-journald@node1 (Mon 2017-03-27 12:14:29 UTC):

var-lib-heketi-mounts-vg_3dc59eba73cb2070a96c0fec5a2b5e82-brick_455ecb9f475cf6bd7dd201457799736c-brick[14090]: [2017-03-27 12:14:29.751829] M [MSGID: 113075] [posix-helpers.c:1841:posix_health_check_thread_proc] 0-vol_71bda80b7a159f08ad795e4f4f244bd4-posix: health-check failed, going down


Message from syslogd@localhost at Mar 27 12:14:29 ...
 var-lib-heketi-mounts-vg_3dc59eba73cb2070a96c0fec5a2b5e82-brick_455ecb9f475cf6bd7dd201457799736c-brick[14090]:[2017-03-27 12:14:29.751829] M [MSGID: 113075] [posix-helpers.c:1841:posix_health_check_thread_proc] 0-vol_71bda80b7a159f08ad795e4f4f244bd4-posix: health-check failed, going down
```

When the kill signal is sent to the same brick process as part of replace-brick, we get 

```
Broadcast message from systemd-journald@node1 (Mon 2017-03-27 12:14:59 UTC):

var-lib-heketi-mounts-vg_3dc59eba73cb2070a96c0fec5a2b5e82-brick_455ecb9f475cf6bd7dd201457799736c-brick[14090]: [2017-03-27 12:14:59.752367] M [MSGID: 113075] [posix-helpers.c:1847:posix_health_check_thread_proc] 0-vol_71bda80b7a159f08ad795e4f4f244bd4-posix: still alive! -> SIGTERM


Message from syslogd@localhost at Mar 27 12:14:59 ...
 var-lib-heketi-mounts-vg_3dc59eba73cb2070a96c0fec5a2b5e82-brick_455ecb9f475cf6bd7dd201457799736c-brick[14090]:[2017-03-27 12:14:59.752367] M [MSGID: 113075] [posix-helpers.c:1847:posix_health_check_thread_proc] 0-vol_71bda80b7a159f08ad795e4f4f244bd4-posix: still alive! -> SIGTERM
Shared connection to 192.168.21.14 closed.
```

It is found that glusterd has crashed on the system and the system goes to emergency mode.


This might not be the best way to test, if a better way proves that gluster is resilient to disk crashes, this bug might be closed.

Comment 5 Samikshan Bairagya 2017-07-26 10:49:50 UTC
Closing this bug since needinfo hasn't been addressed. Please open this bug if this issue persists.