Bug 1436197 - glusterd crashes when disk hosting a brick is removed from system
Summary: glusterd crashes when disk hosting a brick is removed from system
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterd
Version: rhgs-3.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Atin Mukherjee
QA Contact: Byreddy
URL:
Whiteboard:
Depends On:
Blocks: 1435613
TreeView+ depends on / blocked
 
Reported: 2017-03-27 12:29 UTC by Raghavendra Talur
Modified: 2019-10-28 22:04 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-07-26 10:49:50 UTC
Embargoed:


Attachments (Terms of Use)

Description Raghavendra Talur 2017-03-27 12:29:31 UTC
Description of problem:

When the disk is removed from the VM, we get these logs from glusterfsd(brick process)

```
Broadcast message from systemd-journald@node1 (Mon 2017-03-27 12:14:29 UTC):

var-lib-heketi-mounts-vg_3dc59eba73cb2070a96c0fec5a2b5e82-brick_455ecb9f475cf6bd7dd201457799736c-brick[14090]: [2017-03-27 12:14:29.751829] M [MSGID: 113075] [posix-helpers.c:1841:posix_health_check_thread_proc] 0-vol_71bda80b7a159f08ad795e4f4f244bd4-posix: health-check failed, going down


Message from syslogd@localhost at Mar 27 12:14:29 ...
 var-lib-heketi-mounts-vg_3dc59eba73cb2070a96c0fec5a2b5e82-brick_455ecb9f475cf6bd7dd201457799736c-brick[14090]:[2017-03-27 12:14:29.751829] M [MSGID: 113075] [posix-helpers.c:1841:posix_health_check_thread_proc] 0-vol_71bda80b7a159f08ad795e4f4f244bd4-posix: health-check failed, going down
```

When the kill signal is sent to the same brick process as part of replace-brick, we get 

```
Broadcast message from systemd-journald@node1 (Mon 2017-03-27 12:14:59 UTC):

var-lib-heketi-mounts-vg_3dc59eba73cb2070a96c0fec5a2b5e82-brick_455ecb9f475cf6bd7dd201457799736c-brick[14090]: [2017-03-27 12:14:59.752367] M [MSGID: 113075] [posix-helpers.c:1847:posix_health_check_thread_proc] 0-vol_71bda80b7a159f08ad795e4f4f244bd4-posix: still alive! -> SIGTERM


Message from syslogd@localhost at Mar 27 12:14:59 ...
 var-lib-heketi-mounts-vg_3dc59eba73cb2070a96c0fec5a2b5e82-brick_455ecb9f475cf6bd7dd201457799736c-brick[14090]:[2017-03-27 12:14:59.752367] M [MSGID: 113075] [posix-helpers.c:1847:posix_health_check_thread_proc] 0-vol_71bda80b7a159f08ad795e4f4f244bd4-posix: still alive! -> SIGTERM
Shared connection to 192.168.21.14 closed.
```

It is found that glusterd has crashed on the system and the system goes to emergency mode.


This might not be the best way to test, if a better way proves that gluster is resilient to disk crashes, this bug might be closed.

Comment 5 Samikshan Bairagya 2017-07-26 10:49:50 UTC
Closing this bug since needinfo hasn't been addressed. Please open this bug if this issue persists.


Note You need to log in before you can comment on or make changes to this bug.