1100204 – brick failure detection does not work for ext4 filesystems

Bug 1100204 - brick failure detection does not work for ext4 filesystems

Summary: brick failure detection does not work for ext4 filesystems

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	posix
Sub Component:
Version:	3.5.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	---
Assignee:	Lalatendu Mohanty
QA Contact:
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1150244 (view as bug list)
Depends On:	1130242
Blocks:	glusterfs-3.5.3
TreeView+	depends on / blocked

Reported:	2014-05-22 09:05 UTC by Niels de Vos
Modified:	2014-11-21 16:20 UTC (History)
CC List:	4 users (show)
Fixed In Version:	glusterfs-3.5.3beta2
Clone Of:
Clones:	1130242 (view as bug list)
Environment:
Last Closed:	2014-11-21 16:20:32 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Niels de Vos 2014-05-22 09:05:31 UTC

Description of problem:
The "Brick Failure Detection" (http://www.gluster.org/community/documentation/index.php/Features/Brick_Failure_Detection) does not work on ext4 filesystems.

Version-Release number of selected component (if applicable):
<any>

How reproducible:
100%

Steps to Reproduce:
1. see https://forge.gluster.org/glusterfs-core/glusterfs/blobs/release-3.5/doc/features/brick-failure-detection.md
2. make sure to format the brick(s) as ext4
3. disconnect the disk holding the brick

Actual results:
If there is no activity on the volume the brick failure detection does not trigger a shutdown of the brick process.

Expected results:
The brick process should notice that the filesystem went read-only and exit.

Additional info:
It seems that stat() on XFS has a check for the filesystem status, ext4 does not. Replacing stat() by a write() should be sufficient.

Comment 1 Niels de Vos 2014-05-30 07:21:40 UTC

Brick Failure Detection has a thread in the POSIX xlator that calls
stat() on a file on the brick in a loop. The stat() returns an error on
XFS in case the filesystem aborted (pull the disk, or RAID-card).
Unfortunately, ext4 does not behave like that, and stat() happily
succeeds. Eric Sandeen and Lukas Czerner don't think that modifying ext4
is the right paths, and guess that any patches to do this will be shot
down. So, we'll need to fix it in Gluster.

The change that needs to be made is in posix_health_check_thread_proc()
in the xlators/storage/posix/src/posix-helpers.c file. Instead of the
stat(), it should write (and read?) something to a new file under the
"priv->base_path + /.glusterfs" directory.

Comment 2 Lalatendu Mohanty 2014-06-05 07:18:32 UTC

I tried to reproduce the issue using ext4 brick partitions on master branch (though the bug is on 3.5, was checking if it is on master branch too)
1. Created the vol using ext4. Bricks are directories on a ext4 partitions
2.  started the volume and mounted it
3. deleted one of the brick , using "rm -rf <brick>"
4. Saw below messages in /var/log/messages and the brick process got killed.

Jun  5 02:05:03 dhcp159-54 d-testvol-1[16906]: [2014-06-05 06:05:03.712181] M [posix-helpers.c:1413:posix_health_check_thread_proc] 0-test-vol-posix: health-check failed, going down
Jun  5 02:05:33 dhcp159-54 d-testvol-1[16906]: [2014-06-05 06:05:33.713652] M [posix-helpers.c:1418:posix_health_check_thread_proc] 0-test-vol-posix: still alive! -> SIGTERM


The details i.e. commands and output below

[root@dhcp159-54 ~]# mount | grep '/d'

/dev/mapper/fedora_dhcp159--54-home on /d type ext4 (rw,relatime,data=ordered)


[root@dhcp159-54]# gluster v status
Status of volume: test-vol
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick 10.16.159.54:/d/testvol-1				49152	Y	16906
Brick 10.16.159.54:/d/testvol-2				49153	Y	16917
NFS Server on localhost					2049	Y	16929
 
Task Status of Volume test-vol
------------------------------------------------------------------------------
There are no active volume tasks


[root@dhcp159-54]# ps aux | grep glusterfsd
root     16906  0.0  1.0 596320 21228 ?        Ssl  02:02   0:00 /usr/local/sbin/glusterfsd -s 10.16.159.54 --volfile-id test-vol.10.16.159.54.d-testvol-1 -p /var/lib/glusterd/vols/test-vol/run/10.16.159.54-d-testvol-1.pid -S /var/run/ee63b3ac874f970ffd0f47685eaaf718.socket --brick-name /d/testvol-1 -l /usr/local/var/log/glusterfs/bricks/d-testvol-1.log --xlator-option *-posix.glusterd-uuid=df767b01-d8a1-4bba-b125-404931be1cc8 --brick-port 49152 --xlator-option test-vol-server.listen-port=49152
root     16917  0.0  0.9 596320 19124 ?        Ssl  02:02   0:00 /usr/local/sbin/glusterfsd -s 10.16.159.54 --volfile-id test-vol.10.16.159.54.d-testvol-2 -p /var/lib/glusterd/vols/test-vol/run/10.16.159.54-d-testvol-2.pid -S /var/run/c009c5864d1d438b4e085b9af5fc2416.socket --brick-name /d/testvol-2 -l /usr/local/var/log/glusterfs/bricks/d-testvol-2.log --xlator-option *-posix.glusterd-uuid=df767b01-d8a1-4bba-b125-404931be1cc8 --brick-port 49153 --xlator-option test-vol-server.listen-port=49153
root     16951  0.0  0.0 112640   936 pts/0    S+   02:02   0:00 grep --color=auto glusterfsd



[root@dhcp159-54]# rm -rf /d/testvol-1

In /var/log/messages

Jun  5 02:05:03 dhcp159-54 d-testvol-1[16906]: [2014-06-05 06:05:03.712181] M [posix-helpers.c:1413:posix_health_check_thread_proc] 0-test-vol-posix: health-check failed, going down
Jun  5 02:05:33 dhcp159-54 d-testvol-1[16906]: [2014-06-05 06:05:33.713652] M [posix-helpers.c:1418:posix_health_check_thread_proc] 0-test-vol-posix: still alive! -> SIGTERM


[root@dhcp159-54]# ps aux | grep glusterfsd
root     16917  0.0  0.9 596320 19124 ?        Ssl  02:02   0:00 /usr/local/sbin/glusterfsd -s 10.16.159.54 --volfile-id test-vol.10.16.159.54.d-testvol-2 -p /var/lib/glusterd/vols/test-vol/run/10.16.159.54-d-testvol-2.pid -S /var/run/c009c5864d1d438b4e085b9af5fc2416.socket --brick-name /d/testvol-2 -l /usr/local/var/log/glusterfs/bricks/d-testvol-2.log --xlator-option *-posix.glusterd-uuid=df767b01-d8a1-4bba-b125-404931be1cc8 --brick-port 49153 --xlator-option test-vol-server.listen-port=49153
root     17056  0.0  0.0 112640   940 pts/0    S+   02:19   0:00 grep --color=auto glusterfsd

[root@dhcp159-54]# gluster v status
Status of volume: test-vol
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick 10.16.159.54:/d/testvol-1				N/A	N	N/A
Brick 10.16.159.54:/d/testvol-2				49153	Y	16917
NFS Server on localhost					2049	Y	16929
 
Task Status of Volume test-vol
------------------------------------------------------------------------------
There are no active volume tasks

Comment 3 Niels de Vos 2014-06-05 07:38:15 UTC

(In reply to Lalatendu Mohanty from comment #2)
> I tried to reproduce the issue using ext4 brick partitions on master branch
> (though the bug is on 3.5, was checking if it is on master branch too)
> 1. Created the vol using ext4. Bricks are directories on a ext4 partitions
> 2.  started the volume and mounted it
> 3. deleted one of the brick , using "rm -rf <brick>"
> 4. Saw below messages in /var/log/messages and the brick process got killed.

Yes, removing the directory that holds the brick will get detected. But this is not really the same as simulating a disk failure. You can use device-mapper to forcefully remove devices or load an error-target that will trigger a filesystem abort.

Or, simulating the unplugging of a device, can be done like this:

  # echo offline > /sys/block/sdb/device/state

Comment 4 Lalatendu Mohanty 2014-06-09 13:35:41 UTC

Thanks, Niels. I could reproduce the bug with "cho "offline" > /sys/block/sda/device/state" on a VM where sda is an IDE disk.

Comment 5 Anand Avati 2014-07-01 12:58:19 UTC

REVIEW: http://review.gluster.org/8213 (Posix: Brick failure detection fix for ext4 filesystem) posted (#1) for review on master by Lalatendu Mohanty (lmohanty)

Comment 6 Anand Avati 2014-07-01 14:44:21 UTC

REVIEW: http://review.gluster.org/8213 (Posix: Brick failure detection fix for ext4 filesystem) posted (#2) for review on master by Lalatendu Mohanty (lmohanty)

Comment 7 Anand Avati 2014-07-01 15:02:46 UTC

REVIEW: http://review.gluster.org/8213 (Posix: Brick failure detection fix for ext4 filesystem) posted (#3) for review on master by Lalatendu Mohanty (lmohanty)

Comment 8 Anand Avati 2014-07-05 18:38:39 UTC

REVIEW: http://review.gluster.org/8213 (Posix: Brick failure detection fix for ext4 filesystem) posted (#4) for review on master by Lalatendu Mohanty (lmohanty)

Comment 9 Anand Avati 2014-08-14 15:18:50 UTC

REVIEW: http://review.gluster.org/8213 (Posix: Brick failure detection fix for ext4 filesystem) posted (#5) for review on master by Lalatendu Mohanty (lmohanty)

Comment 10 Lalatendu Mohanty 2014-10-28 11:40:58 UTC

*** Bug 1150244 has been marked as a duplicate of this bug. ***

Comment 11 Anand Avati 2014-10-28 11:54:45 UTC

REVIEW: http://review.gluster.org/8988 (Posix: Brick failure detection fix for ext4 filesystem) posted (#1) for review on release-3.6 by Lalatendu Mohanty (lmohanty)

Comment 12 Niels de Vos 2014-10-28 16:38:16 UTC

http://review.gluster.org/8989 has been merged for 3.5.

Comment 13 Niels de Vos 2014-11-05 09:24:11 UTC

The second Beta for GlusterFS 3.5.3 has been released [1]. Please verify if the release solves this bug report for you. In case the glusterfs-3.5.3beta2 release does not have a resolution for this issue, leave a comment in this bug and move the status to ASSIGNED. If this release fixes the problem for you, leave a note and change the status to VERIFIED.

Packages for several distributions have been made available on [2] to make testing easier.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-November/019359.html
[2] http://download.gluster.org/pub/gluster/glusterfs/qa-releases/3.5.3beta2/

Comment 14 Niels de Vos 2014-11-21 16:20:32 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.5.3, please reopen this bug report.

glusterfs-3.5.3 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/announce/2014-November/000042.html
[2] http://supercolony.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.