799856 – Native client hangs when accessing certain files or directories

Bug 799856 - Native client hangs when accessing certain files or directories

Summary: Native client hangs when accessing certain files or directories

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	core
Sub Component:
Version:	3.2.5
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Pranith Kumar K
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-03-05 09:30 UTC by Rami Hänninen
Modified:	2012-12-11 05:18 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2012-12-11 05:18:53 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Rami Hänninen 2012-03-05 09:30:18 UTC

Description of problem:

When an process attempts to access certain files or directories through GlusterFS native FUSE client, that process hangs: the process can not be interrupted, and killing it with signal 9 (SIGKILL) may leave it in <defuct> state, and make it a direct child of the 'init' process. This "zombie" process then keeps the target file or directory still open, and because of this, corresponding mount point can then not be unmounted, because it is "busy"; the only method to restore everything back to normal is to reboot the client.

The hanging happens for example (and specifically) when running a 'find' command over the entire GlusterFS file system in order to trigger canonical self-healing on a troublesome volume: the 'find' command hangs, and if a new 'find' command is issued again, it too hangs in the exact same spot. This would indicate that this is not some random timing or deadlock problem, but there is really something "special" in some target files and directories that causes the problem.

If all copies of the the troublesome file or directory are manually removed from every GlusterFS brick, the 'find' command will then advance past that point, only to hang at some later time at some other troublesome file or directory.

'ps axl' status report of a hanged 'find' process iterating through GlusterFS volume '/mnt/volume' process looks like this (with unrelevant processes removed from the list):

F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
4     0  3771     1  20   0 120272  9180 wait_a S    ?          0:18 find /mnt/volume

Corresponding 'lsof -p 3771' output is (with the actual hanging directory path replaced with a/b/c/d/e for clarity):

COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF                 NODE NAME
find    3771 root  cwd    DIR   0,18    98304 18446744065793965768 /mnt/volume/a/b/c/d/e
find    3771 root  rtd    DIR    9,0     4096                    2 /
find    3771 root  txt    REG    9,0   234512             58195979 /bin/find
find    3771 root  mem    REG    9,0 99158752              8652401 /usr/lib/locale/locale-archive
find    3771 root  mem    REG    9,0    19536             39321619 /lib64/libdl-2.12.so
find    3771 root  mem    REG    9,0   142424             39321637 /lib64/libpthread-2.12.so
find    3771 root  mem    REG    9,0  1832712             39321613 /lib64/libc-2.12.so
find    3771 root  mem    REG    9,0   122008             39321674 /lib64/libselinux.so.1
find    3771 root  mem    REG    9,0   595816             39321621 /lib64/libm-2.12.so
find    3771 root  mem    REG    9,0    43840             39321641 /lib64/librt-2.12.so
find    3771 root  mem    REG    9,0   148504             39321922 /lib64/ld-2.12.so
find    3771 root    0u   CHR  136,1      0t0                    4 /dev/pts/1 (deleted)
find    3771 root    1w   REG    9,0 30834688             46661667 /root/find-2012030403.log
find    3771 root    2u   CHR  136,1      0t0                    4 /dev/pts/1 (deleted)
find    3771 root    3r   DIR    9,0     4096             46661633 /root
find    3771 root    4r   DIR    9,0     4096             46661633 /root

Access to the the hanged directory parent directory works normally, but any attempt to access any files or directories inside the troublesome directory hangs as well. For example, the following 'ls' commands generate the following output:

[root@hostname ~]# ls /mnt/volume/a/b/c/d
e (terminates normally, with expected output)

[root@hostname ~]# ls /mnt/volume/a/b/c/d/e
(hangs, no output, can be killed with 'kill -9')

[root@hostname ~]# ls /mnt/volume/a/b/c/d/e/f
(hangs, no output, can not be killed with 'kill -9')

The corresponding 'ps axl' output for the hanged processes look like this:

26292 pts/3    D+     0:00 ls /mnt/volume/a/b/c/d/e/f
26524 pts/4    D+     0:00 ls /mnt/volume/a/b/c/d/e

After sending signal 9 (SIGKILL) to both processes:

kill -9 26292 26524

the command that was accessing 'e' dies normally, but the one that was accessing 'f' does not. The corresponding 'ps axl' output is:

26292 pts/3    D+     0:00 ls /mnt/volume/a/b/c/d/e/f

There are also a number of warnings and errors in corresponding GlusterFS volume logs in the client that was accessing the volume:

/var/log/glusterfs/mnt-volume.log:[2012-03-04 22:50:52.336764] E [afr-self-heal-common.c:1054:afr_sh_common_lookup_resp_handler] 7-volume-replicate-3: path /a/b/c/d/e on subvolume volume-client-6 => -1 (No such file or directory)
/var/log/glusterfs/mnt-volume.log:[2012-03-04 22:50:52.336961] E [afr-self-heal-common.c:1054:afr_sh_common_lookup_resp_handler] 7-volume-replicate-1: path /a/b/c/d/e on subvolume volume-client-2 => -1 (No such file or directory)
/var/log/glusterfs/mnt-volume.log:[2012-03-04 22:50:52.337715] E [afr-self-heal-common.c:1054:afr_sh_common_lookup_resp_handler] 7-volume-replicate-17: path /a/b/c/d/e on subvolume volume-client-34 => -1 (No such file or directory)
/var/log/glusterfs/mnt-volume.log:[2012-03-04 22:50:52.339047] E [afr-self-heal-common.c:1054:afr_sh_common_lookup_resp_handler] 7-volume-replicate-20: path /a/b/c/d/e on subvolume volume-client-40 => -1 (No such file or directory)
/var/log/glusterfs/mnt-volume.log:[2012-03-04 22:50:52.342469] E [afr-self-heal-common.c:1054:afr_sh_common_lookup_resp_handler] 7-volume-replicate-0: path /a/b/c/d/e on subvolume volume-client-0 => -1 (No such file or directory)
/var/log/glusterfs/mnt-volume.log:[2012-03-04 22:50:52.343970] E [afr-self-heal-common.c:1054:afr_sh_common_lookup_resp_handler] 7-volume-replicate-14: path /a/b/c/d/e on subvolume volume-client-28 => -1 (No such file or directory)
/var/log/glusterfs/mnt-volume.log:[2012-03-04 22:50:52.346431] E [afr-self-heal-common.c:1054:afr_sh_common_lookup_resp_handler] 7-volume-replicate-2: path /a/b/c/d/e on subvolume volume-client-4 => -1 (No such file or directory)
/var/log/glusterfs/mnt-volume.log:[2012-03-04 22:50:52.356663] E [afr-self-heal-common.c:1054:afr_sh_common_lookup_resp_handler] 7-volume-replicate-23: path /a/b/c/d/e on subvolume volume-client-46 => -1 (No such file or directory)
/var/log/glusterfs/mnt-volume.log:[2012-03-04 22:50:52.373125] E [afr-self-heal-common.c:1054:afr_sh_common_lookup_resp_handler] 7-volume-replicate-11: path /a/b/c/d/e on subvolume volume-client-22 => -1 (No such file or directory)
/var/log/glusterfs/mnt-volume.log:[2012-03-04 22:50:52.399428] E [afr-self-heal-common.c:1054:afr_sh_common_lookup_resp_handler] 7-volume-replicate-16: path /a/b/c/d/e on subvolume volume-client-33 => -1 (No such file or directory)
/var/log/glusterfs/mnt-volume.log:[2012-03-04 22:50:52.399532] E [afr-self-heal-common.c:1054:afr_sh_common_lookup_resp_handler] 7-volume-replicate-19: path /a/b/c/d/e on subvolume volume-client-39 => -1 (No such file or directory)
/var/log/glusterfs/mnt-volume.log:[2012-03-04 22:50:52.399753] E [afr-self-heal-common.c:1054:afr_sh_common_lookup_resp_handler] 7-volume-replicate-22: path /a/b/c/d/e on subvolume volume-client-45 => -1 (No such file or directory)
/var/log/glusterfs/mnt-volume.log:[2012-03-04 22:50:52.407050] E [afr-self-heal-common.c:1054:afr_sh_common_lookup_resp_handler] 7-volume-replicate-13: path /a/b/c/d/e on subvolume volume-client-27 => -1 (No such file or directory)

Here is GlusterFS volume info (with confidential site/volume/brick names altered and most of the bricks removed):

[root@hostname ~]# gluster volume info volume

Volume Name: volume
Type: Distributed-Replicate
Status: Started
Number of Bricks: 24 x 2 = 48
Transport-type: tcp
Bricks:
Brick1: r1s8.cluster.site.com:/mnt/data2/gfs/volume
Brick2: r1s9.cluster.site.com:/mnt/data2/gfs/volume
  :
Brick47: r1s4.cluster.site.com:/mnt/data9/gfs/volume
Brick48: r1s5.cluster.site.com:/mnt/data9/gfs/volume
Options Reconfigured:
diagnostics.client-log-level: WARNING
diagnostics.brick-log-level: WARNING
performance.quick-read: off

Please note that 'performance.quick-read' has been turned off in an attempt to avoid hanging issues described in:

https://bugzilla.redhat.com/show_bug.cgi?id=764743

When we inspect the troublesome files through the the brick native file systems, we do not see anything different in them and nearby files relative to directories and files that work correctly. Here are 'stat' output from two of the bricks (the directory is present in every 48 bricks, which we assume is normal):

  File: `/mnt/data8/gfs/volume/a/b/c/d/e'
  Size: 4096      	Blocks: 8          IO Block: 4096   directory
Device: 821h/2081d	Inode: 63833568    Links: 8
Access: (0775/drwxrwxr-x)  Uid: (   91/  tomcat)   Gid: (   91/  tomcat)
Access: 2012-03-05 10:28:32.516782891 +0200
Modify: 2012-03-04 22:50:53.618780064 +0200
Change: 2012-03-04 22:50:53.618780064 +0200
  File: `/mnt/data9/gfs/volume/a/b/c/d/e'
  Size: 4096      	Blocks: 16         IO Block: 4096   directory
Device: 831h/2097d	Inode: 16778400    Links: 8
Access: (0775/drwxrwxr-x)  Uid: (   91/  tomcat)   Gid: (   91/  tomcat)
Access: 2012-03-05 10:28:32.516782891 +0200
Modify: 2012-03-04 22:50:52.000000000 +0200
Change: 2012-03-04 23:53:51.838991487 +0200

This particular case described here involves a troublesome directory, but we have experienced exactly similar situations with individual troublesome files, too. 

We are running the GlusterFS volume in servers that run 'CentOS release 6.2' operating system, with kernel '2.6.32-220.4.2.el6.x86_64', 'Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz' CPE, and 16Gb of memory. The GlusterFS versio is 3.2.5, release 2.el6, for architecture 'x86_64'. GlusterFS has been installed from RedHat RPM:s. Corresponding 'glusterfs-core-3.2.5-2.el6.x86_64' RPM info header is:

Name        : glusterfs-core               Relocations: (not relocatable)
Version     : 3.2.5                             Vendor: Red Hat, Inc.
Release     : 2.el6                         Build Date: Tue 15 Nov 2011 03:43:32 PM EET
Install Date: Fri 27 Jan 2012 05:13:50 PM EET      Build Host: x86-004.build.bos.redhat.com
Group       : System Environment/Libraries   Source RPM: glusterfs-3.2.5-2.el6.src.rpm
Size        : 7146188                          License: GPLv3+
Signature   : (none)
Packager    : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
URL         : http://www.gluster.org/docs/index.php/GlusterFS

Before issuing this bug report we carefully researched earlier bug reports, and found at least the following other reports that seem describe similar situations that what we are now facing:

http://gluster.org/pipermail/gluster-users/2011-May/007580.html
https://bugzilla.redhat.com/show_bug.cgi?id=764743



How reproducible:

The problem occurs every time when we try to 'self-heal' (or 'rebalance') on our volume that was damaged earlier in a massive hardware failure. The operation always hangs when it finds the first "troublesome" file or directory.



Steps to Reproduce:

We do not know of course how exactly the hardware failure the broke the GlusterFS volume did what it did, or how to reproduce similar damage. We are however willing to assist you in any way we can by running any tests and provide any diagnostic information you wish in our damaged GlusterFS volume.
  


Actual results:

'find' and any other similar commands that try to access specific file or directory in GlusterFS hangs, and the corresponding operation and anything depending on it then fails.



Expected results:

Commands should not hang, but simply access and read the file or directory and any related meta-data.



Additional info:

These problems appeared after the corresponding cluster suffered a massive hardware failure, which broke several disks, and also introduced a 'split-brain' scenario. Faulty hardware was replaced, and the GFID problems caused by the 'split-brain' situation were fixed with the manual procedure described in:

http://gluster.org/pipermail/gluster-users/2011-July/008215.html
https://github.com/vikasgorur/gfid

After these issues were addressed, all attempts to self-heal the GlusterFS volume have failed because of corresponding 'find' command hangs. All other alternative commands like 'ls -R' or 'tree' that iterate the volume directory tree hang too.

It is probably worth mentioning that GlusterFS rebalance operation fails too: in 'fix-layout' phase the operation starts normally, and the corresponding counter as reported by rebalance 'status' command grows first steadily for a while, but then the index value stops growing, and rebalancing never becomes complete.

There are currently about 300.000 directories and 1.000.000 files in the affected volume, and about 0.02% of the files and directories seem to hang any commands that try to access them. The rest of files and directories work normally.

All access is done with GlusterFS native FUSE client: NFS is not involved in any way.

The affected GlusterFS volume was constructed recently, using the latest stable GlusterFS 3.2.5 version. In other words, the volume has never been upgraded from an older version.

The volume currently uses only 1% of the maximum capacity: we could recover from the current situation by simply copying all data away from the bricks, clearing the whole volume, and then copying the data back. However, we are unwilling to do this, because we are also evaluating GlusterFS, and we need to be sure that GlusterFS volumes can be properly recovered without 'start-everything-from-the-beginning' approach.

Comment 1 Amar Tumballi 2012-03-06 05:44:40 UTC

Thanks for the detailed report on the issue. The possible scenarios why this happens is possibly because of GFID mismatches. We have seen a similar issue in our testing too, and debugging it is in progress. Will update you soon on the situation.

-Amar

Comment 2 Vijay Bellur 2012-05-17 07:31:30 UTC

Addressing this post 3.3.0.

Comment 3 Pranith Kumar K 2012-06-13 07:15:16 UTC

Rami Hänninen,
   I realize this is a very late request but I was just wondering if you could provide the 'getfattr -d -m . -e hex ', 'stat' of both the file, parent-dir-of-file on the bricks the file is present. We know of hangs when the files have missing xattrs on the backends 
https://bugzilla.redhat.com/show_bug.cgi?id=798874,
https://bugzilla.redhat.com/show_bug.cgi?id=765587

I would like to verify if this bug is related to those.

Pranith.

Comment 4 Vijay Bellur 2012-12-11 05:18:53 UTC

Closing as requested information has not been provided.

Note You need to log in before you can comment on or make changes to this bug.