Bug 864401
Summary: | [glusterfs-3.3.1qa3]: glusterfs client asserted | |||
---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Raghavendra Bhat <rabhat> | |
Component: | posix | Assignee: | Raghavendra Bhat <rabhat> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Sachidananda Urs <surs> | |
Severity: | unspecified | Docs Contact: | ||
Priority: | unspecified | |||
Version: | pre-release | CC: | esandeen, fharshav, gluster-bugs, jahernan, sdharane, shaines | |
Target Milestone: | --- | |||
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.4.0 | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 895841 (view as bug list) | Environment: | ||
Last Closed: | 2013-07-24 17:30:44 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 895841 |
Description
Raghavendra Bhat
2012-10-09 10:57:59 UTC
The reason for the crash is because of some problem in the backend xfs filesystem. upon lstat() was returning 5 instead of 0 or -1. Thus posix xlator did not create the dictionary which contains the xattrs information of lookup and thus returned NULL dictionary. This is the o/p of dmesg: XFS (dm-8): xfs_log_force: error 5 returned. XFS (dm-7): xfs_log_force: error 5 returned. XFS (dm-8): xfs_log_force: error 5 returned. XFS (dm-7): xfs_log_force: error 5 returned. XFS (dm-8): xfs_log_force: error 5 returned. XFS (dm-7): xfs_log_force: error 5 returned. XFS (dm-8): xfs_log_force: error 5 returned. XFS (dm-7): xfs_log_force: error 5 returned. XFS (dm-8): xfs_log_force: error 5 returned. XFS (dm-7): xfs_log_force: error 5 returned. XFS (dm-8): xfs_log_force: error 5 returned. XFS (dm-7): xfs_log_force: error 5 returned. XFS (dm-8): xfs_log_force: error 5 returned. XFS (dm-7): xfs_log_force: error 5 returned. XFS (dm-8): xfs_log_force: error 5 returned. XFS (dm-7): xfs_log_force: error 5 returned. XFS (dm-8): xfs_log_force: error 5 returned. XFS (dm-7): xfs_log_force: error 5 returned. XFS (dm-8): xfs_log_force: error 5 returned. XFS (dm-7): xfs_log_force: error 5 returned. CHANGE: http://review.gluster.org/4056 (storage/posix: return -1 if lstat call returns non zero value apart from -1) merged in master by Anand Avati (avati) CHANGE: http://review.gluster.org/4054 (storage/posix: return -1 if lstat call returns non zero value apart from -1) merged in release-3.3 by Anand Avati (avati) (In reply to comment #1) > The reason for the crash is because of some problem in the backend xfs > filesystem. upon lstat() was returning 5 instead of 0 or -1. Thus posix > xlator did not create the dictionary which contains the xattrs information > of lookup and thus returned NULL dictionary. > > > This is the o/p of dmesg: > > XFS (dm-8): xfs_log_force: error 5 returned. > XFS (dm-7): xfs_log_force: error 5 returned. > XFS (dm-8): xfs_log_force: error 5 returned. > XFS (dm-7): xfs_log_force: error 5 returned. > XFS (dm-8): xfs_log_force: error 5 returned. > XFS (dm-7): xfs_log_force: error 5 returned. > XFS (dm-8): xfs_log_force: error 5 returned. > XFS (dm-7): xfs_log_force: error 5 returned. > XFS (dm-8): xfs_log_force: error 5 returned. > XFS (dm-7): xfs_log_force: error 5 returned. > XFS (dm-8): xfs_log_force: error 5 returned. > XFS (dm-7): xfs_log_force: error 5 returned. > XFS (dm-8): xfs_log_force: error 5 returned. > XFS (dm-7): xfs_log_force: error 5 returned. > XFS (dm-8): xfs_log_force: error 5 returned. > XFS (dm-7): xfs_log_force: error 5 returned. > XFS (dm-8): xfs_log_force: error 5 returned. > XFS (dm-7): xfs_log_force: error 5 returned. > XFS (dm-8): xfs_log_force: error 5 returned. > XFS (dm-7): xfs_log_force: error 5 returned. do you know how to reproduced this issue? did you see XFS produce messages like "xfs_iunlink_remove: xfs_inotobp() returned error 22." and then XFS shuts itself down? If you know how to reproduce, possible provide me the testbed / reproducer details or what set of operations were being performed on the XFS while this occurred? Thanks -Harsha Both the machines user were vms. I had run some tests and then had left the setup as it is for few days (may b a week). Had not run any command on those machines during the idle time. And when I had run the tests there was no error. But when I again logged into the machine after a week the fuse client crashed due to the above error after entering a command on the fuse mount. I dont know how that xfs issue came. But in one of my other testings where I was doing removing the cable of the disk or if the mount is via losetup then removing the actual file used for loopback, I had got above errors again. "upon lstat() was returning 5 instead of 0 or -1. " Which lstat() is this - glibc, or something in the gluster code? 5 is EIO which is the errno xfs is returning from lstat after it shuts down, but whateverteh lstat implementation is should have been putting that into errno and returning -1. Let's not lose track of *that* bug . . . checking for other values returned from a syscall seems like papering over the root cause of the bug I think. IOW, this is what glibc's lstat does in the face of a shut-down xfs filesystem: # mount /dev/sdb1 /mnt/test # touch /mnt/test/file # /mnt/test2/git/xfstests/src/godown /mnt/test # strace -e lstat stat /mnt/test/file lstat("/mnt/test/file", 0x7fffc9ae2480) = -1 EIO (Input/output error) stat: cannot stat `/mnt/test/file': Input/output error # dmesg tail -n 2 [359908.052905] XFS (sdb1): xfs_log_force: error 5 returned. [359938.103901] XFS (sdb1): xfs_log_force: error 5 returned. -Eric And looking at glusterfs code, it does seem to be doing the right thing in the lstat wappers. I don't see anything in this bug that indicates lstat *did* return 5. Are you sure of this? Thanks, -Eric Earlier glusterfs (storage/posix xlator which talks to the backend) was checking the return value of lstat () syscall and was treating it as a failure if lstat return value is -1. When thig bug was found, lstat () returning 5 instead of -1 (dmesg also said some xfs error, as mentioned in the above comments). Since glusterfs was treating -1 return value as the failure of the syscall, it assumed the lookup call was successful and returned 0 instead of -1. Other components of glusterfs (replicate xlator) checked the return value to be 0 and started accessing some structures which did not contain the valid information and thus segfaulted. (In reply to comment #9) > Earlier glusterfs (storage/posix xlator which talks to the backend) was > checking the return value of lstat () syscall and was treating it as a > failure if lstat return value is -1. When thig bug was found, lstat () > returning 5 instead of -1 How was it determined that lstat returned 5? > (dmesg also said some xfs error, as mentioned in > the above comments). dmesg may say "error 5 returned" but that does *not* mean that's what lstat returned. That's what an internal kernel function returned, and it should have been translated into an *errno* of 5, i.e. EIO; see comment #7 above. Do you have traces of lstat actually returning 5? If so that is a glibc bug which must be addressed, but I do not see it in my testing. Ok actually RHEL6 does have this bug. I wish gluster folks had notified us: xfs_vn_getattr returns positive not negative errno: if (XFS_FORCED_SHUTDOWN(mp)) return XFS_ERROR(EIO); see also bug #604089 Ok. The XFS bug was fixed in bug #859242 commit 2e8ecba563cfeff97584b578043b2e7ea7a31498 Author: Dave Chinner <dchinner> Date: Mon Sep 24 14:21:30 2012 -0400 [fs] xfs: Return -EIO when xfs_vn_getattr() failed Upstream commit: ed32201e65e15f3e6955cb84cbb544b08f81e5a5 An attribute of inode can be fetched via xfs_vn_getattr() in XFS. Currently it returns EIO, not negative value, when it failed. As a result, the system call returns not negative value even though an error occured. The stat(2), ls and mv commands cannot handle this error and do not work correctly. This patch fixes this bug, and returns -EIO, not EIO when an error is detected in xfs_vn_getattr(). Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu> Reviewed-by: Christoph Hellwig <hch> Signed-off-by: Alex Elder <aelder> Signed-off-by: Jarod Wilson <jarod> Sorry for all the noise, but when gluster finds fs problems we really need to know about them, even if you also work around it in the short term. Thanks, -Eric |