Bug 1006809
Summary: | DHT: Kernel untar fails on the mount point after add-brick | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | shylesh <shmohan> |
Component: | glusterfs | Assignee: | Krutika Dhananjay <kdhananj> |
Status: | CLOSED ERRATA | QA Contact: | shylesh <shmohan> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 2.1 | CC: | kdhananj, nsathyan, rgowdapp, ssamanta, vagarwal, vbellur |
Target Milestone: | --- | ||
Target Release: | RHGS 3.0.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-3.6.0.22-1 | Doc Type: | Bug Fix |
Doc Text: |
Earlier, mkdir failures because of parents not being present returned ENOENT. DHT-selfheal considers a brick which returned ENOENT on lookup as part of layout assuming that the lookup might be racing with mkdir. Hence, the newly added brick would be considered for layout assignments. However, the directory creation itself might've been failed because of parents not being present on new brick and subsequently when a file that is about to be created hashes to the new brick, it would fail with ENOENT, which is propagated back to application.
The fix involves treating parent being absent on a sub-volume (in this case because the directory hierarchy is yet to be constructed on the newly added brick) as ESTALE error (as opposed to ENOENT). As a result, the newly added brick is not considered for layout assignment, thereby fixing 'No such file or directory' failures on the mount point on subsequent directory entry creations.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2014-09-22 19:28:57 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
shylesh
2013-09-11 10:26:42 UTC
Targeting for 3.0.0 (Denali) release. Dev ack to 3.0 RHS BZs Updon adding a brick while i/o is going on i could see "stale file handle messages in the logs" from mount logs ------------- [2014-06-20 06:04:21.010211] D [MSGID: 0] [dht-inode-write.c:852:dht_setattr_cbk] 5-test-dht: subvolume test-client-3 returned -1 (Stale file handle ) [2014-06-20 06:04:21.010889] W [client-rpc-fops.c:2134:client3_3_setattr_cbk] 5-test-client-3: remote operation failed: Stale file handle [2014-06-20 06:04:21.011538] W [client-rpc-fops.c:2134:client3_3_setattr_cbk] 5-test-client-3: remote operation failed: Stale file handle [2014-06-20 06:04:21.013935] W [client-rpc-fops.c:306:client3_3_mkdir_cbk] 5-test-client-3: remote operation failed: Stale file handle. Path: /linux -2.6.32.63/arch/arm/plat-s3c/include/plat [2014-06-20 06:04:21.014776] W [client-rpc-fops.c:1021:client3_3_setxattr_cbk] 5-test-client-3: remote operation failed: Stale file handle [2014-06-20 06:04:21.170889] W [client-rpc-fops.c:2134:client3_3_setattr_cbk] 5-test-client-3: remote operation failed: Stale file handle [2014-06-20 06:04:21.171642] W [client-rpc-fops.c:2134:client3_3_setattr_cbk] 5-test-client-3: remote operation failed: Stale file handle [2014-06-20 06:04:21.172404] W [client-rpc-fops.c:2134:client3_3_setattr_cbk] 5-test-client-3: remote operation failed: Stale file handle [2014-06-20 06:04:21.173156] W [client-rpc-fops.c:2134:client3_3_setattr_cbk] 5-test-client-3: remote operation failed: Stale file handle [2014-06-20 06:04:21.173899] W [client-rpc-fops.c:2134:client3_3_setattr_cbk] 5-test-client-3: remote operation failed: Stale file handle [2014-06-20 06:04:21.174638] W [client-rpc-fops.c:2134:client3_3_setattr_cbk] 5-test-client-3: remote operation failed: Stale file handle [2014-06-20 06:04:21.204200] W [client-rpc-fops.c:2134:client3_3_setattr_cbk] 5-test-client-3: remote operation failed: Stale file handle [2014-06-20 06:04:21.204970] W [client-rpc-fops.c:2134:client3_3_setattr_cbk] 5-test-client-3: remote operation failed: Stale file handle [2014-06-20 06:04:21.205716] W [client-rpc-fops.c:2134:client3_3_setattr_cbk] 5-test-client-3: remote operation failed: Stale file handle [2014-06-20 06:04:21.208210] W [client-rpc-fops.c:306:client3_3_mkdir_cbk] 5-test-client-3: remote operation failed: Stale file handle. Path: /linux -2.6.32.63/arch/arm/plat-s3c24xx [2014-06-20 06:04:21.209135] W [client-rpc-fops.c:1021:client3_3_setxattr_cbk] 5-test-client-3: remote operation failed: Stale file handle [2014-06-20 06:04:21.265725] W [client-rpc-fops.c:306:client3_3_mkdir_cbk] 5-test-client-3: remote operation failed: Stale file handle. Path: /linux -2.6.32.63/arch/arm/plat-s3c24xx/include [2014-06-20 06:04:21.266616] W [client-rpc-fops.c:1021:client3_3_setxattr_cbk] 5-test-client-3: remote operation failed: Stale file handle [2014-06-20 06:04:21.269226] W [client-rpc-fops.c:306:client3_3_mkdir_cbk] 5-test-client-3: remote operation failed: Stale file handle. Path: /linux -2.6.32.63/arch/arm/plat-s3c24xx/include/mach [2014-06-20 06:04:21.270158] W [client-rpc-fops.c:1021:client3_3_setxattr_cbk] 5-test-client-3: remote operation failed: Stale file handle [2014-06-20 06:04:21.274812] W [client-rpc-fops.c:2134:client3_3_setattr_cbk] 5-test-client-3: remote operation failed: Stale file handle [2014-06-20 06:04:21.275561] W [client-rpc-fops.c:2134:client3_3_setattr_cbk] 5-test-client-3: remote operation failed: Stale file handle [2014-06-20 06:04:21.276337] W [client-rpc-fops.c:2134:client3_3_setattr_cbk] 5-test-client-3: remote operation failed: Stale file handle [2014-06-20 06:04:21.278774] W [client-rpc-fops.c:306:client3_3_mkdir_cbk] 5-test-client-3: remote operation failed: Stale file handle. Path: /linux -2.6.32.63/arch/arm/plat-s3c24xx/include/plat [2014-06-20 06:04:21.279700] W [client-rpc-fops.c:1021:client3_3_setxattr_cbk] 5-test-client-3: remote operation failed: Stale file handle As a consequence of this if we try to remove-directory rm fails with stale file handles message for some of the directories on the mount point ================ rm: cannot remove `linux-2.6.32.63/arch/arm/mach-sa1100/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-stmp37xx/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-u300/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/plat-mxc/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/plat-orion/include/plat': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/plat-s3c24xx/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/plat-s3c24xx/include/plat': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/plat-s5pc1xx/include/plat': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/plat-stmp3xxx/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/tools': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/common': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/configs': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-aaec2000/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-bcmring/csp/chipc': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-bcmring/csp/dmac': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-bcmring/csp/tmr': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-bcmring/include/csp': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-bcmring/include/mach/csp': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-gemini/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-ixp2000/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-ixp23xx/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-kirkwood/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-ks8695/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-l7200/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-lh7a40x/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-loki/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-msm/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-mv78xx0/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-mx1': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-mx2': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-omap1': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-orion5x/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-s3c6400/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-versatile/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/nwfpe': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/plat-omap/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/plat-pxa/include/plat': Stale file handle from mount logs during rm ========================= [2014-06-20 06:06:54.254635] W [fuse-bridge.c:1298:fuse_unlink_cbk] 0-glusterfs-fuse: 1005235: RMDIR() /linux-2.6.32.63/arch/arm/plat-pxa/include/plat => -1 (Stale file handle) [2014-06-20 06:06:54.255240] W [client-rpc-fops.c:2674:client3_3_opendir_cbk] 5-test-client-3: remote operation failed: Stale file handle. Path: /linux-2.6.32.63/arch/arm/plat-pxa/include/plat (ae555539-f0a8-4d7f-b504-8e92228b8660) [2014-06-20 06:06:54.255293] W [fuse-bridge.c:1298:fuse_unlink_cbk] 0-glusterfs-fuse: 1005236: RMDIR() /linux-2.6.32.63/arch/arm/plat-pxa/include/plat => -1 (Stale file handle) Cluster info ============ rhs-client4.lab.eng.blr.redhat.com rhs-client39 mount point ========= rhs-client4:/test volume info =========== Volume Name: test Type: Distribute Volume ID: 5f95f04d-eaf7-4ad2-a551-b80820918e0c Status: Started Snap Volume: no Number of Bricks: 4 Transport-type: tcp Bricks: Brick1: rhs-client4.lab.eng.blr.redhat.com:/home/t0 Brick2: rhs-client39.lab.eng.blr.redhat.com:/home/t1 Brick3: rhs-client4.lab.eng.blr.redhat.com:/home/t2 Brick4: rhs-client4.lab.eng.blr.redhat.com:/home/t3 Options Reconfigured: diagnostics.client-log-level: INFO performance.readdir-ahead: on snap-max-hard-limit: 256 snap-max-soft-limit: 90 auto-delete: disable <RCA from upstream patch> Till we separated the scenario of a file/directory not existing from parent not existing [1], we used to include a subvolume in the layout of a directory even if it is not present on that subvolume. This was done to allow a lookup racing with mkdir to create correct layout. However, there are other scenarios as well where a directory is not present. One such situation is trying to create a directory after an add-brick. Since there is no guarantee that all the ancestors are created after an add-brick (and hence directory cannot be created), the newly added brick should not be part of the layout. However, we used to consider newly added brick as part of layout (even before we do fix-layout of all the ancestors) and this was the root cause of [2]. With [1], this issue got fixed and hence [2] got fixed too. However, [1] is not complete in the sense we didn't modify rmdir codepath appropriately. This patch fixes that gap. [1] http://review.gluster.org/6322 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1006809 </RCA> Verified on 3.6.0.27-1.el6_5.x86_64, however I/O fails on NFS raising a seperate bug Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-1278.html |