Description of problem: while untaring the kernel , if we do add-brick untar fails Version-Release number of selected component (if applicable): [root@rhs1-gold myscripts]# rpm -qa | grep gluster glusterfs-3.4.0.33rhs-1.el6rhs.x86_64 glusterfs-api-devel-3.4.0.33rhs-1.el6rhs.x86_64 glusterfs-debuginfo-3.4.0.33rhs-1.el6rhs.x86_64 samba-glusterfs-3.6.9-160.3.el6rhs.x86_64 gluster-swift-container-1.8.0-6.11.el6rhs.noarch glusterfs-libs-3.4.0.33rhs-1.el6rhs.x86_64 glusterfs-fuse-3.4.0.33rhs-1.el6rhs.x86_64 glusterfs-api-3.4.0.33rhs-1.el6rhs.x86_64 glusterfs-server-3.4.0.33rhs-1.el6rhs.x86_64 glusterfs-devel-3.4.0.33rhs-1.el6rhs.x86_64 gluster-swift-proxy-1.8.0-6.11.el6rhs.noarch gluster-swift-account-1.8.0-6.11.el6rhs.noarch gluster-swift-plugin-1.8.0-6.el6rhs.noarch vdsm-gluster-4.10.2-23.0.1.el6rhs.noarch glusterfs-geo-replication-3.4.0.33rhs-1.el6rhs.x86_64 glusterfs-rdma-3.4.0.33rhs-1.el6rhs.x86_64 gluster-swift-1.8.0-6.11.el6rhs.noarch gluster-swift-object-1.8.0-6.11.el6rhs.noarch How reproducible: Always Steps to Reproduce: 1. created a 3 brick distribute volume 2. mounted the volume and started kernel untar on the mount point 3. while untar is in progress add a brick to the volume Actual results: after some time untar fails Additional info: --------------- Volume Name: dist Type: Distribute Volume ID: 22d41a1d-8ecc-4226-bf79-0668a6daa150 Status: Started Number of Bricks: 4 Transport-type: tcp Bricks: Brick1: 10.70.37.113:/brick1/dist0 Brick2: 10.70.37.133:/brick1/dist1 Brick3: 10.70.37.134:/brick1/dist2 Brick4: 10.70.37.134:/brick1/dist3 --> newly added brick [2013-09-11 08:56:26.462990] I [dht-layout.c:633:dht_layout_normalize] 1-dist-dht: found anomalies in /linux-2.6.32.61/Documentation. holes=1 overlaps=0 missing=1 down=0 misc=0 [2013-09-11 08:56:26.641158] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 1-dist-client-3: remote operation failed: No such file or directory. Path: /linux-2.6.32.61/Documentation/filesystems/pohmelfs [2013-09-11 08:56:26.648318] W [client-rpc-fops.c:2058:client3_3_create_cbk] 1-dist-client-3: remote operation failed: No such file or directory. Path: /linux-2.6.32.61/Documentation/filesystems/pohmelfs/design_notes.txt [2013-09-11 08:56:26.648352] W [fuse-bridge.c:2398:fuse_create_cbk] 0-glusterfs-fuse: 8792: /linux-2.6.32.61/Documentation/filesystems/pohmelfs/design_notes.txt => -1 (No such file or directory) brick0 ====== [root@rhs1-gold myscripts]# getfattr -d -m . -e hex /brick1/dist0/linux-2.6.32.61/Documentation/filesystems/pohmelfs getfattr: Removing leading '/' from absolute path names # file: brick1/dist0/linux-2.6.32.61/Documentation/filesystems/pohmelfs trusted.gfid=0x10926f520c264d9084558ad95dbfd882 trusted.glusterfs.dht=0x00000001000000007ffffffebffffffc brick1 ======== [root@rhs2-bb dist1]# getfattr -d -m . -e hex linux-2.6.32.61/Documentation/filesystems/pohmelfs # file: linux-2.6.32.61/Documentation/filesystems/pohmelfs trusted.gfid=0x10926f520c264d9084558ad95dbfd882 trusted.glusterfs.dht=0x0000000100000000bffffffdffffffff [root@rhs2-bb dist1]# pwd /brick1/dist1 brick2 ========= [root@rhs3-bb dist2]# getfattr -d -m . -e hex linux-2.6.32.61/Documentation/filesystems/pohmelfs # file: linux-2.6.32.61/Documentation/filesystems/pohmelfs trusted.gfid=0x10926f520c264d9084558ad95dbfd882 trusted.glusterfs.dht=0x0000000100000000000000003ffffffe [root@rhs3-bb dist2]# pwd /brick1/dist2 brick3 ======= [root@rhs3-bb dist3]# getfattr -d -m . -e hex linux-2.6.32.61/Documentation/filesystems/pohmelfs getfattr: linux-2.6.32.61/Documentation/filesystems/pohmelfs: No such file or directory [root@rhs3-bb dist3]# pwd /brick1/dist3 [root@rhs3-bb dist3]# getfattr -d -m . -e hex . # file: . trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.volume-id=0x22d41a1d8ecc4226bf790668a6daa150 [root@rhs3-bb dist3]# pwd /brick1/dist3 Cluster info --------------- rhs nodes ---------- 10.70.37.113 10.70.37.133 10.70.37.134 10.70.37.59 mount point -------- /mount attaching the sosreport
Targeting for 3.0.0 (Denali) release.
Dev ack to 3.0 RHS BZs
Updon adding a brick while i/o is going on i could see "stale file handle messages in the logs" from mount logs ------------- [2014-06-20 06:04:21.010211] D [MSGID: 0] [dht-inode-write.c:852:dht_setattr_cbk] 5-test-dht: subvolume test-client-3 returned -1 (Stale file handle ) [2014-06-20 06:04:21.010889] W [client-rpc-fops.c:2134:client3_3_setattr_cbk] 5-test-client-3: remote operation failed: Stale file handle [2014-06-20 06:04:21.011538] W [client-rpc-fops.c:2134:client3_3_setattr_cbk] 5-test-client-3: remote operation failed: Stale file handle [2014-06-20 06:04:21.013935] W [client-rpc-fops.c:306:client3_3_mkdir_cbk] 5-test-client-3: remote operation failed: Stale file handle. Path: /linux -2.6.32.63/arch/arm/plat-s3c/include/plat [2014-06-20 06:04:21.014776] W [client-rpc-fops.c:1021:client3_3_setxattr_cbk] 5-test-client-3: remote operation failed: Stale file handle [2014-06-20 06:04:21.170889] W [client-rpc-fops.c:2134:client3_3_setattr_cbk] 5-test-client-3: remote operation failed: Stale file handle [2014-06-20 06:04:21.171642] W [client-rpc-fops.c:2134:client3_3_setattr_cbk] 5-test-client-3: remote operation failed: Stale file handle [2014-06-20 06:04:21.172404] W [client-rpc-fops.c:2134:client3_3_setattr_cbk] 5-test-client-3: remote operation failed: Stale file handle [2014-06-20 06:04:21.173156] W [client-rpc-fops.c:2134:client3_3_setattr_cbk] 5-test-client-3: remote operation failed: Stale file handle [2014-06-20 06:04:21.173899] W [client-rpc-fops.c:2134:client3_3_setattr_cbk] 5-test-client-3: remote operation failed: Stale file handle [2014-06-20 06:04:21.174638] W [client-rpc-fops.c:2134:client3_3_setattr_cbk] 5-test-client-3: remote operation failed: Stale file handle [2014-06-20 06:04:21.204200] W [client-rpc-fops.c:2134:client3_3_setattr_cbk] 5-test-client-3: remote operation failed: Stale file handle [2014-06-20 06:04:21.204970] W [client-rpc-fops.c:2134:client3_3_setattr_cbk] 5-test-client-3: remote operation failed: Stale file handle [2014-06-20 06:04:21.205716] W [client-rpc-fops.c:2134:client3_3_setattr_cbk] 5-test-client-3: remote operation failed: Stale file handle [2014-06-20 06:04:21.208210] W [client-rpc-fops.c:306:client3_3_mkdir_cbk] 5-test-client-3: remote operation failed: Stale file handle. Path: /linux -2.6.32.63/arch/arm/plat-s3c24xx [2014-06-20 06:04:21.209135] W [client-rpc-fops.c:1021:client3_3_setxattr_cbk] 5-test-client-3: remote operation failed: Stale file handle [2014-06-20 06:04:21.265725] W [client-rpc-fops.c:306:client3_3_mkdir_cbk] 5-test-client-3: remote operation failed: Stale file handle. Path: /linux -2.6.32.63/arch/arm/plat-s3c24xx/include [2014-06-20 06:04:21.266616] W [client-rpc-fops.c:1021:client3_3_setxattr_cbk] 5-test-client-3: remote operation failed: Stale file handle [2014-06-20 06:04:21.269226] W [client-rpc-fops.c:306:client3_3_mkdir_cbk] 5-test-client-3: remote operation failed: Stale file handle. Path: /linux -2.6.32.63/arch/arm/plat-s3c24xx/include/mach [2014-06-20 06:04:21.270158] W [client-rpc-fops.c:1021:client3_3_setxattr_cbk] 5-test-client-3: remote operation failed: Stale file handle [2014-06-20 06:04:21.274812] W [client-rpc-fops.c:2134:client3_3_setattr_cbk] 5-test-client-3: remote operation failed: Stale file handle [2014-06-20 06:04:21.275561] W [client-rpc-fops.c:2134:client3_3_setattr_cbk] 5-test-client-3: remote operation failed: Stale file handle [2014-06-20 06:04:21.276337] W [client-rpc-fops.c:2134:client3_3_setattr_cbk] 5-test-client-3: remote operation failed: Stale file handle [2014-06-20 06:04:21.278774] W [client-rpc-fops.c:306:client3_3_mkdir_cbk] 5-test-client-3: remote operation failed: Stale file handle. Path: /linux -2.6.32.63/arch/arm/plat-s3c24xx/include/plat [2014-06-20 06:04:21.279700] W [client-rpc-fops.c:1021:client3_3_setxattr_cbk] 5-test-client-3: remote operation failed: Stale file handle As a consequence of this if we try to remove-directory rm fails with stale file handles message for some of the directories on the mount point ================ rm: cannot remove `linux-2.6.32.63/arch/arm/mach-sa1100/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-stmp37xx/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-u300/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/plat-mxc/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/plat-orion/include/plat': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/plat-s3c24xx/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/plat-s3c24xx/include/plat': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/plat-s5pc1xx/include/plat': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/plat-stmp3xxx/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/tools': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/common': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/configs': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-aaec2000/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-bcmring/csp/chipc': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-bcmring/csp/dmac': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-bcmring/csp/tmr': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-bcmring/include/csp': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-bcmring/include/mach/csp': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-gemini/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-ixp2000/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-ixp23xx/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-kirkwood/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-ks8695/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-l7200/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-lh7a40x/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-loki/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-msm/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-mv78xx0/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-mx1': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-mx2': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-omap1': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-orion5x/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-s3c6400/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/mach-versatile/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/nwfpe': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/plat-omap/include/mach': Stale file handle rm: cannot remove `linux-2.6.32.63/arch/arm/plat-pxa/include/plat': Stale file handle from mount logs during rm ========================= [2014-06-20 06:06:54.254635] W [fuse-bridge.c:1298:fuse_unlink_cbk] 0-glusterfs-fuse: 1005235: RMDIR() /linux-2.6.32.63/arch/arm/plat-pxa/include/plat => -1 (Stale file handle) [2014-06-20 06:06:54.255240] W [client-rpc-fops.c:2674:client3_3_opendir_cbk] 5-test-client-3: remote operation failed: Stale file handle. Path: /linux-2.6.32.63/arch/arm/plat-pxa/include/plat (ae555539-f0a8-4d7f-b504-8e92228b8660) [2014-06-20 06:06:54.255293] W [fuse-bridge.c:1298:fuse_unlink_cbk] 0-glusterfs-fuse: 1005236: RMDIR() /linux-2.6.32.63/arch/arm/plat-pxa/include/plat => -1 (Stale file handle) Cluster info ============ rhs-client4.lab.eng.blr.redhat.com rhs-client39 mount point ========= rhs-client4:/test volume info =========== Volume Name: test Type: Distribute Volume ID: 5f95f04d-eaf7-4ad2-a551-b80820918e0c Status: Started Snap Volume: no Number of Bricks: 4 Transport-type: tcp Bricks: Brick1: rhs-client4.lab.eng.blr.redhat.com:/home/t0 Brick2: rhs-client39.lab.eng.blr.redhat.com:/home/t1 Brick3: rhs-client4.lab.eng.blr.redhat.com:/home/t2 Brick4: rhs-client4.lab.eng.blr.redhat.com:/home/t3 Options Reconfigured: diagnostics.client-log-level: INFO performance.readdir-ahead: on snap-max-hard-limit: 256 snap-max-soft-limit: 90 auto-delete: disable
<RCA from upstream patch> Till we separated the scenario of a file/directory not existing from parent not existing [1], we used to include a subvolume in the layout of a directory even if it is not present on that subvolume. This was done to allow a lookup racing with mkdir to create correct layout. However, there are other scenarios as well where a directory is not present. One such situation is trying to create a directory after an add-brick. Since there is no guarantee that all the ancestors are created after an add-brick (and hence directory cannot be created), the newly added brick should not be part of the layout. However, we used to consider newly added brick as part of layout (even before we do fix-layout of all the ancestors) and this was the root cause of [2]. With [1], this issue got fixed and hence [2] got fixed too. However, [1] is not complete in the sense we didn't modify rmdir codepath appropriately. This patch fixes that gap. [1] http://review.gluster.org/6322 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1006809 </RCA>
patch at: https://code.engineering.redhat.com/gerrit/#/c/27535
Verified on 3.6.0.27-1.el6_5.x86_64, however I/O fails on NFS raising a seperate bug
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-1278.html