Description of problem: ============= While performing rm -rf getting unknown error 527 on tiered volume Version-Release number of selected component (if applicable): ============ glusterfs-server-3.7.5-12.el7rhgs.x86_64 How reproducible: Steps to Reproduce: ============= 1. Create 3x2 volume then attach 1x2 hot tier and mount it on client using NFS 2. Create a script to untar linux kernel and then remove all files and while removing files some times getting unknown error 527 Additional info: ============= [root@tettnang ~]# gluster vol info system_rajesh Volume Name: system_rajesh Type: Tier Volume ID: fac2f70a-8a9e-4d09-8a0a-1b73be4cc137 Status: Started Number of Bricks: 8 Transport-type: tcp Hot Tier : Hot Tier Type : Replicate Number of Bricks: 1 x 2 = 2 Brick1: tettnang.lab.eng.blr.redhat.com:/rhs/brick2/hot2 Brick2: tettnang.lab.eng.blr.redhat.com:/rhs/brick2/hot1 Cold Tier: Cold Tier Type : Distributed-Replicate Number of Bricks: 3 x 2 = 6 Brick3: rhs-client19.lab.eng.blr.redhat.com:/rhs/brick1/system_rajesh Brick4: rhs-client18.lab.eng.blr.redhat.com:/rhs/brick1/system_rajesh Brick5: vertigo.lab.eng.blr.redhat.com:/rhs/brick1/system_rajesh Brick6: ninja.lab.eng.blr.redhat.com:/rhs/brick1/system_rajesh Brick7: transformers.lab.eng.blr.redhat.com:/rhs/brick1/system_rajesh Brick8: interstellar.lab.eng.blr.redhat.com:/rhs/brick1/system_rajesh Options Reconfigured: performance.readdir-ahead: on features.ctr-enabled: on cluster.tier-mode: cache snippet of the NFS server log ================ [2015-12-21 10:04:25.018978] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-system_rajesh-client-1: Server and Client lk-version numbers are not same, reopening the fds [2015-12-21 10:04:25.019288] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-system_rajesh-client-1: Server lk version = 1 [2015-12-21 10:04:25.024976] I [MSGID: 108031] [afr-common.c:1872:afr_local_discovery_cbk] 0-system_rajesh-replicate-2: selecting local read_child system_rajesh-client-4 [2015-12-21 10:12:01.642135] W [graph.c:357:_log_if_unknown_option] 0-system_rajesh-tier-dht: option 'rule' is not recognized [2015-12-21 10:12:01.727829] E [MSGID: 112070] [nfs3.c:2304:nfs3_write] 0-nfs-nfsv3: Volume is disabled: system_rajesh [2015-12-21 10:12:01.730073] E [MSGID: 112070] [nfs3.c:2304:nfs3_write] 0-nfs-nfsv3: Volume is disabled: system_rajesh [2015-12-21 10:12:01.735341] E [MSGID: 112070] [nfs3.c:2304:nfs3_write] 0-nfs-nfsv3: Volume is disabled: system_rajesh [2015-12-21 10:12:01.737728] E [MSGID: 112070] [nfs3.c:2304:nfs3_write] 0-nfs-nfsv3: Volume is disabled: system_rajesh [2015-12-21 10:12:01.765497] E [MSGID: 112070] [nfs3.c:2304:nfs3_write] 0-nfs-nfsv3: Volume is disabled: system_rajesh [2015-12-21 10:12:01.781583] E [MSGID: 112070] [nfs3.c:2304:nfs3_write] 0-nfs-nfsv3: Volume is disabled: system_rajesh [2015-12-21 10:12:01.800313] E [MSGID: 112070] [nfs3.c:2304:nfs3_write] 0-nfs-nfsv3: Volume is disabled: system_rajesh [2015-12-21 10:12:01.814846] E [MSGID: 112070] [nfs3.c:2304:nfs3_write] 0-nfs-nfsv3: Volume is disabled: system_rajesh [2015-12-21 10:12:01.825091] E [MSGID: 112070] [nfs3.c:2304:nfs3_write] 0-nfs-nfsv3: Volume is disabled: system_rajesh [2015-12-21 10:12:01.840992] E [MSGID: 112070] [nfs3.c:2304:nfs3_write] 0-nfs-nfsv3: Volume is disabled: system_rajesh [2015-12-21 10:12:01.842511] E [MSGID: 112070] [nfs3.c:2304:nfs3_write] 0-nfs-nfsv3: Volume is disabled: system_rajesh [2015-12-21 10:12:01.853348] E [MSGID: 112070] [nfs3.c:2304:nfs3_write] 0-nfs-nfsv3: Volume is disabled: system_rajesh [2015-12-21 10:12:01.858960] E [MSGID: 112070] [nfs3.c:2304:nfs3_write] 0-nfs-nfsv3: Volume is disabled: system_rajesh [2015-12-21 10:12:01.865087] E [MSGID: 112070] [nfs3.c:2304:nfs3_write] 0-nfs-nfsv3: Volume is disabled: system_rajesh [2015-12-21 10:12:01.875971] E [MSGID: 112070] [nfs3.c:2304:nfs3_write] 0-nfs-nfsv3: Volume is disabled: system_rajesh [2015-12-21 10:12:01.881231] E [MSGID: 112070] [nfs3.c:2304:nfs3_write] 0-nfs-nfsv3: Volume is disabled: system_rajesh
[root@tettnang ~]# mount | grep rajesh transformers.lab.eng.blr.redhat.com:/system_rajesh/linux on /mnt/system_rajesh/linux type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.70.34.44,mountvers=3,mountport=38465,mountproto=tcp,local_lock=none,addr=10.70.34.44) [root@tettnang ~]# sosreport available @ /home/repo/sosreports/bug.1293798 on rhsqe-repo.lab.eng.blr.redhat.com
Is this always reproducible or is it a one time happening? Do you have a test-script (shell or distaf) that I can use to reproduce this?
We observed this issue on two setups and we are not hitting the issue consistently I was running fallowing script for i in {1..1000} do tar -xvf linux-3.19.tar.gz sleep 180 rm -rf linux-3.19 done
Created attachment 1116320 [details] Script to setup the environment and reproduce the problem I'm using this to configure three systems and run the test. Its been running for a long time already, but there have not been any errors yet. I'm checking for the errors with: tail -f /var/log/glusterfs/nfs.log | grep ' E ' I'll keep this running over night, 1000 iterations are a lot and it seems pretty time consuming. The snippet from your log shows that the problem happened 8 minutes after the NFS-server was started. - Did the tarball extraction/removal start immediately after that? - There was a message "option 'rule' is not recognized", do you know where that came from? Could you check if my script should do the same actions as what you did on your environment?
It seems that I have reproduced the error: rm: cannot remove ‘linux-3.19/arch/arm/boot/dts/include/dt-bindings’: Unknown error 527 Rather awkward, the whole linux-3.19/ directory is gone, so the 'rm' did finish completely. However, "Volume is disabled" does not occur in the nfs.log. Instead, I have messages like these: W [MSGID: 114031] [client-rpc-fops.c:664:client3_3_unlink_cbk] 0-bz1293798-client-5: remote operation failed [Bad file descriptor] A "Bad file descriptor" should get mapped to ESTALE. It is not acceptable that error 527 is passed back to the NFS-client. Rajesh, please check if you have the same "Bad file descriptor" messages in your nfs.log.
No errors in the <volume>-tier.log. In the bricks however: E [MSGID: 113005] [posix.c:1700:posix_unlink] 0-bz1293798-posix: post operation fstat failed on fd=-1 [Bad file descriptor] E [MSGID: 115062] [server-rpc-fops.c:1097:server_unlink_cbk] 0-bz1293798-server: 375880: UNLINK /linux-3.19/tools/testing/selftests/powerpc/copyloops/memcpy_power7.S (4eb32857-ce11-482b-a83d-2a0fd0639eb1/memcpy_power7.S) ==> (Bad file descriptor) [Bad file descriptor] Checking when this was last touched, it seems that upstream has a fix for it: http://review.gluster.org/13034 The chain of bugs that fixes this in a newer build of RHGS: bug 1293256 (Gluster mainline) + bug 1295347 (Gluster 3.7) + bug 1293237 (RHGS, glusterfs-3.7.5-13) This bug was opened while tests were done with an older release (-12). I expect that this problem can not be reproduced with -13 or newer. *** This bug has been marked as a duplicate of bug 1293237 ***
May i know On which build you reproduced the problem
To reproduce, I used the version you reported this against in comment #0: glusterfs-server-3.7.5-12.el7rhgs.x86_64 With -13 I did not get the "unknown error 527" messages anymore.