Bug 1293798 - While performing rm -rf getting unknown error 527 on tiered volume
While performing rm -rf getting unknown error 527 on tiered volume
Status: CLOSED DUPLICATE of bug 1293237
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: gluster-nfs (Show other bugs)
3.1
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Niels de Vos
storage-qa-internal@redhat.com
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-12-23 01:28 EST by RajeshReddy
Modified: 2016-07-13 18:34 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-01-20 04:19:53 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Script to setup the environment and reproduce the problem (2.84 KB, text/plain)
2016-01-19 14:00 EST, Niels de Vos
no flags Details

  None (edit)
Description RajeshReddy 2015-12-23 01:28:19 EST
Description of problem:
=============
While performing rm -rf getting unknown error 527 on tiered volume 

Version-Release number of selected component (if applicable):
============
glusterfs-server-3.7.5-12.el7rhgs.x86_64

How reproducible:


Steps to Reproduce:
=============
1. Create 3x2 volume then attach 1x2 hot tier and mount it on client using NFS
2. Create a script to untar linux kernel and then remove all files and while removing files some times getting  unknown error 527 


Additional info:
=============
[root@tettnang ~]# gluster vol info system_rajesh
 
Volume Name: system_rajesh
Type: Tier
Volume ID: fac2f70a-8a9e-4d09-8a0a-1b73be4cc137
Status: Started
Number of Bricks: 8
Transport-type: tcp
Hot Tier :
Hot Tier Type : Replicate
Number of Bricks: 1 x 2 = 2
Brick1: tettnang.lab.eng.blr.redhat.com:/rhs/brick2/hot2
Brick2: tettnang.lab.eng.blr.redhat.com:/rhs/brick2/hot1
Cold Tier:
Cold Tier Type : Distributed-Replicate
Number of Bricks: 3 x 2 = 6
Brick3: rhs-client19.lab.eng.blr.redhat.com:/rhs/brick1/system_rajesh
Brick4: rhs-client18.lab.eng.blr.redhat.com:/rhs/brick1/system_rajesh
Brick5: vertigo.lab.eng.blr.redhat.com:/rhs/brick1/system_rajesh
Brick6: ninja.lab.eng.blr.redhat.com:/rhs/brick1/system_rajesh
Brick7: transformers.lab.eng.blr.redhat.com:/rhs/brick1/system_rajesh
Brick8: interstellar.lab.eng.blr.redhat.com:/rhs/brick1/system_rajesh
Options Reconfigured:
performance.readdir-ahead: on
features.ctr-enabled: on
cluster.tier-mode: cache


snippet of the NFS server log
================
[2015-12-21 10:04:25.018978] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-system_rajesh-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2015-12-21 10:04:25.019288] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-system_rajesh-client-1: Server lk version = 1
[2015-12-21 10:04:25.024976] I [MSGID: 108031] [afr-common.c:1872:afr_local_discovery_cbk] 0-system_rajesh-replicate-2: selecting local read_child system_rajesh-client-4
[2015-12-21 10:12:01.642135] W [graph.c:357:_log_if_unknown_option] 0-system_rajesh-tier-dht: option 'rule' is not recognized
[2015-12-21 10:12:01.727829] E [MSGID: 112070] [nfs3.c:2304:nfs3_write] 0-nfs-nfsv3: Volume is disabled: system_rajesh
[2015-12-21 10:12:01.730073] E [MSGID: 112070] [nfs3.c:2304:nfs3_write] 0-nfs-nfsv3: Volume is disabled: system_rajesh
[2015-12-21 10:12:01.735341] E [MSGID: 112070] [nfs3.c:2304:nfs3_write] 0-nfs-nfsv3: Volume is disabled: system_rajesh
[2015-12-21 10:12:01.737728] E [MSGID: 112070] [nfs3.c:2304:nfs3_write] 0-nfs-nfsv3: Volume is disabled: system_rajesh
[2015-12-21 10:12:01.765497] E [MSGID: 112070] [nfs3.c:2304:nfs3_write] 0-nfs-nfsv3: Volume is disabled: system_rajesh
[2015-12-21 10:12:01.781583] E [MSGID: 112070] [nfs3.c:2304:nfs3_write] 0-nfs-nfsv3: Volume is disabled: system_rajesh
[2015-12-21 10:12:01.800313] E [MSGID: 112070] [nfs3.c:2304:nfs3_write] 0-nfs-nfsv3: Volume is disabled: system_rajesh
[2015-12-21 10:12:01.814846] E [MSGID: 112070] [nfs3.c:2304:nfs3_write] 0-nfs-nfsv3: Volume is disabled: system_rajesh
[2015-12-21 10:12:01.825091] E [MSGID: 112070] [nfs3.c:2304:nfs3_write] 0-nfs-nfsv3: Volume is disabled: system_rajesh
[2015-12-21 10:12:01.840992] E [MSGID: 112070] [nfs3.c:2304:nfs3_write] 0-nfs-nfsv3: Volume is disabled: system_rajesh
[2015-12-21 10:12:01.842511] E [MSGID: 112070] [nfs3.c:2304:nfs3_write] 0-nfs-nfsv3: Volume is disabled: system_rajesh
[2015-12-21 10:12:01.853348] E [MSGID: 112070] [nfs3.c:2304:nfs3_write] 0-nfs-nfsv3: Volume is disabled: system_rajesh
[2015-12-21 10:12:01.858960] E [MSGID: 112070] [nfs3.c:2304:nfs3_write] 0-nfs-nfsv3: Volume is disabled: system_rajesh
[2015-12-21 10:12:01.865087] E [MSGID: 112070] [nfs3.c:2304:nfs3_write] 0-nfs-nfsv3: Volume is disabled: system_rajesh
[2015-12-21 10:12:01.875971] E [MSGID: 112070] [nfs3.c:2304:nfs3_write] 0-nfs-nfsv3: Volume is disabled: system_rajesh
[2015-12-21 10:12:01.881231] E [MSGID: 112070] [nfs3.c:2304:nfs3_write] 0-nfs-nfsv3: Volume is disabled: system_rajesh
Comment 2 RajeshReddy 2015-12-24 06:27:32 EST
[root@tettnang ~]# mount | grep rajesh
transformers.lab.eng.blr.redhat.com:/system_rajesh/linux on /mnt/system_rajesh/linux type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.70.34.44,mountvers=3,mountport=38465,mountproto=tcp,local_lock=none,addr=10.70.34.44)
[root@tettnang ~]# 

sosreport available @ /home/repo/sosreports/bug.1293798 on rhsqe-repo.lab.eng.blr.redhat.com
Comment 3 Niels de Vos 2016-01-18 10:40:41 EST
Is this always reproducible or is it a one time happening?

Do you have a test-script (shell or distaf) that I can use to reproduce this?
Comment 4 RajeshReddy 2016-01-19 03:07:50 EST
We observed this issue on two setups and we are not hitting the issue consistently

I was running fallowing script

for i in {1..1000}
do
tar -xvf linux-3.19.tar.gz
sleep 180
rm -rf linux-3.19
done
Comment 5 Niels de Vos 2016-01-19 14:00 EST
Created attachment 1116320 [details]
Script to setup the environment and reproduce the problem

I'm using this to configure three systems and run the test. Its been running for a long time already, but there have not been any errors yet. I'm checking for the errors with:

  tail -f /var/log/glusterfs/nfs.log | grep ' E '

I'll keep this running over night, 1000 iterations are a lot and it seems pretty time consuming.

The snippet from your log shows that the problem happened 8 minutes after the NFS-server was started.
- Did the tarball extraction/removal start immediately after that?
- There was a message "option 'rule' is not recognized", do you know where that came from?

Could you check if my script should do the same actions as what you did on your environment?
Comment 6 Niels de Vos 2016-01-20 03:56:05 EST
It seems that I have reproduced the error:

  rm: cannot remove ‘linux-3.19/arch/arm/boot/dts/include/dt-bindings’: Unknown error 527

Rather awkward, the whole linux-3.19/ directory is gone, so the 'rm' did finish completely.

However, "Volume is disabled" does not occur in the nfs.log. Instead, I have messages like these:

  W [MSGID: 114031] [client-rpc-fops.c:664:client3_3_unlink_cbk] 0-bz1293798-client-5: remote operation failed [Bad file descriptor]

A "Bad file descriptor" should get mapped to ESTALE. It is not acceptable that error 527 is passed back to the NFS-client.

Rajesh, please check if you have the same "Bad file descriptor" messages in your nfs.log.
Comment 7 Niels de Vos 2016-01-20 04:19:53 EST
No errors in the <volume>-tier.log.

In the bricks however:

  E [MSGID: 113005] [posix.c:1700:posix_unlink] 0-bz1293798-posix: post operation fstat failed on fd=-1 [Bad file descriptor]
  E [MSGID: 115062] [server-rpc-fops.c:1097:server_unlink_cbk] 0-bz1293798-server: 375880: UNLINK /linux-3.19/tools/testing/selftests/powerpc/copyloops/memcpy_power7.S (4eb32857-ce11-482b-a83d-2a0fd0639eb1/memcpy_power7.S) ==> (Bad file descriptor) [Bad file descriptor]

Checking when this was last touched, it seems that upstream has a fix for it:

  http://review.gluster.org/13034

The chain of bugs that fixes this in a newer build of RHGS:

  bug 1293256 (Gluster mainline)
   + bug 1295347 (Gluster 3.7)
      + bug 1293237 (RHGS, glusterfs-3.7.5-13)


This bug was opened while tests were done with an older release (-12). I expect that this problem can not be reproduced with -13 or newer.

*** This bug has been marked as a duplicate of bug 1293237 ***
Comment 8 RajeshReddy 2016-01-20 09:41:01 EST
May i know On which build you reproduced the problem
Comment 9 Niels de Vos 2016-01-21 05:12:05 EST
To reproduce, I used the version you reported this against in comment #0:

  glusterfs-server-3.7.5-12.el7rhgs.x86_64

With -13 I did not get the "unknown error 527" messages anymore.

Note You need to log in before you can comment on or make changes to this bug.