Description of problem: I have a setup of a 4 node cluster having 4 volumes started iozone for all 4 volumes from nfs mounts for these volumes. while iozone is going and on one or two it is finished, Iam getting a series of logs which says "unlock failed on 1 unlock" I can't understand these logs, does it mean that you are unlocking an unlock this does not give any kind of information to me. Also, before this log I get a message saying "remote operation failed" as Warning. If remote operation has failed then why it has to be warning not an "Error" Version-Release number of selected component (if applicable): [root@nfs2 ~]# rpm -qa | grep glusterfs glusterfs-3.4.0.12rhs.beta6-1.el6rhs.x86_64 glusterfs-server-3.4.0.12rhs.beta6-1.el6rhs.x86_64 glusterfs-fuse-3.4.0.12rhs.beta6-1.el6rhs.x86_64 How reproducible: keeps happening for the test I mentioned above Steps to Reproduce: setup 4 rhs nodes [1, 2, 3, 4] 2 client [c1, c2] 1. create 4 6x2 volumes, start it 2. mount the volumes over nfs on clients c1 and c2 volume v1 v2 on client c1 volume v3 v4 on client c2 3. start iozone on the mount point at the same time Actual results: by this time the iozone on volume v2 and v3 has finished v1 and v2 going on but in nfs.log I get so many logs of similar type of which I cannot make out any sense, 2013-07-28 19:09:54.952089] I [afr-self-heal-data.c:759:afr_sh_data_fix] 0-dist-rep3-replicate-1: no active sinks for performing self-heal on file <gfid:7a75a447-5e4f-455d-839c-3b19482a2e3d> [2013-07-28 19:09:54.956236] W [client-rpc-fops.c:1529:client3_3_inodelk_cbk] 0-dist-rep3-client-3: remote operation failed: Invalid argument [2013-07-28 19:09:54.956281] I [afr-lk-common.c:669:afr_unlock_inodelk_cbk] 0-dist-rep3-replicate-1: <gfid:7a75a447-5e4f-455d-839c-3b19482a2e3d>: unlock failed on 1 unlock by 703fe703577f0000 [2013-07-28 19:09:54.956521] W [client-rpc-fops.c:1529:client3_3_inodelk_cbk] 0-dist-rep3-client-2: remote operation failed: Invalid argument [2013-07-28 19:09:54.956558] I [afr-lk-common.c:669:afr_unlock_inodelk_cbk] 0-dist-rep3-replicate-1: <gfid:7a75a447-5e4f-455d-839c-3b19482a2e3d>: unlock failed on 0 unlock by 703fe703577f0000 [2013-07-28 19:09:54.964305] I [afr-self-heal-data.c:759:afr_sh_data_fix] 0-dist-rep3-replicate-1: no active sinks for performing self-heal on file <gfid:7a75a447-5e4f-455d-839c-3b19482a2e3d> [2013-07-28 19:09:54.968031] W [client-rpc-fops.c:1529:client3_3_inodelk_cbk] 0-dist-rep3-client-3: remote operation failed: Invalid argument [2013-07-28 19:09:54.968066] I [afr-lk-common.c:669:afr_unlock_inodelk_cbk] 0-dist-rep3-replicate-1: <gfid:7a75a447-5e4f-455d-839c-3b19482a2e3d>: unlock failed on 1 unlock by 90fde603577f0000 [2013-07-28 19:09:54.968119] W [client-rpc-fops.c:1529:client3_3_inodelk_cbk] 0-dist-rep3-client-2: remote operation failed: Invalid argument [2013-07-28 19:09:54.968135] I [afr-lk-common.c:669:afr_unlock_inodelk_cbk] 0-dist-rep3-replicate-1: <gfid:7a75a447-5e4f-455d-839c-3b19482a2e3d>: unlock failed on 0 unlock by 90fde603577f0000 [2013-07-28 19:09:54.981280] I [afr-self-heal-data.c:759:afr_sh_data_fix] 0-dist-rep3-replicate-1: no active sinks for performing self-heal on file <gfid:7a75a447-5e4f-455d-839c-3b19482a2e3d> [2013-07-28 19:09:54.985082] W [client-rpc-fops.c:1529:client3_3_inodelk_cbk] 0-dist-rep3-client-3: remote operation failed: Invalid argument [2013-07-28 19:09:54.985156] I [afr-lk-common.c:669:afr_unlock_inodelk_cbk] 0-dist-rep3-replicate-1: <gfid:7a75a447-5e4f-455d-839c-3b19482a2e3d>: unlock failed on 1 unlock by cc14e903577f0000 this keeps happening in repetition Expected results: Need logs where the information is reported clearly. Additional info:
1. Saurabh and I agreed on the log message "unlock failed on subvolume <subvol name> with lock owner <lock-owner> " is appropriate for this log message. 2. "Also, before this log I get a message saying "remote operation failed" as Warning. If remote operation has failed then why it has to be warning not an "Error"" It is okay for the log message in question to be a warning. Unless the problem is major/fatal, the log message is taken as a warning. Else it is an error.
Patch posted for review on branch rhs-2.1 : https://code.engineering.redhat.com/gerrit/#/c/13671/ Patch posted for review on branch rhs-2.1-u1 : https://code.engineering.redhat.com/gerrit/#/c/13670/
I have to get the logs quite a few times using the case mentioned in description section, this time the issue does not seem to happen. Hence moving the BZ to verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-0208.html