Bug 989362

Summary: nfs+afr+logging: "unlock failed on 1 unlock" seems meaningless
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Saurabh <saujain>
Component: glusterdAssignee: Anuradha <atalur>
Status: CLOSED ERRATA QA Contact: Saurabh <saujain>
Severity: high Docs Contact:
Priority: medium    
Version: 2.1CC: atalur, mzywusko, nsathyan, pkarampu, psriniva, rhs-bugs, smohan, vagarwal, vbellur
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 2.1.2   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.4.0.50rhs Doc Type: Bug Fix
Doc Text:
Previously, the log message for NFS did not have all the required information. With this fix relevant, information about the sub volume name is added in the log message.
Story Points: ---
Clone Of:
: 1012945 1012947 (view as bug list) Environment:
Last Closed: 2014-02-25 07:33:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1012945, 1012947    

Description Saurabh 2013-07-29 06:58:46 UTC
Description of problem:
I have a setup of a 4 node cluster
having 4 volumes
started iozone for all 4 volumes from nfs mounts for these volumes.

while iozone is going and on one or two it is finished, Iam getting a series of logs which says "unlock failed on 1 unlock"

I can't understand these logs, does it mean that you are unlocking an unlock

this does not give any kind of information to me.

Also, before this log I get a message saying "remote operation failed"
as Warning.
If remote operation has failed then why it has to be warning not an "Error"

Version-Release number of selected component (if applicable):
[root@nfs2 ~]# rpm -qa | grep glusterfs
glusterfs-3.4.0.12rhs.beta6-1.el6rhs.x86_64
glusterfs-server-3.4.0.12rhs.beta6-1.el6rhs.x86_64
glusterfs-fuse-3.4.0.12rhs.beta6-1.el6rhs.x86_64

How reproducible:
keeps happening for the test I mentioned above

Steps to Reproduce:
setup 4 rhs nodes [1, 2, 3, 4]
2 client [c1, c2]
1. create 4 6x2 volumes, start it
2. mount the volumes over nfs on clients c1 and c2
   volume v1 v2 on client c1
   volume v3 v4 on client c2
3. start iozone on the mount point at the same time

Actual results:

by this time the iozone on volume v2 and v3 has finished
v1 and v2 going on

but in nfs.log I get so many logs of similar type 
of which I cannot make out any sense,

2013-07-28 19:09:54.952089] I [afr-self-heal-data.c:759:afr_sh_data_fix] 0-dist-rep3-replicate-1: no active sinks for performing self-heal on file <gfid:7a75a447-5e4f-455d-839c-3b19482a2e3d>
[2013-07-28 19:09:54.956236] W [client-rpc-fops.c:1529:client3_3_inodelk_cbk] 0-dist-rep3-client-3: remote operation failed: Invalid argument
[2013-07-28 19:09:54.956281] I [afr-lk-common.c:669:afr_unlock_inodelk_cbk] 0-dist-rep3-replicate-1: <gfid:7a75a447-5e4f-455d-839c-3b19482a2e3d>: unlock failed on 1 unlock by 703fe703577f0000
[2013-07-28 19:09:54.956521] W [client-rpc-fops.c:1529:client3_3_inodelk_cbk] 0-dist-rep3-client-2: remote operation failed: Invalid argument
[2013-07-28 19:09:54.956558] I [afr-lk-common.c:669:afr_unlock_inodelk_cbk] 0-dist-rep3-replicate-1: <gfid:7a75a447-5e4f-455d-839c-3b19482a2e3d>: unlock failed on 0 unlock by 703fe703577f0000
[2013-07-28 19:09:54.964305] I [afr-self-heal-data.c:759:afr_sh_data_fix] 0-dist-rep3-replicate-1: no active sinks for performing self-heal on file <gfid:7a75a447-5e4f-455d-839c-3b19482a2e3d>
[2013-07-28 19:09:54.968031] W [client-rpc-fops.c:1529:client3_3_inodelk_cbk] 0-dist-rep3-client-3: remote operation failed: Invalid argument
[2013-07-28 19:09:54.968066] I [afr-lk-common.c:669:afr_unlock_inodelk_cbk] 0-dist-rep3-replicate-1: <gfid:7a75a447-5e4f-455d-839c-3b19482a2e3d>: unlock failed on 1 unlock by 90fde603577f0000
[2013-07-28 19:09:54.968119] W [client-rpc-fops.c:1529:client3_3_inodelk_cbk] 0-dist-rep3-client-2: remote operation failed: Invalid argument
[2013-07-28 19:09:54.968135] I [afr-lk-common.c:669:afr_unlock_inodelk_cbk] 0-dist-rep3-replicate-1: <gfid:7a75a447-5e4f-455d-839c-3b19482a2e3d>: unlock failed on 0 unlock by 90fde603577f0000
[2013-07-28 19:09:54.981280] I [afr-self-heal-data.c:759:afr_sh_data_fix] 0-dist-rep3-replicate-1: no active sinks for performing self-heal on file <gfid:7a75a447-5e4f-455d-839c-3b19482a2e3d>
[2013-07-28 19:09:54.985082] W [client-rpc-fops.c:1529:client3_3_inodelk_cbk] 0-dist-rep3-client-3: remote operation failed: Invalid argument
[2013-07-28 19:09:54.985156] I [afr-lk-common.c:669:afr_unlock_inodelk_cbk] 0-dist-rep3-replicate-1: <gfid:7a75a447-5e4f-455d-839c-3b19482a2e3d>: unlock failed on 1 unlock by cc14e903577f0000


this keeps happening in repetition

Expected results:
Need logs where the information is reported clearly.

Additional info:

Comment 2 Anuradha 2013-09-27 12:51:50 UTC
1. Saurabh and I agreed on the log message "unlock failed on subvolume <subvol name> with lock owner <lock-owner> " is appropriate for this log message.

2. "Also, before this log I get a message saying "remote operation failed"
as Warning.
If remote operation has failed then why it has to be warning not an "Error""

It is okay for the log message in question to be a warning. Unless the problem is major/fatal, the log message is taken as a warning. Else it is an error.

Comment 3 Anuradha 2013-10-07 07:16:41 UTC
Patch posted for review on branch rhs-2.1 : https://code.engineering.redhat.com/gerrit/#/c/13671/

Patch posted for review on branch rhs-2.1-u1 : 
https://code.engineering.redhat.com/gerrit/#/c/13670/

Comment 4 Saurabh 2014-01-17 12:18:36 UTC
I have to get the logs quite a few times using the case mentioned in description section, this time the issue does not seem to happen.

Hence moving the BZ to verified

Comment 6 errata-xmlrpc 2014-02-25 07:33:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-0208.html