Bug 989362 - nfs+afr+logging: "unlock failed on 1 unlock" seems meaningless
Summary: nfs+afr+logging: "unlock failed on 1 unlock" seems meaningless
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterd
Version: 2.1
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
: RHGS 2.1.2
Assignee: Anuradha
QA Contact: Saurabh
URL:
Whiteboard:
Depends On:
Blocks: 1012945 1012947
TreeView+ depends on / blocked
 
Reported: 2013-07-29 06:58 UTC by Saurabh
Modified: 2016-09-20 02:00 UTC (History)
9 users (show)

Fixed In Version: glusterfs-3.4.0.50rhs
Doc Type: Bug Fix
Doc Text:
Previously, the log message for NFS did not have all the required information. With this fix relevant, information about the sub volume name is added in the log message.
Clone Of:
: 1012945 1012947 (view as bug list)
Environment:
Last Closed: 2014-02-25 07:33:56 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2014:0208 0 normal SHIPPED_LIVE Red Hat Storage 2.1 enhancement and bug fix update #2 2014-02-25 12:20:30 UTC

Description Saurabh 2013-07-29 06:58:46 UTC
Description of problem:
I have a setup of a 4 node cluster
having 4 volumes
started iozone for all 4 volumes from nfs mounts for these volumes.

while iozone is going and on one or two it is finished, Iam getting a series of logs which says "unlock failed on 1 unlock"

I can't understand these logs, does it mean that you are unlocking an unlock

this does not give any kind of information to me.

Also, before this log I get a message saying "remote operation failed"
as Warning.
If remote operation has failed then why it has to be warning not an "Error"

Version-Release number of selected component (if applicable):
[root@nfs2 ~]# rpm -qa | grep glusterfs
glusterfs-3.4.0.12rhs.beta6-1.el6rhs.x86_64
glusterfs-server-3.4.0.12rhs.beta6-1.el6rhs.x86_64
glusterfs-fuse-3.4.0.12rhs.beta6-1.el6rhs.x86_64

How reproducible:
keeps happening for the test I mentioned above

Steps to Reproduce:
setup 4 rhs nodes [1, 2, 3, 4]
2 client [c1, c2]
1. create 4 6x2 volumes, start it
2. mount the volumes over nfs on clients c1 and c2
   volume v1 v2 on client c1
   volume v3 v4 on client c2
3. start iozone on the mount point at the same time

Actual results:

by this time the iozone on volume v2 and v3 has finished
v1 and v2 going on

but in nfs.log I get so many logs of similar type 
of which I cannot make out any sense,

2013-07-28 19:09:54.952089] I [afr-self-heal-data.c:759:afr_sh_data_fix] 0-dist-rep3-replicate-1: no active sinks for performing self-heal on file <gfid:7a75a447-5e4f-455d-839c-3b19482a2e3d>
[2013-07-28 19:09:54.956236] W [client-rpc-fops.c:1529:client3_3_inodelk_cbk] 0-dist-rep3-client-3: remote operation failed: Invalid argument
[2013-07-28 19:09:54.956281] I [afr-lk-common.c:669:afr_unlock_inodelk_cbk] 0-dist-rep3-replicate-1: <gfid:7a75a447-5e4f-455d-839c-3b19482a2e3d>: unlock failed on 1 unlock by 703fe703577f0000
[2013-07-28 19:09:54.956521] W [client-rpc-fops.c:1529:client3_3_inodelk_cbk] 0-dist-rep3-client-2: remote operation failed: Invalid argument
[2013-07-28 19:09:54.956558] I [afr-lk-common.c:669:afr_unlock_inodelk_cbk] 0-dist-rep3-replicate-1: <gfid:7a75a447-5e4f-455d-839c-3b19482a2e3d>: unlock failed on 0 unlock by 703fe703577f0000
[2013-07-28 19:09:54.964305] I [afr-self-heal-data.c:759:afr_sh_data_fix] 0-dist-rep3-replicate-1: no active sinks for performing self-heal on file <gfid:7a75a447-5e4f-455d-839c-3b19482a2e3d>
[2013-07-28 19:09:54.968031] W [client-rpc-fops.c:1529:client3_3_inodelk_cbk] 0-dist-rep3-client-3: remote operation failed: Invalid argument
[2013-07-28 19:09:54.968066] I [afr-lk-common.c:669:afr_unlock_inodelk_cbk] 0-dist-rep3-replicate-1: <gfid:7a75a447-5e4f-455d-839c-3b19482a2e3d>: unlock failed on 1 unlock by 90fde603577f0000
[2013-07-28 19:09:54.968119] W [client-rpc-fops.c:1529:client3_3_inodelk_cbk] 0-dist-rep3-client-2: remote operation failed: Invalid argument
[2013-07-28 19:09:54.968135] I [afr-lk-common.c:669:afr_unlock_inodelk_cbk] 0-dist-rep3-replicate-1: <gfid:7a75a447-5e4f-455d-839c-3b19482a2e3d>: unlock failed on 0 unlock by 90fde603577f0000
[2013-07-28 19:09:54.981280] I [afr-self-heal-data.c:759:afr_sh_data_fix] 0-dist-rep3-replicate-1: no active sinks for performing self-heal on file <gfid:7a75a447-5e4f-455d-839c-3b19482a2e3d>
[2013-07-28 19:09:54.985082] W [client-rpc-fops.c:1529:client3_3_inodelk_cbk] 0-dist-rep3-client-3: remote operation failed: Invalid argument
[2013-07-28 19:09:54.985156] I [afr-lk-common.c:669:afr_unlock_inodelk_cbk] 0-dist-rep3-replicate-1: <gfid:7a75a447-5e4f-455d-839c-3b19482a2e3d>: unlock failed on 1 unlock by cc14e903577f0000


this keeps happening in repetition

Expected results:
Need logs where the information is reported clearly.

Additional info:

Comment 2 Anuradha 2013-09-27 12:51:50 UTC
1. Saurabh and I agreed on the log message "unlock failed on subvolume <subvol name> with lock owner <lock-owner> " is appropriate for this log message.

2. "Also, before this log I get a message saying "remote operation failed"
as Warning.
If remote operation has failed then why it has to be warning not an "Error""

It is okay for the log message in question to be a warning. Unless the problem is major/fatal, the log message is taken as a warning. Else it is an error.

Comment 3 Anuradha 2013-10-07 07:16:41 UTC
Patch posted for review on branch rhs-2.1 : https://code.engineering.redhat.com/gerrit/#/c/13671/

Patch posted for review on branch rhs-2.1-u1 : 
https://code.engineering.redhat.com/gerrit/#/c/13670/

Comment 4 Saurabh 2014-01-17 12:18:36 UTC
I have to get the logs quite a few times using the case mentioned in description section, this time the issue does not seem to happen.

Hence moving the BZ to verified

Comment 6 errata-xmlrpc 2014-02-25 07:33:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-0208.html


Note You need to log in before you can comment on or make changes to this bug.