183565 – fcntl() locks on NFS filesystem are not released upon abnormal process termination (netapp filer)

Bug 183565 - fcntl() locks on NFS filesystem are not released upon abnormal process termination (netapp filer)

Summary: fcntl() locks on NFS filesystem are not released upon abnormal process termin...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	3.0
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Jeff Layton
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-03-02 00:34 UTC by Ian McLeod
Modified:	2007-11-30 22:07 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-03-19 17:23:54 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Source for duplicator (1.58 KB, application/octet-stream) 2006-03-02 00:34 UTC, Ian McLeod	no flags	Details
purposed patch (1.01 KB, patch) 2006-03-16 18:46 UTC, Steve Dickson	no flags	Details \| Diff
updated ldisp1 program -- builds with fewer warnings with -Wall now (1.80 KB, text/plain) 2006-06-05 14:11 UTC, Jeff Layton	no flags	Details
debug patch 1 (3.31 KB, patch) 2006-06-16 02:06 UTC, Amit Gud	no flags	Details \| Diff
patch to test theory in comment #42 (1.34 KB, patch) 2006-06-18 13:17 UTC, Jeff Layton	no flags	Details \| Diff
debug patch 2 (1.26 KB, patch) 2006-06-19 02:27 UTC, Amit Gud	no flags	Details \| Diff
debug patch 3 (2.41 KB, patch) 2006-06-20 13:53 UTC, Amit Gud	no flags	Details \| Diff
fix 1 (743 bytes, patch) 2006-06-21 13:35 UTC, Amit Gud	no flags	Details \| Diff
debug messages (2.90 KB, text/plain) 2006-06-21 20:09 UTC, Amit Gud	no flags	Details
proposed patch (1003 bytes, patch) 2006-06-26 11:03 UTC, Steve Dickson	no flags	Details \| Diff
View All

Description Ian McLeod 2006-03-02 00:34:56 UTC

Created attachment 125508 [details]
Source for duplicator

Comment 1 Ian McLeod 2006-03-02 00:34:56 UTC

Description of problem:

Processes takes fcntl() locks on an NFS mounted filesystem.  Process is killed
or terminates abnormally.  Lock is not released and remains on the NFS server,
associated with the now-dead PID.


Version-Release number of selected component (if applicable):

RHEL3 U6  - 2.4.21-37.ELsmp


How reproducible:

With attached code it can be reliably reproduced in our environment with 1-2 tries.

Steps to Reproduce:
1. Compile ldisp1.c and lockit.c from attached tarball, assemble compiled
binaries and locktest.ksh in an NFS mounted directory.  In our case, the NFS
server is a netapp filer.

Mount options are:

rw,nosuid,v3,rsize=8192,wsize=8192,hard,intr,tcp,lock,noacl,noac


2. Run locktest.ksh in same directory.  Let it run for a few seconds, then
interrupt with ctrl-c

3. Run ldisp on the resulting LTEST file.  
  
Actual results:

Over half the time, ldisp will report that a lock is still held on the file,
associated with a dead PID.

Expected results:

It should not be possible for a process to die without releasing locks that
Linux is aware of.

Additional info:

Additional data in IT 89052

Comment 2 Steve Russell 2006-03-02 17:25:00 UTC

Minor correction, mount option is 'nointr' rather than 'intr'

Comment 3 Steve Dickson 2006-03-16 18:46:07 UTC

Created attachment 126239 [details]
purposed patch

It appears the	mechanism which enqueues interrupted lock requests
fails to *always* que up a unlock request. This patch should ensure that
all interrupted lock requests are immediately followed by an async
unlock request.

Comment 4 Ian McLeod 2006-03-23 14:21:50 UTC

We've tested the attached patch in our environment and can still generate stale
locks with our duplication script.  Tcpdump and process output can be found in
the associated issue tracker ticket.

Comment 5 Ernie Petrides 2006-04-07 03:12:36 UTC

A fix for this problem has just been committed to the RHEL3 U8
patch pool this evening (in kernel version 2.4.21-40.6.EL).

Comment 9 Jeff Layton 2006-06-05 14:08:08 UTC

Some small issues with the reproducer. First "v3" isn't a valid option, so my
option string looks like this:

rw,nosuid,nfsvers=3,rsize=8192,wsize=8192,hard,nointr,tcp,lock,noacl,noac

Also, there were a number of errors in the ldisp1 reproducer program. I'll
attach an updated version here in a bit.

Comment 10 Jeff Layton 2006-06-05 14:11:02 UTC

Created attachment 130502 [details]
updated ldisp1 program -- builds with fewer warnings with -Wall now

Comment 11 Jeff Layton 2006-06-05 14:33:16 UTC

The other catch is that simply hitting ^c while running the locktest.ksh script
does not seem to be sufficient. From looking over the code, I presume what they
really want is this:

1) run locktest.ksh
2) wait until the ldisp1 output actually starts
3) pkill lockit2

That should kill off all of the lockit2 processes and ldisp should start just
printing out lines with '...' with nothing between them.

I was able to reproduce the issue after a few attempts with the above method:

...
        Lock:1 - Offset 100, Len 1, Pid 1438
...

# ps -ef | grep 1438
root      1452  1031  0 10:41 pts/0    00:00:00 grep 1438

# uname -a
Linux tatooine.rdu.redhat.com 2.4.21-43.ELsmp #1 SMP Sun May 14 17:25:07 EDT
2006 i686 i686 i386 GNU/Linux

Comment 12 Jeff Layton 2006-06-05 15:23:29 UTC

Building a kernel out of a recent git pull now to see if the problem stille
exists there.

Comment 13 Jeff Layton 2006-06-05 16:39:52 UTC

Upstream kernel's still building, but in looking over the code, I see that
locks_unlock_delete calls f_op->lock, but doesn't check the return code. Not
sure if this is significant however, but I may stick some debug code in there to
print something if it's not 0, and try to reproduce the problem.

Comment 14 Jeff Layton 2006-06-05 17:26:11 UTC

This bug exists upstream as well -- at least, as late as of 2.4.32. I've not
tested later. Going to see now about building some test kernels to see if we can
trap what's actually happening.

Comment 15 Jeff Layton 2006-06-05 18:00:33 UTC

I instrumented that piece and it apparently still returns zero there, so that's
not it. Going to drill down farther to the rpc layer...

Comment 16 Jeff Layton 2006-06-06 12:05:45 UTC

Did some network captures yesterday. It's pretty clear that at some point we're
locking the file, but never unlocking it. So definitely a client-side problem
(but we were already pretty sure of that).

I then modified the nfs_lock dprintk to show the cmd and PID, kicked off the
test and then did this in another shell:

# echo 32767 > /proc/sys/sunrpc/nfs_debug ; pkill lockit2; sleep 1; echo 0 > 
/proc/sys/sunrpc/nfs_debug

I then saw this lock being held:

...
        Lock:1 - Offset 200, Len 1, Pid 3026
...

confirmed on the server:

[root@barsoom ~]# cat /proc/locks
1: POSIX  ADVISORY  WRITE 3026 fd:04:507952 200 200

Log messages for this PID:

# grep 3026 /mnt/barsoom/89052/nfslock1.log
Jun  6 08:07:31 tatooine kernel: NFS: nfs_lock(f=   9/507952, t=1, fl=1,
r=200:200, cmd=6, pid=3026)
Jun  6 08:07:31 tatooine kernel: NFS: nfs_lock(f=   9/507952, t=1, fl=1,
r=200:200, cmd=5, pid=3026)
Jun  6 08:07:32 tatooine kernel: NFS: nfs_lock(f=   9/507952, t=1, fl=1,
r=100:100, cmd=6, pid=3026)
Jun  6 08:07:32 tatooine kernel: NFS: nfs_lock(f=   9/507952, t=1, fl=1,
r=100:100, cmd=5, pid=3026)
Jun  6 08:07:32 tatooine kernel: NFS: nfs_lock(f=   9/507952, t=1, fl=1,
r=200:200, cmd=6, pid=3026)
Jun  6 08:07:32 tatooine kernel: NFS: nfs_lock(f=   9/507952, t=1, fl=1,
r=200:200, cmd=5, pid=3026)
Jun  6 08:07:32 tatooine kernel: NFS: nfs_lock(f=   9/507952, t=1, fl=1,
r=100:100, cmd=6, pid=3026)
Jun  6 08:07:32 tatooine kernel: NFS: nfs_lock(f=   9/507952, t=1, fl=1,
r=100:100, cmd=5, pid=3026)
Jun  6 08:07:33 tatooine kernel: NFS: nfs_lock(f=   9/507952, t=1, fl=1,
r=200:200, cmd=6, pid=3026)
Jun  6 08:07:33 tatooine kernel: NFS: nfs_lock(f=   9/507952, t=1, fl=1,
r=200:200, cmd=5, pid=3026)
Jun  6 08:07:33 tatooine kernel: NFS: nfs_lock(f=   9/507952, t=1, fl=1,
r=100:100, cmd=6, pid=3026)
Jun  6 08:07:33 tatooine kernel: NFS: nfs_lock(f=   9/507952, t=1, fl=1,
r=100:100, cmd=5, pid=3026)
Jun  6 08:07:33 tatooine kernel: NFS: nfs_lock(f=   9/507952, t=1, fl=1,
r=200:200, cmd=6, pid=3026)
Jun  6 08:07:33 tatooine kernel: NFS: nfs_lock(f=   9/507952, t=1, fl=1,
r=200:200, cmd=5, pid=3026)
Jun  6 08:07:33 tatooine kernel: NFS: nfs_lock(f=   9/507952, t=1, fl=1,
r=100:100, cmd=6, pid=3026)
Jun  6 08:07:34 tatooine kernel: NFS: nfs_lock(f=   9/507952, t=1, fl=1,
r=100:100, cmd=5, pid=3026)
Jun  6 08:07:34 tatooine kernel: NFS: nfs_lock(f=   9/507952, t=1, fl=1,
r=200:200, cmd=6, pid=3026)
Jun  6 08:07:34 tatooine kernel: NFS: nfs_lock(f=   9/507952, t=1, fl=1,
r=200:200, cmd=5, pid=3026)
Jun  6 08:07:34 tatooine kernel: NFS: nfs_lock(f=   9/507952, t=1, fl=1,
r=100:100, cmd=6, pid=3026)
Jun  6 08:07:34 tatooine kernel: NFS: nfs_lock(f=   9/507952, t=1, fl=1,
r=100:100, cmd=5, pid=3026)
Jun  6 08:07:34 tatooine kernel: NFS: nfs_lock(f=   9/507952, t=1, fl=1,
r=200:200, cmd=6, pid=3026)
Jun  6 08:07:34 tatooine kernel: NFS: nfs_lock(f=   9/507952, t=1, fl=1,
r=200:200, cmd=5, pid=3026)
Jun  6 08:07:35 tatooine kernel: NFS: nfs_lock(f=   9/507952, t=1, fl=1,
r=100:100, cmd=6, pid=3026)
Jun  6 08:07:35 tatooine kernel: NFS: nfs_lock(f=   9/507952, t=2, fl=1,
r=100:100, cmd=6, pid=3026)

Comment 17 Jeff Layton 2006-06-06 12:15:36 UTC

For reference:

cmd:
F_GETLK == 5
F_SETLK == 6

fl == fl->flags:
FL_POSIX == 1

t (type):
F_WRLCK == 1
F_UNLCK == 2

So it looks like the last log message in here was an unlock command for the
range that's still locked. The nfs_lock function is getting called, but the
operation isn't making it out to the wire for some reason.

Comment 18 Jeff Layton 2006-06-06 13:02:21 UTC

Actually, I may have been wrong before. I did another test and got 2 stale locks:

1: POSIX  ADVISORY  WRITE 3479 fd:04:507952 100 100
2: POSIX  ADVISORY  WRITE 3472 fd:04:507952 200 200

I also added some sleeps in before and after the pkill command to allow the logs
to catch up:

Jun  6 09:03:10 tatooine kernel: NFS: nfs_lock(f=   9/507952, t=2, fl=1,
r=200:200, cmd=6, pid=3472)
Jun  6 09:03:10 tatooine kernel: NFS: nfs_lock(f=   9/507952, t=1, fl=1,
r=100:100, cmd=6, pid=3472)
Jun  6 09:03:10 tatooine kernel: NFS: nfs_lock(f=   9/507952, t=1, fl=1,
r=100:100, cmd=5, pid=3472)
Jun  6 09:03:10 tatooine kernel: NFS: nfs_lock(f=   9/507952, t=1, fl=1,
r=200:200, cmd=6, pid=3472)
Jun  6 09:03:10 tatooine kernel: NFS: nfs_lock(f=   9/507952, t=1, fl=1,
r=200:200, cmd=5, pid=3472)
Jun  6 09:03:10 tatooine kernel: NFS: nfs_lock(f=   9/507952, t=1, fl=1,
r=100:100, cmd=6, pid=3472)
Jun  6 09:03:10 tatooine kernel: NFS: nfs_lock(f=   9/507952, t=1, fl=1,
r=100:100, cmd=5, pid=3472)
-------------------
Jun  6 09:03:10 tatooine kernel: NFS: nfs_lock(f=   9/507952, t=2, fl=1,
r=100:100, cmd=6, pid=3479)
Jun  6 09:03:10 tatooine kernel: NFS: nfs_lock(f=   9/507952, t=1, fl=1,
r=200:200, cmd=6, pid=3479)
Jun  6 09:03:10 tatooine kernel: NFS: nfs_lock(f=   9/507952, t=2, fl=1,
r=200:200, cmd=6, pid=3479)
Jun  6 09:03:10 tatooine kernel: NFS: nfs_lock(f=   9/507952, t=1, fl=1,
r=100:100, cmd=6, pid=3479)
Jun  6 09:03:10 tatooine kernel: NFS: nfs_lock(f=   9/507952, t=1, fl=1,
r=100:100, cmd=5, pid=3479)
Jun  6 09:03:10 tatooine kernel: NFS: nfs_lock(f=   9/507952, t=1, fl=1,
r=200:200, cmd=6, pid=3479)
Jun  6 09:03:10 tatooine kernel: NFS: nfs_lock(f=   9/507952, t=1, fl=1,
r=200:200, cmd=5, pid=3479)
Jun  6 09:03:10 tatooine kernel: NFS: nfs_lock(f=   9/507952, t=1, fl=1,
r=100:100, cmd=6, pid=3479)
----------------------

So the last F_SETLK commands here for the ranges with the stale locks are lock
commands. In this case, it looks like we attempted to lock the file but the
unlock attempt never occured.

Unfortunately, I don't have info here about whether the lock attempts were
successful. Going to see about instrumenting that now...

Comment 19 Jeff Layton 2006-06-06 13:29:37 UTC

Modified nfs_lock to so that the dprintk shows the return code:

Here we get a successful lock:

Jun  6 09:33:36 tatooine kernel: NFS: nfs_lock(f=   9/507955, t=1, fl=1,
r=200:200, cmd=6, pid=5866) = 0

...and a successful unlock:

Jun  6 09:33:36 tatooine kernel: NFS: nfs_lock(f=   9/507955, t=2, fl=1,
r=200:200, cmd=6, pid=5866) = 0

lock attempt with an EAGAIN:
Jun  6 09:33:36 tatooine kernel: NFS: nfs_lock(f=   9/507955, t=1, fl=1,
r=100:100, cmd=6, pid=5866) = -11

GETLK which shows it as clear:

Jun  6 09:33:37 tatooine kernel: NFS: nfs_lock(f=   9/507955, t=1, fl=1,
r=100:100, cmd=5, pid=5866) = 0

successful lock attempt:
Jun  6 09:33:37 tatooine kernel: NFS: nfs_lock(f=   9/507955, t=1, fl=1,
r=200:200, cmd=6, pid=5866) = 0

Then this strange beast, this is a GETLK, but type=2 (F_UNLK). Does this mean
it's attempting to see if it can successfully unlock the file?

Jun  6 09:33:37 tatooine kernel: NFS: nfs_lock(f=   9/507955, t=2, fl=1,
r=200:200, cmd=5, pid=5866) = 0

That's the last command for this PID.

Comment 20 Jeff Layton 2006-06-06 16:13:10 UTC

Ok, instrumented locks_unlock_delete to show when it was being called and by
what PID. After pkilling all lockit2's I got this repeated:

        Lock:1 - Offset 200, Len 1, Pid 2023

but the debug message is from a different PID. So again, for some reason we're
not even getting this far when freeing all of a process' locks:

        locks_unlock_delete: 2018 releasing    9/507952, r=100:100

Comment 21 Jeff Layton 2006-06-06 17:34:08 UTC

Added some debug code to locks_remove_posix. This time, PID 1184 had the stale lock:

locks_remove_posix: pid=1172 fl_flags=1, fl_owner=d16df920, owner=d16df920
locks_remove_posix: pid=1185 fl_flags=1, fl_owner=d38e68c0, owner=d1120e20
locks_remove_posix: pid=1185 fl_flags=1, fl_owner=d16df5e0, owner=d1120e20
locks_remove_posix: pid=1187 fl_flags=1, fl_owner=d38e68c0, owner=d16dfac0
locks_remove_posix: pid=1187 fl_flags=1, fl_owner=d16df5e0, owner=d16dfac0
locks_remove_posix: pid=1184 fl_flags=1, fl_owner=d38e68c0, owner=d16df100
locks_remove_posix: pid=1184 fl_flags=1, fl_owner=d16df5e0, owner=d16df100
locks_remove_posix: pid=1178 fl_flags=1, fl_owner=d38e68c0, owner=d16df5e0
locks_remove_posix: pid=1181 fl_flags=1, fl_owner=d38e68c0, owner=d16df780
locks_remove_posix: pid=1190 fl_flags=1, fl_owner=d38e68c0, owner=d1120ae0
locks_remove_posix: pid=1180 fl_flags=1, fl_owner=d38e68c0, owner=d16df2a0
locks_remove_posix: pid=1186 fl_flags=1, fl_owner=d38e68c0, owner=d1120c80
locks_remove_posix: pid=1188 fl_flags=1, fl_owner=d38e68c0, owner=d16dfc60

The issue here seems to be that fl->fl_owner and owner do not match. This is
preventing locks_unlock_delete from getting called.

I'm not clear yet on where these values ultimately come from.

Comment 22 Jeff Layton 2006-06-06 18:39:57 UTC

This time, the stale lock was this one:

1: POSIX  ADVISORY  WRITE 1224 fd:04:507952 200 200

Here's the debug messages from that period (some others precede it). So we don't
match this particular lock here:

locks_remove_posix: pid=1224 fl_flags=1, fl_owner=d3ffb520, owner=d3ffb6c0
lock_remove_posix: non-matching lock 1224    9/507952, r=100:100
locks_remove_posix: pid=1215 fl_flags=1, fl_owner=d3ffb520, owner=d359e6e0
lock_remove_posix: non-matching lock 1215    9/507952, r=100:100
locks_remove_posix: pid=1223 fl_flags=1, fl_owner=d3ffb520, owner=d1265600
lock_remove_posix: non-matching lock 1223    9/507952, r=100:100

But a little later, the fl_owner for this lock range changes? Maybe it was
dropped and reallocated?

locks_remove_posix: pid=1216 fl_flags=1, fl_owner=d1592440, owner=d1265c80
lock_remove_posix: non-matching lock 1216    9/507952, r=100:100
locks_remove_posix: pid=1219 fl_flags=1, fl_owner=d1592440, owner=d1592440
locks_unlock_delete: 1219 releasing    9/507952, r=100:100

Perhaps we're hitting a race where we allocate and get granted a lock just after
the lock lists for the process have been cleaned out?

Comment 23 Jeff Layton 2006-06-06 20:33:20 UTC

Even more strange, it almost looks like the nfs locking is allowing collisions
here. Here are a number of nfs_lock commands and their results for PIDs 1314 and
1318:

# dmesg -s 131072 | grep 200:200 | grep 'cmd=6' | egrep '(1314|1318)'
NFS: nfs_lock(f=   9/507952, t=1, fl=1, r=200:200, cmd=6, pid=1314) = 0
NFS: nfs_lock(f=   9/507952, t=1, fl=1, r=200:200, cmd=6, pid=1318) = 0
NFS: nfs_lock(f=   9/507952, t=1, fl=1, r=200:200, cmd=6, pid=1318) = 0
NFS: nfs_lock(f=   9/507952, t=1, fl=1, r=200:200, cmd=6, pid=1318) = 0
NFS: nfs_lock(f=   9/507952, t=1, fl=1, r=200:200, cmd=6, pid=1318) = 0
NFS: nfs_lock(f=   9/507952, t=1, fl=1, r=200:200, cmd=6, pid=1318) = 0
NFS: nfs_lock(f=   9/507952, t=2, fl=1, r=200:200, cmd=6, pid=1314) = 0

Early on, a couple of other processes are also contending for the locks, but
toward the end, neither of these processes are ever denied a SETLK. 1314 gets
the lock, but then 1318 is also granted it, and 1318 never releases it.

My understanding is that the locks in these programs are exclusive, so it
doesn't seem like this should be allowed...

Comment 24 Ernie Petrides 2006-06-06 20:55:31 UTC

SteveD, should I put this bug into FAILS_QA state or bounce it back
into ASSIGNED state?  It looks like the problem isn't fully resolved.

Comment 29 Jeff Layton 2006-06-14 16:56:04 UTC

I've not, however, been able to reproduce this on FC5 as of yet, but the only
FC5 box I have at the moment is UP, so that could also be a factor.

Going to see if I can reproduce this on RHEL3/RHEL4 with UP kernels.

Comment 30 Jeff Layton 2006-06-14 17:48:16 UTC

I was still able to reproduce this on RHEL3 UP kernel, going to check FC5 SMP in
a bit.

Comment 31 Jeff Layton 2006-06-14 18:05:41 UTC

after many tries, I was also able to reproduce this on FC5 SMP:

Linux dantu.rdu.redhat.com 2.6.16-1.2122_FC5 #1 SMP Sun May 21 15:01:10 EDT 2006
x86_64 x86_64 x86_64 GNU/Linux

My guess is that it may also be reproducable on FC5 UP, but it might just take a
bit longer (I'll try again to see if I can make it happen there too).

So this bug also probably exists upstream, at least as of fairly recently.

Comment 32 Amit Gud 2006-06-14 22:37:23 UTC

I tried tracing calls to nlmclnt_proc from do_setlk.

The test shows:

Lock:1 - Offset 100, Len 1, Pid 28066
...

Modified fs/nfs/file.c:nfs_lock()

grep "nfs_lock" /var/log/messages | grep 28066
Jun 14 17:42:55 pro1 kernel: NFS: nfs_lock(f=   a/10506580, t=1, fl=1,
r=100:100, cmd=5, pid=28066)
Jun 14 17:42:56 pro1 kernel: NFS: nfs_lock(f=   a/10506580, t=1, fl=1,
r=200:200, cmd=6, pid=28066)
Jun 14 17:42:57 pro1 kernel: NFS: nfs_lock(f=   a/10506580, t=1, fl=1,
r=200:200, cmd=5, pid=28066)
Jun 14 17:42:59 pro1 kernel: NFS: nfs_lock(f=   a/10506580, t=1, fl=1,
r=100:100, cmd=6, pid=28066)
Jun 14 17:43:00 pro1 kernel: NFS: nfs_lock(f=   a/10506580, t=1, fl=1,
r=100:100, cmd=5, pid=28066)
Jun 14 17:43:02 pro1 kernel: NFS: nfs_lock(f=   a/10506580, t=1, fl=1,
r=200:200, cmd=6, pid=28066)
Jun 14 17:43:03 pro1 kernel: NFS: nfs_lock(f=   a/10506580, t=2, fl=1,
r=200:200, cmd=6, pid=28066)
Jun 14 17:43:04 pro1 kernel: NFS: nfs_lock(f=   a/10506580, t=1, fl=1,
r=100:100, cmd=6, pid=28066)
Jun 14 17:43:05 pro1 kernel: NFS: nfs_lock(f=   a/10506580, t=1, fl=1,
r=100:100, cmd=5, pid=28066)
Jun 14 17:43:06 pro1 kernel: NFS: nfs_lock(f=   a/10506580, t=1, fl=1,
r=200:200, cmd=6, pid=28066)
Jun 14 17:43:07 pro1 kernel: NFS: nfs_lock(f=   a/10506580, t=1, fl=1,
r=200:200, cmd=5, pid=28066)

And modified fs/lockd/clntproc.c:nlmclnt_proc()
grep "clnt proc returns" /var/log/messages | grep 28066
Jun 14 17:42:55 pro1 kernel: lockd: clnt proc returns -11, cmd=6, pid=28066
Jun 14 17:42:57 pro1 kernel: lockd: clnt proc returns -11, cmd=6, pid=28066
Jun 14 17:43:00 pro1 kernel: lockd: clnt proc returns -11, cmd=6, pid=28066
Jun 14 17:43:03 pro1 kernel: lockd: clnt proc returns 0, cmd=6, pid=28066
Jun 14 17:43:04 pro1 kernel: lockd: clnt proc returns 0, cmd=6, pid=28066
Jun 14 17:43:05 pro1 kernel: lockd: clnt proc returns 0, cmd=6, pid=28066
Jun 14 17:43:06 pro1 kernel: lockd: clnt proc returns 0, cmd=5, pid=28066
Jun 14 17:43:06 pro1 kernel: lockd: clnt proc returns 0, cmd=5, pid=28066
Jun 14 17:43:06 pro1 kernel: lockd: clnt proc returns 0, cmd=5, pid=28066
Jun 14 17:43:07 pro1 kernel: lockd: clnt proc returns 0, cmd=5, pid=28066
Jun 14 17:43:07 pro1 kernel: lockd: clnt proc returns -11, cmd=6, pid=28066
Jun 14 17:43:08 pro1 kernel: lockd: clnt proc returns 0, cmd=5, pid=28066
Jun 14 17:43:08 pro1 kernel: lockd: clnt proc returns 0, cmd=5, pid=28066
Jun 14 17:43:08 pro1 kernel: lockd: clnt proc returns 0, cmd=5, pid=28066
Jun 14 17:43:08 pro1 kernel: lockd: clnt proc returns 0, cmd=5, pid=28066
Jun 14 17:43:08 pro1 kernel: lockd: clnt proc returns 0, cmd=5, pid=28066
Jun 14 17:43:09 pro1 kernel: lockd: clnt proc returns 0, cmd=5, pid=28066
Jun 14 17:43:10 pro1 kernel: lockd: clnt proc returns 0, cmd=5, pid=28066


Here, after process 28066 gets lock on 100:100, it revokes the lock. Later it
tries to get lock on 200:200. But theres no unlock attempts whatsoever. I'm
putting some debug statements right at locks_remove_posix to see when it
actually initiates the unlocking, and also trying to trace where the values in
owner field come from.

Comment 33 Ernie Petrides 2006-06-14 22:55:39 UTC

Redoing lost reassignment from SteveD to Amit on
Mon, 12 Jun 2006 10:44:24 -0400.

Comment 34 Ernie Petrides 2006-06-14 22:57:34 UTC

Reapplying lost comment from jlayton on 2006-06-13 10:02 EST:


[note that attachment has been lost]

For this patch and run:

        Lock:1 - Offset 100, Len 1, Pid 1588


The relevant logs messages are here:

close_files: closing all files for pid=1588 files_struct=db9e5280
locks_remove_posix: pid=-610396224 *before=00000000
locks_remove_posix: pid=1588 fl_owner=db6a5d00 owner=db9e5280 range=100:100
nlmclnt_proc: pid=1592 type=2 r=100:100 status=0
close_files: closing all files for pid=1592 files_struct=db6a5d00
nlmclnt_proc: pid=1598 type=1 r=200:200 status=0
close_files: closing all files for pid=1598 files_struct=dc333780
locks_remove_posix: pid=-610394304 *before=00000000
locks_remove_posix: pid=1598 fl_owner=dc333780 owner=dc333780 range=200:200
nlmclnt_proc: pid=1598 type=2 r=200:200 status=0
close_files: closing all files for pid=1590 files_struct=db9e5080
close_files: closing all files for pid=1597 files_struct=db6a5900
close_files: closing all files for pid=1606 files_struct=db6a5900

So 1592 released the lock around the same time that locks_remove_posix was
being called for 1588. The interesting bit is that I did a network capture
during this run as well. This packet comes in right near the end, and no more
LOCK/UNLOCK calls are made after this one:

387  21.229495    10.10.1.9 -> 172.16.57.196 NLM V4 LOCK Call FH:0xa7c09108
svid:1598 pos:0-0  ; V4 UNLOCK Call FH:0xa7c09108 svid:1592 pos:0-0  ; V4 LOCK
Call FH:0xa7c09108 svid:1588 pos:0-0

So we have 3 different calls in this one packet. A LOCK call for PID 1598
(which is for l_offset 200), and then an UNLOCK call for l_offset 100 from pid
1592, and then a LOCK call for l_offset 100 from 1588.

So the server is granting this last lock request, but I think that by the time
the response gets back to the client, the process has exited. I wonder if the
issue is just timing with the polling loop for locks. Perhaps we need to
disable the polling loop if the process is PF_EXITING?

Comment 35 Ernie Petrides 2006-06-14 22:59:08 UTC

Reapplying lost comment from jlayton on 2006-06-13 12:21 EST:


Actually, I think that by the time the process is PF_EXITING, we don't hit the
polling loop again anyway.

I think that what's happening is that the RPC lock call in the polling loop gets
issued just before the process gets hit with the signal. The call goes out on
the wire before the process can cancel it, but by the time the time the response
gets back, the process has already exited.

That said, it looks like (from Steve's patch above) that we should be able to
handle this situation, so I'm not clear on where it's falling down.

Comment 37 Jeff Layton 2006-06-15 11:47:16 UTC

Actually, I have a (somewhat vague) hunch about what this problem might be.
While bugzilla was down yesterday, I had a similar situation to one I had posted
above when I saw this:

# dmesg -s 131072 | grep 200:200 | grep 'cmd=6' | egrep '(1314|1318)'
NFS: nfs_lock(f=   9/507952, t=1, fl=1, r=200:200, cmd=6, pid=1314) = 0
NFS: nfs_lock(f=   9/507952, t=1, fl=1, r=200:200, cmd=6, pid=1318) = 0
NFS: nfs_lock(f=   9/507952, t=1, fl=1, r=200:200, cmd=6, pid=1318) = 0
NFS: nfs_lock(f=   9/507952, t=1, fl=1, r=200:200, cmd=6, pid=1318) = 0
NFS: nfs_lock(f=   9/507952, t=1, fl=1, r=200:200, cmd=6, pid=1318) = 0
NFS: nfs_lock(f=   9/507952, t=1, fl=1, r=200:200, cmd=6, pid=1318) = 0
NFS: nfs_lock(f=   9/507952, t=2, fl=1, r=200:200, cmd=6, pid=1314) = 0

After seeing something like the above, when locks_remove_posix was called, it
seemed to have no knowledge of a lock on 200:200.

The issue here is that my debug patches at this point were being done after the
call into the NLM client had returned. This info makes it look like the server
is granting the lock to 1318 while 1314 still holds it.

But network captures show that this isn't the case. What's happening here is
something like this:

1314 requests the lock and gets it
1314 sends an UNLOCK to the server and then goes to sleep
server sends a successful unlock reply to the client, but while in transit...
1318 sends a lock request
1318 gets the lock
1314 then gets the unlock reply

Now all of this is legit from the standpoint of the server, but what I'm not
clear on is whether the out-of-order set of operations here from the standpoint
of the client processes is maybe causing the file_lock lists to look wrong.

i.e., from the standpoint of the two processes above, the last operation to
happen was the unlock by 1314. Might this make the kernel think that the above
file was actually unlocked at the time that the processes are all exiting?

Again, this is just speculation, but this has me thinking that the problem is
not in the client/server nlm piece, but in how the file_lock lists are being
handled.

Comment 38 Amit Gud 2006-06-16 02:06:21 UTC

Created attachment 131020 [details]
debug patch 1

Comment 39 Amit Gud 2006-06-16 02:07:49 UTC

I tried to see if last unlock is really called. With the attached debug patch,
it seems like the unlock for the buggy process's lock is never called by the
fs/locks.c:locks_unlock_delete()

The test output shows:

...
        Lock:1 - Offset 100, Len 1, Pid 5026
...

# grep 5026 messages-06-15 | tail -n 20

Jun 15 21:02:06 pro1 kernel: lockd: clnt proc returns 0, cmd=6, pid=5026
Jun 15 21:02:06 pro1 kernel: NFS: END nfs_lock(t=1, r=200:200, cmd=6, pid=5026) = 0
Jun 15 21:02:06 pro1 kernel: NFS: nfs_lock(f=   a/10506580, t=2, fl=1,
r=200:200, cmd=6, pid=5026)
Jun 15 21:02:06 pro1 kernel: NFS: do_unlk(f=   a/10506580, t=2, fl=1, r=200:200,
cmd=6, pid=5026)
Jun 15 21:02:06 pro1 kernel: lockd: clnt proc returns 0, cmd=5, pid=5026
Jun 15 21:02:06 pro1 kernel: NFS: END nfs_lock(t=1, r=200:200, cmd=5, pid=5026) = 0
Jun 15 21:02:07 pro1 kernel: lockd: clnt proc returns 0, cmd=5, pid=5026
Jun 15 21:02:07 pro1 kernel: NFS: END nfs_lock(t=1, r=200:200, cmd=5, pid=5026) = 0
Jun 15 21:02:07 pro1 kernel: lockd: clnt proc returns 0, cmd=6, pid=5026
Jun 15 21:02:07 pro1 kernel: NFS: END nfs_lock(t=2, r=200:200, cmd=6, pid=5026) = 0
Jun 15 21:02:07 pro1 kernel: NFS: posix_lock_file: caller(5026, dc97a900,
200:200, 2) before(5026, dc97a900, 200:200, 1)
Jun 15 21:02:07 pro1 kernel: NFS: nfs_lock(f=   a/10506580, t=1, fl=1,
r=100:100, cmd=6, pid=5026)
Jun 15 21:02:08 pro1 kernel: lockd: clnt proc returns 0, cmd=6, pid=5026
Jun 15 21:02:08 pro1 kernel: NFS: END nfs_lock(t=1, r=100:100, cmd=6, pid=5026) = 0
Jun 15 21:02:08 pro1 kernel: NFS: nfs_lock(f=   a/10506580, t=1, fl=1,
r=100:100, cmd=5, pid=5026)
Jun 15 21:02:08 pro1 kernel: NFS: do_getlk(f=   a/10506580, t=1, fl=1,
r=100:100, cmd=5, pid=5026)
Jun 15 21:02:08 pro1 kernel: lockd: clnt proc returns 0, cmd=5, pid=5026
Jun 15 21:02:08 pro1 kernel: NFS: END nfs_lock(t=1, r=100:100, cmd=5, pid=5026) = 0
Jun 15 21:02:08 pro1 kernel: lockd: clnt proc returns -4, cmd=5, pid=5026
Jun 15 21:02:08 pro1 kernel: NFS: END nfs_lock(t=1, r=100:100, cmd=5, pid=5026) = -4

Here 5026 never send unlock for the range 100:100. I think the file_lock
updation occurs after the process is terminated, don't know if its possible in
atleast in certain scenario. TCP dump  confirms that there was no unlock call.

# tethereal -r tcpdump1 | grep V4 | egrep '(NLM|NFS)' | tail -n 20

88231  18.659381     10.0.0.4 -> 10.0.0.1     NLM V4 LOCK Call FH:0x1d98c2f6
svid:5027 pos:0-0
88232  18.659418     10.0.0.4 -> 10.0.0.1     NLM V4 LOCK Call FH:0x1d98c2f6
svid:5029 pos:0-0
88233  18.659444     10.0.0.4 -> 10.0.0.1     NLM V4 UNLOCK Call FH:0x1d98c2f6
svid:5028 pos:0-0
88234  18.659518     10.0.0.4 -> 10.0.0.1     NLM V4 TEST Call FH:0x1d98c2f6
svid:5032 pos:0-0
88235  18.659534     10.0.0.1 -> 10.0.0.4     NLM V4 TEST Reply (Call In 88228)
88236  18.659553     10.0.0.4 -> 10.0.0.1     NLM V4 LOCK Call FH:0x1d98c2f6
svid:5030 pos:0-0
88238  18.659697     10.0.0.1 -> 10.0.0.4     NLM V4 LOCK Reply (Call In 88231)
 ; V4 LOCK Reply (Call In 88232) NLM_DENIED  ; V4 UNLOCK Reply (Call In 88233)
88240  18.659864     10.0.0.4 -> 10.0.0.1     NLM V4 UNLOCK Call FH:0x1d98c2f6
svid:5027 pos:0-0
88241  18.659930     10.0.0.4 -> 10.0.0.1     NLM V4 TEST Call FH:0x1d98c2f6
svid:5029 pos:0-0
88242  18.659977     10.0.0.4 -> 10.0.0.1     NLM V4 LOCK Call FH:0x1d98c2f6
svid:5028 pos:0-0
88243  18.660000     10.0.0.4 -> 10.0.0.1     NLM V4 LOCK Call FH:0x1d98c2f6
svid:5026 pos:0-0
88244  18.660016     10.0.0.1 -> 10.0.0.4     NLM V4 TEST Reply (Call In 88234);
 V4 LOCK Reply (Call In 88236)
88246  18.660151     10.0.0.4 -> 10.0.0.1     NLM V4 UNLOCK Call FH:0x1d98c2f6
svid:5030 pos:0-0
88247  18.660172     10.0.0.4 -> 10.0.0.1     NLM V4 LOCK Call FH:0x1d98c2f6
svid:5024 pos:0-0
88249  18.660307     10.0.0.1 -> 10.0.0.4     NLM V4 UNLOCK Reply (Call In
88240)  ; V4 TEST Reply (Call In 88241)  ; V4 LOCK Reply (Call In 88242)  ; V4
LOCK Reply (Call In 88243) NLM_DENIED
88250  18.660342     10.0.0.4 -> 10.0.0.1     NLM V4 LOCK Call FH:0x1d98c2f6
svid:5022 pos:0-0
88252  18.660433     10.0.0.4 -> 10.0.0.1     NLM V4 LOCK Call FH:0x1d98c2f6
svid:5027 pos:0-0
88253  18.660483     10.0.0.1 -> 10.0.0.4     NLM V4 UNLOCK Reply (Call In
88246)  ; V4 LOCK Reply (Call In 88247) NLM_DENIED
88254  18.660491     10.0.0.4 -> 10.0.0.1     NLM V4 UNLOCK Call FH:0x1d98c2f6
svid:5028 pos:0-0
88255  18.660592     10.0.0.4 -> 10.0.0.1     NLM V4 TEST Call FH:0x1d98c2f6
svid:5026 pos:0-0

I'm now trying to get the debug output from the place where the file_lock is
updated.

Comment 40 Amit Gud 2006-06-16 15:56:33 UTC

Strange thing I found in yesterday's syslog, is:

Jun 15 21:02:08 pro1 kernel: NFS: nfs_lock(f=   a/10506580, t=1, fl=1,
r=100:100, cmd=6, pid=5032)
Jun 15 21:02:09 pro1 kernel: lockd: clnt proc returns -4, cmd=6, pid=5032
Jun 15 21:02:09 pro1 kernel: lockd: clnt proc returns 0, cmd=6, pid=5032
Jun 15 21:02:09 pro1 kernel: NFS: END nfs_lock(t=2, r=100:100, cmd=6, pid=5032) = 0

5032 is the last one to unlock 200:200, but its type 't', apparently changed
within the function, from 1 to 2.

Comment 41 Amit Gud 2006-06-16 20:15:26 UTC

With another debug patch for fs/locks.c, its very clear that the victim process
' lock is not registered in file_lock before the locks are cleaned up.

...
        Lock:1 - Offset 200, Len 1, Pid 1680
...

$ grep "cmd=6" messages-06-16-1 | grep 1680 | grep "t=1" | grep END

NFS: END nfs_lock(t=1, r=100:100, cmd=6, pid=1680) = -11
NFS: END nfs_lock(t=1, r=200:200, cmd=6, pid=1680) = 0
NFS: END nfs_lock(t=1, r=200:200, cmd=6, pid=1680) = -11
NFS: END nfs_lock(t=1, r=100:100, cmd=6, pid=1680) = -11
NFS: END nfs_lock(t=1, r=200:200, cmd=6, pid=1680) = 0

$ grep locks_insert_lock messages-06-16-1 | grep 1680

NFS: locks_insert_lock: fl(1680, df82e680, 200:200, 1)

1680 takes 2 locks on 200:200, but only 1 is registered, thats the first one.
Second one is not! Whereas, for any other process 1677 e.g. :

$ grep locks_insert_lock messages-06-16-1 | grep 1677

NFS: locks_insert_lock: fl(1677, dd48ed80, 200:200, 1)
NFS: locks_insert_lock: fl(1677, dd48ed80, 100:100, 1)
NFS: locks_insert_lock: fl(1677, dd48ed80, 200:200, 1)

$ grep "cmd=6" messages-06-16-1 | grep 1677 | grep "t=1" | grep END

NFS: END nfs_lock(t=1, r=200:200, cmd=6, pid=1677) = -11
NFS: END nfs_lock(t=1, r=200:200, cmd=6, pid=1677) = -11
NFS: END nfs_lock(t=1, r=200:200, cmd=6, pid=1677) = 0
NFS: END nfs_lock(t=1, r=100:100, cmd=6, pid=1677) = 0
NFS: END nfs_lock(t=1, r=200:200, cmd=6, pid=1677) = 0
NFS: END nfs_lock(t=1, r=100:100, cmd=6, pid=1677) = -11

3 = 3!

Now since the lock is seen after the process dies (with ldisp), I guess lock
gets registered in file_lock right when process is deing but after the locks are
cleaned up. But now the confusing part is when exactly that happens, and if it
really happens. ldisp shows that file_lock has the lock entry for the process,
by my logs don't have anything when exactly that entry is made.

Comment 42 Jeff Layton 2006-06-18 02:28:40 UTC

It's not clear to me that ldisp displaying this lock actually means that there
is a file_lock entry on the client. From what I can tell, that info comes from
the server.

I've given this some more thought...

Another possible scenario for this situation might be this:

Process requests lock and server grants it, but, before the reply makes it back
to the client, the process is killed. So, it never calls locks_insert_lock to
put it in the list, but the server considers the lock as granted.

If this is the case, some possible fixes might be:

1) have lockd send an NLM_DENIED when it receives a NLM_GRANTED message for a
process that is PF_EXITING.

...or, perhaps a more sure way...

2) have the process add the lock to the file_lock list before the request ever
goes out on the wire. If it gets denied, then the process could just remove it.
With this, that should help ensure that locks_remove_posix will know about the
lock and queue up an UNLOCK request for it.

I *think* this would be OK, though we might end up with some spurious UNLOCK
requests if (for instance) the process was killed while waiting for a lock
response from the server was going to be a DENIED.

The only catch here is that I'm not sure how we could do this without affecting
the code for locks on local filesystems.

Comment 43 Amit Gud 2006-06-18 03:06:26 UTC

>> Process requests lock and server grants it, but, before the reply makes it back
>> to the client, the process is killed. So, it never calls locks_insert_lock to
>> put it in the list, but the server considers the lock as granted.

ldisp showing the lock, surely means that its hanging on the inode->i_flock. But
I'm not sure if its really added to file_lock_list at any point.

Jeff, I'm thinking of similar work-around as your (2) above. With a variation
that, all locks will be added to inode->i_flock and to file_lock_list as soon as
they are issued, but before they are accepted. They will be marked NOT_GRANTED.
Whenever they are actually granted, the flag will be cleared. While cleaning up,
UNLOCK would be initiated for these locks even if the flag is set. With this I
think even locks with other filesystems would work ok.

But other thing I want to look for next is that if the lock entry is really
added to file_lock_list, and at what point it is added to inode->i_flock.

Comment 44 Jeff Layton 2006-06-18 13:17:08 UTC

Created attachment 131118 [details]
patch to test theory in comment #42

Here's a small patch to test the theory in comment #42. I added a new PF_ flag
that gets set just before entering the loop to request the lock. We then clear
it as we exit nlmclnt_lock function. We then throw a printk if the flag is set
near the end of locks_remove_posix.

I'm not certain that there won't be false positives from this, or if we might
end up with locks_remove_posix getting called from lockd somehow, but if there
is a dangling lock due to the situation in comment #42, then I think this will
catch it.

Building a kernel with this now -- I'll plan to test it either later today or
tomorrow...

Comment 45 Jeff Layton 2006-06-18 15:23:52 UTC

Well, I reproduced it twice on RHEL4, and didn't see the message I thought I
would, but I did see this -- once for each time that I reproduced it:

nlmclnt_lock: VFS is out of sync with lock manager!

Comment 46 Amit Gud 2006-06-19 02:27:55 UTC

Created attachment 131125 [details]
debug patch 2

With the attached patch, I think its clear now that on client side the lock is
not recorded on file_lock_list. So the problem could possibly be resolved by
comments #42 or #43.

Comment 47 Amit Gud 2006-06-20 13:53:23 UTC

Created attachment 131196 [details]
debug patch 3

I was trying to send UNLOCK requests to the process' request which are hanging
off fl_block of the file_lock and are PF_EXITING, when clearing the locks
(locks_remove_posix).

But fl_block is empty (and rightly so, because we are not giving F_SETLKW, but
either F_SETLK or F_UNLCK, realized later).

Comment 48 Amit Gud 2006-06-21 13:35:41 UTC

Created attachment 131281 [details]
fix 1

Super Quick Fix. I'm sure there can be some better way to fix this.

The problem is the fcntl/fclose race, which is handled, but doesn't work. After
returning from posix_lock_file in fcntl_setlk, the error due to lock conflicts
detected in posix_lock_file did not cause the lock recently acquired to be
revoked. I changed the code and I haven't seen the bug for many (> 10) runs
now. This is on RHEL3 on pro1 mounting gfiler.

Comment 49 Peter Staubach 2006-06-21 14:01:03 UTC

"but doesn't work"?  How doesn't it work and why?

Changing the code to release a lock which just not acquired does not
seem to make sense.  That's what this "Super Quick Fix" seems to do.
Also, does it reintroduce the fcntl/close race?

Comment 50 Amit Gud 2006-06-21 14:38:38 UTC

My understanding is that, for the case at hand, lock is granted by the server
and nfs_lock returns successfully. But posix_lock_file detects a conflict and
returns -EAGAIN and does not record the lock granted by the server (in
inode->i_flock). This lock goes unrecorded. So my intention was, if
posix_lock_file fails for some reason, release the lock which was just acquired
(posix_lock_file was executed only because nfs_lock granted the lock).

Current code doesn't handle this case. The error returned by the posix_lock_file
is not conveyed to the server, which is OK for F_UNLCK, but not OK otherwise.

Even if we check in posix_lock_conflict to see if the conflicting process is
PF_EXITING, I'm not sure how are we going to record the acquired lock without
letting filp_close clean up the inode->i_flock.

Comment 51 Peter Staubach 2006-06-21 14:51:26 UTC

Why would the server grant a lock, which apparently did not conflict with
other locks, but then the client reject the lock due to a conflict?

Comment 52 Amit Gud 2006-06-21 20:09:34 UTC

Created attachment 131319 [details]
debug messages

Bug resolution.

Cause: Race between two processes, P1 and P2, for updating the i_flock field
(I'd call it unlock/lock race and not neccessarily fcntl/close race). This
sequence is the reason of the bug:
---
P1 requests server for the lock
Server grants the lock
P1 sends release request for the lock
Server grants the release request..

...But before this release response reaches P1 and i_flock is updated, P2 gets
going..

P2 request server for the same lock
Server grants the lock
P2 tries to update the i_flock field, but find conflicts
BUG: error goes undetected
P1 runs and updates i_flock
---

The bug is that, conflict is detected but server is not sent the unlock
request, and server thinks it has granted the lock (thats the reason why ldisp
shows the lock being held by a non-existing process).

Resolution:
Handle the error returned by posix_file_lock and send unlock if posix_file_lock
returns an error, then be it for any cause...(no memory for lock structures,
locks not POSIX at all or conflicts)

Now the question is, this is fairly VFS bug, so why this bug is not seen on
other filesystems?

My answer to this would be, most filesystems DON'T implement their own lock
f_ops operation, and so it doesn't matter if the file-specific lock is not
called if lock conflicts occur. Primary reason for the bug is:
- call fs->lock(LOCK)
- check for conflicts
- AND _don't_ call fs->lock(UNLOCK) if conflicts are present


I'd like to test for this bug in any filesystem which implement its own lock
f_ops operation. For now, the fix works for RHEL3, will need to see if this
works on other kernels.


AG

Comment 53 Peter Staubach 2006-06-21 20:15:10 UTC

This seems like a serialization sort of error and one that system should
handle transparently.

Sending an unlock to the server when an error is received seems like a
brute force way of handling this and that there should be a better way.
I don't think that the server should be sent the unlock request because
the lock should exist.  The conflict is only in the client's accounting
of the data structures.

Comment 54 Amit Gud 2006-06-21 22:41:19 UTC

Well, that is what the comment and the code (which I removed) intend to do -
send the unlock on conflict. At most we can put the acquired lock in the
fl_block list for later to be woken up when the i_flock is cleaned up. I'm
trying to find a catch for this and why this was not done in the first place.

Also I'm looking how this could be done in 2.6 kernel, where things have changed
a bit.

Comment 55 Amit Gud 2006-06-22 00:57:43 UTC

I think the fix is apt and appropriate, for blocking calls the code hangs the
process off fl_block and for non-blocking ones we don't have any option but to
return the error code BUT after releasing the lock that was just acquired.

Comment 56 Peter Staubach 2006-06-22 12:40:56 UTC

Without some more information, I don't agree.  The lock was granted by
the server, so clearly there is no conflict, despite whatever the client
is showing in this situation.  Releasing a lock which was granted and
returning an error to the application just seems wrong.

I'd like to see the real problem addressed and not a symptom.

Comment 57 Amit Gud 2006-06-23 13:54:33 UTC

> Releasing a lock which was granted and returning an error to the application 
> just seems wrong.

The lock has to be released for following reasons:

- its a non-blocking lock
- we cannot record the lock anywhere else
- the lock we acquired has conflicts and resolving those conflicts may take
cycles and the process doesn't want to wait
- we cannot wait until i_flock is cleared (which may well be because of a bug in
other part of the system and not neccessarily a lock/unlock race)
- and since we cannot wait, server has to be informed

If we are not to send the unlock request, the question is how are we going to
resolve the conflict (update the i_flock) without blocking the process?

Comment 58 Peter Staubach 2006-06-23 14:04:53 UTC

I might suggest checking for the conflict ahead of time in order to avoid
this situation?

Once again, this feels to me to be addressing a symptom of a synchronization
problem on the client.  It would be better to correct the checking and
synchronization in order to avoid this situation in the first place.

Comment 59 Steve Dickson 2006-06-26 11:03:22 UTC

Created attachment 131521 [details]
proposed patch

The following patch I think more directly address the problem
without effecting other race conditions.  Whenever the filesystem
allocates the lock but it can not be recored (due to any reason)
the lock has to be cleaned up.

Now better synchronization probably would help in this case and should
be pursed in the RHEL4 world (assuming this bug exists), but
whenever a lock is allocated and not recored is a very bad thing
regardless if its a synchronization issue or not...  So it s not clear
that fixing the synchronization would be a "catch all" as I hope this
patch is...

Comment 60 Ian McLeod 2006-06-29 14:56:29 UTC

I've complied a 2.4.21-37.EL SMP kernel with the patch from comment #59 and
re-run our tests.  I'm afraid I am still able to consistently reproduce stale locks.

Comment 61 Amit Gud 2006-06-29 16:24:10 UTC

I tried hard on 2.4.21-44.EL-ag SMP + the steved's fix again, but I could not
reproduce the bug. I'm going to try on 2.4.21-37.EL SMP.

Comment 65 Ian McLeod 2006-07-06 14:26:09 UTC

On further review, I did not apply the patch correctly for the tests referenced
in Comment 60.  I've since recompiled and retested with the latest patch and
cannot generate a stale lock.  Will be doing further testing in a real
application context.

Note You need to log in before you can comment on or make changes to this bug.