Bug 453094

Summary: deadlock when lockd tries to take f_sema that it already has
Product: Red Hat Enterprise Linux 5 Reporter: Janne Blomqvist <blomqvist.janne>
Component: kernelAssignee: Jeff Layton <jlayton>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: high Docs Contact:
Priority: urgent    
Version: 5.2CC: aca21, amyagi, bjorn.sund, bogdan.costescu, Colin.Simpson, craigwhite, dhoward, gasi, georgios, gilboad, goeran, jan.iven, jpirko, jplans, j.s.peatfield, kas, k.georgiou, klaus.steinberger, matt, mishu, mmcgrath, pasteur, pere, rdieter, redhat, riek, rwheeler, selimok, sgf, staubach, steved, tao, t.h.amundsen, voetelink
Target Milestone: rcKeywords: Regression, ZStream
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-01-20 19:38:13 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmesg output
none
proposed patch
none
fault injector -- always do a gc pass in nlm_lookup_host
none
proposed patch 2
none
patch -- lockd: don't call nlmsvc_lookup_host with the f_sema held none

Description Janne Blomqvist 2008-06-27 08:43:08 UTC
Description of problem:

After upgrading the kernel of our RHEL5 NFS server to the then current
2.6.18-92.1.1.el5PAE version, lockd died after about one day. Clients reported
no other errors than

lockd: server 111.222.333.444 not responding, timed out

and anything requiring locks over NFS (a lot, since we have our homes on nfs)
either stopped working completely, or were very sluggish.

On the server side, restarting the "nfslock" service had no effect. There was
nothing in suspicious in syslog. However, in the process list there where
several (3) kernel processes named "[lockd]" instead of the usual one.

For the moment we have solved the problem by reverting to 2.6.18-53.1.14.el5PAE,
which has worked fine for months.

Version-Release number of selected component (if applicable):

kernel-PAE 2.6.18-92.1.1.el5

How reproducible:

Happened only once, after which we reverted to 2.6.18-53.1.14.el5PAE.

Additional info:

This seems similar to the following bug reported against the ubuntu 2.6.22 kernel:

https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/181996

Comment 1 Colin.Simpson 2008-06-27 12:26:07 UTC
We have this bug too. It's not just the PAE kernel I have this bug on x86_64
kernel 2.6.18-92.1.6.el5. So it appears to be "All" Hardware.

The symptoms are the same as we are seeing with the client machine constantly
generating:

Jun 26 12:48:21 brek kernel: lockd: server 10.100.16.20 not responding, timed out
Jun 26 13:06:21 brek kernel: lockd: server 10.100.16.20 not responding, timed out
Jun 26 13:08:04 brek kernel: lockd: server 10.100.16.20 not responding, timed out

Also on first boot when lockd works you can do the following:
% rpcinfo -u srvu17ux01 100021
program 100021 version 1 ready and waiting
rpcinfo: RPC: Program/version mismatch; low version = 1, high version = 4
program 100021 version 2 is not available
program 100021 version 3 ready and waiting
program 100021 version 4 ready and waiting

But after a while (a day), it stops responding to this:
rpcinfo -u srvu17ux01 100021
^C^C^C
...hangs.

Or on TCP,
rpcinfo -t srv17ux01 100021
rpcinfo: RPC: Timed out
program 100021 version 0 is not available

We have the same kernel on client machine that have a lighter NFS load, they
don't seem to see it. It's only on a file server (so far), so it maybe a load thing.

Very annoying and little you can do about it now that lockd is kernel based and
has so many dependencies that it can't be easily reloaded without rebooting the
file server.

This should be urgent as it's a regression.

Comment 2 Jeff Layton 2008-06-30 17:13:08 UTC
We've not seen these sorts of hangs in testing here, so it may be dependent on a
particular usage pattern...

> On the server side, restarting the "nfslock" service had no effect. There was
> nothing in suspicious in syslog. However, in the process list there where
> several (3) kernel processes named "[lockd]" instead of the usual one.

Ugh...sounds like lockd was just plain stuck and wouldn't come down.
lockd_down/up will just give up after a while and start a new one.

One thing that would be helpful is if you could gather some sysrq-t info when
this occurs.

# echo t > /proc/sysrq-trigger

...and then gather the output of dmesg:

# dmesg -s 131072 > /tmp/sysrq-t.out

...then attach sysrq-t.out here. That should hopefully give me a stack trace for
lockd and give us some idea of where it's stuck.

An even better idea might be to get a coredump, but sysrq-t info is easier to
gather and pass around, so maybe we should start there...


Comment 3 Colin.Simpson 2008-07-01 00:56:33 UTC
Will try to obtain this. The system we are seeing this on is one that uses
locking  fairly heavily, as it hosts a number of homedirs all with
firefox/thunderbird. I presently have it booted in a downgrade kernel and the
problem hasn't reoccurred. Will reboot the latest kernel to get the debug.

Comment 4 Bogdan Costescu 2008-07-04 09:39:51 UTC
Same behaviour observed here, using the 2.6.18-92.1.6.el5 kernel from CentOS on
a x86 (32bit) server; same hardware worked fine for months with the -53.1.x kernels.
The setus is similar: one server with many clients mounting /home.
The -92.1.6 kernel was installed and computer rebooted 2 days ago, so it seems
that it doesn't take too long to reproduce it.

Found this bug report only after the fact, so no debugging data to add.

Comment 5 Bogdan Costescu 2008-07-06 18:07:22 UTC
Created attachment 311104 [details]
dmesg output

Comment 6 Bogdan Costescu 2008-07-06 18:16:31 UTC
I haev attached the dmesg output after the problem has occurred again. I can see
in the output that there is a nfsd4 kernel thread running and that there are
nlm4svc_* calls in the lockd trace, but there is no NFSv4 setup on this server.
All the clients mount the FS via NFSv3 over TCP. The /etc/sysconfig/nfs file
doesn't contain any active definitions, all lines are commented out; there are
no NFS related modules mentioned in /etc/modprobe.conf.

I have to reboot the server but I'm quite confident that I will be able to
reproduce this easily ;-) Please let me know if I can provide more data.

Comment 7 Jeff Layton 2008-07-07 11:59:00 UTC
Thanks. This one has the lockd trace:

lockd         D 000035B2  2488  2908      1          2911  2860 (L-TLB)
       f7ab8eac 00000046 397cd1d1 000035b2 00000000 00000000 00000000 00000007 
       f7418aa0 c20f0550 397d35bc 000035b2 000063eb 00000001 f7418bac c2013cc4 
       00000000 f20c8a70 00000000 00000000 00000000 ffffffff 00000000 00000000 
Call Trace:
 [<c06096e0>] __down+0xa9/0xbb
 [<c042027b>] default_wake_function+0x0/0xc
 [<c06076cf>] __down_failed+0x7/0xc
 [<f8af7ed0>] .text.lock.svclock+0xb/0x9f [lockd]
 [<f8af8c98>] nlm_traverse_files+0x69/0x10f [lockd]
 [<f8af62c2>] nlm_gc_hosts+0x38/0x12a [lockd]
 [<f8af652c>] nlm_lookup_host+0x5e/0x240 [lockd]
 [<f8af9083>] nlm_lookup_file+0x1e4/0x1f1 [lockd]
 [<f8af6727>] nlmsvc_lookup_host+0x19/0x1b [lockd]
 [<f8af7cb6>] nlmsvc_lock+0xa2/0x2b1 [lockd]
 [<f8afaad7>] nlm4svc_retrieve_args+0x63/0xb6 [lockd]
 [<f8afad9a>] nlm4svc_proc_lock+0x80/0xc6 [lockd]
 [<f8b0f4d1>] svc_process+0x350/0x633 [sunrpc]
 [<f8af6e18>] lockd+0x14a/0x222 [lockd]
 [<f8af6cce>] lockd+0x0/0x222 [lockd]
 [<c0405c3b>] kernel_thread_helper+0x7/0x10

...it looks like it's stuck trying to down the nlm_file_mutex, and I suspect
that it's already done that here and so it ended up deadlocking. I have a hunch
that this is related to bug 280311, but I'll need to see if that makes sense here.

It sounds like it's probably a regression since 5.1, but nothing is jumping out
at me so far. This patch (for bug 196318) might be a possible culprit:

    [fs] nfs: byte-range locking support for cfs

...it may also be that this race preexisted 5.2 and something changed the timing
to make it more likely. In any case, I'll give the nlm_file_mutex some scrutiny
and see if I can figure out what happened.


Comment 8 Jeff Layton 2008-07-07 13:20:34 UTC
Actually, the stack trace isn't as clear as I had thought...

 [<f8af7ed0>] .text.lock.svclock+0xb/0x9f [lockd]

...the return address here seems to be going back to nlmsvc_traverse_blocks,
which doesn't appear in the stack. Then again, this is a centos kernel, so it's
possible there are compiler/build differences that make the addresses line up
differently than in our kernels.

This does make a bit more sense though -- this is looks like a call to down()
and not to mutex_lock(). If that is correct then we're stuck trying to do this here:

        down(&file->f_sema);

...in either nlmsvc_act_mark or nlmsvc_act_unlock. This also makes sense --
earlier in the stack there is this:

 [<f8af7cb6>] nlmsvc_lock+0xa2/0x2b1 [lockd]

...and it runs pretty much entirely under the f_sema.


Comment 9 Jeff Layton 2008-07-07 21:55:18 UTC
Created attachment 311206 [details]
proposed patch

Here's an (untested) patch that I think will probably fix this.

I don't think this is a regression (at least not as far as I can tell), but
it's possible that something has changed in 5.2 to make this race more likely.
Either way, if anyone suffering from this has a non-critical place to test this
patch, that would be helpful.

I'll plan to give it a spin in the next few days too to see if it causes any
problems.

Comment 10 RHEL Program Management 2008-07-07 22:04:47 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 11 Jeff Layton 2008-07-08 11:18:32 UTC
I did some cursory testing of this patch this morning and don't see any obvious
breakage from it. If anyone here can independently confirm that it fixes this
problem, then that would be very helpful.

Setting to NEEDINFO for Bogdan since he said he could easily reproduce this...


Comment 12 Jeff Layton 2008-07-08 22:07:54 UTC
I went ahead and added this patch to my test kernels, so if you'd rather, you're
welcome to use them to test this:

http://people.redhat.com/jlayton/

Comment 13 Bogdan Costescu 2008-07-09 10:29:18 UTC
I have downloaded the patch, applied it to -92.1.6 and built a kernel RPM
myself. However I wanted to wait until the problem occurs again and only then to
boot the server with the patched kernel. But obviously now that I'm waiting for
it, it doesn't deadlock anymore ;-) I don't know exactly what caused it in the
first 2 occurrences, but both have happened within 48h after reboot so I hoped
that it will also happen again soon; but now after almost 72h it still didn't
happen. If there is any idea about situations which increase the likeliness,
please let me know...

So no real info yet, the bug should probably remain in NEEDINFO state.

Comment 14 Jeff Layton 2008-07-09 10:50:17 UTC
Thanks for giving it a look. I'll leave it in ASSIGNED for now. I'm fairly
confident that I understand the problem and that this patch should fix it. The
stack trace was pretty clear. Confirmation that it makes the problem go away is
a nice thing to have, but not always practical (particularly with races like this).

As far as what triggers it...

The problem seems to occur when we end up in nlm_gc_hosts() while trying to
establish a lock. nlm_gc_hosts is what cleans up unused nlm_host entries and it
is called whenever we try to look up or create a new nlm_host. It essentially
cleans up entries in these lists for NFS clients or servers that have not had
recent lock activity.

Basically, you would need to time it so that you have some clients that need to
be garbage collected, and then issue a lock request at that time. That should
make this occur, but trying to arrange for that to happen is likely to be tricky.


Comment 16 Jeff Layton 2008-07-11 12:04:30 UTC
Actually, now that I look closer at the code, it may be easier to trigger than I
had originally thought. This might do it:

Have 2 nfs clients on a RHEL5 server (need to check if RHEL4 is vulnerable to this)
have one grab a lock on a file on the NFS mount and hold it
wait for 2 mins (the max NLM_HOST_COLLECT period)
have the second client try to get a lock on the same file

...I think that that point we'll take the f_mutex for the file and then do a
nlm_lookup_host for the client. This will trigger a GC pass, and we'll end up
trying to mark the same file that we're trying to lock. This should deadlock.

We can probably even make this a little easier to trigger if we modify
nlm_lookup_host to do a gc pass on every run through it. I'll play with it some
today and see if I can get a reproducer out of this...


Comment 17 Bogdan Costescu 2008-07-11 13:38:02 UTC
I still haven't seen the problem appearing again, so I kept quiet...

Your idea in comment 16 doesn't seem to work here. I try with the following code:

        fl.l_whence = SEEK_SET;
        fl.l_start = 0;
        fl.l_len = 0;

        fl.l_type = F_WRLCK;
        err = fcntl(fd, F_SETLK, &fl);
        if (err < 0) {
                perror("fcntl");
                return 1;
        }
        sleep(120);
        printf("2 minutes after locking.\n");
        sleep(120);

Note that the lock is not the waiting version. After starting one copy on one
client and waiting for the '2 minutes after locking' message to appear, I try to
run it on a different client, but there I immediately get the error:

fcntl: Resource temporarily unavailable

and the lockd on the NFS server does not get into the D state. Have I
misunderstood the explanation ? Or is my code not actually doing what it's
supposed to do ?

Comment 18 Jeff Layton 2008-07-11 13:45:59 UTC
It's more likely that I'm missing something. I'll need to play with it a bit and
see if I can figure out what's happening when we try that sequence of events.


Comment 19 Jeff Layton 2008-07-11 14:10:38 UTC
Ahh...I see. Here's some debug output from a lock request:

lockd: request from 0a0be7e5
lockd: LOCK          called
lockd: nlm_lookup_host(0a0be7e5, p=6, v=4)                       <<<< first call
lockd: host garbage collection
lockd: nlmsvc_mark_resources
nlm_gc_hosts skipping 10.11.231.224 (cnt 0 use 1 exp 4295030894)
lockd: delete host 10.11.231.229
lockd: nsm_unmonitor(10.11.231.229)
lockd: creating host entry
lockd: nsm_monitor(10.11.231.229)
nsm: xdr_decode_stat_res status 0 state 463
lockd: nlm_file_lookup (01010001 00000000 001acd69 2b6cced4 00000000 00000000
00000000 00000000)
lockd: found file ffff81001ef6b330 (count 0)
lockd: nlmsvc_lock(dm-0/1756521, ty=1, pi=1, 0-9223372036854775807, bl=1)
lockd: nlm_lookup_host(0a0be7e5, p=6, v=4)                      <<<< second call
lockd: get host 10.11.231.229

...the deadlock only occurs if we end up doing a garbage collection pass during
the second call into nlm_lookup_host. So it should only occur if there is a
timer tick (or more than one) in between the two calls that carries it over the
time for the next_gc pass.

In most cases, we'll do a gc pass on the first one and the second lookup won't
incur one. This is probably going to be too hard to time in such a way that
gives us a reliable reproducer. We can likely artificially cause it by making
nlm_lookup_host do a gc pass on every call into it, however (i.e. hack the
kernel to make this more likely).

I'll play with that in little while...


Comment 20 Jeff Layton 2008-07-11 14:27:49 UTC
Created attachment 311582 [details]
fault injector -- always do a gc pass in nlm_lookup_host

With this patch on the server, I didn't even need a second host:

lockd	      D 0000000000000000     0	2401	  1	     2402  2380 (L-TLB)

 ffff81000e7d7c80 0000000000000046 ffffffff800243d4 ffff81000fa44060
 ffff81001f8ed678 000000000000000a ffff81000fe04b80 ffff8100175d2e40
 0000002ed4fc5349 00000000000093aa ffff81000fe04d68 0000000000000001
Call Trace:
 [<ffffffff800243d4>] file_move+0x1d/0x49
 [<ffffffff80068201>] __down+0xc3/0xd8
 [<ffffffff8008f41e>] default_wake_function+0x0/0xe
 [<ffffffff80067e4e>] __down_failed+0x35/0x3a
 [<ffffffff883f0610>] :lockd:.text.lock.svclock+0x5/0x71
 [<ffffffff883f158c>] :lockd:nlm_traverse_files+0x75/0x134
 [<ffffffff883ee540>] :lockd:nlm_gc_hosts+0x4e/0x166
 [<ffffffff883ee829>] :lockd:nlm_lookup_host+0x61/0x2b1
 [<ffffffff883f03c5>] :lockd:nlmsvc_lock+0xd7/0x31d
 [<ffffffff883f3a0e>] :lockd:nlm4svc_proc_lock+0x8f/0xda
 [<ffffffff88313711>] :sunrpc:svc_process+0x3da/0x71b
 [<ffffffff883ef127>] :lockd:lockd+0x0/0x271
 [<ffffffff883ef2ae>] :lockd:lockd+0x187/0x271
 [<ffffffff80061079>] child_rip+0xa/0x11
 [<ffffffff800606a8>] restore_args+0x0/0x30
 [<ffffffff800d05c7>] zone_statistics+0x3e/0x6d
 [<ffffffff883ef127>] :lockd:lockd+0x0/0x271
 [<ffffffff8006106f>] child_rip+0x0/0x11

...all this patch does is make it so that we always do a gc pass when calling
into this function. Since the timing of those is based on jiffies, then this
could happen at any time. Now, I'll verify whether the proposed patch fixes it.

Comment 21 Jeff Layton 2008-07-11 14:36:07 UTC
Proposed patch seems to fix this. That said, I wonder whether we should change
this code around so that it doesn't require so many calls into nlm_lookup_host.
It seems like it might be more efficient to just pass a host pointer to
nlmsvc_lock since we've done a lookup in nlm4svc_proc_lock already.


Comment 22 Jeff Layton 2008-07-11 14:52:20 UTC
There is still a bit of investigation to do with this patch:

TODO:

1) check RHEL4 for vulnerability
2) I don't completely understand why we need to take an extra nlm_host reference
in nlmsvc_lock but not in nlmsvc_testlock. Determine why that is
3) possibly send patch upstream to eliminate the extra nlm_lookup_host calls
here. We ought to be able to just pass a host pointer from the callers of both
nlmsvc_lock/testlock since they do a lookup anyway


Comment 23 Jeff Layton 2008-07-11 15:05:38 UTC
RHEL4 does not seem to be vulnerable. Also, I take back what I said before. This
is a regression caused by the patch for 196318. Before that patch, we did not
call nlmsvc_create_block with the f_sema held.

Also, I think I understand the difference in reference count handling. In the
case of setting a lock we're going to have a persistent host staying on the list
so we can't allow it to be gc'ed until after the unlock.

When testing a lock, once the call exits the host is eligible for garbage
collection, so we don't need to take an extra reference for the nlm_host.

The only thing left is to consider an upstream patch for efficiency...


Comment 24 RHEL Program Management 2008-07-11 15:23:50 UTC
This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being marked as a blocker for this release.  

Please resolve ASAP.

Comment 25 Jeff Layton 2008-07-11 16:10:26 UTC
Created attachment 311599 [details]
proposed patch 2

Here's a second proposed patch that eliminates these duplicate calls into
nlmsvc_lookup_host. It's untested (even for compilation) and the delta from the
original patch will need to go upstream before we can take it for RHEL, but it
should work.

I'll plan to test it out and push the upstream patch if it looks good.

Comment 26 Jeff Layton 2008-07-13 22:57:09 UTC
Patch to eliminate the duplicate calls to nlmsvc_lookup_host pushed upstream.
Awaiting comment.


Comment 27 Bogdan Costescu 2008-07-14 12:09:57 UTC
Last weekend the unpatched kernel misbehaved again; I have obtained a lockd
trace identical to the one in comment #7. The kernel including the patch from
comment #9 is now running, I will report if anything goes wrong but I have high
hopes that it won't ;-)

Thanks for the effort and I hope to see the patch in a released kernel as soon
as possible.

Comment 28 Jeff Layton 2008-07-15 19:15:15 UTC
Thanks Bogdan,
   Looks like Bruce Fields accepted the upstream patch in his tree, so the final
patch for 5.3 will probably be closer to the one in comment #25. I plan to give
it a bit more testing in the next day or so and will probably post it for review
internally later this week.


Comment 29 Jeff Layton 2008-07-16 13:38:04 UTC
Created attachment 311951 [details]
patch -- lockd: don't call nlmsvc_lookup_host with the f_sema held

Final patch proposed internally

Comment 31 Don Zickus 2008-07-23 18:56:01 UTC
in kernel-2.6.18-99.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 32 Selim Ok 2008-08-08 22:41:31 UTC
(In reply to comment #31)
> in kernel-2.6.18-99.el5
> You can download this test kernel from http://people.redhat.com/dzickus/el5

There is no more 99'er kernel therein. Are 102 or 103'er kernels patched alike?

Comment 33 Jeff Layton 2008-08-09 00:19:44 UTC
Yes, this patch should be in any -99.el5 kernel or later (including 102, 103, etc)

Comment 34 Bogdan Costescu 2008-08-12 09:42:38 UTC
It's rather sad that Red Hat doesn't consider this regression serious enough to make it part of released updates, like the recent -92.1.10. It makes the situation very similar to that during RHEL 5.1 when sysadmins had to choose between running an older (5.0) kernel without NFS problems or the lastest one with all the security updates.

Comment 36 Matthew Kent 2008-08-13 16:25:33 UTC
+1 to include in an updates release :)

After apply the patch to the latest 5.2 kernel it solved our issue and has been running for 6 days without issue.

Comment 38 George B. Magklaras 2008-08-14 12:42:06 UTC
I also report the issue running the NFS server with:

Linux name 2.6.18-92.1.10.el5xen #1 SMP Wed Jul 23 04:11:52 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux

kernel. and seeing lockd daemon goes into the D state:
ps auxwww | grep lockd
root        27  0.0  0.0      0     0 ?        S<   Aug07   0:00 [kblockd/0]
root        28  0.0  0.0      0     0 ?        S<   Aug07   0:00 [kblockd/1]
root        29  0.0  0.0      0     0 ?        S<   Aug07   0:00 [kblockd/2]
root        30  0.0  0.0      0     0 ?        S<   Aug07   0:00 [kblockd/3]
root      5183  0.0  0.0      0     0 ?        D    Aug07   0:00 [lockd] 

after that various RHEL4 (2.6.9-78.ELsmp #1 SMP Wed Jul 9 15:39:47 EDT 2008 i686 i686 i386 GNU/Linux) and 5 clients go bananas with dmesg reporting

lockd: server 192.168.8.2 not responding, still trying

This is really BAD. Sorry for shouting but please expedite the process of a patched kernel. 

Best regards,
GM

Comment 39 Herbert Gasiorowski 2008-08-20 13:20:28 UTC
Me too ...
only every 10 days, but I would be really pleased about a kernel update

Comment 40 George B. Magklaras 2008-08-29 08:13:30 UTC
Just to report that I have been running the:

Linux dias.uio.no 2.6.18-104.el5xen #1 SMP Tue Aug 12 17:52:47 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux

patched kernel for two weeks now, with moderate NFS loads (30k write/requests per second from 20-40 clients) and I have not seen the issue again.

GM

Comment 41 Jonathan Peatfield 2008-09-01 17:14:47 UTC
Does Comment #24 imply that a kernel with the patch will be released for 5.2 or will people need to wait until 5.3 (or 5.4) for this fix to be included?  I'm just not clear what the 'this release' refers to.

I'm not sure if I can find any way to claim that this is a security issue but I suppose it does allow a trivial DOS against the lockd.

Comment 42 Jeff Layton 2008-09-01 18:51:38 UTC
"This release" means 5.3 in this instance. My understanding though is that this bug has been proposed for 5.2.z inclusion in bug 459083, but I'm not clear on when it'll be released.

As far as this being a security issue...

I don't really see it as such. It would be *very* hard to predictably time this in such a way to make it reliably happen, which is why I had to hack lockd to make it occur when I was testing it.

Comment 43 Jonathan Peatfield 2008-09-01 19:39:22 UTC
Both #453094 and #459083 seem to show 'Version: 5.2' and both have ' Keywords: 	Regression, ZStream' so it isn't obvious to me that one is actually about a proposed fix for the next update release.  Anyway I'd not spotted #459083 before so thanks for pointing me at it.

Yes, sorry I mis-read Comment #4 to imply that it was pretty easy to trigger (or at least it happened fairly often).

However it does seem to happen (in normal use with a few dozen clients) about once a week or so and that is without anything trying to trigger it.

 -- Jon

Comment 44 Janne Blomqvist 2008-09-01 19:59:07 UTC
Well, for us it's a security issue in the sense that due to this bug, our NFS server is still stuck on 2.6.18-53.xxx.

Comment 45 Sam Fulcomer 2008-09-03 12:36:05 UTC
This is much bigger problem on systems that support heavy lock traffic. Our case in point is a fileserver containing home directories and mail stores that is mounted by a mailserver that is both an SMTP hub and IMAP server, and which runs procmail and spamassassin for incoming messages (~300 users). On the fileserver, the frequency of lockd hangs is correlated with influx of spam e-mail. What we've been seeing is 3-4 lockd hangs per day during high spam levels and 1 hang every ~2 days during lulls.

We can't downgrade because of protocol neg problems with legacy clients that don't know about nfsv4. I've tried a test kernel as mentioned above, but it crashes on the way up in one of the HP mgmt hw drivers supplied by HP.

My unsatisfactory workaround is to run a script via cron that checks every 5 minutes for a hung lockd and reboots if found. This is unsatisfactory because users notice and rightly perceive the hangs as system instability.

The fileserver->mailserver configuration is certainly not optimal architecture, but we're stuck with it for the moment.

Comment 50 Jan "Yenya" Kasprzak 2008-09-23 12:20:53 UTC
Just a "me too" - my RHEL5/x86_64 server with kernel-2.6.18-92.1.10.el5 had the same problem after the recent kernel upgrade. I was not able to get the trace, because the dmesg buffer was too small. But the symptoms were the same - lockd stuck in the "D" state, clients timing out on flock() operations. Now I have installed the kernel from http://people.redhat.com/dzickus/el5 (kernel-2.6.18-116.el5.x86_64.rpm), and we will see whether it helps.

Comment 51 Jan "Yenya" Kasprzak 2008-09-24 12:44:40 UTC
After a day of using Don Zickus' kernel-2.6.18-116.el5.x86_64.rpm, I have not seen a lockd lockup yet. However, with this kernel, the system feels _very_ sluggish. In iostat, I frequently see the "await" times of several seconds, and our MRTG graphs show the average CPU time for the last day being 100-120 % (we have 4 CPUs, so it means at more than one CPU has been used full time), most of which is accounted to the system (not user) time.

I am not sure where to report this, as this kernel is not an official release.

I have now downloaded a src.rpm of 2.6.18-92.1.10.el5, and I am rebuilding it with just linux-2.6-fs-lockd-nlmsvc_lookup_host-called-with-f_sema-held.patch added. I will try to boot it later today.

Comment 52 Robert Buffington 2008-09-26 21:02:48 UTC
Jeff, this has appeared to have been fixed in http://rhn.redhat.com/errata/RHSA-2008-0885.html (kernel 2.6.18-92.1.13.el5). I was nailed by this one (dovecot index locking and NFS-mounted home directories), too, but fortunately, I only had one occurrence of this until this update came out.

Comment 53 Jeff Layton 2008-09-29 11:18:45 UTC
There have been some problems with recent development kernels (unrelated to this bug), and that's probably what Jan is seeing here. These problems should get worked out before release.

I'd probably recommend that anyone seeing this issue use a zstream kernel -92.1.13 or later and not one of the current development builds.

Comment 54 Bogdan Costescu 2008-09-29 12:17:29 UTC
Just some feedback: I have used 2.6.18-92.1.6 with the initial patch and the testing kernel 2.6.18-99 for 78 and 61 days respectively on busy NFS servers affected earlier by this bug; neither of them have shown the symptoms anymore.

As 2.6.18-92.1.13 came out with a fix, I will change those 2 NFS server to run it as soon as they can be taken down. Thanks to Jeff Layton for cooking up the patch and to Red Hat for listening to us and releasing it.

Comment 59 Akemi Yagi 2008-11-29 21:37:39 UTC
I can see in the changelog that the patch correcting the issue reported here indeed appeared in:

* Thu Sep 04 2008 Jiri Pirko <jpirko> [2.6.18-92.1.13.el5]
- [fs] lockd: nlmsvc_lookup_host called with f_sema held (Jeff Layton ) [459083 453094]

Could someone please close this bug report for completeness?  The status is left as ON_QA at this moment.

Comment 62 errata-xmlrpc 2009-01-20 19:38:13 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0225.html

Comment 63 Jan "Yenya" Kasprzak 2009-01-22 20:40:13 UTC
As for the slowdown I have reported in comment #51, which according to comment #53 was supposed to be fixed before the release: Yesterday I have upgraded my system to 2.6.18-128.el5 (from 2.6.18-92.1.22.el5), and I experience the same slowdown I have reported in comment #51 for the development build 2.6.18-116.el5.

Tonight I will reboot back to the kernel-2.6.18-92.1.22.el5, and if it makes the problem disappear, I will open a new bug.