Bug 453094
Summary: | deadlock when lockd tries to take f_sema that it already has | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Janne Blomqvist <blomqvist.janne> | ||||||||||||
Component: | kernel | Assignee: | Jeff Layton <jlayton> | ||||||||||||
Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> | ||||||||||||
Severity: | high | Docs Contact: | |||||||||||||
Priority: | urgent | ||||||||||||||
Version: | 5.2 | CC: | aca21, amyagi, bjorn.sund, bogdan.costescu, Colin.Simpson, craigwhite, dhoward, gasi, georgios, gilboad, goeran, jan.iven, jpirko, jplans, j.s.peatfield, kas, k.georgiou, klaus.steinberger, matt, mishu, mmcgrath, pasteur, pere, rdieter, redhat, riek, rwheeler, selimok, sgf, staubach, steved, tao, t.h.amundsen, voetelink | ||||||||||||
Target Milestone: | rc | Keywords: | Regression, ZStream | ||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | i386 | ||||||||||||||
OS: | Linux | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | Environment: | ||||||||||||||
Last Closed: | 2009-01-20 19:38:13 UTC | Type: | --- | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Attachments: |
|
Description
Janne Blomqvist
2008-06-27 08:43:08 UTC
We have this bug too. It's not just the PAE kernel I have this bug on x86_64 kernel 2.6.18-92.1.6.el5. So it appears to be "All" Hardware. The symptoms are the same as we are seeing with the client machine constantly generating: Jun 26 12:48:21 brek kernel: lockd: server 10.100.16.20 not responding, timed out Jun 26 13:06:21 brek kernel: lockd: server 10.100.16.20 not responding, timed out Jun 26 13:08:04 brek kernel: lockd: server 10.100.16.20 not responding, timed out Also on first boot when lockd works you can do the following: % rpcinfo -u srvu17ux01 100021 program 100021 version 1 ready and waiting rpcinfo: RPC: Program/version mismatch; low version = 1, high version = 4 program 100021 version 2 is not available program 100021 version 3 ready and waiting program 100021 version 4 ready and waiting But after a while (a day), it stops responding to this: rpcinfo -u srvu17ux01 100021 ^C^C^C ...hangs. Or on TCP, rpcinfo -t srv17ux01 100021 rpcinfo: RPC: Timed out program 100021 version 0 is not available We have the same kernel on client machine that have a lighter NFS load, they don't seem to see it. It's only on a file server (so far), so it maybe a load thing. Very annoying and little you can do about it now that lockd is kernel based and has so many dependencies that it can't be easily reloaded without rebooting the file server. This should be urgent as it's a regression. We've not seen these sorts of hangs in testing here, so it may be dependent on a
particular usage pattern...
> On the server side, restarting the "nfslock" service had no effect. There was
> nothing in suspicious in syslog. However, in the process list there where
> several (3) kernel processes named "[lockd]" instead of the usual one.
Ugh...sounds like lockd was just plain stuck and wouldn't come down.
lockd_down/up will just give up after a while and start a new one.
One thing that would be helpful is if you could gather some sysrq-t info when
this occurs.
# echo t > /proc/sysrq-trigger
...and then gather the output of dmesg:
# dmesg -s 131072 > /tmp/sysrq-t.out
...then attach sysrq-t.out here. That should hopefully give me a stack trace for
lockd and give us some idea of where it's stuck.
An even better idea might be to get a coredump, but sysrq-t info is easier to
gather and pass around, so maybe we should start there...
Will try to obtain this. The system we are seeing this on is one that uses locking fairly heavily, as it hosts a number of homedirs all with firefox/thunderbird. I presently have it booted in a downgrade kernel and the problem hasn't reoccurred. Will reboot the latest kernel to get the debug. Same behaviour observed here, using the 2.6.18-92.1.6.el5 kernel from CentOS on a x86 (32bit) server; same hardware worked fine for months with the -53.1.x kernels. The setus is similar: one server with many clients mounting /home. The -92.1.6 kernel was installed and computer rebooted 2 days ago, so it seems that it doesn't take too long to reproduce it. Found this bug report only after the fact, so no debugging data to add. Created attachment 311104 [details]
dmesg output
I haev attached the dmesg output after the problem has occurred again. I can see in the output that there is a nfsd4 kernel thread running and that there are nlm4svc_* calls in the lockd trace, but there is no NFSv4 setup on this server. All the clients mount the FS via NFSv3 over TCP. The /etc/sysconfig/nfs file doesn't contain any active definitions, all lines are commented out; there are no NFS related modules mentioned in /etc/modprobe.conf. I have to reboot the server but I'm quite confident that I will be able to reproduce this easily ;-) Please let me know if I can provide more data. Thanks. This one has the lockd trace: lockd D 000035B2 2488 2908 1 2911 2860 (L-TLB) f7ab8eac 00000046 397cd1d1 000035b2 00000000 00000000 00000000 00000007 f7418aa0 c20f0550 397d35bc 000035b2 000063eb 00000001 f7418bac c2013cc4 00000000 f20c8a70 00000000 00000000 00000000 ffffffff 00000000 00000000 Call Trace: [<c06096e0>] __down+0xa9/0xbb [<c042027b>] default_wake_function+0x0/0xc [<c06076cf>] __down_failed+0x7/0xc [<f8af7ed0>] .text.lock.svclock+0xb/0x9f [lockd] [<f8af8c98>] nlm_traverse_files+0x69/0x10f [lockd] [<f8af62c2>] nlm_gc_hosts+0x38/0x12a [lockd] [<f8af652c>] nlm_lookup_host+0x5e/0x240 [lockd] [<f8af9083>] nlm_lookup_file+0x1e4/0x1f1 [lockd] [<f8af6727>] nlmsvc_lookup_host+0x19/0x1b [lockd] [<f8af7cb6>] nlmsvc_lock+0xa2/0x2b1 [lockd] [<f8afaad7>] nlm4svc_retrieve_args+0x63/0xb6 [lockd] [<f8afad9a>] nlm4svc_proc_lock+0x80/0xc6 [lockd] [<f8b0f4d1>] svc_process+0x350/0x633 [sunrpc] [<f8af6e18>] lockd+0x14a/0x222 [lockd] [<f8af6cce>] lockd+0x0/0x222 [lockd] [<c0405c3b>] kernel_thread_helper+0x7/0x10 ...it looks like it's stuck trying to down the nlm_file_mutex, and I suspect that it's already done that here and so it ended up deadlocking. I have a hunch that this is related to bug 280311, but I'll need to see if that makes sense here. It sounds like it's probably a regression since 5.1, but nothing is jumping out at me so far. This patch (for bug 196318) might be a possible culprit: [fs] nfs: byte-range locking support for cfs ...it may also be that this race preexisted 5.2 and something changed the timing to make it more likely. In any case, I'll give the nlm_file_mutex some scrutiny and see if I can figure out what happened. Actually, the stack trace isn't as clear as I had thought... [<f8af7ed0>] .text.lock.svclock+0xb/0x9f [lockd] ...the return address here seems to be going back to nlmsvc_traverse_blocks, which doesn't appear in the stack. Then again, this is a centos kernel, so it's possible there are compiler/build differences that make the addresses line up differently than in our kernels. This does make a bit more sense though -- this is looks like a call to down() and not to mutex_lock(). If that is correct then we're stuck trying to do this here: down(&file->f_sema); ...in either nlmsvc_act_mark or nlmsvc_act_unlock. This also makes sense -- earlier in the stack there is this: [<f8af7cb6>] nlmsvc_lock+0xa2/0x2b1 [lockd] ...and it runs pretty much entirely under the f_sema. Created attachment 311206 [details]
proposed patch
Here's an (untested) patch that I think will probably fix this.
I don't think this is a regression (at least not as far as I can tell), but
it's possible that something has changed in 5.2 to make this race more likely.
Either way, if anyone suffering from this has a non-critical place to test this
patch, that would be helpful.
I'll plan to give it a spin in the next few days too to see if it causes any
problems.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. I did some cursory testing of this patch this morning and don't see any obvious breakage from it. If anyone here can independently confirm that it fixes this problem, then that would be very helpful. Setting to NEEDINFO for Bogdan since he said he could easily reproduce this... I went ahead and added this patch to my test kernels, so if you'd rather, you're welcome to use them to test this: http://people.redhat.com/jlayton/ I have downloaded the patch, applied it to -92.1.6 and built a kernel RPM myself. However I wanted to wait until the problem occurs again and only then to boot the server with the patched kernel. But obviously now that I'm waiting for it, it doesn't deadlock anymore ;-) I don't know exactly what caused it in the first 2 occurrences, but both have happened within 48h after reboot so I hoped that it will also happen again soon; but now after almost 72h it still didn't happen. If there is any idea about situations which increase the likeliness, please let me know... So no real info yet, the bug should probably remain in NEEDINFO state. Thanks for giving it a look. I'll leave it in ASSIGNED for now. I'm fairly confident that I understand the problem and that this patch should fix it. The stack trace was pretty clear. Confirmation that it makes the problem go away is a nice thing to have, but not always practical (particularly with races like this). As far as what triggers it... The problem seems to occur when we end up in nlm_gc_hosts() while trying to establish a lock. nlm_gc_hosts is what cleans up unused nlm_host entries and it is called whenever we try to look up or create a new nlm_host. It essentially cleans up entries in these lists for NFS clients or servers that have not had recent lock activity. Basically, you would need to time it so that you have some clients that need to be garbage collected, and then issue a lock request at that time. That should make this occur, but trying to arrange for that to happen is likely to be tricky. Actually, now that I look closer at the code, it may be easier to trigger than I had originally thought. This might do it: Have 2 nfs clients on a RHEL5 server (need to check if RHEL4 is vulnerable to this) have one grab a lock on a file on the NFS mount and hold it wait for 2 mins (the max NLM_HOST_COLLECT period) have the second client try to get a lock on the same file ...I think that that point we'll take the f_mutex for the file and then do a nlm_lookup_host for the client. This will trigger a GC pass, and we'll end up trying to mark the same file that we're trying to lock. This should deadlock. We can probably even make this a little easier to trigger if we modify nlm_lookup_host to do a gc pass on every run through it. I'll play with it some today and see if I can get a reproducer out of this... I still haven't seen the problem appearing again, so I kept quiet... Your idea in comment 16 doesn't seem to work here. I try with the following code: fl.l_whence = SEEK_SET; fl.l_start = 0; fl.l_len = 0; fl.l_type = F_WRLCK; err = fcntl(fd, F_SETLK, &fl); if (err < 0) { perror("fcntl"); return 1; } sleep(120); printf("2 minutes after locking.\n"); sleep(120); Note that the lock is not the waiting version. After starting one copy on one client and waiting for the '2 minutes after locking' message to appear, I try to run it on a different client, but there I immediately get the error: fcntl: Resource temporarily unavailable and the lockd on the NFS server does not get into the D state. Have I misunderstood the explanation ? Or is my code not actually doing what it's supposed to do ? It's more likely that I'm missing something. I'll need to play with it a bit and see if I can figure out what's happening when we try that sequence of events. Ahh...I see. Here's some debug output from a lock request: lockd: request from 0a0be7e5 lockd: LOCK called lockd: nlm_lookup_host(0a0be7e5, p=6, v=4) <<<< first call lockd: host garbage collection lockd: nlmsvc_mark_resources nlm_gc_hosts skipping 10.11.231.224 (cnt 0 use 1 exp 4295030894) lockd: delete host 10.11.231.229 lockd: nsm_unmonitor(10.11.231.229) lockd: creating host entry lockd: nsm_monitor(10.11.231.229) nsm: xdr_decode_stat_res status 0 state 463 lockd: nlm_file_lookup (01010001 00000000 001acd69 2b6cced4 00000000 00000000 00000000 00000000) lockd: found file ffff81001ef6b330 (count 0) lockd: nlmsvc_lock(dm-0/1756521, ty=1, pi=1, 0-9223372036854775807, bl=1) lockd: nlm_lookup_host(0a0be7e5, p=6, v=4) <<<< second call lockd: get host 10.11.231.229 ...the deadlock only occurs if we end up doing a garbage collection pass during the second call into nlm_lookup_host. So it should only occur if there is a timer tick (or more than one) in between the two calls that carries it over the time for the next_gc pass. In most cases, we'll do a gc pass on the first one and the second lookup won't incur one. This is probably going to be too hard to time in such a way that gives us a reliable reproducer. We can likely artificially cause it by making nlm_lookup_host do a gc pass on every call into it, however (i.e. hack the kernel to make this more likely). I'll play with that in little while... Created attachment 311582 [details]
fault injector -- always do a gc pass in nlm_lookup_host
With this patch on the server, I didn't even need a second host:
lockd D 0000000000000000 0 2401 1 2402 2380 (L-TLB)
ffff81000e7d7c80 0000000000000046 ffffffff800243d4 ffff81000fa44060
ffff81001f8ed678 000000000000000a ffff81000fe04b80 ffff8100175d2e40
0000002ed4fc5349 00000000000093aa ffff81000fe04d68 0000000000000001
Call Trace:
[<ffffffff800243d4>] file_move+0x1d/0x49
[<ffffffff80068201>] __down+0xc3/0xd8
[<ffffffff8008f41e>] default_wake_function+0x0/0xe
[<ffffffff80067e4e>] __down_failed+0x35/0x3a
[<ffffffff883f0610>] :lockd:.text.lock.svclock+0x5/0x71
[<ffffffff883f158c>] :lockd:nlm_traverse_files+0x75/0x134
[<ffffffff883ee540>] :lockd:nlm_gc_hosts+0x4e/0x166
[<ffffffff883ee829>] :lockd:nlm_lookup_host+0x61/0x2b1
[<ffffffff883f03c5>] :lockd:nlmsvc_lock+0xd7/0x31d
[<ffffffff883f3a0e>] :lockd:nlm4svc_proc_lock+0x8f/0xda
[<ffffffff88313711>] :sunrpc:svc_process+0x3da/0x71b
[<ffffffff883ef127>] :lockd:lockd+0x0/0x271
[<ffffffff883ef2ae>] :lockd:lockd+0x187/0x271
[<ffffffff80061079>] child_rip+0xa/0x11
[<ffffffff800606a8>] restore_args+0x0/0x30
[<ffffffff800d05c7>] zone_statistics+0x3e/0x6d
[<ffffffff883ef127>] :lockd:lockd+0x0/0x271
[<ffffffff8006106f>] child_rip+0x0/0x11
...all this patch does is make it so that we always do a gc pass when calling
into this function. Since the timing of those is based on jiffies, then this
could happen at any time. Now, I'll verify whether the proposed patch fixes it.
Proposed patch seems to fix this. That said, I wonder whether we should change this code around so that it doesn't require so many calls into nlm_lookup_host. It seems like it might be more efficient to just pass a host pointer to nlmsvc_lock since we've done a lookup in nlm4svc_proc_lock already. There is still a bit of investigation to do with this patch: TODO: 1) check RHEL4 for vulnerability 2) I don't completely understand why we need to take an extra nlm_host reference in nlmsvc_lock but not in nlmsvc_testlock. Determine why that is 3) possibly send patch upstream to eliminate the extra nlm_lookup_host calls here. We ought to be able to just pass a host pointer from the callers of both nlmsvc_lock/testlock since they do a lookup anyway RHEL4 does not seem to be vulnerable. Also, I take back what I said before. This is a regression caused by the patch for 196318. Before that patch, we did not call nlmsvc_create_block with the f_sema held. Also, I think I understand the difference in reference count handling. In the case of setting a lock we're going to have a persistent host staying on the list so we can't allow it to be gc'ed until after the unlock. When testing a lock, once the call exits the host is eligible for garbage collection, so we don't need to take an extra reference for the nlm_host. The only thing left is to consider an upstream patch for efficiency... This bugzilla has Keywords: Regression. Since no regressions are allowed between releases, it is also being marked as a blocker for this release. Please resolve ASAP. Created attachment 311599 [details]
proposed patch 2
Here's a second proposed patch that eliminates these duplicate calls into
nlmsvc_lookup_host. It's untested (even for compilation) and the delta from the
original patch will need to go upstream before we can take it for RHEL, but it
should work.
I'll plan to test it out and push the upstream patch if it looks good.
Patch to eliminate the duplicate calls to nlmsvc_lookup_host pushed upstream. Awaiting comment. Last weekend the unpatched kernel misbehaved again; I have obtained a lockd trace identical to the one in comment #7. The kernel including the patch from comment #9 is now running, I will report if anything goes wrong but I have high hopes that it won't ;-) Thanks for the effort and I hope to see the patch in a released kernel as soon as possible. Thanks Bogdan, Looks like Bruce Fields accepted the upstream patch in his tree, so the final patch for 5.3 will probably be closer to the one in comment #25. I plan to give it a bit more testing in the next day or so and will probably post it for review internally later this week. Created attachment 311951 [details]
patch -- lockd: don't call nlmsvc_lookup_host with the f_sema held
Final patch proposed internally
in kernel-2.6.18-99.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 (In reply to comment #31) > in kernel-2.6.18-99.el5 > You can download this test kernel from http://people.redhat.com/dzickus/el5 There is no more 99'er kernel therein. Are 102 or 103'er kernels patched alike? Yes, this patch should be in any -99.el5 kernel or later (including 102, 103, etc) It's rather sad that Red Hat doesn't consider this regression serious enough to make it part of released updates, like the recent -92.1.10. It makes the situation very similar to that during RHEL 5.1 when sysadmins had to choose between running an older (5.0) kernel without NFS problems or the lastest one with all the security updates. +1 to include in an updates release :) After apply the patch to the latest 5.2 kernel it solved our issue and has been running for 6 days without issue. I also report the issue running the NFS server with: Linux name 2.6.18-92.1.10.el5xen #1 SMP Wed Jul 23 04:11:52 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux kernel. and seeing lockd daemon goes into the D state: ps auxwww | grep lockd root 27 0.0 0.0 0 0 ? S< Aug07 0:00 [kblockd/0] root 28 0.0 0.0 0 0 ? S< Aug07 0:00 [kblockd/1] root 29 0.0 0.0 0 0 ? S< Aug07 0:00 [kblockd/2] root 30 0.0 0.0 0 0 ? S< Aug07 0:00 [kblockd/3] root 5183 0.0 0.0 0 0 ? D Aug07 0:00 [lockd] after that various RHEL4 (2.6.9-78.ELsmp #1 SMP Wed Jul 9 15:39:47 EDT 2008 i686 i686 i386 GNU/Linux) and 5 clients go bananas with dmesg reporting lockd: server 192.168.8.2 not responding, still trying This is really BAD. Sorry for shouting but please expedite the process of a patched kernel. Best regards, GM Me too ... only every 10 days, but I would be really pleased about a kernel update Just to report that I have been running the: Linux dias.uio.no 2.6.18-104.el5xen #1 SMP Tue Aug 12 17:52:47 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux patched kernel for two weeks now, with moderate NFS loads (30k write/requests per second from 20-40 clients) and I have not seen the issue again. GM Does Comment #24 imply that a kernel with the patch will be released for 5.2 or will people need to wait until 5.3 (or 5.4) for this fix to be included? I'm just not clear what the 'this release' refers to. I'm not sure if I can find any way to claim that this is a security issue but I suppose it does allow a trivial DOS against the lockd. "This release" means 5.3 in this instance. My understanding though is that this bug has been proposed for 5.2.z inclusion in bug 459083, but I'm not clear on when it'll be released. As far as this being a security issue... I don't really see it as such. It would be *very* hard to predictably time this in such a way to make it reliably happen, which is why I had to hack lockd to make it occur when I was testing it. Both #453094 and #459083 seem to show 'Version: 5.2' and both have ' Keywords: Regression, ZStream' so it isn't obvious to me that one is actually about a proposed fix for the next update release. Anyway I'd not spotted #459083 before so thanks for pointing me at it. Yes, sorry I mis-read Comment #4 to imply that it was pretty easy to trigger (or at least it happened fairly often). However it does seem to happen (in normal use with a few dozen clients) about once a week or so and that is without anything trying to trigger it. -- Jon Well, for us it's a security issue in the sense that due to this bug, our NFS server is still stuck on 2.6.18-53.xxx. This is much bigger problem on systems that support heavy lock traffic. Our case in point is a fileserver containing home directories and mail stores that is mounted by a mailserver that is both an SMTP hub and IMAP server, and which runs procmail and spamassassin for incoming messages (~300 users). On the fileserver, the frequency of lockd hangs is correlated with influx of spam e-mail. What we've been seeing is 3-4 lockd hangs per day during high spam levels and 1 hang every ~2 days during lulls. We can't downgrade because of protocol neg problems with legacy clients that don't know about nfsv4. I've tried a test kernel as mentioned above, but it crashes on the way up in one of the HP mgmt hw drivers supplied by HP. My unsatisfactory workaround is to run a script via cron that checks every 5 minutes for a hung lockd and reboots if found. This is unsatisfactory because users notice and rightly perceive the hangs as system instability. The fileserver->mailserver configuration is certainly not optimal architecture, but we're stuck with it for the moment. Just a "me too" - my RHEL5/x86_64 server with kernel-2.6.18-92.1.10.el5 had the same problem after the recent kernel upgrade. I was not able to get the trace, because the dmesg buffer was too small. But the symptoms were the same - lockd stuck in the "D" state, clients timing out on flock() operations. Now I have installed the kernel from http://people.redhat.com/dzickus/el5 (kernel-2.6.18-116.el5.x86_64.rpm), and we will see whether it helps. After a day of using Don Zickus' kernel-2.6.18-116.el5.x86_64.rpm, I have not seen a lockd lockup yet. However, with this kernel, the system feels _very_ sluggish. In iostat, I frequently see the "await" times of several seconds, and our MRTG graphs show the average CPU time for the last day being 100-120 % (we have 4 CPUs, so it means at more than one CPU has been used full time), most of which is accounted to the system (not user) time. I am not sure where to report this, as this kernel is not an official release. I have now downloaded a src.rpm of 2.6.18-92.1.10.el5, and I am rebuilding it with just linux-2.6-fs-lockd-nlmsvc_lookup_host-called-with-f_sema-held.patch added. I will try to boot it later today. Jeff, this has appeared to have been fixed in http://rhn.redhat.com/errata/RHSA-2008-0885.html (kernel 2.6.18-92.1.13.el5). I was nailed by this one (dovecot index locking and NFS-mounted home directories), too, but fortunately, I only had one occurrence of this until this update came out. There have been some problems with recent development kernels (unrelated to this bug), and that's probably what Jan is seeing here. These problems should get worked out before release. I'd probably recommend that anyone seeing this issue use a zstream kernel -92.1.13 or later and not one of the current development builds. Just some feedback: I have used 2.6.18-92.1.6 with the initial patch and the testing kernel 2.6.18-99 for 78 and 61 days respectively on busy NFS servers affected earlier by this bug; neither of them have shown the symptoms anymore. As 2.6.18-92.1.13 came out with a fix, I will change those 2 NFS server to run it as soon as they can be taken down. Thanks to Jeff Layton for cooking up the patch and to Red Hat for listening to us and releasing it. I can see in the changelog that the patch correcting the issue reported here indeed appeared in: * Thu Sep 04 2008 Jiri Pirko <jpirko> [2.6.18-92.1.13.el5] - [fs] lockd: nlmsvc_lookup_host called with f_sema held (Jeff Layton ) [459083 453094] Could someone please close this bug report for completeness? The status is left as ON_QA at this moment. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0225.html As for the slowdown I have reported in comment #51, which according to comment #53 was supposed to be fixed before the release: Yesterday I have upgraded my system to 2.6.18-128.el5 (from 2.6.18-92.1.22.el5), and I experience the same slowdown I have reported in comment #51 for the development build 2.6.18-116.el5. Tonight I will reboot back to the kernel-2.6.18-92.1.22.el5, and if it makes the problem disappear, I will open a new bug. |