Bug 191831
Summary: | kernel BUG at include/asm/spinlock.h:133! | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Cheryl L. Southard <cld> | ||||||||||||
Component: | kernel | Assignee: | Peter Staubach <staubach> | ||||||||||||
Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> | ||||||||||||
Severity: | high | Docs Contact: | |||||||||||||
Priority: | medium | ||||||||||||||
Version: | 4.0 | CC: | cwebster, jas, jbaron, jeffery.hanano, jmccann, ken.depetris, paulw, primoz.tolar, racedo, raines, raymond.marx, richard.cunningham, santoshbr, steved | ||||||||||||
Target Milestone: | --- | Keywords: | Regression | ||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | i686 | ||||||||||||||
OS: | Linux | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | RHBA-2007-0304 | Doc Type: | Bug Fix | ||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | Environment: | ||||||||||||||
Last Closed: | 2007-05-08 01:18:21 UTC | Type: | --- | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Attachments: |
|
Description
Cheryl L. Southard
2006-05-15 23:46:19 UTC
Created attachment 129142 [details]
/var/log/messages
The new 2.6.9-34.0.1.EL kernel does not fix the problem. I just tried installing it on another computer that was crashing and it continued to crash with these spinlock errors. Also, this problem occurs on both smp and non-smp computers. We just experienced the same problem on our Dell PowerEdge 2800 running RHEL 4 AS. It happend in the middle of a build on a remote machine using an NFS mounted share served from this machine. Has a fix been issued yet? This problem has never occurred before. This production server has been returned to service and I do not wish to try to induce the problem. I will report if it happens again. Kernel: 2.6.9-34.ELsmp Relevant /var/log/messages entries immediately prior to failure: --------------------------------- May 31 10:42:19 pegasus kernel: eip: f8ecdc00 May 31 10:42:19 pegasus kernel: ------------[ cut here ]------------ May 31 10:42:19 pegasus kernel: kernel BUG at include/asm/spinlock.h:133! May 31 10:42:19 pegasus kernel: invalid operand: 0000 [#1] May 31 10:42:19 pegasus kernel: SMP May 31 10:42:19 pegasus kernel: Modules linked in: parport_pc lp parport autofs4 i2c_dev i2c_core nfsd exportfs lockd nfs_acl sunrpc md5 ipv6 dm_mirror dm_mod button battery ac uhci_hcd ehci_hcd hw_random shpchp e1000 bonding(U) floppy st sg ext3 jbd megaraid_mbox megaraid_mm aic7xxx sd_mod scsi_mod May 31 10:42:19 pegasus kernel: CPU: 2 May 31 10:42:19 pegasus kernel: EIP: 0060:[<c02d11e8>] Not tainted VLI May 31 10:42:19 pegasus kernel: EFLAGS: 00010216 (2.6.9-34.ELsmp) May 31 10:42:19 pegasus kernel: EIP is at _spin_lock+0x1c/0x34 May 31 10:42:19 pegasus kernel: eax: c02e4ca6 ebx: c6453644 ecx: f61d5c70 edx: f8ecdc00 May 31 10:42:19 pegasus kernel: esi: c645363c edi: f716a700 ebp: 46000000 esp: f61d5c74 May 31 10:42:19 pegasus kernel: ds: 007b es: 007b ss: 0068 May 31 10:42:19 pegasus kernel: Process nfsd (pid: 2761, threadinfo=f61d5000 task=f75748b0) May 31 10:42:19 pegasus kernel: Stack: f61c9810 f8ecdc00 f61c9810 00000001 00000000 f8975084 00000000 c6b4904c May 31 10:42:19 pegasus kernel: f88f0aa8 ffffff8c f61ca000 c645363c f61d5ec8 c586b400 c6b490c8 0000007c May 31 10:42:19 pegasus kernel: f5c60718 0000007c 0000007c c027bec6 c60d30cc 00025200 0000fa4b f5c60718 May 31 10:42:19 pegasus kernel: Call Trace: May 31 10:42:19 pegasus kernel: [<f8ecdc00>] nfsd_acceptable+0x48/0xba [nfsd] May 31 10:42:19 pegasus kernel: [<f8975084>] find_exported_dentry+0x84/0x5e8 [exportfs] May 31 10:42:19 pegasus kernel: [<c027bec6>] skb_copy_datagram_iovec+0x53/0x1e5May 31 10:42:19 pegasus kernel: [<c027992b>] release_sock+0xf/0x4f May 31 10:42:19 pegasus kernel: [<c029e1a2>] tcp_recvmsg+0x64a/0x681 May 31 10:42:19 pegasus kernel: [<c0279a58>] sock_common_recvmsg+0x30/0x46 May 31 10:42:19 pegasus kernel: [<c0276720>] sock_recvmsg+0xef/0x10c May 31 10:42:19 pegasus kernel: [<c02765e9>] sock_sendmsg+0xdb/0xf7 May 31 10:42:19 pegasus kernel: [<c011cbf2>] recalc_task_prio+0x128/0x133 May 31 10:42:19 pegasus kernel: [<c011cc85>] activate_task+0x88/0x95 May 31 10:42:19 pegasus kernel: [<c011d1a3>] try_to_wake_up+0x281/0x28c May 31 10:42:19 pegasus kernel: [<c011e75d>] __wake_up_common+0x36/0x51 May 31 10:42:19 pegasus kernel: [<c011e7a1>] __wake_up+0x29/0x3c May 31 10:42:19 pegasus kernel: [<f8ed250b>] svc_expkey_lookup+0x1f0/0x322 [nfsd] May 31 10:42:19 pegasus kernel: [<f897588e>] export_decode_fh+0x61/0x6d [exportfs] May 31 10:42:19 pegasus kernel: [<f8ecdbb8>] nfsd_acceptable+0x0/0xba [nfsd] May 31 10:42:19 pegasus kernel: [<f897582d>] export_decode_fh+0x0/0x6d [exportfs] May 31 10:42:19 pegasus kernel: [<f8ece067>] fh_verify+0x3f5/0x5f6 [nfsd] May 31 10:42:19 pegasus kernel: [<f8ecdbb8>] nfsd_acceptable+0x0/0xba [nfsd] May 31 10:42:19 pegasus kernel: [<f8eceb3c>] nfsd_lookup+0x45/0x3ad [nfsd] May 31 10:42:19 pegasus kernel: [<f8eb1383>] svcauth_unix_set_client+0xa7/0xb5 [sunrpc] May 31 10:42:19 pegasus kernel: [<f8eccfb0>] nfsd_proc_lookup+0x5f/0x71 [nfsd] May 31 10:42:19 pegasus kernel: [<f8ed4c1d>] nfssvc_decode_diropargs+0x0/0xa7 [nfsd] May 31 10:42:19 pegasus kernel: [<f8ecc681>] nfsd_dispatch+0xba/0x16d [nfsd] May 31 10:42:19 pegasus kernel: [<f8eae55b>] svc_process+0x432/0x6d7 [sunrpc] May 31 10:42:19 pegasus kernel: [<f8ecc45a>] nfsd+0x1cc/0x339 [nfsd] May 31 10:42:19 pegasus kernel: [<f8ecc28e>] nfsd+0x0/0x339 [nfsd] May 31 10:42:19 pegasus kernel: [<c01041f5>] kernel_thread_helper+0x5/0xb May 31 10:42:19 pegasus kernel: Code: 00 75 09 f0 81 02 00 00 00 01 30 c9 89 c8 c3 53 89 c3 81 78 04 ad 4e ad de 74 18 ff 74 24 04 68 a6 4c 2e c0 e8 54 14 e5 ff 58 5a <0f> 0b 85 00 60 3d 2e c0 f0 fe 0b 79 09 f3 90 80 3b 00 7e f9 eb May 31 10:42:19 pegasus kernel: <0>Fatal exception: panic in 5 seconds May 31 10:42:21 pegasus ntpd[2948]: synchronized to 192.168.2.5, stratum 2 (System hung at this point - all functions, including console, are unavailable) May 31 10:56:49 pegasus syslogd 1.4.1: restart. ... --------------------------------- Excerpt from /etc/exports: --------------------------------- /home/cwebster/aegis av8bdev(rw,async) av8bios(rw,async) /home/lachman/aegis av8bdev(rw,async) av8bios(rw,async) ... /archive/trainer av8bdev(rw,async) av8bios(rw,async) --------------------------------- Each developer has a development directory exported from his home. These are automounted in the same place on each of two legacy development systems. One is a Concurrent Computer Corp. PowerHawk (ppc) running PowerMAXOS 4.3 and the other is a Sun SuperSparc running Solaris 6. /archive/trainer is an exported source code repository, also mounted on the two legacy systems. Okay, it happened again. As soon as I got to the same point on a remote build it immediately hung again. At the point the RHEL server panics, the "gmake" on the PowerHawk is executing an "rsh" to the Sparc. Both the PowerHawk and Sparc are executing commands on files located in directories NFS mounted from the RHEL server. I have not changed anything on this server since the last up2date session on Mon 13 Mar 2006 11:01:37 PM EST. I've done over 100 similar builds since then without any indications of a problem. Now, all of a sudden, we're having these NFS-related kernel panics. I have reverted to kernel version 2.6.9-22.0.2.ELsmp and I've been able to get through a successful build without another kernel panic... so far. Kernel panic messages are very similar for both failures. Relevant excerpts from /var/log/messages: ---------------------------------------------------------- May 31 11:47:36 pegasus kernel: eip: f8ecdc00 May 31 11:47:36 pegasus kernel: ------------[ cut here ]------------ May 31 11:47:36 pegasus kernel: kernel BUG at include/asm/spinlock.h:133! May 31 11:47:36 pegasus kernel: invalid operand: 0000 [#1] May 31 11:47:36 pegasus kernel: SMP May 31 11:47:36 pegasus kernel: Modules linked in: parport_pc lp parport autofs4 i2c_dev i2c_core nfsd exportfs lockd nfs_acl sunrpc md5 ipv6 dm_mirror dm_mod button battery ac uhci_hcd ehci_hcd hw_random shpchp e1000 bonding(U) floppy st sg ext3 jbd megaraid_mbox megaraid_mm aic7xxx sd_mod scsi_mod May 31 11:47:36 pegasus kernel: CPU: 2 May 31 11:47:36 pegasus kernel: EIP: 0060:[<c02d11e8>] Not tainted VLI May 31 11:47:36 pegasus kernel: EFLAGS: 00010216 (2.6.9-34.ELsmp) May 31 11:47:36 pegasus kernel: EIP is at _spin_lock+0x1c/0x34 May 31 11:47:36 pegasus kernel: eax: c02e4ca6 ebx: f4a6dd64 ecx: f69e2ca4 edx: f8ecdc00 May 31 11:47:36 pegasus kernel: esi: f4a6dd5c edi: f712b8c0 ebp: 46000000 esp: f69e2ca8 May 31 11:47:36 pegasus kernel: ds: 007b es: 007b ss: 0068 May 31 11:47:36 pegasus kernel: Process nfsd (pid: 2773, threadinfo=f69e2000 task=f69e06b0) May 31 11:47:36 pegasus kernel: Stack: f4a6dd5c f8ecdc00 f5b74010 00000001 00000000 f8975084 c027bec6 f65f48cc May 31 11:47:36 pegasus kernel: f88f0aa8 ffffff8c f4e18b98 f482676c f69e2efc c5861e00 f4e18980 c027992b May 31 11:47:36 pegasus kernel: 0000006c 00000246 0000006c c029e1a2 00000004 f5b4a880 00000001 00000000 May 31 11:47:36 pegasus kernel: Call Trace: May 31 11:47:36 pegasus kernel: [<f8ecdc00>] nfsd_acceptable+0x48/0xba [nfsd] May 31 11:47:36 pegasus kernel: [<f8975084>] find_exported_dentry+0x84/0x5e8 [exportfs] May 31 11:47:36 pegasus kernel: [<c027bec6>] skb_copy_datagram_iovec+0x53/0x1e5May 31 11:47:36 pegasus kernel: [<c027992b>] release_sock+0xf/0x4f May 31 11:47:36 pegasus kernel: [<c029e1a2>] tcp_recvmsg+0x64a/0x681 May 31 11:47:36 pegasus kernel: [<c0279a58>] sock_common_recvmsg+0x30/0x46 May 31 11:47:36 pegasus kernel: [<c0276720>] sock_recvmsg+0xef/0x10c May 31 11:47:36 pegasus kernel: [<c02765e9>] sock_sendmsg+0xdb/0xf7 May 31 11:47:36 pegasus kernel: [<c011cbf2>] recalc_task_prio+0x128/0x133 May 31 11:47:36 pegasus kernel: [<c011cc85>] activate_task+0x88/0x95 May 31 11:47:36 pegasus kernel: [<c011d1a3>] try_to_wake_up+0x281/0x28c May 31 11:47:36 pegasus kernel: [<c011e75d>] __wake_up_common+0x36/0x51 May 31 11:47:36 pegasus kernel: [<c011e7a1>] __wake_up+0x29/0x3c May 31 11:47:36 pegasus kernel: [<f8eae9d3>] svc_sock_enqueue+0x1d3/0x20f [sunrpc] May 31 11:47:36 pegasus kernel: [<f8eaf963>] svc_tcp_recvfrom+0x304/0x376 [sunrpc] May 31 11:47:36 pegasus kernel: [<f8ed250b>] svc_expkey_lookup+0x1f0/0x322 [nfsd] May 31 11:47:36 pegasus kernel: [<f897588e>] export_decode_fh+0x61/0x6d [exportfs] May 31 11:47:36 pegasus kernel: [<f8ecdbb8>] nfsd_acceptable+0x0/0xba [nfsd] May 31 11:47:36 pegasus kernel: [<f897582d>] export_decode_fh+0x0/0x6d [exportfs] May 31 11:47:36 pegasus kernel: [<f8ece067>] fh_verify+0x3f5/0x5f6 [nfsd] May 31 11:47:36 pegasus kernel: [<f8ecdbb8>] nfsd_acceptable+0x0/0xba [nfsd] May 31 11:47:36 pegasus kernel: [<f8ed6c5b>] nfsacld_proc_getattr+0x6a/0x6f [nfsd] May 31 11:47:36 pegasus kernel: [<f8ed6df2>] nfsaclsvc_decode_fhandleargs+0x0/0x21 [nfsd] May 31 11:47:36 pegasus kernel: [<f8ecc681>] nfsd_dispatch+0xba/0x16d [nfsd] May 31 11:47:36 pegasus kernel: [<f8eae55b>] svc_process+0x432/0x6d7 [sunrpc] May 31 11:47:36 pegasus kernel: [<f8ecc45a>] nfsd+0x1cc/0x339 [nfsd] May 31 11:47:36 pegasus kernel: [<f8ecc28e>] nfsd+0x0/0x339 [nfsd] May 31 11:47:36 pegasus kernel: [<c01041f5>] kernel_thread_helper+0x5/0xb May 31 11:47:36 pegasus kernel: Code: 00 75 09 f0 81 02 00 00 00 01 30 c9 89 c8 c3 53 89 c3 81 78 04 ad 4e ad de 74 18 ff 74 24 04 68 a6 4c 2e c0 e8 54 14 e5 ff 58 5a <0f> 0b 85 00 60 3d 2e c0 f0 fe 0b 79 09 f3 90 80 3b 00 7e f9 eb May 31 11:47:36 pegasus kernel: <0>Fatal exception: panic in 5 seconds May 31 11:58:54 pegasus syslogd 1.4.1: restart. ---------------------------------------------------------- Created attachment 130903 [details]
/var/log/messages
Hi Folks, I'm also encountering this problem. The unit is a Dell PowerEdge 2850. The relevant log entries are posted above. I'd be very interesed if there is a solution to this issue. Also, has going back to the old kernel allowed for a workaround until a solution is found? Thanks, - Jeff I originally reverted to kernel version 2.6.9-22.0.2.ELsmp where it doesn't panic. However, Red Hat Support gave me a workaround that seems to be working. Add the "no_subtree_check" option to /etc/exports entries. I've added this to all my /etc/exports entries and installed a test kernel that allows the netdump client to run with my bonded Ethernet interface. The current kernels will not support bonded interfaces with netdump. I wanted to get a good crash dump if it did panic again so I've since upgraded to a test kernel [2.6.9-37.ELsmp] available from [http://people.redhat.com/~jbaron/rhel4/RPMS.kernel/]. The test kernel does not fix the panic bug, but it does allow me to run a netdump client with my bonded Ethernet interface. Red Hat Support informs me that the fix for this issue will likely be in RHEL 4 update 4 release. Last word I got was that it is still in testing. After running for weeks on 2.6.9-34.ELsmp, I have suddenly had several systems start to get the spinlock.h panic. In one, it would panic within seconds of a reboot until we rebooted it to the old 2.6.9-22.0.2.ELsmp kernel. I am not sure I buy the "no_subtree_check" bug as this system was exporting only whole filesystems from their root, no subdirectories of filesystems. Does this subtree check still happen then? Adding the "no_subtree_check" was a workaround that Red Hat Support suggested for my circumstances. I'm pretty sure that the "bug" is not in NFS itself, but in the kernel source code related to it. As you can see from the description above, we _do_ export subdirectories of /home and /archive. So, this is a reasonable workaround for us. I can't say for sure if it is circumventing the problem completely. All I know is that I haven't had a kernel panic since. If I do, however, I'll have soem good crash dump info to provide Red Hat since I'm running netdump. Here is the full text of the "workaround" message from Red Hat Support. Again, this was based upon the symptoms we are seeing at our site with this platform and configuration. It may not work for you. ----------------------------------------------------- This is not a solution but a temporary work around. You need to mount the NFS shares using NFSv3 protocol in the client side. mount -o nfsvers=3 172.16.36.109:/backup /mnt/rem-backup There is one more suggested work around is to specify " no_subtree_check " option in the exports(NFS server) ----------------------------------------------------- Using the "nfsvers=3" option on the client side was not possible for us because we use "legacy" nfs clients that do not allow this option. This may be a viable workaround for you, though. no_subtree_check added to exports options does NOT work. Our main home directory server, after upgrading to 4.3, would crash within 20 minutes of being booted with this "kernel BUG at include/asm/spinlock.h" error even after putting no_subree_check in all the exports (which btw are all full filesystem mounts, not subdirectories). Other less loaded servers may take days to panic. The only solution is to boot into the old 2.6.9-22.0.2.ELsmp kernel. Even the kernel 2.6.9-34 that was last update for 4.2 crashes so it is definitely some change in the 2.6.9-22 to 2.6.9-34 move. We cannot force our 300+ clients to all use nfsvers=3 I feel your pain Paul. Have you submitted a support request? That's the preferred method of resolving critical issues. If you paid for the support, why not use it? When lots of customers register an issue it gives them an idea of just how critical it is. I was recently notified that there's a test kernel available at: http://people.redhat.com/~jbaron/rhel4/RPMS.kernel/ It contains a patch for the kernel bug as well as the "netdump-bonded-interface" patch so netdump will work on bonded Ethernet interfaces. I went ahead and installed it and haven't seen any problems to date. My problem wasn't as persistent as yours, though. I'm told that RHEL4 U4 beta is available too. If you don't want to try a test kernel or beta release, you're better off staying with the old kernel and waiting until the official release of RHEL4 U4 in the next few months. We believe that this is a duplicate of bz #178848, which is resolved in the U4 beta available from: http://people.redhat.com/~jbaron/rhel4/ I'd appreciate if somebody could verify the fix with the U4 beta kernel. I've installed the U4 beta kernel and will be booting into it at the end of the day today. I'll post the crash dump if it panics. I'm running netdump to another RHEL 4 server. Three days of development using the new kernel, including a series of intense test builds, have elapsed without incident. The test builds were designed to simulate the same conditions under which the previous kernels panicked. I could not access bug #178848 to compare symptoms. Created attachment 133414 [details]
/var/log/messages
Hi, We tried adding "no_subtree_check" to /etc/exports but that didn't fix the problem. We also tried upgrading to the beta kernel, 2.6.9-42.ELsmp, and one of our computers crashed with the spinlock bug within a day. The above attachment from our /var/log/messages file shows the crash. We now have about 5 computers at 2.6.9.42.ELsmp that crash with this spinlock bug. I have a server that till this weekend was running kernel-smp-2.6.9-34. I did a long overdue update on it and it got updated to kernel-smp-2.6.9-34.0.2 as well as a glic-2.3.4-2.19 (a total of 172 rpms updates). Within minutes and sometimes seconds of this box booting and running NFS, it would crash with this spinlock.h panic. I tried going back to the 2.6.9-34 kernel which it was happily using the day before but it still panicked which makes me belive now it has something more to do with a glibc update. Anyway I tried the beta 2.6.9-42.ELsmp kernel and it still panicked. I now have installed 2.6.9-22.0.2.ELsmp from 4.2 and it is stable now. I upgraded several others servers exactly the same and they still have the new 2.6.9-34.0.2 running and have not shown the problem (yet). The server that does have the problem is different in that it is the most busy and also runs samba as a PDC, a Flexnet license server and is a ntpd master. This also means it is a critical server and I cannot afford to do beta testing on it. Anyway, I think it might be glibc instead of the kernel that is the source of the problem. Hi, We have a Red Hat AS 4.0 U3 with the latest kernel and it crash when we are trying to export the nfs file system there. The latest patches didn't solved the problem so I have done some workaround that took me some time and it seems to be working. I have forced the server to work on nfsver 3 and let the mountd work with version 2 and 3. I have also set the firewall to block the tcp connection to port 2049 and it seems that the machine is working fine but on udp. Regards, Shalom Would it be possible for someone, who can reproduce the problem, to try using an update 4 kernel, please? I have used the update 4 kernel 2.6.9-42.ELsmp and it didn't solved the problem by mistake I have wrote that we have the U3. The first thing that I did was installing the latest patches. Thanx for trying. Just to be sure -- which combinations of NFS protocol version and transport choice work and which ones do not? Out of NFSv2/UDP, NFSv2/TCP, NFSv3/UDP, and NFSv3/TCP, what works and what does not? I have tried NFSv2/TCP, NFSv3/TCP, NFSv2/UDP and NFSv3/UDP. works - all the options of UDP. doesn't work - all the options of TCP. I have blocked it with the iptables. Thanx! That will help me to look at the correct areas. Have duplicated this problem on RHEL4 U3 when client is Solaris Sparc which has NFS mounted with -o vers=2 then performs a file copy operation. Yes, that problem should be resolved in U4. Solaris attempts to mount NFSv2 with ACL's enabled and that was triggering the bug. U4 disables ACL checking in NFSv2 and should work around this. Apparently, however, there are other ways to trigger this problem that don't involve ACL's on NFSv2. John, how reproducible is your situation? Created attachment 136084 [details]
/var/log/message dump
kernel: kernel BUG at include/asm/spinlock.h:133! I am also experiencing this problem, we have two identical HP servers. Both are configured and patched exactly the same, running the same software. The kernels were upgraded (along with all other RHN patches) about 1 month ago, the 2.6.9- 42.ELsmp kernel has been running without incident since the patches. Today the server crashed, and would not recover on reboot. Hung at different points of booting, regressing to 2.6.9-34.ELsmp has caused the problem to (for now?) go away. The second (less busy) server has had no problems at all. These servers handle thousands of mount/unmount NFS requests daily, from Sun, AIX, and other Linux servers. Since this is a production unit, I cannot test (or try) methods of fixing. Are there any certain methods to offer greater stability? I have read this thread but am unsure if any of the 'solutions' are actual fixes since this problem seems difficult to reproduce. For the deployments which are seeing these problems, is NFSv2 being forced on the clients? There was some mention of legacy clients, what are these? Hi, We are not forcing the clients to mount with NFSv2. It's not the clients that are crashing though. Only the NFS servers get the spinlock error. And they seem to crash when we run the exportfs command in the nfs startup scripts. If the clients are not being forced to NFSv2, then why are they mounting using NFSv2? The only clients which support the NFS_ACL protocol default to using NFSv3 or now NFSv4. Things have to go really wrong before they will revert back to NFSv2 unless they are told to. So, the question again -- what are these legacy clients which are causing the NFS server to fail? One possible workaround (Peter correct me if I'm wrong), would be to have mountd deny NFSv2 mount requests. To do this, you'd want to add this line to /etc/sysconfig/nfs: MOUNTD_NFS_V2=no ...of course, this assumes you are not using NFSv2 at all, which should be the case on any reasonably modern OS. This will, unfortunately, not have any affect on hosts that already have NFSv2 mounts on this server (since mounts persist across server reboots), so you'd need to check all the clients and make sure that no such mounts exist. To this day we still don't have *any* confirmed cases where this problem occurred and there was no NFSv2 traffic. We'd definitely be interested if anyone can come up with such a case. committed in stream U5 build 42.24. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/ i hope it fixes this :) This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. This bugzilla has Keywords: Regression. Since no regressions are allowed between releases, it is also being proposed as a blocker for this release. Please resolve ASAP. (In reply to comment #47) > One possible workaround (Peter correct me if I'm wrong), would be to have mountd > deny NFSv2 mount requests. To do this, you'd want to add this line to > /etc/sysconfig/nfs: > > MOUNTD_NFS_V2=no This setting only affects mountd. nfsd can still provide v2 unless the following is also set in /etc/sysconfig/nfs: RPCNFSDARGS="-N 2" I was able to reliably crash a 2.6.9-42.EL machine by mounting from a Solaris client using "-o vers=2", even with MOUNTD_NFS_V2=no. Adding RPCNFSDARGS as above resolved the problem. QE ack for 4.5. (In reply to comment #59) > committed in stream U5 build 42.24. A test kernel with this patch is available > from http://people.redhat.com/~jbaron/rhel4/ > > i hope it fixes this :) Hi Jason, Thanks for the beta kernel. We tried a bunch of them from that directory 2.6.9.42.24 2.6.9.42.32 2.6.9.42.36 They all seem to fix the spinlock problem! Thanks for the fix! We've tried it on about 4 computers so far that were spinlocking, and so far the problem has not reoccurred. However, we have noticed on all 3 of these beta kernels that they cause a somewhat different NFS bug to appear. Where do I report problems with these beta kernels? Thanks again for the fix! What sort of bug? (In reply to comment #74) > What sort of bug? > Well, I haven't had time to go through all of the source code and figure out exactly what's going on, but the problem is with systems running the 2.6.9-42.24, 2.6.9-42.32 and 2.6.9-42.36 kernels and with the mail client called "pine". When we are running pine and are in the "MESSAGE INDEX" routine looking at the index of a mail file that is NFS mounted, we are never notified of new e-mails. We have to exit from the current mail file then re-open it, or quit pine and restart it to see any incoming e-mail. *** Bug 220771 has been marked as a duplicate of this bug. *** Okie doke. I'm glad that that was easily resolved. :-) *** Bug 228273 has been marked as a duplicate of this bug. *** *** Bug 230094 has been marked as a duplicate of this bug. *** An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0304.html Closing per todays call - issue resolved with the given fix. Internal Status set to 'Resolved' Status set to: Closed by Tech Resolution set to: 'NotABug' This event sent from IssueTracker by sfolkwil issue 123455 I just wanted to let you know that I found a workaround for the pine problem which started hapening with this "fixed" kernel. Ever since the spinlock problem has been fixed with these beta kernels, and even now with the newly released RHEL_4 V5 kernel (2.6.9-55.EL), we've been having problems with pine seeing new e-mails. I mentioned it in one of my above comments. The workaround is to NFS mount our /var/mail spool directory "udp" instead of "tcp". With "tcp" we are never notified of new incoming e-mails. With "udp" everything works fine and as expected. This is probably mentioned in another bugzilla, but I wanted to follow up in this one because I brought it up back in January. Thanks again for the fix. |