Bug 8168
Summary: | NFS Server Crashes for SMP kernel | ||
---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | joseph |
Component: | nfs-utils | Assignee: | Michael K. Johnson <johnsonm> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 7.0 | CC: | joseph, scot, starback, steve, system |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i386 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2002-12-14 01:59:23 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
joseph
2000-01-04 17:52:42 UTC
I am experiencing the same problem with a NON-SMP Kernel 2.2.12-20 Feb 24 11:07:47 gi2 kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000008 Feb 24 11:07:47 gi2 kernel: current->tss.cr3 = 00101000, %cr3 = 00101000 Feb 24 11:07:47 gi2 kernel: *pde = 00000000 Feb 24 11:07:47 gi2 kernel: Oops: 0000 Feb 24 11:07:47 gi2 kernel: CPU: 0 Feb 24 11:07:47 gi2 kernel: EIP: 0010: [sound:sound_install_audiodrv_R990d1ca0+-149799/512] Feb 24 11:07:47 gi2 kernel: EFLAGS: 00010286 Feb 24 11:07:47 gi2 kernel: eax: 00000000 ebx: c208d000 ecx: 00000000 edx: 00000000 Feb 24 11:07:47 gi2 kernel: esi: c208d000 edi: c196401c ebp: c208d000 esp: c1d0ff5c Feb 24 11:07:47 gi2 kernel: ds: 0018 es: 0018 ss: 0018 Feb 24 11:07:47 gi2 kernel: Process nfsd (pid: 32465, process nr: 29, stackpage=c1d0f000) Feb 24 11:07:47 gi2 kernel: Stack: c1964014 c1964014 c48b345e c208d000 c196401c c1957080 c208d0f4 c48bcd80 Feb 24 11:07:47 gi2 kernel: c19570bc c4898388 c208d000 c1964014 c1d0e000 c1d0e000 00000001 c208d000 Feb 24 11:07:47 gi2 kernel: c1957080 c48bd0c0 00000001 00000002 000186a3 00000002 c1964014 c48bcc2c Feb 24 11:07:47 gi2 kernel: Call Trace: [sound:sound_install_audiodrv_R990d1ca0+-176890/512] [sound:sound_install_audiodrv_R990d1ca0+-137688 /512] [sound:sound_install_audiodrv_R990d1ca0+-287696/512] [sound:sound_install_audiodrv_R990d1ca0+-136856/512] [sound:sound_install_audiodr v_R990d1ca0+-138028/512] [sound:sound_install_audiodrv_R990d1ca0+-177451/512] [kernel_thread+35/48] Feb 24 11:07:47 gi2 kernel: Code: 8b 58 08 85 db 75 08 31 c9 e9 45 01 00 00 90 66 8b 43 22 66 This occured during a reboot of the system: Feb 24 12:18:57 gi2 nfs: Starting NFS services: succeeded Feb 24 12:18:58 gi2 nfs: rpc.rquotad startup succeeded Feb 24 12:18:58 gi2 nfs: rpc.mountd startup succeeded Feb 24 12:18:59 gi2 nfs: rpc.nfsd startup succeeded : Feb 24 12:19:08 gi2 kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000008 Feb 24 12:19:08 gi2 kernel: current->tss.cr3 = 00101000, %cr3 = 00101000 Feb 24 12:19:08 gi2 kernel: *pde = 00000000 Feb 24 12:19:08 gi2 kernel: Oops: 0000 Feb 24 12:19:08 gi2 kernel: CPU: 0 Feb 24 12:19:08 gi2 kernel: EIP: 0010: [sunrpc:rpc_system_err_Rc39cf57e+69845/141375247] Feb 24 12:19:08 gi2 kernel: EFLAGS: 00010286 Feb 24 12:19:08 gi2 kernel: eax: 00000000 ebx: c17b6600 ecx: 00000000 edx: 00000000 Feb 24 12:19:08 gi2 kernel: esi: c17b6600 edi: c1d2001c ebp: c17b6600 esp: c1d25f5c Feb 24 12:19:08 gi2 kernel: ds: 0018 es: 0018 ss: 0018 Feb 24 12:19:08 gi2 kernel: Process nfsd (pid: 601, process nr: 36, stackpage=c1d25000) Feb 24 12:19:08 gi2 kernel: Stack: c1d20014 c1d20014 c489545e c17b6600 c1d2001c c17b7c30 c17b66f4 c489ed80 Feb 24 12:19:08 gi2 kernel: c17b7c6c c4883388 c17b6600 c1d20014 c1d24000 c1d24000 00000000 c17b6600 Feb 24 12:19:08 gi2 kernel: c17b7c30 c489f0c0 00000000 00000002 000186a3 00000002 c1d20014 c489ec2c Feb 24 12:19:08 gi2 kernel: Call Trace: [sunrpc:rpc_system_err_Rc39cf57e+42754/141402338] [sunrpc:rpc_system_err_Rc39cf57e+81956/141363136] [lockd:nlmclnt_proc_R6be7fa0a+-36832/5324] [sunrpc:rpc_system_err_Rc39cf57e+82788/141362304] [sunrpc:rpc_system_err_Rc39cf57e+81616/14136347 6] [sunrpc:rpc_system_err_Rc39cf57e+42193/141402899] [kernel_thread+35/48] Feb 24 12:19:08 gi2 kernel: Code: 8b 58 08 85 db 75 08 31 c9 e9 45 01 00 00 90 66 8b 43 22 66 I forgot to mention that we can't kill the NFSD. PID TTY STAT TIME COMMAND 594 ? DW 0:00 [nfsd] 595 ? DW 0:00 [nfsd] 596 ? DW 0:00 [nfsd] 597 ? DW 0:00 [nfsd] 598 ? DW 0:00 [nfsd] 599 ? DW 0:00 [nfsd] 600 ? DW 0:00 [nfsd] We are using the following RPMS: knfsd-1.4.7-7 portmap-4.0-17 knfsd-clients-1.4.7-7 Was finally able to get our NFS server back into operation by removing knfsd- 1.4.7-7 & knfsd-client-1.4.7-7, removing the /var/lib/nfs directory, then reinstalling the knfsd-1.4.7-7 & knfsd-client-1.4.7-7 from the RPM package and rebooted the server. Any Ideals as to why this crashed occured? I've experienced the same problem recently on a totally different machine. Different details though: It's a UP machine, PII 400, running straight up 6.1 (knfsd-1.4.7-7). It is (was) exporting home drives for three other Linux boxes and one Sparc/Solaris box (SunOS jade.celox.net 5.7 Generic_106541-02 sun4u sparc SUNW,Ultra-5_10). It is also the NIS master for those machines. The setup ran fine for about a month, then our network load began to saturate on a regular basis and the collisions/framing errors were getting kind of bad. That was when the first oops occurred. NFS stopped working and we were unable to shut it down. The server was power cycled and then it ran fine for another couple weeks until about three days ago. Another clue is that recently the nfs exported partition has been filling to 100% of user space periodically recently. Anyway, about 3 days ago, the box got the exact same oops and had to be rebooted. This time, however, the problem stuck and it would oops every time it was restarted. As a quick fix I told them to move the home directories to the Sun machine (got to hear "I thought Linux was supposed to be a good fileserver"). I eventually fixed it so it could boot up without oopsing by mv'ing all the files in /var/lib/nfs then touching etab, rmtab, and xtab. I've got the old files that were in the directory if you want them. BTW: eth0: Intel EtherExpress Pro 10/100 at 0xe400, 00:90:27:C1:E3:93, IRQ 10. From /var/log/messages, right after lpd is started by initscripts: Mar 15 14:29:34 silver exportfs[522]: stevepc.celox.net has non-inet addr Mar 15 14:29:34 silver exportfs: exportfs: stevepc.celox.net has non-inet addr Mar 15 14:29:34 silver kernel: Installing knfsd (copyright (C) 1996 okir.de). Mar 15 14:29:34 silver nfs: Starting NFS services: succeeded Mar 15 14:29:35 silver nfs: rpc.rquotad startup succeeded Mar 15 14:29:35 silver nfs: rpc.mountd startup succeeded Mar 15 14:29:36 silver nfs: rpc.nfsd startup succeeded Mar 15 14:29:36 silver yppasswdd: rpc.yppasswdd startup succeeded Mar 15 14:29:36 silver keytable: Loading keymap: Mar 15 14:29:36 silver keytable: Loading system font: Mar 15 14:29:36 silver rc: Starting keytable succeeded Mar 15 14:29:37 silver kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000008 Mar 15 14:29:37 silver kernel: current->tss.cr3 = 00101000, %cr3 = 00101000 Mar 15 14:29:37 silver kernel: *pde = 00000000 Mar 15 14:29:37 silver kernel: Oops: 0000 Mar 15 14:29:37 silver kernel: CPU: 0 Mar 15 14:29:37 silver kernel: EIP: 0010:[<f0036da2>] Mar 15 14:29:37 silver kernel: EFLAGS: 00010286 Mar 15 14:29:37 silver kernel: eax: 00000000 ebx: de6fd600 ecx: 00000000 edx: ef49001c Mar 15 14:29:37 silver kernel: esi: de6fd600 edi: ef490014 ebp: de6fd600 esp: d9a8ff60 Mar 15 14:29:37 silver kernel: ds: 0018 es: 0018 ss: 0018 Mar 15 14:29:37 silver kernel: Process nfsd (pid: 565, processnr: 32, stackpage=d9a8f000) Mar 15 14:29:37 silver kernel: Stack: ef490014 f003046e de6fd600 ef49001c de6fe900 de6fd6f4 f0039b80 de6fe93c Mar 15 14:29:37 silver kernel: f001e3ec de6fd600 ef490014 d9a8e000 d9a8e000 00000000 de6fd600 de6fe900 Mar 15 14:29:37 silver kernel: f0039ec0 00000000 00000002 000186a3 00000002 ef490014 f0039a2c 00000000 Mar 15 14:29:37 silver kernel: Call Trace: [<f003046e>] [<f0039b80>] [<f001e3ec>] [<f0039ec0>] [<f0039a2c>] [<f003022d>] [kernel_thread+35/48] Mar 15 14:29:37 silver kernel: Code: 8b 58 08 85 db 75 07 31 d2 e9 fd 00 00 00 66 8b 43 22 66 c1 The bit about stevepc from exportfs probably has something to do with the machine being replaced from a Win98 box to a linux box and it's name being changed. Somehow, the first time it mounted silver's disks, it was given the old name (new name being "asics"). Hope that helps. -- Steve We are also experiencing problems on a DELL Power Edge 1300 PIII-500Mhz Dual Processor machine! The machine ran fine for about a week in production. Then from 2000-05-05 on it started crashing 1-3 times PER DAY! But the kernel message are always different. I had the nfsd once, some "Unable to handle Kernel paging request", klogd, etc. The last two crashes were just have an hour apart. I just deactivated NFS, to see if this helps. assigned to johnsonm I was experiencing these same issues on a PIII 700 SMP machine with 2.2.14 (from VA 6.2.1). Applying the patched kernel, etc available at http://nfs.sourceforge.net remidied these problems in our enviornment. The following mesage gives some clue on how to fix this problem: http://www.geocrawler.com/archives/3/789/2000/3/0/3503316/ |