8168 – NFS Server Crashes for SMP kernel

Bug 8168 - NFS Server Crashes for SMP kernel

Summary: NFS Server Crashes for SMP kernel

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	nfs-utils
Sub Component:
Version:	7.0
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Michael K. Johnson
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2000-01-04 17:52 UTC by joseph
Modified:	2008-05-01 15:37 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2002-12-14 01:59:23 UTC
Embargoed:

Attachments	(Terms of Use)

Description joseph 2000-01-04 17:52:42 UTC

With a Dual PII-500Mhz system and a SMP kernel of linux 2.2.12-20 with
redhat 6.1 I get Kernel Panic from the NFSD daemon..

The message log reads...

Jan 4 12:47:44 localhost kernel: Unable to handle kernel NULL pointer
derederence at virtual address 00000008
...
Jan 4 12:47:46 localhost kernel: Call Trace: [nfsd_dispatch+266/344]
[svc_process+692/1316] [nfsd+321/648] [kernel_thread+35/48]

Comment 1 scot 2000-02-24 20:04:59 UTC

I am experiencing the same problem with a NON-SMP Kernel 2.2.12-20

Feb 24 11:07:47 gi2 kernel: Unable to handle kernel NULL pointer dereference at
virtual address 00000008
Feb 24 11:07:47 gi2 kernel: current->tss.cr3 = 00101000, %cr3 = 00101000
Feb 24 11:07:47 gi2 kernel: *pde = 00000000
Feb 24 11:07:47 gi2 kernel: Oops: 0000
Feb 24 11:07:47 gi2 kernel: CPU:    0
Feb 24 11:07:47 gi2 kernel: EIP:    0010:
[sound:sound_install_audiodrv_R990d1ca0+-149799/512]
Feb 24 11:07:47 gi2 kernel: EFLAGS: 00010286
Feb 24 11:07:47 gi2 kernel: eax: 00000000   ebx: c208d000   ecx: 00000000
edx: 00000000
Feb 24 11:07:47 gi2 kernel: esi: c208d000   edi: c196401c   ebp: c208d000
esp: c1d0ff5c
Feb 24 11:07:47 gi2 kernel: ds: 0018   es: 0018   ss: 0018
Feb 24 11:07:47 gi2 kernel: Process nfsd (pid: 32465, process nr: 29,
stackpage=c1d0f000)
Feb 24 11:07:47 gi2 kernel: Stack: c1964014 c1964014 c48b345e c208d000 c196401c
c1957080 c208d0f4 c48bcd80
Feb 24 11:07:47 gi2 kernel:        c19570bc c4898388 c208d000 c1964014 c1d0e000
c1d0e000 00000001 c208d000
Feb 24 11:07:47 gi2 kernel:        c1957080 c48bd0c0 00000001 00000002 000186a3
00000002 c1964014 c48bcc2c
Feb 24 11:07:47 gi2 kernel: Call Trace:
[sound:sound_install_audiodrv_R990d1ca0+-176890/512]
[sound:sound_install_audiodrv_R990d1ca0+-137688
/512] [sound:sound_install_audiodrv_R990d1ca0+-287696/512]
[sound:sound_install_audiodrv_R990d1ca0+-136856/512]
[sound:sound_install_audiodr
v_R990d1ca0+-138028/512] [sound:sound_install_audiodrv_R990d1ca0+-177451/512]
[kernel_thread+35/48]
Feb 24 11:07:47 gi2 kernel: Code: 8b 58 08 85 db 75 08 31 c9 e9 45 01 00 00 90
66 8b 43 22 66

This occured during a reboot of the system:

Feb 24 12:18:57 gi2 nfs: Starting NFS services:  succeeded
Feb 24 12:18:58 gi2 nfs: rpc.rquotad startup succeeded
Feb 24 12:18:58 gi2 nfs: rpc.mountd startup succeeded
Feb 24 12:18:59 gi2 nfs: rpc.nfsd startup succeeded
:
Feb 24 12:19:08 gi2 kernel: Unable to handle kernel NULL pointer dereference at
virtual address 00000008
Feb 24 12:19:08 gi2 kernel: current->tss.cr3 = 00101000, %cr3 = 00101000
Feb 24 12:19:08 gi2 kernel: *pde = 00000000
Feb 24 12:19:08 gi2 kernel: Oops: 0000
Feb 24 12:19:08 gi2 kernel: CPU:    0
Feb 24 12:19:08 gi2 kernel: EIP:    0010:
[sunrpc:rpc_system_err_Rc39cf57e+69845/141375247]
Feb 24 12:19:08 gi2 kernel: EFLAGS: 00010286
Feb 24 12:19:08 gi2 kernel: eax: 00000000   ebx: c17b6600   ecx: 00000000
edx: 00000000
Feb 24 12:19:08 gi2 kernel: esi: c17b6600   edi: c1d2001c   ebp: c17b6600
esp: c1d25f5c
Feb 24 12:19:08 gi2 kernel: ds: 0018   es: 0018   ss: 0018
Feb 24 12:19:08 gi2 kernel: Process nfsd (pid: 601, process nr: 36,
stackpage=c1d25000)
Feb 24 12:19:08 gi2 kernel: Stack: c1d20014 c1d20014 c489545e c17b6600 c1d2001c
c17b7c30 c17b66f4 c489ed80
Feb 24 12:19:08 gi2 kernel:        c17b7c6c c4883388 c17b6600 c1d20014 c1d24000
c1d24000 00000000 c17b6600
Feb 24 12:19:08 gi2 kernel:        c17b7c30 c489f0c0 00000000 00000002 000186a3
00000002 c1d20014 c489ec2c
Feb 24 12:19:08 gi2 kernel: Call Trace:
[sunrpc:rpc_system_err_Rc39cf57e+42754/141402338]
[sunrpc:rpc_system_err_Rc39cf57e+81956/141363136]
[lockd:nlmclnt_proc_R6be7fa0a+-36832/5324]
[sunrpc:rpc_system_err_Rc39cf57e+82788/141362304]
[sunrpc:rpc_system_err_Rc39cf57e+81616/14136347
6] [sunrpc:rpc_system_err_Rc39cf57e+42193/141402899] [kernel_thread+35/48]
Feb 24 12:19:08 gi2 kernel: Code: 8b 58 08 85 db 75 08 31 c9 e9 45 01 00 00 90
66 8b 43 22 66

Comment 2 scot 2000-02-24 20:11:59 UTC

I forgot to mention that we can't kill the NFSD.

  PID TTY      STAT   TIME COMMAND
  594 ?        DW     0:00 [nfsd]
  595 ?        DW     0:00 [nfsd]
  596 ?        DW     0:00 [nfsd]
  597 ?        DW     0:00 [nfsd]
  598 ?        DW     0:00 [nfsd]
  599 ?        DW     0:00 [nfsd]
  600 ?        DW     0:00 [nfsd]

We are using the following RPMS:
  knfsd-1.4.7-7
  portmap-4.0-17
  knfsd-clients-1.4.7-7

Comment 3 scot 2000-02-24 22:35:59 UTC

Was finally able to get our NFS server back into operation by removing knfsd-
1.4.7-7 & knfsd-client-1.4.7-7, removing the /var/lib/nfs directory, then
reinstalling the knfsd-1.4.7-7 & knfsd-client-1.4.7-7 from the RPM package and
rebooted the server.

Any Ideals as to why this crashed occured?

Comment 4 Steve "in India" Borho 2000-03-19 05:58:59 UTC

I've experienced the same problem recently on a totally different machine.

Different details though:  It's a UP machine, PII 400, running straight up 6.1
(knfsd-1.4.7-7).  It is (was) exporting home drives for three other Linux boxes
and one Sparc/Solaris box (SunOS jade.celox.net 5.7 Generic_106541-02 sun4u
sparc SUNW,Ultra-5_10).  It is also the NIS master for those machines.  The
setup ran fine for about a month, then our network load began to saturate on a
regular basis and the collisions/framing errors were getting kind of bad.  That
was when the first oops occurred.  NFS stopped working and we were unable to
shut it down.  The server was power cycled and then it ran fine for another
couple weeks until about three days ago.  Another clue is that recently the nfs
exported partition has been filling to 100% of user space periodically recently.
Anyway, about 3 days ago, the box got the exact same oops and had to be
rebooted.  This time, however, the problem stuck and it would oops every time it
was restarted.  As a quick fix I told them to move the home directories to the
Sun machine (got to hear "I thought Linux was supposed to be a good
fileserver").  I eventually fixed it so it could boot up without oopsing by
mv'ing all the files in /var/lib/nfs then touching etab, rmtab, and xtab.  I've
got the old files that were in the directory if you want them.

BTW: eth0: Intel EtherExpress Pro 10/100 at 0xe400, 00:90:27:C1:E3:93, IRQ 10.

From /var/log/messages, right after lpd is started by initscripts:
Mar 15 14:29:34 silver exportfs[522]: stevepc.celox.net has non-inet addr
Mar 15 14:29:34 silver exportfs: exportfs: stevepc.celox.net has non-inet addr
Mar 15 14:29:34 silver kernel: Installing knfsd (copyright (C) 1996
okir.de).
Mar 15 14:29:34 silver nfs: Starting NFS services:  succeeded
Mar 15 14:29:35 silver nfs: rpc.rquotad startup succeeded
Mar 15 14:29:35 silver nfs: rpc.mountd startup succeeded
Mar 15 14:29:36 silver nfs: rpc.nfsd startup succeeded
Mar 15 14:29:36 silver yppasswdd: rpc.yppasswdd startup succeeded
Mar 15 14:29:36 silver keytable: Loading keymap:
Mar 15 14:29:36 silver keytable: Loading system font:
Mar 15 14:29:36 silver rc: Starting keytable succeeded
Mar 15 14:29:37 silver kernel: Unable to handle kernel NULL pointer dereference
at virtual address 00000008
Mar 15 14:29:37 silver kernel: current->tss.cr3 = 00101000, %cr3 = 00101000
Mar 15 14:29:37 silver kernel: *pde = 00000000
Mar 15 14:29:37 silver kernel: Oops: 0000
Mar 15 14:29:37 silver kernel: CPU:    0
Mar 15 14:29:37 silver kernel: EIP:    0010:[<f0036da2>]
Mar 15 14:29:37 silver kernel: EFLAGS: 00010286
Mar 15 14:29:37 silver kernel: eax: 00000000   ebx: de6fd600   ecx: 00000000
edx: ef49001c
Mar 15 14:29:37 silver kernel: esi: de6fd600   edi: ef490014   ebp: de6fd600
esp: d9a8ff60
Mar 15 14:29:37 silver kernel: ds: 0018   es: 0018   ss: 0018
Mar 15 14:29:37 silver kernel: Process nfsd (pid: 565, processnr: 32,
stackpage=d9a8f000)
Mar 15 14:29:37 silver kernel: Stack: ef490014 f003046e de6fd600 ef49001c
de6fe900 de6fd6f4 f0039b80 de6fe93c
Mar 15 14:29:37 silver kernel:        f001e3ec de6fd600 ef490014 d9a8e000
d9a8e000 00000000 de6fd600 de6fe900
Mar 15 14:29:37 silver kernel:        f0039ec0 00000000 00000002 000186a3
00000002 ef490014 f0039a2c 00000000
Mar 15 14:29:37 silver kernel: Call Trace: [<f003046e>] [<f0039b80>]
[<f001e3ec>] [<f0039ec0>] [<f0039a2c>] [<f003022d>] [kernel_thread+35/48]
Mar 15 14:29:37 silver kernel: Code: 8b 58 08 85 db 75 07 31 d2 e9 fd 00 00 00
66 8b 43 22 66 c1

The bit about stevepc from exportfs probably has something to do with the
machine being replaced from a Win98 box to a linux box and it's name being
changed.  Somehow, the first time it mounted silver's disks, it was given the
old name (new name being "asics").

Hope that helps.

--
Steve

Comment 5 system 2000-05-10 08:23:59 UTC

We are also experiencing problems on a DELL Power Edge 1300 PIII-500Mhz Dual
Processor machine!
The machine ran fine for about a week in production. Then from 2000-05-05 on it
started crashing 1-3 times PER DAY!
But the kernel message are always different. I had the nfsd once, some "Unable
to handle Kernel paging request", klogd, etc.
The last two crashes were just have an hour apart.
I just deactivated NFS, to see if this helps.

Comment 6 Cristian Gafton 2000-08-09 02:35:16 UTC

assigned to johnsonm

Comment 7 Ian Prowell 2000-09-22 23:27:51 UTC

I was experiencing these same issues on a PIII 700 SMP machine with 2.2.14 (from
VA 6.2.1).  Applying the patched kernel, etc available at
http://nfs.sourceforge.net remidied these problems in our enviornment.

Comment 8 scot 2000-09-23 17:00:20 UTC

The following mesage gives some clue on how to fix this problem:

http://www.geocrawler.com/archives/3/789/2000/3/0/3503316/

Note You need to log in before you can comment on or make changes to this bug.