I have knfsd-1.4.4 and must use the new, fixed restart every couple of days. If I forget, the server will stop responding. After re-booting I get an error message in about 1-10 seconds and the server stops responding again. The only way to bring the server back up is to re-boot every computer on the network, then re-boot the NFS server. My network contains 3 Linux NFS servers and about 30 i386 Linux and Ultra 60 SunOS 5.6 clients. I have not yet applied the mountd growth fix I found a couple minutes ago. The error message just after re-boot of the nfs server: Linux version 2.2.5-22 (root.redhat.com) (gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)) #1 Wed Jun 2 09:02:27 EDT 1999 Detected 451026999 Hz processor. ... ncr53c876-0-<6,*>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 15) SCSI device sda: hdwr sector= 512 bytes. Sectors= 35970860 [17563 MB] [17.6 GB] sda: sda1 sda2 sda3 < sda5 sda6 > ... Installing knfsd (copyright (C) 1996 okir.de). nfsd_init: initialized fhcache, entries=256 eth0: Changing PNIC configuration to full-duplex, CSR6 812e0200. Unable to handle kernel NULL pointer dereference at virtual address 00000008 current->tss.cr3 = 00101000, %cr3 = 00101000 *pde = 00000000 Oops: 0000 CPU: 0 EIP: 0010:[<c807d98e>] EFLAGS: 00010282 eax: 00000000 ebx: c7dc8000 ecx: 00000000 edx: c676801c esi: c7dc8000 edi: c6768014 ebp: c7dc8000 esp: c67dbf60 ds: 0018 es: 0018 ss: 0018 Process nfsd (pid: 450, process nr: 24, stackpage=c67db000) Stack: c6768014 c8077462 c7dc8000 c676801c c6e0b360 c7dc80f4 c8080680 c6e0b39c c8065354 c7dc8000 c6768014 c67da000 c67da000 00000000 c7dc8000 c6e0b360 c80809a0 00000000 00000002 000186a3 00000002 c6768014 c808052c 00000000 Call Trace: [<c8077462>] [<c8080680>] [<c8065354>] [<c80809a0>] [<c808052c>] [<c8077221>] [<c010813b>] Code: 8b 58 08 85 db 75 07 31 d2 e9 fd 00 00 00 66 8b 43 22 66 c1 Unable to handle kernel NULL pointer dereference at virtual address 00000008 current->tss.cr3 = 00101000, %cr3 = 00101000 *pde = 00000000 Oops: 0000 CPU: 0 EIP: 0010:[<c807d98e>] EFLAGS: 00010282 eax: 00000000 ebx: c67d8a00 ecx: 00000000 edx: c678801c esi: c67d8a00 edi: c6788014 ebp: c67d8a00 esp: c6787f60 ds: 0018 es: 0018 ss: 0018 ds: 0018 es: 0018 ss: 0018 Process nfsd (pid: 453, process nr: 27, stackpage=c6787000) Stack: c6788014 c8077462 c67d8a00 c678801c c6e0ba20 c67d8af4 c8080680 c6e0ba5c c8065354 c67d8a00 c6788014 c6786000 c6786000 00000000 c67d8a00 c6e0ba20 c80809a0 00000000 00000002 000186a3 00000002 c6788014 c808052c 00000000 Call Trace: [<c8077462>] [<c8080680>] [<c8065354>] [<c80809a0>] [<c808052c>] [<c8077221>] [<c010813b>] Code: 8b 58 08 85 db 75 07 31 d2 e9 fd 00 00 00 66 8b 43 22 66 c1
This problem could not be duplicated once I upgraded to glibc-2.1.2-2.i386.rpm The memory leak and NULL pointer both went away.
This problem appears to be resolved.
well this doesn't fix the problem for a stock 6.0 RH distro so perhaps the kernel upgrade is also required. # netstat -a Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State ... udp 65456 0 *:2049 *:* ... Linux version 2.2.5-15 (root.redhat.com) (gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)) #1 Mon Apr 19 23:00:46 EDT 1999 Detected 448978865 Hz processor. Console: colour VGA+ 80x25 Calibrating delay loop... 447.28 BogoMIPS Memory: 127972k/131008k available (996k kernel code, 412k reserved, 1568k data, 60k init) VFS: Diskquotas version dquot_6.4.0 initialized CPU: Intel Celeron (Mendocino) stepping 00 Unable to handle kernel NULL pointer dereference at virtual address 00000008 current->tss.cr3 = 00101000, %cr3 = 00101000 *pde = 00000000 Oops: 0000 CPU: 0 EIP: 0010:[3c59x+127218/77496320] EFLAGS: 00010282 eax: 00000000 ebx: c0a7be00 ecx: 00000000 edx: c130401c esi: c0a7be00 edi: c1304014 ebp: c0a7be00 esp: c15a9f60 ds: 0018 es: 0018 ss: 0018 Process nfsd (pid: 500, process nr: 25, stackpage=c15a9000) Stack: c1304014 c803e462 c0a7be00 c130401c c039db60 c0a7bef4 c8047680 c039db9c c8023354 c0a7be00 c1304014 c15a8000 c15a8000 00000001 c0a7be00 c039db60 c80479a0 00000001 00000002 000186a3 00000002 c1304014 c804752c 00000000 Call Trace: [3c59x+101318/77496320] [3c59x+138724/77496320] [lockd:nlmclnt_proc_R05b69af3+-36864/5332] [3c59x+139524/77496320] [3c59x+138384/77496320] [3c59x+100741/77496320] [kernel_thread+35/48] Code: 8b 58 08 85 db 75 07 31 d2 e9 fd 00 00 00 66 8b 43 22 66 c1
kernel upgrade and nfs-utils replacement of broken knfsd required. weird that this port lockup/kernel oops took so long to show up .