Bug 61653 - lockd hangs, requires system reboot
lockd hangs, requires system reboot
Status: CLOSED CURRENTRELEASE
Product: Red Hat Linux
Classification: Retired
Component: nfs-server (Show other bugs)
7.1
i386 Linux
high Severity high
: ---
: ---
Assigned To: Ben LaHaise
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2002-03-22 11:47 EST by Trent Doyle
Modified: 2008-05-01 11:38 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2002-05-14 15:07:25 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
Ksymoops Output from a crash (8.94 KB, text/plain)
2002-04-08 16:22 EDT, Ron Reed
no flags Details
Latest oops output processed thru Ksymoops (8.92 KB, text/plain)
2002-04-17 17:16 EDT, Ron Reed
no flags Details
Ksymoops processed oops from 5/1 (10.16 KB, text/plain)
2002-05-02 11:31 EDT, Ron Reed
no flags Details
Ksymoops processed oops output, 5/1 - crash right after reboot. (10.16 KB, text/plain)
2002-05-02 11:33 EDT, Ron Reed
no flags Details

  None (edit)
Description Trent Doyle 2002-03-22 11:47:10 EST
Please note that we are running 7.1sbe not just 7.1 from Dell.

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.4) Gecko/20011126
Netscape6/6.2.1

Description of problem:
We have had continual problems averaging about 1/2 times a week with our main
nfs server.

Version-Release number of selected component (if applicable):


How reproducible:
Couldn't Reproduce


Additional info:

We're running the latest kernel 2.4.9-31 here is some out put from one of the
system logs:
Mar 17 04:22:34 sgp-nfs kernel:  <1>Unable to handle kernel NULL pointer
dereference at virtual address 00000010
Mar 17 04:22:34 sgp-nfs kernel:  printing eip:
Mar 17 04:22:34 sgp-nfs kernel: c0145d93
Mar 17 04:22:34 sgp-nfs kernel: *pde = 00000000
Mar 17 04:22:34 sgp-nfs kernel: Oops: 0000
Mar 17 04:22:34 sgp-nfs kernel: Kernel 2.4.9-31
Mar 17 04:22:34 sgp-nfs kernel: CPU:    0
Mar 17 04:22:34 sgp-nfs kernel: EIP:    0010:[posix_locks_deadlock+67/96]    Not
tainted
Mar 17 04:22:34 sgp-nfs kernel: EIP:    0010:[<c0145d93>]    Not tainted
Mar 17 04:22:34 sgp-nfs kernel: EFLAGS: 00010207
Mar 17 04:22:34 sgp-nfs kernel: EIP is at posix_locks_deadlock [kernel] 0x43
Mar 17 04:22:34 sgp-nfs kernel: eax: fffffffc   ebx: d4ca7240   ecx: 000003f9  
edx: 00000000
Mar 17 04:22:34 sgp-nfs kernel: esi: 000003f0   edi: c5180d80   ebp: 00000001  
esp: cdd7feec
Mar 17 04:22:34 sgp-nfs kernel: ds: 0018   es: 0018   ss: 0018
Mar 17 04:22:34 sgp-nfs kernel: Process smbd (pid: 1008, stackpage=cdd7f000)
Mar 17 04:22:34 sgp-nfs kernel: Stack: d41f8344 ffffffdd d4429e40 c0146174
d41f8680 d41f8344 0000000b 2a1f06dd
Mar 17 04:22:34 sgp-nfs kernel:        c79361d0 cf316000 00000000 ffffffeb
00000000 00000000 d41f83fc d41f828c
Mar 17 04:22:34 sgp-nfs kernel:        00000000 00000006 00000001 00000000
c19ebd20 bfffda8c cdd7ff88 d41f8680
Mar 17 04:22:34 sgp-nfs kernel: Call Trace: [posix_lock_file+180/1376]
posix_lock_file [kernel] 0xb4
Mar 17 04:22:34 sgp-nfs kernel: Call Trace: [<c0146174>] posix_lock_file
[kernel] 0xb4
Mar 17 04:22:34 sgp-nfs kernel: [fcntl_setlk64+324/464] fcntl_setlk64 [kernel] 0x144
Mar 17 04:22:34 sgp-nfs kernel: [<c0147304>] fcntl_setlk64 [kernel] 0x144
Mar 17 04:22:34 sgp-nfs kernel: [filp_open+77/96] filp_open [kernel] 0x4d
Mar 17 04:22:34 sgp-nfs kernel: [<c0135fcd>] filp_open [kernel] 0x4d
Mar 17 04:22:34 sgp-nfs kernel: [getname+94/160] getname [kernel] 0x5e
Mar 17 04:22:34 sgp-nfs kernel: [<c013f9ce>] getname [kernel] 0x5e
Mar 17 04:22:34 sgp-nfs kernel: [sys_fcntl64+109/160] sys_fcntl64 [kernel] 0x6d
Mar 17 04:22:34 sgp-nfs kernel: [<c01435fd>] sys_fcntl64 [kernel] 0x6d
Mar 17 04:22:34 sgp-nfs kernel: [system_call+51/56] system_call [kernel] 0x33
Mar 17 04:22:34 sgp-nfs kernel: [<c0106f3b>] system_call [kernel] 0x33
Mar 17 04:22:34 sgp-nfs kernel:
Mar 17 04:22:34 sgp-nfs kernel:
Mar 17 04:22:34 sgp-nfs kernel: Code: 39 58 14 75 05 39 48 18 74 d9 8b 12 81 fa
b0 3b 2b c0 75 e9
Comment 1 Trent Doyle 2002-03-25 09:47:18 EST
We are wondering if this has anything to do with the fact that we have apache
files on the nfs server....

03/25/2002 another occurance, here is another copy of /var/log/messages:
Mar 25 02:08:00 sgp-nfs kernel: Unable to handle kernel paging request at
virtual address 652e3642
Mar 25 02:08:00 sgp-nfs kernel:  printing eip:
Mar 25 02:08:00 sgp-nfs kernel: c0145d93
Mar 25 02:08:00 sgp-nfs kernel: *pde = 00000000
Mar 25 02:08:00 sgp-nfs kernel: Oops: 0000
Mar 25 02:08:00 sgp-nfs kernel: Kernel 2.4.9-31
Mar 25 02:08:00 sgp-nfs kernel: CPU:    0
Mar 25 02:08:00 sgp-nfs kernel: EIP:    0010:[posix_locks_deadlock+67/96]    Not
tainted
Mar 25 02:08:00 sgp-nfs kernel: EIP:    0010:[<c0145d93>]    Not tainted
Mar 25 02:08:00 sgp-nfs kernel: EFLAGS: 00010a87
Mar 25 02:08:00 sgp-nfs kernel: EIP is at posix_locks_deadlock [kernel] 0x43
Mar 25 02:08:00 sgp-nfs kernel: eax: 652e362e   ebx: d1adfe40   ecx: 00003d0e  
edx: 652e3632
Mar 25 02:08:00 sgp-nfs kernel: esi: 00003d0f   edi: d1adfe40   ebp: d21e91d4  
esp: d2883f30
Mar 25 02:08:00 sgp-nfs kernel: ds: 0018   es: 0018   ss: 0018
Mar 25 02:08:00 sgp-nfs kernel: Process lockd (pid: 839, stackpage=d2883000)
Mar 25 02:08:00 sgp-nfs kernel: Stack: d3d9d600 d3d9dee8 d3d9dee8 e095f487
d3d9d600 d21e91d4 c3f63000 e0963b7a
Mar 25 02:08:00 sgp-nfs kernel:        dfa37e00 d2883f5c d3d9d5b4 d3d9dea0
d3d9d2a0 dfa37e00 e096960c e0963d47
Mar 25 02:08:00 sgp-nfs kernel:        dfa37e00 d3d9dea0 d3d9d5ac 00000001
d3d9d5a0 d1adfe40 d3d9dea0 dfa37f38
Mar 25 02:08:00 sgp-nfs kernel: Call Trace:
[eepro100:__insmod_eepro100_S.bss_L16+405383/72048169] lockd_down_Ra7b91a7b
[lockd] 0x767
Mar 25 02:08:00 sgp-nfs kernel: Call Trace: [<e095f487>] lockd_down_Ra7b91a7b
[lockd] 0x767
Mar 25 02:08:00 sgp-nfs kernel:
[eepro100:__insmod_eepro100_S.bss_L16+423546/72030006]
nlmsvc_invalidate_client_Rb1c3f825 [lockd] 0x277a
Mar 25 02:08:00 sgp-nfs kernel: [<e0963b7a>] nlmsvc_invalidate_client_Rb1c3f825
[lockd] 0x277a
Mar 25 02:08:00 sgp-nfs kernel:
[eepro100:__insmod_eepro100_S.bss_L16+446732/72006820]
__insmod_lockd_S.data_L2956 [lockd] 0x8cc
Mar 25 02:08:00 sgp-nfs kernel: [<e096960c>] __insmod_lockd_S.data_L2956 [lockd]
0x8cc
Mar 25 02:08:00 sgp-nfs kernel:
[eepro100:__insmod_eepro100_S.bss_L16+424007/72029545]
nlmsvc_invalidate_client_Rb1c3f825 [lockd] 0x2947
Mar 25 02:08:00 sgp-nfs kernel: [<e0963d47>] nlmsvc_invalidate_client_Rb1c3f825
[lockd] 0x2947
Mar 25 02:08:00 sgp-nfs kernel:
[eepro100:__insmod_eepro100_S.bss_L16+282487/72171065] svc_process_R7eb1336f
[sunrpc] 0x2d7
Mar 25 02:08:00 sgp-nfs kernel: [<e0941477>] svc_process_R7eb1336f [sunrpc] 0x2d7
Mar 25 02:08:00 sgp-nfs kernel:
[eepro100:__insmod_eepro100_S.bss_L16+444600/72008952]
__insmod_lockd_S.data_L2956 [lockd] 0x78
Mar 25 02:08:00 sgp-nfs kernel: [<e0968db8>] __insmod_lockd_S.data_L2956 [lockd]
0x78
Mar 25 02:08:00 sgp-nfs kernel:
[eepro100:__insmod_eepro100_S.bss_L16+444636/72008916]
__insmod_lockd_S.data_L2956 [lockd] 0x9c
Mar 25 02:08:00 sgp-nfs kernel: [<e0968ddc>] __insmod_lockd_S.data_L2956 [lockd]
0x9c
Mar 25 02:08:00 sgp-nfs kernel:
[eepro100:__insmod_eepro100_S.bss_L16+403030/72050522] nlmclnt_proc_Rd9df9c43
[lockd] 0x16d6
Mar 25 02:08:00 sgp-nfs kernel: [<e095eb56>] nlmclnt_proc_Rd9df9c43 [lockd] 0x16d6
Mar 25 02:08:00 sgp-nfs kernel: [kernel_thread+38/48] kernel_thread [kernel] 0x26
Mar 25 02:08:00 sgp-nfs kernel: [<c0105726>] kernel_thread [kernel] 0x26
Mar 25 02:08:00 sgp-nfs kernel:
[eepro100:__insmod_eepro100_S.bss_L16+402592/72050960] nlmclnt_proc_Rd9df9c43
[lockd] 0x1520
Mar 25 02:08:00 sgp-nfs kernel: [<e095e9a0>] nlmclnt_proc_Rd9df9c43 [lockd] 0x1520
Mar 25 02:08:00 sgp-nfs kernel:
Mar 25 02:08:00 sgp-nfs kernel:
Mar 25 02:08:00 sgp-nfs kernel: Code: 39 58 14 75 05 39 48 18 74 d9 8b 12 81 fa
b0 3b 2b c0 75 e9

Comment 2 Arjan van de Ven 2002-03-25 09:50:19 EST
weird question: the filesystem in question isn't vfat is it ?
Comment 3 Trent Doyle 2002-03-26 17:04:32 EST
03/26/2002
One contributing factor may be the fact that this server is our nfs server and
shares the config files for apache vs. 1.3.14 or 1.3.19 (I've checked with a
sys. programmer and he wasn't sure either) to a sun ultra10 running solaris 7. 
The system does continue sharing files to other clients however.  But for some
reason it doesn't appear to re-validate the share for the config files for the
ulta-10.
Comment 4 Trent Doyle 2002-03-26 17:06:56 EST
03/26/2002
No, it is ext2, I just checked /etc/fstab.

Trent
Comment 5 Arjan van de Ven 2002-03-26 17:10:05 EST
Ok this is the 2.4.3-6 kernel ?
If so then upgrading to 2.4.9-31 might be worth a shot. Quite a few NFS problems
have been fixed since 2.4.3-6....
Comment 6 Trent Doyle 2002-03-27 11:49:34 EST
We have the latest kernel (2.4.9-31) it is installed on the system.  That was
one of the other fixes that we've been given along with changing the kernel
module from eepro100 over epro100 (All it did was to lock the system up even
faster than before, we went back to eepro100).  So, far none of the suposed
"fixes" have fixed the problem.
Comment 7 Trent Doyle 2002-04-08 15:09:34 EDT
2002/04/08
Just checking status.  Nothing new.  So, does RedHat have any other suggestions?

Trent Doyle
Comment 8 Michael K. Johnson 2002-04-08 15:42:53 EDT
Please add a line
insmod_opt=-S
to /etc/modules.conf
That will cause things like __insmod_lockd_S.data_L2956 to get reasonable names.

Then also in /etc/sysconfig/syslog, add a "-x" to KLOGD_OPTIONS
so that klogd doesn't confuse the log messages.

That should help us get better debugging output.
Comment 9 Michael K. Johnson 2002-04-08 15:59:04 EDT
Are you running samba?
What about fam?

chkconfig --list samba
chkconfig --list sgi_fam
Comment 10 Ron Reed 2002-04-08 16:16:36 EDT
Samba does run on this server, but does not appear to be a factor. The samba
traffic is very light and only once per hour. Other NFS machines that we have
will also show this crash, and they do not run samba.
Comment 11 Ron Reed 2002-04-08 16:22:26 EDT
Created attachment 52744 [details]
Ksymoops Output from a crash
Comment 12 Ben LaHaise 2002-04-08 16:48:53 EDT
You might want to try the following patch against fs/locks.c.  This might be the
source of the problem; I've set a couple of questions to the maintainer of the
code about this.

Index: locks.c
===================================================================
RCS file: /bcrl/cvs/CVSROOT/net-aio/linux/fs/locks.c,v
retrieving revision 1.1.1.1
diff -u -u -r1.1.1.1 locks.c
--- locks.c	2 Apr 2002 23:47:24 -0000	1.1.1.1
+++ locks.c	8 Apr 2002 20:46:00 -0000
@@ -440,7 +440,7 @@
 	while (!list_empty(&blocker->fl_block)) {
 		struct file_lock *waiter = list_entry(blocker->fl_block.next, struct
file_lock, fl_block);
 
-		if (wait) {
+		if (0) {
 			locks_notify_blocked(waiter);
 			/* Let the blocked process remove waiter from the
 			 * block list when it gets scheduled.
Comment 13 Ron Reed 2002-04-08 16:59:11 EDT
I have made the change to the locks.c file, but how do I recompile it into the
nfs module? The 2.4.9-31 kernel was installed via rpm on this machine, I have
the src rpm installed, but the kernel has never been compiled on it.
Comment 14 Ron Reed 2002-04-17 17:16:07 EDT
Created attachment 54244 [details]
Latest oops output processed thru Ksymoops
Comment 15 Ron Reed 2002-04-17 17:18:30 EDT
Just added another oops output. Still waiting for instructions on how to compile
the suggested patch into a module.
Comment 16 Ben LaHaise 2002-04-17 17:26:23 EDT
Who is your TAO?
Comment 17 Ron Reed 2002-04-18 09:28:22 EDT
I am not sure what you mean by TAO. Technical Contact would be me.
Comment 18 Michael K. Johnson 2002-04-18 10:26:43 EDT
TAO would be your service contract contact at Red Hat.  Do you have
a service contract?
Comment 19 Ron Reed 2002-04-18 11:29:59 EDT
We do not have a service contract. We did pay for 1 problem resolution, but
since they could not fix it they told us to put the problem in here. They
offered to give us the money back on the resolution, but we choose to just keep
the resolution for any later problems. Do you have the standard .config file
that was used for the 2.4.9-31 kernel? That is all I need to recompile.
Comment 20 Ben LaHaise 2002-04-18 11:51:59 EDT
The .config is included with the kernel source rpm.
Comment 21 Michael K. Johnson 2002-04-18 12:11:36 EDT
To be more specific, the .config files are kept in the configs
subdirectory below the kernel source, one for each kernel that
is built.
Comment 22 Ron Reed 2002-04-18 12:28:01 EDT
I feel stupid. I found them right after I posted that message. I have a new
compiled kernel now, I will be rebooting the server shortly to this new kernel.
Any ideals on how to stress test it for this recurring error?
Comment 23 Ron Reed 2002-05-02 11:31:04 EDT
Created attachment 56164 [details]
Ksymoops processed oops from 5/1
Comment 24 Ron Reed 2002-05-02 11:33:08 EDT
Created attachment 56165 [details]
Ksymoops processed oops output, 5/1 - crash right after reboot.
Comment 25 Ron Reed 2002-05-02 11:36:56 EDT
Has there been a way found to cause this error for testing purposes? Is there
anything else I can provide to help solve this error? This is a production file
server with over 400gigs of storage that cannot be down for any extended period.
I  will try to compile the sugguestion of bcrl@redhat.com again and see if I can
get the machine to boot with this change.
Comment 26 Michael K. Johnson 2002-05-02 13:20:59 EDT
We don't have a way of testing it here, no; it's not happening
to us.  Without a reproducer like that, we don't have much choice.
Comment 27 Trent Doyle 2002-05-13 16:06:16 EDT
Had another lockd process go south.  Preliminary bug info as follows from
/var/log/messages:
May 13 16:40:09 sgp-nfs rpc.mountd: authenticated mount request from
jester1.sgp.arm.gov:739 for /files0/SunOS5.7/apps/web (/files0) 
May 13 16:40:22 sgp-nfs rpc.mountd: authenticated unmount request from
r1.sgp.arm.gov:750 for /files0/res/apps/res (/files0) 
May 13 16:40:22 sgp-nfs rpc.mountd: authenticated unmount request from
r1.sgp.arm.gov:750 for /files0/res/apps/cse (/files0) 
May 13 16:40:22 sgp-nfs rpc.mountd: authenticated unmount request from
r1.sgp.arm.gov:750 for /files0/res/home/sds (/files0) 
May 13 16:40:22 sgp-nfs rpc.mountd: authenticated unmount request from
r1.sgp.arm.gov:750 for /files0/res/home/sgpdq (/files0) 
May 13 16:40:23 sgp-nfs rpc.mountd: authenticated mount request from
jester1.sgp.arm.gov:739 for /files0/SunOS5.7/data/collection (/files0) 
May 13 16:40:27 sgp-nfs kernel: Unable to handle kernel NULL pointer dereference
at virtual address 00000010
May 13 16:40:27 sgp-nfs kernel:  printing eip:
May 13 16:40:27 sgp-nfs kernel: c0145d93
May 13 16:40:27 sgp-nfs kernel: *pde = 00000000
May 13 16:40:27 sgp-nfs kernel: Oops: 0000
May 13 16:40:27 sgp-nfs kernel: Kernel 2.4.9-31
May 13 16:40:27 sgp-nfs kernel: CPU:    0
May 13 16:40:27 sgp-nfs kernel: EIP:    0010:[<c0145d93>]    Not tainted
May 13 16:40:27 sgp-nfs kernel: EFLAGS: 00010207
May 13 16:40:27 sgp-nfs kernel: EIP is at posix_locks_deadlock [kernel] 0x43 
May 13 16:40:27 sgp-nfs kernel: eax: fffffffc   ebx: c37c28c0   ecx: 00006afe  
edx: 00000000
May 13 16:40:27 sgp-nfs kernel: esi: 00004ad7   edi: c37c28c0   ebp: df525230  
esp: d1e57f30
May 13 16:40:27 sgp-nfs kernel: ds: 0018   es: 0018   ss: 0018
May 13 16:40:27 sgp-nfs kernel: Process lockd (pid: 832, stackpage=d1e57000)
May 13 16:40:27 sgp-nfs kernel: Stack: c6167f00 d4225bc8 d4225bc8 e0942487
c6167f00 df525230 c2373c00 e0946b7a 
May 13 16:40:27 sgp-nfs kernel:        dfa37e00 d1e57f5c c6167eb4 d4225b80
c61673a0 dfa37e00 e094e67c e0946d47 
May 13 16:40:27 sgp-nfs kernel:        dfa37e00 d4225b80 c6167eac 00000001
c6167ea0 c37c28c0 d4225b80 dfa37f38 
May 13 16:40:27 sgp-nfs kernel: Call Trace: [<e0942487>] nlmsvc_lock [lockd] 0x1d7 
May 13 16:40:27 sgp-nfs kernel: [<e0946b7a>] nlm4svc_retrieve_args [lockd] 0xaa 
May 13 16:40:27 sgp-nfs kernel: [<e094e67c>] nlmsvc_procedures4 [lockd] 0x40 
May 13 16:40:27 sgp-nfs kernel: [<e0946d47>] nlm4svc_proc_lock [lockd] 0x97 
May 13 16:40:27 sgp-nfs kernel: [<e0968477>] svc_process_R7eb1336f [sunrpc] 0x2d7 
May 13 16:40:27 sgp-nfs kernel: [<e094de28>] nlmsvc_version4 [lockd] 0x0 
May 13 16:40:27 sgp-nfs kernel: [<e094de4c>] nlmsvc_program [lockd] 0x0 
May 13 16:40:27 sgp-nfs kernel: [<e0941b56>] lockd [lockd] 0x1b6 
May 13 16:40:27 sgp-nfs kernel: [<c0105726>] kernel_thread [kernel] 0x26 
May 13 16:40:27 sgp-nfs kernel: [<e09419a0>] lockd [lockd] 0x0 
May 13 16:40:27 sgp-nfs kernel: 
May 13 16:40:27 sgp-nfs kernel: 
May 13 16:40:27 sgp-nfs kernel: Code: 39 58 14 75 05 39 48 18 74 d9 8b 12 81 fa
b0 3b 2b c0 75 e9
Comment 28 Ron Reed 2002-05-14 11:18:49 EDT
Can you tell from the Oops outputs what is causing this problem? If we can
figure out what is causing the lockd to crash, maybe a program can be written
that will test for the problem, then we can start working on a fix.
Comment 29 Ben LaHaise 2002-05-14 12:35:10 EDT
Please test either the 2.4.9 kernel included with AS 2.1 or the 2.4.18-4 errata
kernel for 7.3.
Comment 30 Ron Reed 2002-05-14 12:57:07 EDT
This is a production NFS server. I can't upgrade the kernel without testing it
first. I have a test system, but I can't replicate the error easily. That is why
I am asking about a test program.

I can find the 2.4.18-3 kernel on Redhat's ftp server, do I need to get the
2.4.9 or 2.4.18-4 from elsewhere?
Comment 31 Ben LaHaise 2002-05-14 13:09:46 EDT
is http://rhn.redhat.com/errata/RHBA-2002-085.html not an accurate description
of where to obtain the 2.4.18-4 update?
Comment 32 Ron Reed 2002-05-14 13:58:06 EDT
I am getting the newer kernel now, but this still does me no good without a way
to force the crash. This server currently runs the 2.4.9-31 kernel, but not from
the AS 2.1. I will search out this kernel too and get it downloaded. But all of
this is a mute point if I can't find a way of duplicating the crash.
Comment 33 Ben LaHaise 2002-05-14 14:47:43 EDT
Well, your setup is the only one hitting this problem, and you've been unable to
provide a reproducer, nor have you tested the suggested patch, so there's not
terribly much that I can do other than point at new kernels that may have fixed
the problem.
Comment 34 Ron Reed 2002-05-14 15:07:20 EDT
I have submited all these processed oops reports in the hopes of getting help in
createing a reproducer. I do not understand what the oops is saying, other than
it is a problem with lockd. As I have said before, this is a PRODUCTION NFS
server, and I can not just install a patch to see if it works. When this crash
happens, it takes the server down for more than 30 minutes at a time. Can anyone
there look at the oops outputs and even suggest what could be causing this or
suggest something that can be used to cause this problem? You asked for changes
to the modules.conf and the syslog.conf for better debugging messages, and I
have gave them to you. Surely someone around there can give a better answer than
"It is your problem, you solve it"
Comment 35 Michael K. Johnson 2002-05-14 17:04:38 EDT
We think we fixed it, based on the output you provided.  We have
offered you tested kernels with the candidate fix in them.

Since you can't provide us with a reproducer, we can't tell for
sure unless you are willing to test.  If you aren't willing to
test a candidate fix, then there's not much we can do.  The fact
that you cannot, by your own institutional rules, deploy our
candidate fix without writing a test program does not make it our
job to write the test program.  I don't know that a test program
can be written.

Since you aren't willing to test, we'll just close this one as
fixed in the current release, since we have made a change that
we expect has fixed the problem.  When you have upgraded to the
current release, or at least the kernel from the current release,
if this still occurs, you can feel free to re-open the bug report.
Comment 36 Ron Reed 2002-05-14 17:18:55 EDT
I have installed the 2.4.18-4 kernel on a test NFS machine. But since no one
there is willing to even give me a glimpse of what might have been causeing the
error, I have no way of writing a reproducer either. I have sent all those oops
outputs so that someone there can at least help me narrow down what is causing
this. I understand that we are the only one's experiencing this problem, and I
do not expect you to write a test program. But what I have been asking for is
for someone to look at the output and see if there are any clues as to what
could be causing this crash. Does the oops output help you narrow it in any way?
I have my suspects as to what is causeing this crash, but I am not sure. If
anyone has any clues from the oops outputs that they want to share, I am listening.

Note You need to log in before you can comment on or make changes to this bug.