44034 – NFS related panic with IBM DB2 EE [patch attached]

Bug 44034 - NFS related panic with IBM DB2 EE [patch attached]

Summary: NFS related panic with IBM DB2 EE [patch attached]

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	kernel
Sub Component:
Version:	7.1
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Arjan van de Ven
QA Contact:	Brock Organ
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2001-06-10 00:36 UTC by Michael E Brown
Modified:	2007-04-18 16:33 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2001-06-11 13:50:35 UTC
Embargoed:

Attachments	(Terms of Use)

Description Michael E Brown 2001-06-10 00:36:49 UTC

Description of Problem:

The panic is actually caused by a longstanding bug in NFS.  However, it is
exposed in the 2.4.3-6 kernel because it does some aggressive clearing of 
list
pointers.  The problem is that the nfs code does manipulation of the 
dentry
d_child list by using list_del.  However, later on it does a dput of the
dentry which can get into a return leg where list_del is done again on 
d_list.
 This is supposed to be a no-no.  When list_del is done, the list entry is
undefined; to re-use it you must re-initialise it.  The correct fix is to
replace the list_del in the nfs code with list_del_init (attached below).

How Reproducible:

IBM DB2 EE, creating a database, or just about anything over NFS will 
cause this.

Steps to Reproduce:
1. 
2. 
3. 

Actual Results:


Expected Results:


Additional Information:
 
Index: linux/2.4/fs/nfsd/nfsfh.c
diff -u linux/2.4/fs/nfsd/nfsfh.c:1.1.1.8.4.1 
linux/2.4/fs/nfsd/nfsfh.c:1.1.1.8.4.2
--- linux/2.4/fs/nfsd/nfsfh.c:1.1.1.8.4.1       Wed May  9 11:08:54 2001
+++ linux/2.4/fs/nfsd/nfsfh.c   Fri Jun  8 16:18:33 2001
@@ -238,7 +238,7 @@
         * make it an IS_ROOT instead
         */
        spin_lock(&dcache_lock);
-       list_del(&tdentry->d_child);
+       list_del_init(&tdentry->d_child);
        tdentry->d_parent = tdentry;
        spin_unlock(&dcache_lock);
        d_rehash(target);

Comment 1 Michael E Brown 2001-06-10 00:48:32 UTC

Stack trace of panic:

Unable to handle kernel NULL pointer dereference at virtual address 00000004
c014b13d
Oops: 0002
CPU:    0
EIP:    0010:[<c014b13d>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246
eax: 00000000   ebx: c4a6dcc0   ecx: c4a6dce8   edx: 00000000
esi: c0256ed8   edi: c59197a0   ebp: c4ba79e8   esp: c4ba78a0
ds: 0018   es: 0018   ss: 0018
Process nfsd (pid: 3230, stackpage=c4ba7000)
Stack: c1239148 c4a6d9e8 c4a6dcc0 c4a6d9c0 c4a6dcc0 c4a6d9c0 c68af335 c4a6dcc0
       c4a6d9c0 00000000 c4ba79e8 c4a6d9c0 c59197a0 c68af59a c4a6d9c0 c59197a0
       c4ba79e8 00000000 00706d74 c5e07d80 c1e9ec00 c014d972 c1e9ec00 00007458
Call Trace: [<c68af335>] [<c68af59a>] [<c014d972>] [<c016bf17>] [<c015fa28>]
   [<c02275a0>] [<c015fa51>] [<c014d972>] [<c015fbfd>] [<c014b0b4>] 
[<c68af887>] 
   [<c68afdb8>] [<c68affde>] [<c012125b>] [<c0121326>] [<c01468f6>] 
[<c0139fc3>] 
   [<c01324ce>] [<c0139e6f>] [<c0139ebf>] [<c0139f4d>] [<c015c1d4>] 
[<c015c2e0>] 
   [<c01c31fc>] [<c01c3382>] [<c01c4e4e>] [<c687136d>] [<c686ed08>] 
[<c01f49b3>] 
   [<c01f4d65>] [<c01ccf3c>] [<c01d7b66>] [<c01d7a70>] [<c01cd26a>] 
[<c01d7a70>] 
   [<c01cd2a1>] [<c01d762c>] [<c01d7a70>] [<c01d7be0>] [<c01ccf3c>] 
[<c01d7dca>] 
   [<c01d7be0>] [<c01cd26a>] [<c01d7be0>] [<c01cd2a1>] [<c014da0a>] 
[<c01d7a1d>] 
   [<c01d7be0>] [<c01fa3a5>] [<c01c728c>] [<c68b0ceb>] [<c68b69df>] 
[<c68bdea0>] 
   [<c68ad5d1>] [<c68bdea0>] [<c6873d23>] [<c68bd738>] [<c68bd758>] 
[<c68ad3b7>] 
   [<c68bd720>] [<c0105676>] [<c68ad190>]
Code: 89 50 04 89 02 c7 43 28 00 00 00 00 c7 41 04 00 00 00 00 8b

>>EIP; c014b13d <dput+ad/180>   <=====
Trace; c68af335 <[nfsd]d_splice+e5/160>
Trace; c68af59a <[nfsd]splice+ea/140>
Trace; c014d972 <iget4+52/100>
Trace; c016bf17 <ll_rw_block+167/1b0>
Trace; c015fa28 <ext2_find_entry+228/380>
Trace; c02275a0 <quota_versions+2880/716c>
Trace; c015fa51 <ext2_find_entry+251/380>
Trace; c014d972 <iget4+52/100>
Trace; c015fbfd <ext2_lookup+7d/90>
Trace; c014b0b4 <dput+24/180>
Trace; c68af887 <[nfsd]find_fh_dentry+297/3b0>
Trace; c68afdb8 <[nfsd]fh_verify+418/690>
Trace; c68affde <[nfsd]fh_verify+63e/690>
Trace; c012125b <deliver_signal+4b/90>
Trace; c0121326 <send_sig_info+86/b0>
Trace; c01468f6 <send_sigio_to_task+b6/c0>
Trace; c0139fc3 <__refile_buffer+63/70>
Trace; c01324ce <nr_free_buffer_pages+e/60>
Trace; c0139e6f <balance_dirty_state+f/50>
Trace; c0139ebf <balance_dirty+f/30>
Trace; c0139f4d <mark_buffer_dirty+3d/50>
Trace; c015c1d4 <ext2_new_block+a64/c00>
Trace; c015c2e0 <ext2_new_block+b70/c00>
Trace; c01c31fc <kfree_skbmem+c/70>
Trace; c01c3382 <__kfree_skb+122/130>
Trace; c01c4e4e <skb_free_datagram+1e/30>
Trace; c687136d <[sunrpc]rpc_unlock_task+4d/70>
Trace; c686ed08 <[sunrpc]udp_data_ready+248/280>
Trace; c01f49b3 <udp_queue_rcv_skb+143/1e0>
Trace; c01f4d65 <udp_rcv+145/260>
Trace; c01ccf3c <nf_iterate+2c/80>
Trace; c01d7b66 <ip_local_deliver_finish+f6/170>
Trace; c01d7a70 <ip_local_deliver_finish+0/170>
Trace; c01cd26a <nf_hook_slow+da/180>
Trace; c01d7a70 <ip_local_deliver_finish+0/170>
Trace; c01cd2a1 <nf_hook_slow+111/180>
Trace; c01d762c <ip_local_deliver+1ac/1c0>
Trace; c01d7a70 <ip_local_deliver_finish+0/170>
Trace; c01d7be0 <ip_rcv_finish+0/240>
Trace; c01ccf3c <nf_iterate+2c/80>
Trace; c01d7dca <ip_rcv_finish+1ea/240>
Trace; c01d7be0 <ip_rcv_finish+0/240>
Trace; c01cd26a <nf_hook_slow+da/180>
Trace; c01d7be0 <ip_rcv_finish+0/240>
Trace; c01cd2a1 <nf_hook_slow+111/180>
Trace; c014da0a <iget4+ea/100>
Trace; c01d7a1d <ip_rcv+3dd/430>
Trace; c01d7be0 <ip_rcv_finish+0/240>
Trace; c01fa3a5 <inet_sendmsg+35/40>
Trace; c01c728c <net_rx_action+1cc/300>
Trace; c68b0ceb <[nfsd]nfsd_lookup+6b/400>
Trace; c68b69df <[nfsd]nfsd3_proc_lookup+cf/e0>
Trace; c68bdea0 <[nfsd]nfsd_procedures3+60/2c0>
Trace; c68ad5d1 <[nfsd]nfsd_dispatch+c1/170>
Trace; c68bdea0 <[nfsd]nfsd_procedures3+60/2c0>
Trace; c6873d23 <[sunrpc]svc_process+3f3/5b0>
Trace; c68bd738 <[nfsd]nfsd_version3+0/10>
Trace; c68bd758 <[nfsd]nfsd_program+0/18>
Trace; c68ad3b7 <[nfsd]nfsd+227/380>
Trace; c68bd720 <[nfsd]nfsd_list+0/0>
Trace; c0105676 <kernel_thread+26/30>
Trace; c68ad190 <[nfsd]nfsd+0/380>
Code;  c014b13d <dput+ad/180>
00000000 <_EIP>:
Code;  c014b13d <dput+ad/180>   <=====
   0:   89 50 04                  mov    %edx,0x4(%eax)   <=====
Code;  c014b140 <dput+b0/180>
   3:   89 02                     mov    %eax,(%edx)
Code;  c014b142 <dput+b2/180>
   5:   c7 43 28 00 00 00 00      movl   $0x0,0x28(%ebx)
Code;  c014b149 <dput+b9/180>
   c:   c7 41 04 00 00 00 00      movl   $0x0,0x4(%ecx)
Code;  c014b150 <dput+c0/180>
  13:   8b 00                     mov    (%eax),%eax

Comment 2 Michael E Brown 2001-06-10 00:50:10 UTC

Oops, the following statement in the bug report was poorly worded:
>>IBM DB2 EE, creating a database, or just about anything over NFS will 
cause this.

It should have read:

In IBM DB2 EE, doing just about anything over NFS will cause a kernel panic.

Comment 3 Kostas Georgiou 2001-06-11 13:50:31 UTC

This looks just like the problem i have with my file server, almost any program
running under Tru64 and produces a lot of output (200+Mb) crashes the file
server (redhat 7.1). 

I'll try to find the time and build an rpm today or tomorrow and see if this
patch cures the problem.

Comment 4 Doug Ledford 2001-07-10 19:22:42 UTC

This patch has been incorporated into the upstream kernels and should already be
in the latest rawhide kernel as well as likely in the next errata kernel for 7.1

Note You need to log in before you can comment on or make changes to this bug.