Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 569342

Summary: [5.4] nfsd dereferences uninitialized list head on error exit in nfsd4_list_rec_dir()
Product: Red Hat Enterprise Linux 5 Reporter: Sachin Prabhu <sprabhu>
Component: kernelAssignee: Jeff Layton <jlayton>
Status: CLOSED ERRATA QA Contact: yanfu,wang <yanwang>
Severity: medium Docs Contact:
Priority: low    
Version: 5.4CC: cward, eguan, jlayton, moshiro, rwheeler, steved, yanwang
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-01-13 20:36:22 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
System tap script to reproduce this problem none

Description Sachin Prabhu 2010-03-01 11:13:01 UTC
A user has pointed out the following bug which affects the RHEL 5 kernel

http://marc.info/?l=git-commits-head&m=122835242904003&w=2

The user hasn't hit this in a real world environment yet. However they would like this to be fixed in the RHEL 5 kernel.

Upstream patch:

commit e4625eb826de4f6774ee602c442ba23b686bdcc7
Author: J. Bruce Fields <bfields.edu>
Date:   Mon Nov 24 10:32:46 2008 -0600

   nfsd: use of unitialized list head on error exit in nfs4recover.c
   
The issue is caused by a list_entry() called on an uninitialised list. The error is triggered by an error in the call to dentry_open().

An artificial reproducer using a system tap is attached to this issue. This produces the following Oops message

NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
Unable to handle kernel NULL pointer dereference at 0000000000000008 RIP: 
 [<ffffffff801511e0>] list_del+0x1/0x71
PGD 0 
Oops: 0000 [1] SMP 
last sysfs file: /module/xt_tcpudp/sections/__versions
CPU 3 
Modules linked in: stap_f4f59a63ff17cb87b40f0a59a6173f0c_7612(U) nfsd exportfs auth_rpcgss nfs fscache nfs_acl ipt_MASQUERADE iptable_nat ip_nat xt_state ip_conntrack nfnetlink ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio cxgb3i cxgb3 8021q libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi cpufreq_ondemand acpi_cpufreq freq_table dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac ipv6 xfrm_nalgo crypto_api parport_pc lp parport snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss sr_mod cdrom snd_pcm shpchp snd_timer snd_page_alloc snd_hwdep snd soundcore r8169 pcspkr mii i2c_i801 sg serio_raw i2c_core dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod usb_storage ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 5505, comm: rpc.nfsd Tainted: G      2.6.18-164.el5 #1
RIP: 0010:[<ffffffff801511e0>]  [<ffffffff801511e0>] list_del+0x1/0x71
RSP: 0018:ffff8100cf1a9de8  EFLAGS: 00010213
RAX: ffffffffffffffea RBX: fffffffffffffff8 RCX: 00000000000004d6
RDX: ffff81012aa19dc0 RSI: ffff810104dc08e8 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000040000010
R10: ffff81012aa19dc0 R11: 00000000000000f0 R12: 0000000000000801
R13: 00000000ffffffea R14: ffffffff8887b13b R15: 0000000000000001
FS:  00002b12706b06e0(0000) GS:ffff8101043d5640(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 00000000ac339000 CR4: 00000000000006e0
Process rpc.nfsd (pid: 5505, threadinfo ffff8100cf1a8000, task ffff81012b7c9080)
Stack:  fffffffffffffff8 ffffffff8887b046 0000000000000000 0000000000000000
 ffff810104dc08e8 0000000000000000 0000000000000008 0000000000000000
 0000000000000801 ffff8100cf1a9f50 0000000000002000 ffffffff80065b2d
Call Trace:
 [<ffffffff8887b046>] :nfsd:nfsd4_list_rec_dir+0xf1/0x141
 [<ffffffff80065b2d>] kretprobe_trampoline+0x0/0x4b
 [<ffffffff88875f48>] :nfsd:nfs4_state_start+0xf1/0x18f
 [<ffffffff8885f3ae>] :nfsd:nfsd_svc+0x6c/0x1e9
 [<ffffffff8885ff8e>] :nfsd:write_threads+0x0/0xa9
 [<ffffffff8885fffd>] :nfsd:write_threads+0x6f/0xa9
 [<ffffffff8002b8e9>] get_zeroed_page+0x21/0x82
 [<ffffffff800f0900>] simple_transaction_get+0x8b/0xa5
 [<ffffffff8885ff8e>] :nfsd:write_threads+0x0/0xa9
 [<ffffffff8885fd59>] :nfsd:nfsctl_transaction_write+0x42/0x77
 [<ffffffff80016927>] vfs_write+0xce/0x174
 [<ffffffff800171df>] sys_write+0x45/0x6e
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0


Code: 48 8b 47 08 48 89 fb 48 8b 10 48 39 fa 74 1b 48 89 fe 31 c0 
RIP  [<ffffffff801511e0>] list_del+0x1/0x71
 RSP <ffff8100cf1a9de8>

Comment 1 Sachin Prabhu 2010-03-01 11:15:46 UTC
Created attachment 397041 [details]
System tap script to reproduce this problem

System-tap script to cause the call to dentry_open to fail when called from nfsd4_list_rec_dir()

Comment 2 Jeff Layton 2010-05-26 12:54:13 UTC
Looks like a reasonable patch to consider.

Comment 4 RHEL Program Management 2010-06-07 19:49:39 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 6 Jarod Wilson 2010-06-29 13:35:12 UTC
in kernel-2.6.18-205.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 7 Jarod Wilson 2010-06-29 13:39:34 UTC
Not sure yet what went wrong w/the release script, but that should have been "in kernel-2.6.18-204.el5" (in build 204, not 205).

Comment 13 yanfu,wang 2010-10-20 07:27:13 UTC
can reproduced on RHEL5-U5:
Starting NFS daemon: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
Unable to handle kernel NULL pointer dereference at 0000000000000008 RIP: 
 [<ffffffff80154d56>] list_del+0x1/0x71
PGD 0 
Oops: 0000 [1] SMP 
last sysfs file: /module/xfrm_nalgo/sections/__versions
CPU 3 
Modules linked in: stap_e6ebdefaa161a929d889430f4de44641_7609(U) nfs fscache nfsd exportfs nfs_acl auth_rpcgss radeon drm autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc cpufreq_ondemand powernow_k8 freq_table ipv6 xfrm_nalgo crypto_api loop dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac lp joydev snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss amd64_edac_mod snd_pcm shpchp snd_timer snd_page_alloc ide_cd i2c_piix4 edac_mc snd_hwdep parport_pc i2c_core cdrom floppy parport snd soundcore pcspkr r8169 serio_raw e1000 mii dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ahci libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 6374, comm: rpc.nfsd Tainted: G      2.6.18-194.el5 #1
RIP: 0010:[<ffffffff80154d56>]  [<ffffffff80154d56>] list_del+0x1/0x71
RSP: 0018:ffff81018d37dde8  EFLAGS: 00010213
RAX: ffffffffffffffea RBX: fffffffffffffff8 RCX: 00000000000005c9
RDX: ffff8101919780c0 RSI: 0000000000004000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000040000010
R10: ffff8101919780c0 R11: 00000000000000f0 R12: 0000000000000801
R13: 00000000ffffffea R14: ffffffff8881b2bf R15: 0000000000000001
FS:  00002adcb53ab6e0(0000) GS:ffff810107b97640(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 00000001903a5000 CR4: 00000000000006e0
Process rpc.nfsd (pid: 6374, threadinfo ffff81018d37c000, task ffff810191bbd860)
Stack:  fffffffffffffff8 ffffffff8881b1ca 0000000000000000 0000000000000000
 ffff8101a5325c48 0000000000000000 0000000000000008 0000000000000000
 0000000000000801 ffff81018d37df50 0000000000002000 ffffffff80066b5d
Call Trace:
 [<ffffffff8881b1ca>] :nfsd:nfsd4_list_rec_dir+0xf1/0x141
 [<ffffffff80066b5d>] kretprobe_trampoline+0x0/0x4b
 [<ffffffff88816172>] :nfsd:nfs4_state_start+0xf1/0x18f
 [<ffffffff887ff3ae>] :nfsd:nfsd_svc+0x6c/0x1e9
 [<ffffffff887fff8e>] :nfsd:write_threads+0x0/0xa9
 [<ffffffff887ffffd>] :nfsd:write_threads+0x6f/0xa9
 [<ffffffff8002bb08>] get_zeroed_page+0x21/0x82
 [<ffffffff800f3817>] simple_transaction_get+0x8b/0xa5
 [<ffffffff887fff8e>] :nfsd:write_threads+0x0/0xa9
 [<ffffffff887ffd59>] :nfsd:nfsctl_transaction_write+0x42/0x77
 [<ffffffff80016a49>] vfs_write+0xce/0x174
 [<ffffffff80017316>] sys_write+0x45/0x6e
 [<ffffffff8005e28d>] tracesys+0xd5/0xe0


Code: 48 8b 47 08 48 89 fb 48 8b 10 48 39 fa 74 1b 48 89 fe 31 c0 
RIP  [<ffffffff80154d56>] list_del+0x1/0x71
 RSP <ffff81018d37dde8>
CR2: 0000000000000008
 <0>Kernel panic - not syncing: Fatal exception


verified on RHEL5.6-Server-20101014.0 on i386 and x86_64, no kernel panic occurs on now.
# uname -a
Linux dell-per210-01.lab.bos.redhat.com 2.6.18-227.el5 #1 SMP Tue Oct 12 18:50:50 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

start nfs service and stop nfs service on a terminal.
# stap -vg nfsd.stp
Pass 1: parsed user script and 72 library script(s) using 87156virt/21488res/2600shr kb, in 250usr/20sys/294real ms.
Pass 2: analyzed script: 4 probe(s), 15 function(s), 3 embed(s), 4 global(s) using 165656virt/40656res/4348shr kb, in 920usr/580sys/1963real ms.
Pass 3: translated to C into "/tmp/stapHfFFAp/stap_64bc674c617e9f3dfe04f1947cf50457_7557.c" using 169504virt/46084res/8556shr kb, in 530usr/30sys/567real ms.
Pass 4: compiled C into "stap_64bc674c617e9f3dfe04f1947cf50457_7557.ko" in 4610usr/610sys/5291real ms.
Pass 5: starting run.
      0 rpc.nfsd(13283): -> nfsd4_list_rec_dir
	dir is v4recovery
dentry_open: v4recovery, inode.ino = 12091669, flag = 0
dentry_open: return = -22
     21 rpc.nfsd(13283): <- nfsd4_list_rec_dir: return = -22
      0 nfsd4(13285): -> nfsd4_list_rec_dir
	dir is v4recovery
dentry_open: v4recovery, inode.ino = 12091669, flag = 0
dentry_open: return = -22
     24 nfsd4(13285): <- nfsd4_list_rec_dir: return = -22

on another terminal, no kernel panic.
# /etc/init.d/nfs start
Starting NFS services:  [  OK  ]
Starting NFS quotas: [  OK  ]
Starting NFS daemon: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
nfsd4: failed loading clients from recovery directory v4recovery
NFSD: Failure reading reboot recovery data
NFSD: starting 90-second grace period
[  OK  ]
Starting NFS mountd: [  OK  ]

Comment 15 errata-xmlrpc 2011-01-13 20:36:22 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html