Bug 596111 - nfs related kernel BUG
Summary: nfs related kernel BUG
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: realtime-kernel
Version: 1.2
Hardware: All
OS: Linux
medium
high
Target Milestone: 1.3
: ---
Assignee: John Kacur
QA Contact: David Sommerseth
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-05-26 10:42 UTC by John Kacur
Modified: 2016-05-22 23:30 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-09-15 10:13:00 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
nfs fix from John Stultz (6.92 KB, patch)
2010-05-26 10:46 UTC, John Kacur
no flags Details | Diff

Description John Kacur 2010-05-26 10:42:48 UTC
kernel BUG at kernel/rtmutex.c:808!
invalid opcode: 0000 [#1] PREEMPT SMP 
last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:01:04.6/class
Modules linked in: nfs nfs_acl auth_rpcgss autofs4 i2c_dev i2c_core hidp rfcomm l2cap crc16 bluetooth rfkill lockd sunrpc ipv6 loop dm_multipath scsi_dh video output sbs sbshc battery ac parport_pc lp parport bnx2 ipmi_devintf(+) hpwdt hpilo ipmi_si(+) ipmi_msghandler snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq button snd_seq_device serio_raw snd_pcm_oss snd_mixer_oss snd_pcm tpm_tis tpm tpm_bios snd_timer pata_acpi snd iTCO_wdt i5k_amb ata_generic hwmon iTCO_vendor_support soundcore snd_page_alloc pcspkr dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod ata_piix libata cciss sd_mod crc_t10dif scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: nf_defrag_ipv4]

Pid: 1040, comm: test1 Tainted: G        W   2.6.33.4-rt20.18.el5rt #1 /ProLiant DL380 G5
EIP: 0060:[<c06ec419>] EFLAGS: 00010046 CPU: 0
EIP is at rt_spin_lock_slowlock+0x36/0x13e
EAX: f7760f50 EBX: ede01e40 ECX: 00000292 EDX: f7760f50
ESI: ede01e40 EDI: 00000292 EBP: f0781d2c ESP: f0781cdc
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 preempt:00000001
Process test1 (pid: 1040, ti=f0781000 task=f7760f50 task.ti=f0781000)
Stack:
 ff603194 ede01dc8 f0781000 f0781d3c 000080d0 f74003c0 f0781d10 c04e4461
<0> 00000000 f74003c0 f0781d0c c06ec84a f3efb980 f85224ca 00000000 f74003c0
<0> f3efba40 ede01e40 ede01e40 00000000 f0781d38 c0463773 ede01dc8 f0781d40
Call Trace:
 [<c04e4461>] ? __inode_add_to_lists+0xb3/0xba
 [<c06ec84a>] ? rt_spin_unlock+0x12/0x14
 [<f85224ca>] ? kzalloc+0x10/0x12 [nfs]
 [<c0463773>] ? rt_spin_lock_fastlock+0x41/0x44
 [<c06ec872>] ? rt_spin_lock+0x12/0x14
 [<c04e412e>] ? igrab+0x19/0x39
 [<f8523631>] ? nfs4_get_open_state+0xe5/0x15c [nfs]
 [<f8519056>] ? nfs4_opendata_to_nfs4_state+0x166/0x20a [nfs]
 [<f85194c5>] ? nfs4_do_open+0x213/0x2f0 [nfs]
 [<f851bdaf>] ? nfs4_atomic_open+0xb5/0x157 [nfs]
 [<c06e01a4>] ? quirk_nopciamd+0x46/0x4d
 [<c04cdd02>] ? slab_irq_enable+0x39/0x6c
 [<c04ceef1>] ? kmem_cache_alloc+0x3d/0xad
 [<c04e2aef>] ? d_alloc+0x22/0x170
 [<f85051c2>] ? nfs_atomic_lookup+0x97/0xd7 [nfs]
 [<c04e2c32>] ? d_alloc+0x165/0x170
 [<c04dad7a>] ? __lookup_hash+0xa2/0xbc
 [<c04dadbb>] ? lookup_hash+0x27/0x2a
 [<c04dd089>] ? do_filp_open+0x21a/0x7c9
 [<c04a844f>] ? unlock_page+0x13/0x15
 [<c04bcf54>] ? handle_mm_fault+0x148/0x623
 [<c06ec84a>] ? rt_spin_unlock+0x12/0x14
 [<c04e678b>] ? alloc_fd+0xc9/0xd3
 [<c04d1849>] ? do_sys_open+0x58/0x10d
 [<c04642f3>] ? rt_up_read+0x1c/0x1e
 [<c04d1952>] ? sys_open+0x28/0x2e
 [<c04d197d>] ? sys_creat+0x25/0x27
 [<c0402b93>] ? sysenter_do_call+0x12/0x22
Code: 44 00 00 89 c6 c7 45 e8 00 00 00 00 e8 33 06 00 00 89 c7 89 f0 e8 9d 73 d7 ff 8b 46 14 64 8b 15 ac 74 94 c0 83 e0 fc 39 d0 75 04 <0f> 0b eb fe 8b 02 e8 99 71 d7 ff 89 45 b8 64 8b 1d ac 74 94 c0 
EIP: [<c06ec419>] rt_spin_lock_slowlock+0x36/0x13e SS:ESP 0068:f0781cdc
---[ end trace 931201f9fda900a0 ]---
Kernel panic - not syncing: Fatal exception
Pid: 1040, comm: test1 Tainted: G      D W   2.6.33.4-rt20.18.el5rt #1
Call Trace:
 [<c043b104>] panic+0x55/0x101
 [<c06edde4>] oops_end+0x98/0xa5
 [<c04055e0>] die+0x58/0x5e
 [<c06ed690>] do_trap+0x8e/0xa7
 [<c04039bb>] ? do_invalid_op+0x0/0x96
 [<c0403a47>] do_invalid_op+0x8c/0x96
 [<c06ec419>] ? rt_spin_lock_slowlock+0x36/0x13e
 [<c04359f6>] ? check_preempt_wakeup+0x11b/0x174
 [<c06ecb16>] ? _raw_spin_unlock_irqrestore+0x21/0x2c
 [<c04370dc>] ? try_to_wake_up+0x32d/0x341
 [<c06ec84a>] ? rt_spin_unlock+0x12/0x14
 [<c06ed437>] error_code+0x73/0x78
 [<c06ec419>] ? rt_spin_lock_slowlock+0x36/0x13e
 [<c04e4461>] ? __inode_add_to_lists+0xb3/0xba
 [<c06ec84a>] ? rt_spin_unlock+0x12/0x14
 [<f85224ca>] ? kzalloc+0x10/0x12 [nfs]
 [<c0463773>] rt_spin_lock_fastlock+0x41/0x44
 [<c06ec872>] rt_spin_lock+0x12/0x14
 [<c04e412e>] igrab+0x19/0x39
 [<f8523631>] nfs4_get_open_state+0xe5/0x15c [nfs]
 [<f8519056>] nfs4_opendata_to_nfs4_state+0x166/0x20a [nfs]
 [<f85194c5>] nfs4_do_open+0x213/0x2f0 [nfs]
 [<f851bdaf>] nfs4_atomic_open+0xb5/0x157 [nfs]
 [<c06e01a4>] ? quirk_nopciamd+0x46/0x4d
 [<c04cdd02>] ? slab_irq_enable+0x39/0x6c
 [<c04ceef1>] ? kmem_cache_alloc+0x3d/0xad
 [<c04e2aef>] ? d_alloc+0x22/0x170
 [<f85051c2>] nfs_atomic_lookup+0x97/0xd7 [nfs]
 [<c04e2c32>] ? d_alloc+0x165/0x170
 [<c04dad7a>] __lookup_hash+0xa2/0xbc
 [<c04dadbb>] lookup_hash+0x27/0x2a
 [<c04dd089>] do_filp_open+0x21a/0x7c9
 [<c04a844f>] ? unlock_page+0x13/0x15
 [<c04bcf54>] ? handle_mm_fault+0x148/0x623
 [<c06ec84a>] ? rt_spin_unlock+0x12/0x14
 [<c04e678b>] ? alloc_fd+0xc9/0xd3
 [<c04d1849>] do_sys_open+0x58/0x10d
 [<c04642f3>] ? rt_up_read+0x1c/0x1e
 [<c04d1952>] sys_open+0x28/0x2e
 [<c04d197d>] sys_creat+0x25/0x27
 [<c0402b93>] sysenter_do_call+0x12/0x22

Comment 1 John Kacur 2010-05-26 10:46:44 UTC
Created attachment 416760 [details]
nfs fix from John Stultz

Comment 2 Luis Claudio R. Goncalves 2010-05-27 12:12:32 UTC
The patch seems to have fixed the kernel panics, but now it uncovered the next bug, observed in almost all of the tests:


======= I386

http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=14359093
http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=14359078
http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=14359152
http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=14359152
...

------------[ cut here ]------------
WARNING: at fs/namespace.c:1203 umount_tree+0xce/0x10b()
Hardware name: ProLiant DL320 G5 
Modules linked in: nfs nfs_acl auth_rpcgss autofs4 i2c_dev i2c_core hidp rfcomm l2cap crc16 bluetooth rfkill lockd sunrpc ipv6 loop dm_multipath scsi_dh video output sbs sbshc battery ac parport_pc lp parport tg3 sg ipmi_devintf hpilo ipmi_si ipmi_msghandler hpwdt snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm serio_raw button snd_timer snd soundcore tpm_tis snd_page_alloc tpm tpm_bios iTCO_wdt iTCO_vendor_support pcspkr dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod pata_acpi ata_piix ata_generic libata sd_mod crc_t10dif scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: nf_defrag_ipv4]
Pid: 13811, comm: mount.nfs4 Tainted: G        W   2.6.33.4-rt20.19.el5rt #1
Call Trace:
 [<c04e7ca7>] ? umount_tree+0xce/0x10b
 [<c043b03c>] warn_slowpath_common+0x7a/0x91
 [<c04e7ca7>] ? umount_tree+0xce/0x10b
 [<c043b06d>] warn_slowpath_null+0x1a/0x1c
 [<c04e7ca7>] umount_tree+0xce/0x10b
 [<c04e83fe>] put_mnt_ns+0x83/0xa7
 [<c04dc835>] ? vfs_path_lookup+0x7e/0x8d
 [<f84c8acb>] nfs_follow_remote_path+0x4b/0xf8 [nfs]
 [<c04cdd02>] ? slab_irq_enable+0x39/0x6c
 [<f84c8a76>] ? nfs_do_root_mount+0x70/0x7a [nfs]
 [<f84c8bdf>] nfs4_try_mount+0x67/0x9f [nfs]
 [<c04b86db>] ? strndup_user+0x45/0x64
 [<f84c8e46>] nfs4_get_sb+0x22f/0x2a4 [nfs]
 [<c04d1595>] ? __alloc_percpu+0xf/0x12
 [<c04d54ea>] vfs_kern_mount+0x8b/0x131
 [<c04e6ba4>] ? get_fs_type+0x39/0x8d
 [<c04d5661>] do_kern_mount+0x3c/0xbb
 [<c04e92a9>] do_mount+0x5f9/0x669
 [<c04bac08>] ? page_address+0x17/0x5b
 [<c04acdb9>] ? __get_free_pages+0x27/0x2d
 [<c04e7ae7>] ? copy_mount_options+0x2c/0xdd
 [<c04e9386>] sys_mount+0x6d/0x99
 [<c0402b93>] sysenter_do_call+0x12/0x22
---[ end trace d817ceec35a87c27 ]---



======= X86_64

http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=14372046
http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=14371209
...

------------[ cut here ]------------
WARNING: at fs/namespace.c:1203 umount_tree+0xf3/0x13a()
Hardware name: ProLiant DL320 G5 
Modules linked in: nfs nfs_acl auth_rpcgss autofs4 i2c_dev i2c_core hidp rfcomm l2cap crc16 bluetooth rfkill lockd sunrpc ipv6 loop dm_multipath scsi_dh video output sbs sbshc battery ac parport_pc lp parport tg3 sg ipmi_devintf ipmi_si ipmi_msghandler hpwdt hpilo snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device button snd_pcm_oss snd_mixer_oss tpm_tis tpm snd_pcm serio_raw tpm_bios snd_timer snd soundcore snd_page_alloc shpchp iTCO_wdt i3000_edac pcspkr iTCO_vendor_support edac_core dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod pata_acpi ata_piix ata_generic libata sd_mod crc_t10dif scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: nf_defrag_ipv4]
Pid: 23073, comm: mount.nfs4 Tainted: G        W   2.6.33.4-rt20.19.el5rt #1
Call Trace:
 [<ffffffff8110cd74>] ? umount_tree+0xf3/0x13a
 [<ffffffff8104238b>] warn_slowpath_common+0x7c/0x94
 [<ffffffff810423b7>] warn_slowpath_null+0x14/0x16
 [<ffffffff8110cd74>] umount_tree+0xf3/0x13a
 [<ffffffff8110d674>] put_mnt_ns+0x9b/0xc5
 [<ffffffffa049b654>] nfs_follow_remote_path+0x5c/0x12f [nfs]
 [<ffffffff810eca71>] ? slab_irq_enable+0x47/0x93
 [<ffffffffa049b5e6>] ? nfs_do_root_mount+0x86/0x98 [nfs]
 [<ffffffff810d0497>] ? strndup_user+0x42/0x80
 [<ffffffffa049b7a1>] nfs4_try_mount+0x7a/0xb4 [nfs]
 [<ffffffffa049ba53>] nfs4_get_sb+0x278/0x302 [nfs]
 [<ffffffff810f7957>] vfs_kern_mount+0x9e/0x14f
 [<ffffffff810f7b06>] do_kern_mount+0x4c/0xdd
 [<ffffffff8110e7e0>] do_mount+0x6d3/0x759
 [<ffffffff8110e8ea>] sys_mount+0x84/0xbd
 [<ffffffff8108e447>] ? audit_syscall_entry+0x103/0x12f
 [<ffffffff81002d1b>] system_call_fastpath+0x16/0x1b
---[ end trace 1cec4b11ad46cb11 ]---

Comment 3 john stultz 2010-05-27 22:18:11 UTC
Just to confirm, its just a MNT_MOUNTED related WARN_ON, and the tests completed successfully?

Did the kernel you were testing include commit e86825210e29c0be2691e7055f942555d02a4314 ?

Comment 4 john stultz 2010-05-27 23:33:13 UTC
Hrmm.. So looking at nfs_follow_remote_path()

static int nfs_follow_remote_path(struct vfsmount *root_mnt,
		const char *export_path, struct vfsmount *mnt_target)
{
	struct mnt_namespace *ns_private;
	struct nameidata nd;
	struct super_block *s;
	int ret;

	ns_private = create_mnt_ns(root_mnt);
	ret = PTR_ERR(ns_private);
	if (IS_ERR(ns_private))
		goto out_mntput;

	ret = vfs_path_lookup(root_mnt->mnt_root, root_mnt,
			export_path, LOOKUP_FOLLOW, &nd);

	put_mnt_ns(ns_private);
...

The issue here is we call create_mnt_ns(), with allocates the namespace, then seemingly never do anything with the new namespace then call put_mnt_ns, which tries to unmount it.

Trying to read over the history for the rational here. Commit 301933a0acfdec837fd8b4884093b3f0fff01d8a seems to be where this is introduced, and has the rational. I'm still trying to figure out if its correct or if its an oddball case where we want to cleanup the vfsmount even if its not mounted.

Comment 5 john stultz 2010-05-27 23:34:04 UTC
Bah. Copied the wrong hash. c02d7adf8c5429727a98bad1d039bccad4c61c50 is where it was introduced.

Comment 7 John Kacur 2010-05-28 10:48:26 UTC
(In reply to comment #3)
> Just to confirm, its just a MNT_MOUNTED related WARN_ON, and the tests
> completed successfully?
> 
> Did the kernel you were testing include commit
> e86825210e29c0be2691e7055f942555d02a4314 ?    

Yes, it does include the above commit.


Note You need to log in before you can comment on or make changes to this bug.