kernel BUG at kernel/rtmutex.c:808! invalid opcode: 0000 [#1] PREEMPT SMP last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:01:04.6/class Modules linked in: nfs nfs_acl auth_rpcgss autofs4 i2c_dev i2c_core hidp rfcomm l2cap crc16 bluetooth rfkill lockd sunrpc ipv6 loop dm_multipath scsi_dh video output sbs sbshc battery ac parport_pc lp parport bnx2 ipmi_devintf(+) hpwdt hpilo ipmi_si(+) ipmi_msghandler snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq button snd_seq_device serio_raw snd_pcm_oss snd_mixer_oss snd_pcm tpm_tis tpm tpm_bios snd_timer pata_acpi snd iTCO_wdt i5k_amb ata_generic hwmon iTCO_vendor_support soundcore snd_page_alloc pcspkr dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod ata_piix libata cciss sd_mod crc_t10dif scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: nf_defrag_ipv4] Pid: 1040, comm: test1 Tainted: G W 2.6.33.4-rt20.18.el5rt #1 /ProLiant DL380 G5 EIP: 0060:[<c06ec419>] EFLAGS: 00010046 CPU: 0 EIP is at rt_spin_lock_slowlock+0x36/0x13e EAX: f7760f50 EBX: ede01e40 ECX: 00000292 EDX: f7760f50 ESI: ede01e40 EDI: 00000292 EBP: f0781d2c ESP: f0781cdc DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 preempt:00000001 Process test1 (pid: 1040, ti=f0781000 task=f7760f50 task.ti=f0781000) Stack: ff603194 ede01dc8 f0781000 f0781d3c 000080d0 f74003c0 f0781d10 c04e4461 <0> 00000000 f74003c0 f0781d0c c06ec84a f3efb980 f85224ca 00000000 f74003c0 <0> f3efba40 ede01e40 ede01e40 00000000 f0781d38 c0463773 ede01dc8 f0781d40 Call Trace: [<c04e4461>] ? __inode_add_to_lists+0xb3/0xba [<c06ec84a>] ? rt_spin_unlock+0x12/0x14 [<f85224ca>] ? kzalloc+0x10/0x12 [nfs] [<c0463773>] ? rt_spin_lock_fastlock+0x41/0x44 [<c06ec872>] ? rt_spin_lock+0x12/0x14 [<c04e412e>] ? igrab+0x19/0x39 [<f8523631>] ? nfs4_get_open_state+0xe5/0x15c [nfs] [<f8519056>] ? nfs4_opendata_to_nfs4_state+0x166/0x20a [nfs] [<f85194c5>] ? nfs4_do_open+0x213/0x2f0 [nfs] [<f851bdaf>] ? nfs4_atomic_open+0xb5/0x157 [nfs] [<c06e01a4>] ? quirk_nopciamd+0x46/0x4d [<c04cdd02>] ? slab_irq_enable+0x39/0x6c [<c04ceef1>] ? kmem_cache_alloc+0x3d/0xad [<c04e2aef>] ? d_alloc+0x22/0x170 [<f85051c2>] ? nfs_atomic_lookup+0x97/0xd7 [nfs] [<c04e2c32>] ? d_alloc+0x165/0x170 [<c04dad7a>] ? __lookup_hash+0xa2/0xbc [<c04dadbb>] ? lookup_hash+0x27/0x2a [<c04dd089>] ? do_filp_open+0x21a/0x7c9 [<c04a844f>] ? unlock_page+0x13/0x15 [<c04bcf54>] ? handle_mm_fault+0x148/0x623 [<c06ec84a>] ? rt_spin_unlock+0x12/0x14 [<c04e678b>] ? alloc_fd+0xc9/0xd3 [<c04d1849>] ? do_sys_open+0x58/0x10d [<c04642f3>] ? rt_up_read+0x1c/0x1e [<c04d1952>] ? sys_open+0x28/0x2e [<c04d197d>] ? sys_creat+0x25/0x27 [<c0402b93>] ? sysenter_do_call+0x12/0x22 Code: 44 00 00 89 c6 c7 45 e8 00 00 00 00 e8 33 06 00 00 89 c7 89 f0 e8 9d 73 d7 ff 8b 46 14 64 8b 15 ac 74 94 c0 83 e0 fc 39 d0 75 04 <0f> 0b eb fe 8b 02 e8 99 71 d7 ff 89 45 b8 64 8b 1d ac 74 94 c0 EIP: [<c06ec419>] rt_spin_lock_slowlock+0x36/0x13e SS:ESP 0068:f0781cdc ---[ end trace 931201f9fda900a0 ]--- Kernel panic - not syncing: Fatal exception Pid: 1040, comm: test1 Tainted: G D W 2.6.33.4-rt20.18.el5rt #1 Call Trace: [<c043b104>] panic+0x55/0x101 [<c06edde4>] oops_end+0x98/0xa5 [<c04055e0>] die+0x58/0x5e [<c06ed690>] do_trap+0x8e/0xa7 [<c04039bb>] ? do_invalid_op+0x0/0x96 [<c0403a47>] do_invalid_op+0x8c/0x96 [<c06ec419>] ? rt_spin_lock_slowlock+0x36/0x13e [<c04359f6>] ? check_preempt_wakeup+0x11b/0x174 [<c06ecb16>] ? _raw_spin_unlock_irqrestore+0x21/0x2c [<c04370dc>] ? try_to_wake_up+0x32d/0x341 [<c06ec84a>] ? rt_spin_unlock+0x12/0x14 [<c06ed437>] error_code+0x73/0x78 [<c06ec419>] ? rt_spin_lock_slowlock+0x36/0x13e [<c04e4461>] ? __inode_add_to_lists+0xb3/0xba [<c06ec84a>] ? rt_spin_unlock+0x12/0x14 [<f85224ca>] ? kzalloc+0x10/0x12 [nfs] [<c0463773>] rt_spin_lock_fastlock+0x41/0x44 [<c06ec872>] rt_spin_lock+0x12/0x14 [<c04e412e>] igrab+0x19/0x39 [<f8523631>] nfs4_get_open_state+0xe5/0x15c [nfs] [<f8519056>] nfs4_opendata_to_nfs4_state+0x166/0x20a [nfs] [<f85194c5>] nfs4_do_open+0x213/0x2f0 [nfs] [<f851bdaf>] nfs4_atomic_open+0xb5/0x157 [nfs] [<c06e01a4>] ? quirk_nopciamd+0x46/0x4d [<c04cdd02>] ? slab_irq_enable+0x39/0x6c [<c04ceef1>] ? kmem_cache_alloc+0x3d/0xad [<c04e2aef>] ? d_alloc+0x22/0x170 [<f85051c2>] nfs_atomic_lookup+0x97/0xd7 [nfs] [<c04e2c32>] ? d_alloc+0x165/0x170 [<c04dad7a>] __lookup_hash+0xa2/0xbc [<c04dadbb>] lookup_hash+0x27/0x2a [<c04dd089>] do_filp_open+0x21a/0x7c9 [<c04a844f>] ? unlock_page+0x13/0x15 [<c04bcf54>] ? handle_mm_fault+0x148/0x623 [<c06ec84a>] ? rt_spin_unlock+0x12/0x14 [<c04e678b>] ? alloc_fd+0xc9/0xd3 [<c04d1849>] do_sys_open+0x58/0x10d [<c04642f3>] ? rt_up_read+0x1c/0x1e [<c04d1952>] sys_open+0x28/0x2e [<c04d197d>] sys_creat+0x25/0x27 [<c0402b93>] sysenter_do_call+0x12/0x22
Created attachment 416760 [details] nfs fix from John Stultz
The patch seems to have fixed the kernel panics, but now it uncovered the next bug, observed in almost all of the tests: ======= I386 http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=14359093 http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=14359078 http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=14359152 http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=14359152 ... ------------[ cut here ]------------ WARNING: at fs/namespace.c:1203 umount_tree+0xce/0x10b() Hardware name: ProLiant DL320 G5 Modules linked in: nfs nfs_acl auth_rpcgss autofs4 i2c_dev i2c_core hidp rfcomm l2cap crc16 bluetooth rfkill lockd sunrpc ipv6 loop dm_multipath scsi_dh video output sbs sbshc battery ac parport_pc lp parport tg3 sg ipmi_devintf hpilo ipmi_si ipmi_msghandler hpwdt snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm serio_raw button snd_timer snd soundcore tpm_tis snd_page_alloc tpm tpm_bios iTCO_wdt iTCO_vendor_support pcspkr dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod pata_acpi ata_piix ata_generic libata sd_mod crc_t10dif scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: nf_defrag_ipv4] Pid: 13811, comm: mount.nfs4 Tainted: G W 2.6.33.4-rt20.19.el5rt #1 Call Trace: [<c04e7ca7>] ? umount_tree+0xce/0x10b [<c043b03c>] warn_slowpath_common+0x7a/0x91 [<c04e7ca7>] ? umount_tree+0xce/0x10b [<c043b06d>] warn_slowpath_null+0x1a/0x1c [<c04e7ca7>] umount_tree+0xce/0x10b [<c04e83fe>] put_mnt_ns+0x83/0xa7 [<c04dc835>] ? vfs_path_lookup+0x7e/0x8d [<f84c8acb>] nfs_follow_remote_path+0x4b/0xf8 [nfs] [<c04cdd02>] ? slab_irq_enable+0x39/0x6c [<f84c8a76>] ? nfs_do_root_mount+0x70/0x7a [nfs] [<f84c8bdf>] nfs4_try_mount+0x67/0x9f [nfs] [<c04b86db>] ? strndup_user+0x45/0x64 [<f84c8e46>] nfs4_get_sb+0x22f/0x2a4 [nfs] [<c04d1595>] ? __alloc_percpu+0xf/0x12 [<c04d54ea>] vfs_kern_mount+0x8b/0x131 [<c04e6ba4>] ? get_fs_type+0x39/0x8d [<c04d5661>] do_kern_mount+0x3c/0xbb [<c04e92a9>] do_mount+0x5f9/0x669 [<c04bac08>] ? page_address+0x17/0x5b [<c04acdb9>] ? __get_free_pages+0x27/0x2d [<c04e7ae7>] ? copy_mount_options+0x2c/0xdd [<c04e9386>] sys_mount+0x6d/0x99 [<c0402b93>] sysenter_do_call+0x12/0x22 ---[ end trace d817ceec35a87c27 ]--- ======= X86_64 http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=14372046 http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=14371209 ... ------------[ cut here ]------------ WARNING: at fs/namespace.c:1203 umount_tree+0xf3/0x13a() Hardware name: ProLiant DL320 G5 Modules linked in: nfs nfs_acl auth_rpcgss autofs4 i2c_dev i2c_core hidp rfcomm l2cap crc16 bluetooth rfkill lockd sunrpc ipv6 loop dm_multipath scsi_dh video output sbs sbshc battery ac parport_pc lp parport tg3 sg ipmi_devintf ipmi_si ipmi_msghandler hpwdt hpilo snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device button snd_pcm_oss snd_mixer_oss tpm_tis tpm snd_pcm serio_raw tpm_bios snd_timer snd soundcore snd_page_alloc shpchp iTCO_wdt i3000_edac pcspkr iTCO_vendor_support edac_core dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod pata_acpi ata_piix ata_generic libata sd_mod crc_t10dif scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: nf_defrag_ipv4] Pid: 23073, comm: mount.nfs4 Tainted: G W 2.6.33.4-rt20.19.el5rt #1 Call Trace: [<ffffffff8110cd74>] ? umount_tree+0xf3/0x13a [<ffffffff8104238b>] warn_slowpath_common+0x7c/0x94 [<ffffffff810423b7>] warn_slowpath_null+0x14/0x16 [<ffffffff8110cd74>] umount_tree+0xf3/0x13a [<ffffffff8110d674>] put_mnt_ns+0x9b/0xc5 [<ffffffffa049b654>] nfs_follow_remote_path+0x5c/0x12f [nfs] [<ffffffff810eca71>] ? slab_irq_enable+0x47/0x93 [<ffffffffa049b5e6>] ? nfs_do_root_mount+0x86/0x98 [nfs] [<ffffffff810d0497>] ? strndup_user+0x42/0x80 [<ffffffffa049b7a1>] nfs4_try_mount+0x7a/0xb4 [nfs] [<ffffffffa049ba53>] nfs4_get_sb+0x278/0x302 [nfs] [<ffffffff810f7957>] vfs_kern_mount+0x9e/0x14f [<ffffffff810f7b06>] do_kern_mount+0x4c/0xdd [<ffffffff8110e7e0>] do_mount+0x6d3/0x759 [<ffffffff8110e8ea>] sys_mount+0x84/0xbd [<ffffffff8108e447>] ? audit_syscall_entry+0x103/0x12f [<ffffffff81002d1b>] system_call_fastpath+0x16/0x1b ---[ end trace 1cec4b11ad46cb11 ]---
Just to confirm, its just a MNT_MOUNTED related WARN_ON, and the tests completed successfully? Did the kernel you were testing include commit e86825210e29c0be2691e7055f942555d02a4314 ?
Hrmm.. So looking at nfs_follow_remote_path() static int nfs_follow_remote_path(struct vfsmount *root_mnt, const char *export_path, struct vfsmount *mnt_target) { struct mnt_namespace *ns_private; struct nameidata nd; struct super_block *s; int ret; ns_private = create_mnt_ns(root_mnt); ret = PTR_ERR(ns_private); if (IS_ERR(ns_private)) goto out_mntput; ret = vfs_path_lookup(root_mnt->mnt_root, root_mnt, export_path, LOOKUP_FOLLOW, &nd); put_mnt_ns(ns_private); ... The issue here is we call create_mnt_ns(), with allocates the namespace, then seemingly never do anything with the new namespace then call put_mnt_ns, which tries to unmount it. Trying to read over the history for the rational here. Commit 301933a0acfdec837fd8b4884093b3f0fff01d8a seems to be where this is introduced, and has the rational. I'm still trying to figure out if its correct or if its an oddball case where we want to cleanup the vfsmount even if its not mounted.
Bah. Copied the wrong hash. c02d7adf8c5429727a98bad1d039bccad4c61c50 is where it was introduced.
(In reply to comment #3) > Just to confirm, its just a MNT_MOUNTED related WARN_ON, and the tests > completed successfully? > > Did the kernel you were testing include commit > e86825210e29c0be2691e7055f942555d02a4314 ? Yes, it does include the above commit.