Hide Forgot
Description of problem: fsstress via nfs leads to kernel panic Version-Release number of selected component (if applicable): [root@intel-d3c69-01 ~]# uname -a Linux intel-d3c69-01.rhts.eng.bos.redhat.com 2.6.32-118.el6.i686 #1 SMP Tue Feb 22 11:12:47 EST 2011 i686 i686 i386 GNU/Linux How reproducible: Not sure, I haven't tested it mutiple times Steps to Reproduce: 1. Install fsstress from LTP 2. Setup nfs server, that could be: mkdir /home/testdir; mkdir /mnt/nfs echo "/home/testdir *(rw,no_root_squash)" >> /etc/exports service nfs start mount -t nfs localhost:/home/testdir /mnt/nfs 3. Run fsstress fsstress -d /mnt/nfs -n 1000 -p 1000 4. Wait for kernel panic, I guess it may take several hours Actual results: Kernel Panic with the following call trace BUG: unable to handle kernel NULL pointer dereference at 00000504 IP: [<c08227a5>] _spin_lock_irqsave+0x15/0x30 *pdpt = 000000002b7f0001 *pde = 0000000000000000 Oops: 0002 [#1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:1e.0/0000:01:01.0/local_cpus Modules linked in: nfs fscache nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc cpufreq_ondemand acpi_cpufreq ipv6 dm_mirror dm_region_hash dm_log ppdev parport_pc parport e100 mii microcode serio_raw i2c_i801 sg snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc e1000e ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif firewire_ohci firewire_core crc_itu_t ahci i915 drm_kms_helper drm i2c_algo_bit i2c_core video output dm_mod [last unloaded: scsi_wait_scan] Pid: 26595, comm: fsstress Not tainted (2.6.32-118.el6.i686 #1) Product Name To Be Filled By O.E.M. EIP: 0060:[<c08227a5>] EFLAGS: 00010002 CPU: 2 EIP is at _spin_lock_irqsave+0x15/0x30 EAX: 00000100 EBX: 00000002 ECX: 00000504 EDX: 00000001 ESI: 0000000b EDI: 00000001 EBP: bf85a4f8 ESP: ec3a5f80 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process fsstress (pid: 26595, ti=ec3a4000 task=dadae030 task.ti=ec3a4000) Stack: dadae000 c04691e9 c08278f4 00000060 ec3a5fb4 bf85a510 c08235d0 bf85a4f8 <0> c0823615 08049490 51eb851f bf85be58 c0823257 bf85be58 bf85a510 00000000 <0> bf85a510 bf85a510 bf85a4f8 bf85a510 0000007b 0000007b 00000000 00000033 Call Trace: [<c04691e9>] ? force_sig_info+0x29/0xd0 [<c08278f4>] ? iret_exc+0x68/0xa6e [<c08235d0>] ? do_device_not_available+0x0/0x60 [<c0823615>] ? do_device_not_available+0x45/0x60 [<c0823257>] ? error_code+0x73/0x78 Code: 30 d2 89 d0 c3 90 f0 83 28 01 79 05 e8 45 ff ff ff c3 8d 74 26 00 53 89 c1 9c 58 8d 74 26 00 89 c3 fa 90 8d 74 26 00 66 b8 00 01 <f0> 66 0f c1 01 38 e0 74 0e f3 90 8a 01 eb f6 66 83 39 00 75 f4 EIP: [<c08227a5>] _spin_lock_irqsave+0x15/0x30 SS:ESP 0068:ec3a5f80 CR2: 0000000000000504 ---[ end trace 62bb013f88be2a00 ]--- Kernel panic - not syncing: Fatal exception Pid: 26595, comm: fsstress Tainted: G D ---------------- 2.6.32-118.el6.i686 #1 Call Trace: [<c0820067>] ? panic+0x42/0xf9 [<c0823e6c>] ? oops_end+0xbc/0xd0 [<c04323b2>] ? no_context+0xc2/0x190 [<c043262b>] ? bad_area+0x3b/0x50 [<c0432ace>] ? __do_page_fault+0x34e/0x420 [<c082590a>] ? do_page_fault+0x2a/0x90 [<c0823ad8>] ? do_general_protection+0x48/0x210 [<c08258e0>] ? do_page_fault+0x0/0x90 [<c0823257>] ? error_code+0x73/0x78 [<c08227a5>] ? _spin_lock_irqsave+0x15/0x30 [<c04691e9>] ? force_sig_info+0x29/0xd0 [<c08278f4>] ? iret_exc+0x68/0xa6e [<c08235d0>] ? do_device_not_available+0x0/0x60 [<c0823615>] ? do_device_not_available+0x45/0x60 [<c0823257>] ? error_code+0x73/0x78 panic occurred, switching back to text console Expected results: No panic, test passes Additional info:
I'll test more via and without nfs, and update the results here.
fsstress blocked for more than 120 seconds on another i386 host, not panic INFO: task fsstress:5984 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. fsstress D f011dba8 0 5984 5975 0x00000000 f1d18030 00000086 c0540345 f011dba8 002a1068 5c5c8b82 00000000 00000004 c1f081a0 f06ce900 00000a72 c0ade1a0 c0ade1a0 f1d182d8 c0ade1a0 c0ad9bd4 c0ade1a0 6c81473b 00000a72 f1d182d8 ffffffff 6c812a96 c1f48700 f72c405c Call Trace: [<c0540345>] ? mntput_no_expire+0x15/0xd0 [<c043f41d>] ? enqueue_entity+0x37d/0x400 [<c043f931>] ? enqueue_task_fair+0x31/0x70 [<c08218f8>] ? __mutex_lock_slowpath+0xd8/0x140 [<c08217fd>] ? mutex_lock+0x1d/0x40 [<c054d7b0>] ? sync_filesystems+0x10/0x100 [<c054d8de>] ? sys_sync+0xe/0x40 [<c0409adf>] ? sysenter_do_call+0x12/0x28 INFO: task fsstress:5985 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. fsstress D db38bf4c 0 5985 5975 0x00000000 f1e98030 00000082 00000002 db38bf4c c1f03bd4 00000000 00000000 00000003 c1ec81a0 f06ce740 00000a75 c0ade1a0 c0ade1a0 f1e982d8 c0ade1a0 c0ad9bd4 c0ade1a0 0a3c2424 00000a75 f1e982d8 c1f03bd4 0a3c0b63 00aadd6a f1e98030 Call Trace: [<c043f41d>] ? enqueue_entity+0x37d/0x400 [<c08218f8>] ? __mutex_lock_slowpath+0xd8/0x140 [<c08217fd>] ? mutex_lock+0x1d/0x40 [<c054d7b0>] ? sync_filesystems+0x10/0x100 [<c054d8de>] ? sys_sync+0xe/0x40 [<c0409adf>] ? sysenter_do_call+0x12/0x28 INFO: task fsstress:6001 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. fsstress D c1ec3d00 0 6001 5975 0x00000000 f07ffab0 00000086 00000400 c1ec3d00 00000002 efd3ff48 efd3ff44 00000006 c1f881a0 f2bdd200 00000a70 c0ade1a0 c0ade1a0 f07ffd58 c0ade1a0 c0ad9bd4 c0ade1a0 c302a8b2 00000a70 f07ffd58 c1ec3bd4 00000400 00aa9988 f07ffab0 Call Trace: [<c043f41d>] ? enqueue_entity+0x37d/0x400 [<c08218f8>] ? __mutex_lock_slowpath+0xd8/0x140 [<c08217fd>] ? mutex_lock+0x1d/0x40 [<c054d7b0>] ? sync_filesystems+0x10/0x100 [<c054d8de>] ? sys_sync+0xe/0x40 [<c0409adf>] ? sysenter_do_call+0x12/0x28 INFO: task fsstress:6007 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. fsstress D e5799f4c 0 6007 5975 0x00000000 d7ee7ab0 00000086 00000002 e5799f4c c1e83bd4 00000000 00000000 c09fa020 c09fa020 f0669ac0 00000a6e c0ade1a0 c0ade1a0 d7ee7d58 c0ade1a0 c0ad9bd4 c0ade1a0 bd6faf97 00000a6e d7ee7d58 c1e83bd4 c040b1c0 00aa73a6 d7ee7ab0 Call Trace: [<c040b1c0>] ? do_IRQ+0x50/0xc0 [<c040a030>] ? common_interrupt+0x30/0x38 [<c08218f8>] ? __mutex_lock_slowpath+0xd8/0x140 [<c08217fd>] ? mutex_lock+0x1d/0x40 [<c054d7b0>] ? sync_filesystems+0x10/0x100 [<c054d8de>] ? sys_sync+0xe/0x40 [<c0409adf>] ? sysenter_do_call+0x12/0x28 INFO: task fsstress:6009 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. fsstress D d7139f4c 0 6009 5975 0x00000000 d7ee7030 00000082 00000002 d7139f4c c1f43bd4 00000000 00000000 00000002 c1e881a0 f0669e40 00000a6f c0ade1a0 c0ade1a0 d7ee72d8 c0ade1a0 c0ad9bd4 c0ade1a0 0df3affc 00000a6f d7ee72d8 c1f43bd4 0df398ac 00aa78d2 d7ee7030 Call Trace: [<c043f41d>] ? enqueue_entity+0x37d/0x400 [<c08218f8>] ? __mutex_lock_slowpath+0xd8/0x140 [<c08217fd>] ? mutex_lock+0x1d/0x40 [<c054d7b0>] ? sync_filesystems+0x10/0x100 [<c054d8de>] ? sys_sync+0xe/0x40 [<c0409adf>] ? sysenter_do_call+0x12/0x28 INFO: task fsstress:6015 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. fsstress D c1f03e00 0 6015 5975 0x00000000 ec643030 00000086 000007ff c1f03e00 00000002 d7057f48 d7057f44 00000002 c1e881a0 f05b43c0 00000a71 c0ade1a0 c0ade1a0 ec6432d8 c0ade1a0 c0ad9bd4 c0ade1a0 4e09ccc2 00000a71 ec6432d8 c1f03bd4 000007ff 00aa9e30 ec643030 Call Trace: [<c043f41d>] ? enqueue_entity+0x37d/0x400 [<c08218f8>] ? __mutex_lock_slowpath+0xd8/0x140 [<c08217fd>] ? mutex_lock+0x1d/0x40 [<c054d7b0>] ? sync_filesystems+0x10/0x100 [<c054d8de>] ? sys_sync+0xe/0x40 [<c0409adf>] ? sysenter_do_call+0x12/0x28 INFO: task fsstress:6016 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. fsstress D 00000004 0 6016 5975 0x00000000 deb3dab0 00000082 f70c2000 00000004 00000001 c0463325 d36273e0 000048e5 00000000 f05b4c80 00000a6c c0ade1a0 c0ade1a0 deb3dd58 c0ade1a0 c0ad9bd4 c0ade1a0 d362840c 00000a6c deb3dd58 efbd2000 c09f99e4 c04b6115 c0459b15 Call Trace: [<c0463325>] ? run_timer_softirq+0x35/0x2c0 [<c04b6115>] ? rcu_process_callbacks+0x35/0x40 [<c0459b15>] ? __do_softirq+0xb5/0x1b0 [<c0459d75>] ? irq_exit+0x35/0x70 [<c0427523>] ? smp_apic_timer_interrupt+0x53/0x90 [<c08218f8>] ? __mutex_lock_slowpath+0xd8/0x140 [<c08217fd>] ? mutex_lock+0x1d/0x40 [<c054d7b0>] ? sync_filesystems+0x10/0x100 [<c054d8de>] ? sys_sync+0xe/0x40 [<c0409adf>] ? sysenter_do_call+0x12/0x28 INFO: task fsstress:6039 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. fsstress D d855e888 0 6039 5975 0x00000000 efb85030 00000086 f87edca0 d855e888 0029ffba f118ce48 00000000 0000355b 00000000 f1e2c200 00000a6e c0ade1a0 c0ade1a0 efb852d8 c0ade1a0 c0ad9bd4 c0ade1a0 7d460832 00000a6e efb852d8 ef4da000 7d45e1b0 c1f88700 f72c405c Call Trace: [<c043f41d>] ? enqueue_entity+0x37d/0x400 [<c043f931>] ? enqueue_task_fair+0x31/0x70 [<c08218f8>] ? __mutex_lock_slowpath+0xd8/0x140 [<c08217fd>] ? mutex_lock+0x1d/0x40 [<c054d7b0>] ? sync_filesystems+0x10/0x100 [<c054d8de>] ? sys_sync+0xe/0x40 [<c0409adf>] ? sysenter_do_call+0x12/0x28 INFO: task fsstress:6048 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. fsstress D c1e83e00 0 6048 5975 0x00000000 d1b32030 00000082 00000400 c1e83e00 00000002 d5ef5f48 d5ef5f44 00000000 c1e081a0 f072e040 00000a6f c0ade1a0 c0ade1a0 d1b322d8 c0ade1a0 c0ad9bd4 c0ade1a0 819cc28d 00000a6f d1b322d8 c1e83bd4 00000400 00aa8035 d1b32030 Call Trace: [<c043f41d>] ? enqueue_entity+0x37d/0x400 [<c08218f8>] ? __mutex_lock_slowpath+0xd8/0x140 [<c08217fd>] ? mutex_lock+0x1d/0x40 [<c054d7b0>] ? sync_filesystems+0x10/0x100 [<c054d8de>] ? sys_sync+0xe/0x40 [<c0409adf>] ? sysenter_do_call+0x12/0x28 INFO: task fsstress:6055 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. fsstress D ed888c70 0 6055 5975 0x00000000 ed889ab0 00000086 ef575c58 ed888c70 002a04bb 4cfe0106 00000000 000080c5 00000000 f0192900 00000a6f c0ade1a0 c0ade1a0 ed889d58 c0ade1a0 c0ad9bd4 c0ade1a0 79882cc4 00000a6f ed889d58 ed914000 7988195e c1e48700 f72c405c Call Trace: [<c043f41d>] ? enqueue_entity+0x37d/0x400 [<c043f931>] ? enqueue_task_fair+0x31/0x70 [<c08218f8>] ? __mutex_lock_slowpath+0xd8/0x140 [<c08217fd>] ? mutex_lock+0x1d/0x40 [<c054d7b0>] ? sync_filesystems+0x10/0x100 [<c054d8de>] ? sys_sync+0xe/0x40 [<c0409adf>] ? sysenter_do_call+0x12/0x28
This is not a regression, also happens on 6.0 GA kernel
Also found this on s390x host, with a different call trace, I'm not sure if they share the same root cause. Ý<00000000004be12e>¨ mutex_lock+0x5a/0x60 Ý<00000000002800be>¨ sync_filesystems+0x3a/0x184 Ý<000000000028028e>¨ sys_sync+0x32/0x64 Ý<0000000000118464>¨ sysc_tracego+0xe/0x14 Ý<0000020000137f1a>¨ 0x20000137f1a INFO: task fsstress:2198 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. fsstress D 00000000004be032 0 2198 2186 0x00000200 000000001d107bb0 00000000010e4e00 000000001d107bb0 000000001d107bd8 000000000380f518 00000000008a5e00 00000000010e4e00 000000000380f518 000000000380f518 0000000000000000 00000000024ee990 000000000080ee98 00000000008a5e00 00000000024eee28 000000000380f4e0 00000000010e4e00 00000000004c6c78 00000000004bcbae 000000001d107c10 000000001d107dc8 Call Trace: (Ý<00000000004bcbae>¨ schedule+0x5aa/0xf84) Ý<00000000004be032>¨ __mutex_lock_slowpath+0xa6/0x148 Ý<00000000004be12e>¨ mutex_lock+0x5a/0x60 Ý<00000000002800be>¨ sync_filesystems+0x3a/0x184 Ý<000000000028028e>¨ sys_sync+0x32/0x64 Ý<0000000000118464>¨ sysc_tracego+0xe/0x14 Ý<0000020000137f1a>¨ 0x20000137f1a INFO: task fsstress:2199 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. fsstress D 00000000004be032 0 2199 2186 0x00000200 000000001f1b3bb0 00000000010e4e00 000000001f1b3bb0 000000001f1b3bd8 000000001d533418 00000000008a5e00 00000000010e4e00 000000001d533418 000000001d533418 0000000000000000 00000000050ba540 000000000080ee98 00000000008a5e00 00000000050ba9d8 000000001d5333e0 00000000010e4e00 00000000004c6c78 00000000004bcbae 000000001f1b3c10 000000001f1b3dc8 Call Trace: (Ý<00000000004bcbae>¨ schedule+0x5aa/0xf84) Ý<00000000004be032>¨ __mutex_lock_slowpath+0xa6/0x148 Ý<00000000004be12e>¨ mutex_lock+0x5a/0x60 Ý<00000000002800be>¨ sync_filesystems+0x3a/0x184 Ý<000000000028028e>¨ sys_sync+0x32/0x64 Ý<0000000000118464>¨ sysc_tracego+0xe/0x14 Ý<0000020000137f1a>¨ 0x20000137f1a Below is output of "echo w > /proc/sysrq-trigger", seems not so informative ... Ý<00000000002800be>¨ sync_filesystems+0x3a/0x184 Ý<000000000028028e>¨ sys_sync+0x32/0x64 Ý<0000000000118464>¨ sysc_tracego+0xe/0x14 Ý<0000020000137f1a>¨ 0x20000137f1a fsstress D 00000000004be032 0 3018 2186 0x00000200 0000000017677bb0 00000000010e4e00 0000000017677bb0 0000000017677bd8 0000000017c1c478 00000000008a5e00 00000000010e4e00 0000000017c1c478 0000000017c1c478 0000000000000001 0000000017677e00 000000000080ee98 00000000008a5e00 000000001782ad28 0000000017c1c440 00000000010e4e00 00000000004c6c78 00000000004bcbae 0000000017677c10 0000000017677dc8 Call Trace: (Ý<00000000004bcbae>¨ schedule+0x5aa/0xf84) Ý<00000000004be032>¨ __mutex_lock_slowpath+0xa6/0x148 Ý<00000000004be12e>¨ mutex_lock+0x5a/0x60 Ý<00000000002800be>¨ sync_filesystems+0x3a/0x184 Ý<000000000028028e>¨ sys_sync+0x32/0x64 Ý<0000000000118464>¨ sysc_tracego+0xe/0x14 Ý<0000020000137f1a>¨ 0x20000137f1a fsstress D 00000000004be032 0 3019 2186 0x00000200 000000001799fbb0 00000000010e4e00 000000001799fbb0 000000001799fbd8 00000000024ba578 00000000008a5e00 00000000010e4e00 00000000024ba578 00000000024ba578 0000000000000000 000000001782a040 000000000080ee98 00000000008a5e00 000000001782a4d8 00000000024ba540 00000000010e4e00 00000000004c6c78 00000000004bcbae 000000001799fc10 000000001799fdc8 Call Trace: (Ý<00000000004bcbae>¨ schedule+0x5aa/0xf84) Ý<00000000004be032>¨ __mutex_lock_slowpath+0xa6/0x148 Ý<00000000004be12e>¨ mutex_lock+0x5a/0x60 .rt_runtime : 950.000000 runnable tasks: task PID tree-key switches prio exec-runtime sum-exec sum-sleep -------------------------------------------------------------------------------- -------------------------- R bash 2017 96765.804301 87 120 96765.804301 1 80.667980 245534.321129 /
Since RHEL 6.1 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux.
Created attachment 492022 [details] Call trace and sysrq-w output Got similar call trace on x86_64 host when testing xfs without nfs. It seems that it's a file system independent issue. The host was not hang, fsstress finished eventually. Change platform to ALL.
The testing mentioned in comment 7 is performing on 2.6.32-131.0.1.el6 kernel. The panic described in comment 0 hasn't been seen for a second time.
(In reply to comment #0) > Description of problem: > fsstress via nfs leads to kernel panic > > Version-Release number of selected component (if applicable): > [root@intel-d3c69-01 ~]# uname -a > Linux intel-d3c69-01.rhts.eng.bos.redhat.com 2.6.32-118.el6.i686 #1 SMP Tue Feb > 22 11:12:47 EST 2011 i686 i686 i386 GNU/Linux > > How reproducible: > Not sure, I haven't tested it mutiple times > > Steps to Reproduce: > 1. Install fsstress from LTP > 2. Setup nfs server, that could be: > mkdir /home/testdir; mkdir /mnt/nfs > echo "/home/testdir *(rw,no_root_squash)" >> /etc/exports > service nfs start > mount -t nfs localhost:/home/testdir /mnt/nfs > 3. Run fsstress > fsstress -d /mnt/nfs -n 1000 -p 1000 > 4. Wait for kernel panic, I guess it may take several hours Does this panic/hang happen when a non-loopback (a remote server) mount point is used?
(In reply to comment #9) > Does this panic/hang happen when a non-loopback (a remote server) > mount point is used? I re-tested on 2.6.32-131.0.15.el6 kernel, seems the issue was gone, I saw no panic nor hang. I tested it on both a local mounted nfs and a remote mounted nfs.
Ok, thanks. Closing bug per comment #10. Please reopen if this reappears.