Description of problem: Testsuite case in frysk exercises the kernel with tracing a small executable (using strace), where the executable exec's a given command in a separate thread. When that non-main thread completed, a kernel bug is triggered. Version-Release number of selected component (if applicable): kernel 2.6.19-1.2895.fc6 How reproducible: About 3 out of 5 attempts. Steps to Reproduce: 1. Compile threadexec.c (attached) 2. Issue the command: strace -f -o /dev/null ./threadexec /bin/echo hi | grep hi 3. Repeat a few times if necessary Actual results: Program execution will print 'hi' on a single line. Kernel BUG will be reported in kernel messages. Expected results: Program execution should print 'hi' on a single line. Additional info: The bugzilla report for frysk is: http://sourceware.org/bugzilla/show_bug.cgi?id=3820 A similar earlier problem existed that prompted this test case to be written. The original frysk bug was: http://sourceware.org/bugzilla/show_bug.cgi?id=2130
Created attachment 147721 [details] Executable that is to be strace'd
This should already be fixed by pulling the current utrace patch (backported) into fc6.
This is still happening with the 2.6.19-1.2911.6.5.fc6 kernel that was released a few days ago. Caused the following kernel log entries, resulting in the machine getting stuck, requiring a power cycle. Mar 6 19:30:07 ca-tools2 kernel: BUGging on (tsk->exit_state != EXIT_ZOMBIE) Mar 6 19:30:07 ca-tools2 kernel: ------------[ cut here ]------------ Mar 6 19:30:07 ca-tools2 kernel: kernel BUG at kernel/utrace.c:193! Mar 6 19:30:07 ca-tools2 kernel: invalid opcode: 0000 [#1] Mar 6 19:30:07 ca-tools2 kernel: SMP Mar 6 19:30:07 ca-tools2 kernel: last sysfs file: /devices/system/cpu/cpu0/cpufreq/scaling_setspeed Mar 6 19:30:07 ca-tools2 kernel: Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 acpi_cpufreq freq_table dm_mirror dm_multipath dm_mod video sbs i2c_ec i2c_core button battery ac lp floppy joydev sg snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm tg3 snd_timer ide_cd snd cdrom pcspkr serio_raw soundcore parport_pc snd_page_alloc parport iTCO_wdt ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd Mar 6 19:30:07 ca-tools2 kernel: CPU: 1 Mar 6 19:30:07 ca-tools2 kernel: EIP: 0060:[<c04522fb>] Not tainted VLI Mar 6 19:30:07 ca-tools2 kernel: EFLAGS: 00010296 (2.6.19-1.2911.6.5.fc6 #1) Mar 6 19:30:07 ca-tools2 kernel: EIP is at check_dead_utrace+0x52/0x10d Mar 6 19:30:07 ca-tools2 kernel: eax: 0000002f ebx: c21d8b70 ecx: c068b8d0 edx: 00000082 Mar 6 19:30:07 ca-tools2 kernel: esi: 00000000 edi: ef866560 ebp: ef866560 esp: efd28df8 Mar 6 19:30:07 ca-tools2 kernel: ds: 007b es: 007b ss: 0068 Mar 6 19:30:07 ca-tools2 kernel: Process threadexec (pid: 15892, ti=efd28000 task=c21d8b70 task.ti=efd28000) Mar 6 19:30:07 ca-tools2 kernel: Stack: c06360d3 c06416b9 ef866568 00000000 ef866568 ef866560 c0452410 00000000 Mar 6 19:30:07 ca-tools2 kernel: c21d8b70 ef866568 00000020 c21d8b70 ef866560 c0453782 00000000 00000000 Mar 6 19:30:07 ca-tools2 kernel: f75695f8 00000246 c21d9018 f75695c0 efe70bc0 ef866570 00000010 ef866560 Mar 6 19:30:07 ca-tools2 kernel: Call Trace: Mar 6 19:30:07 ca-tools2 kernel: [<c0452410>] remove_detached+0x5a/0x6d Mar 6 19:30:07 ca-tools2 kernel: [<c0453782>] utrace_report_death+0x221/0x229 Mar 6 19:30:07 ca-tools2 kernel: [<c0426527>] do_exit+0x6f5/0x79b Mar 6 19:30:07 ca-tools2 kernel: [<c042665a>] sys_exit_group+0x0/0xd Mar 6 19:30:07 ca-tools2 kernel: [<c042f046>] get_signal_to_deliver+0x38b/0x3b3 Mar 6 19:30:07 ca-tools2 kernel: [<c0403658>] do_notify_resume+0x83/0x6c6 Mar 6 19:30:07 ca-tools2 kernel: [<c04040ee>] work_notifysig+0x13/0x25 Mar 6 19:30:07 ca-tools2 kernel: ======================= Mar 6 19:30:07 ca-tools2 kernel: Code: f0 f7 d0 83 e2 08 85 d0 0f 84 80 00 00 00 85 c9 74 7c 83 f9 10 74 1c c7 44 24 04 b9 16 64 c0 c7 04 24 d3 60 63 c0 e8 8a 16 fd ff <0f> 0b c1 00 8b 16 64 c0 83 bb 98 00 00 00 ff 75 33 b8 20 00 00 Mar 6 19:30:07 ca-tools2 kernel: EIP: [<c04522fb>] check_dead_utrace+0x52/0x10d SS:ESP 0068:efd28df8 Mar 6 19:30:07 ca-tools2 kernel: <1>Fixing recursive fault but reboot is needed! Mar 7 06:50:13 ca-tools2 kernel: BUGging on (tsk->exit_state != EXIT_ZOMBIE) Mar 7 06:50:13 ca-tools2 kernel: ------------[ cut here ]------------ Mar 7 06:50:13 ca-tools2 kernel: kernel BUG at kernel/utrace.c:193! Mar 7 06:50:13 ca-tools2 kernel: invalid opcode: 0000 [#2] Mar 7 06:50:13 ca-tools2 kernel: SMP Mar 7 06:50:13 ca-tools2 kernel: last sysfs file: /devices/system/cpu/cpu0/cpufreq/scaling_setspeed Mar 7 06:50:13 ca-tools2 kernel: Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 acpi_cpufreq freq_table dm_mirror dm_multipath dm_mod video sbs i2c_ec i2c_core button battery ac lp floppy joydev sg snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm tg3 snd_timer ide_cd snd cdrom pcspkr serio_raw soundcore parport_pc snd_page_alloc parport iTCO_wdt ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd Mar 7 06:50:13 ca-tools2 kernel: CPU: 0 Mar 7 06:50:13 ca-tools2 kernel: EIP: 0060:[<c04522fb>] Not tainted VLI Mar 7 06:50:13 ca-tools2 kernel: EFLAGS: 00010296 (2.6.19-1.2911.6.5.fc6 #1) Mar 7 06:50:13 ca-tools2 kernel: EIP is at check_dead_utrace+0x52/0x10d Mar 7 06:50:13 ca-tools2 kernel: eax: 0000002f ebx: ed4a8690 ecx: c068b8d0 edx: 00000082 Mar 7 06:50:13 ca-tools2 kernel: esi: 00000000 edi: ef8662c0 ebp: ef8662c0 esp: dc7e1df8 Mar 7 06:50:13 ca-tools2 kernel: ds: 007b es: 007b ss: 0068 Mar 7 06:50:13 ca-tools2 kernel: Process threadexec (pid: 12960, ti=dc7e1000 task=ed4a8690 task.ti=dc7e1000) Mar 7 06:50:13 ca-tools2 kernel: Stack: c06360d3 c06416b9 ef8662c8 00000000 ef8662c8 ef8662c0 c0452410 0000000Mar 7 06:50:13 ca-tools2 kernel: ed4a8690 ef8662c8 00000020 ed4a8690 ef8662c0 c0453782 00000000 00000000 Mar 7 06:50:13 ca-tools2 kernel: f70d2af8 00000246 ed4a8b38 f70d2ac0 cf8d46e0 ef8662d0 00000010 ef8662c0 Mar 7 06:50:13 ca-tools2 kernel: Call Trace: Mar 7 06:50:13 ca-tools2 kernel: [<c0452410>] remove_detached+0x5a/0x6d Mar 7 06:50:13 ca-tools2 kernel: [<c0453782>] utrace_report_death+0x221/0x229 Mar 7 06:50:13 ca-tools2 kernel: [<c0426527>] do_exit+0x6f5/0x79b Mar 7 06:50:13 ca-tools2 kernel: [<c042665a>] sys_exit_group+0x0/0xd Mar 7 06:50:13 ca-tools2 kernel: [<c042f046>] get_signal_to_deliver+0x38b/0x3b3 Mar 7 06:50:13 ca-tools2 kernel: [<c0403658>] do_notify_resume+0x83/0x6c6 Mar 7 06:50:13 ca-tools2 kernel: [<c04040ee>] work_notifysig+0x13/0x25 Mar 7 06:50:13 ca-tools2 kernel: ======================= Mar 7 06:50:13 ca-tools2 kernel: Code: f0 f7 d0 83 e2 08 85 d0 0f 84 80 00 00 00 85 c9 74 7c 83 f9 10 74 1c c7 44 24 04 b9 16 64 c0 c7 04 24 d3 60 63 c0 e8 8a 16 fd ff <0f> 0b c1 00 8b 16 64 c0 83 bb 98 00 00 00 ff 75 33 b8 20 00 00 Mar 7 06:50:13 ca-tools2 kernel: EIP: [<c04522fb>] check_dead_utrace+0x52/0x10d SS:ESP 0068:dc7e1df8 0 Mar 7 06:50:13 ca-tools2 kernel: <1>Fixing recursive fault but reboot is needed! Mar 7 06:50:23 ca-tools2 kernel: BUG: soft lockup detected on CPU#1! Mar 7 06:50:23 ca-tools2 kernel: [<c040566b>] dump_trace+0x69/0x1b6 Mar 7 06:50:23 ca-tools2 kernel: [<c04057d0>] show_trace_log_lvl+0x18/0x2c Mar 7 06:50:23 ca-tools2 kernel: [<c0405dc6>] show_trace+0xf/0x11 Mar 7 06:50:23 ca-tools2 kernel: [<c0405ec3>] dump_stack+0x15/0x17 Mar 7 06:50:23 ca-tools2 kernel: [<c044e18d>] softlockup_tick+0xad/0xc4 Mar 7 06:50:23 ca-tools2 kernel: [<c042cc54>] update_process_times+0x39/0x5c Mar 7 06:50:23 ca-tools2 kernel: [<c0415f2d>] smp_apic_timer_interrupt+0x92/0xb1 Mar 7 06:50:23 ca-tools2 kernel: [<c040506f>] apic_timer_interrupt+0x1f/0x30 Mar 7 06:50:23 ca-tools2 kernel: [<c04ef677>] _raw_spin_lock+0x6f/0xdc Mar 7 06:50:23 ca-tools2 kernel: [<c0452263>] utrace_release_task+0x37/0x64 Mar 7 06:50:23 ca-tools2 kernel: [<c0425071>] release_task+0x1d/0x2f0 Mar 7 06:50:23 ca-tools2 kernel: [<c04756e5>] flush_old_exec+0x2e3/0x841 Mar 7 06:50:23 ca-tools2 kernel: [<c0499dbd>] load_elf_binary+0x479/0x15a0 Mar 7 06:50:23 ca-tools2 kernel: [<c0474cf5>] search_binary_handler+0xa4/0x260 Mar 7 06:50:23 ca-tools2 kernel: [<c04765d5>] do_execve+0x158/0x1f5 Mar 7 06:50:23 ca-tools2 kernel: [<c0402344>] sys_execve+0x2f/0x4f Mar 7 06:50:23 ca-tools2 kernel: [<c0404057>] syscall_call+0x7/0xb Mar 7 06:50:23 ca-tools2 kernel: [<40000402>] 0x40000402 Mar 7 06:50:23 ca-tools2 kernel: ======================= Mar 7 06:51:52 ca-tools2 kernel: BUG: spinlock lockup on CPU#1, threadexec/12960, ef8662d0 (Not tainted) Mar 7 06:51:52 ca-tools2 kernel: [<c040566b>] dump_trace+0x69/0x1b6 Mar 7 06:51:52 ca-tools2 kernel: [<c04057d0>] show_trace_log_lvl+0x18/0x2c Mar 7 06:51:52 ca-tools2 kernel: [<c0405dc6>] show_trace+0xf/0x11 Mar 7 06:51:52 ca-tools2 kernel: [<c0405ec3>] dump_stack+0x15/0x17 Mar 7 06:51:52 ca-tools2 kernel: [<c04ef6c7>] _raw_spin_lock+0xbf/0xdc Mar 7 06:51:52 ca-tools2 kernel: [<c0452263>] utrace_release_task+0x37/0x64 Mar 7 06:51:52 ca-tools2 kernel: [<c0425071>] release_task+0x1d/0x2f0 Mar 7 06:51:52 ca-tools2 kernel: [<c04756e5>] flush_old_exec+0x2e3/0x841 Mar 7 06:51:52 ca-tools2 kernel: [<c0499dbd>] load_elf_binary+0x479/0x15a0 Mar 7 06:51:52 ca-tools2 kernel: [<c0474cf5>] search_binary_handler+0xa4/0x260 Mar 7 06:51:52 ca-tools2 kernel: [<c04765d5>] do_execve+0x158/0x1f5 Mar 7 06:51:52 ca-tools2 kernel: [<c0402344>] sys_execve+0x2f/0x4f Mar 7 06:51:52 ca-tools2 kernel: [<c0404057>] syscall_call+0x7/0xb Mar 7 06:51:52 ca-tools2 kernel: [<40000402>] 0x40000402 Mar 7 06:51:52 ca-tools2 kernel: =======================
I believe my current utrace patches now fix this bug (http://redhat.com/~roland/utrace/). Those will go into rawhide very soon I'm sure, but need to be backported for an FC6 kernel update.
Should be fixed in kernel 2.6.20-1.2925.fc6. Please test.
Using 2.6.20-1.2925.fc6, we experience total system hangs every so often while executing the frysk test suite. It is a lot more reproducible on i386 than it is on x86_64, or so it seems on the tests systems, but both systems have experienced these hangs. The hang is such that no kernel messages are written to kernel logs, making it a bit harder to capture what is going on and where exactly it is crashing. Verbatim copy of the console state at time of the crash follows: Stack: ffff81000483b160 ffff810037e62500 ffff8100613820c0 000000000000000a 0000000000000001 ffffffff802ccadc 0000000000000009 ffff8100b707a2c0 ffff81000483f1a0 ffffffff80294fdb ffff81000483f2a0 0000000000000000 Call Trace: <IRQ> [<ffffffff802ccadc>] kmem_rcu_free+0x19/0x38 [<ffffffff80294fdb>] __rcu_process_callbacks+0x12d/0x1bc [<ffffffff8029508d>] rcu_process_callbacks+0x23/0x43 [<ffffffff8028b742>] tasklet_action+0x53/0x9d [<ffffffff8025a23c>] call_softirq+0x1c/0x28 [<ffffffff80211f97>] __do_softirq+0x55/0x2c [<ffffffff8025a23c>] call_softirq+0x1c/0x28 <EOI> [<ffffffff8028b630>] ksoftirqd+0x0/0xbf [<ffffffff8026758d>] do_softirq+0x2c/0x85 [<ffffffff8028b68f>] ksoftirqd+0x5f/0xbf [<ffffffff802317ea>] kthread+0xd0/0xff [<ffffffff80259ec8>] child_rip+0xa/0x12 [<ffffffff8023171a>] kthread+0x0/0xff [<ffffffff80259ebe>] child_rip+0x0/0x12 Code: 0f 0b eb fe 0f ba 33 07 48 83 c3 40 48 ff ca 48 83 fa ff 75 RIP [<ffffffff802cab21>] kmem_freepages+0xef/0x132 RSP <ffff8100df623ee8> <0>Kernel panic - not syncing: Aiee, killing interrupt handler!
2.6.20-1.2933 has a new version of utrace. please test that.
Running with 2.6.20-1.2948.fc6 we no longer see this failure take place.