Bug 436678

Summary: ptrace: Testcase block-step crashes the machine
Product: [Fedora] Fedora Reporter: Jan Kratochvil <jan.kratochvil>
Component: kernelAssignee: Roland McGrath <roland>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: high    
Version: rawhideCC: kernel-mgr
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-03-11 18:29:09 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jan Kratochvil 2008-03-09 06:16:33 UTC
Description of problem:
Running the ptrace testsuite http://sourceware.org/systemtap/wiki/utrace/tests
crashes the machine even for non-crashing `make check'.

Version-Release number of selected component (if applicable):
kernel-2.6.25-0.101.rc4.git3.fc9.x86_64

How reproducible:
Always.

Steps to Reproduce:
1. tests/block-step

Actual results:
general protection fault: 0000 [1] SMP DEBUG_PAGEALLOC
CPU 0
Modules linked in: snd_hda_intel snd_usb_audio snd_seq_dummy snd_seq_oss
snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_pcm snd_timer
snd_page_alloc snd_usb_lib snd_rawmidi snd_seq_device snd_hwdep snd soundcore
nfs nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs ipv6 dm_mirror dm_mod uinput
parport_pc parport floppy pcspkr 8139too 8139cp mii button sr_mod cdrom sg
ata_piix ahci libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd
[last unloaded: freq_table]
Pid: 1831, comm: block-step Not tainted 2.6.25-0.101.rc4.git3.fc9 #1
RIP: 0010:[<ffffffff8100a88b>]  [<ffffffff8100a88b>] __switch_to+0x244/0x2fb
RSP: 0018:ffff81000dcd1c68  EFLAGS: 00000046
RAX: 0000000000000002 RBX: ffff81000dce4000 RCX: 00000000000001d9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff81000dcd0000
RBP: ffff81000dcd1c98 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff8129e70c R11: ffff81000dca08f0 R12: ffff81000dca0000
R13: 0000000000000000 R14: ffff810001069c80 R15: 0000000000000000
FS:  00007f6c70c306f0(0000) GS:ffffffff81415000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003b986ce650 CR3: 000000000dc87000 CR4: 00000000000006a0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process block-step (pid: 1831, threadinfo ffff81000dcd0000, task ffff81000dce4000)
Stack:  ffff81000dcd1ca8 0000000000000046 ffff81000106c180 ffff81000e0da000
 ffff8100164f2a00 0000000000000000 ffff81000dce4000 ffffffff8129edc7
 ffff81000dcd1d88 0000000000000046 ffff81000dcd1cd8 0000000000000296
Call Trace:
 [<ffffffff8129edc7>] thread_return+0x0/0xac
 [<ffffffff81012106>] ? native_sched_clock+0x50/0x6d
 [<ffffffff8103fcfd>] ? ptrace_stop+0x199/0x1a2
 [<ffffffff810410ec>] ? get_signal_to_deliver+0xc8/0x324
 [<ffffffff8100b329>] ? do_notify_resume+0xb2/0x8bd
 [<ffffffff810539af>] ? trace_hardirqs_on+0xf1/0x115
 [<ffffffff81012106>] ? native_sched_clock+0x50/0x6d
 [<ffffffff812a0e6e>] ? trace_hardirqs_on_thunk+0x35/0x3a
 [<ffffffff810539af>] ? trace_hardirqs_on+0xf1/0x115
 [<ffffffff812a0e6e>] ? trace_hardirqs_on_thunk+0x35/0x3a
 [<ffffffff812a1e12>] ? paranoid_userspace1+0x44/0x4c


Code: 0f 30 48 89 f2 66 b9 00 06 89 f0 48 c1 ea 20 0f 30 31 d2 48 8b 83 28 07 00
00 48 39 d0 74 0e 48 89 c2 b9 d9 01 00 00 48 c1 ea 20 <0f> 30 48 8b 57 10 f7 c2
00 00 20 00 74 3c 48 8b 83 c0 04 00 00
RIP  [<ffffffff8100a88b>] __switch_to+0x244/0x2fb
 RSP <ffff81000dcd1c68>

Expected results:
No crash.

Additional info:
On F8 kernel-2.6.24.3-12.fc8.x86_64 it the testcase safely aborts (expecting as
the feature is unsupported there):
  ./block-step
  block-step: block-step.c:121: main: Unexpected error: Input/output error.
  Aborted

Not moving `block-step' from SAFE to CRASHERS as the problem is present only in
Rawhide so far.

Comment 1 Roland McGrath 2008-03-10 01:21:31 UTC
Again, actually not utrace at all (it's not even in rawhide yet).
It's ptrace-related x86 arch code.
It is also my problem though. ;-)

Comment 2 Jan Kratochvil 2008-03-10 11:50:55 UTC
Sorry, I should know the current Rawhide has current no utrace patches applied.

"Upstream"
  http://sourceware.org/systemtap/wiki/utrace/tests
moved block-step into CRASHERS as it got now reproduced on the vanilla build:

kernel-vanilla-2.6.25-0.101.rc4.git3.fc8.x86_64
http://koji.fedoraproject.org/koji/taskinfo?taskID=507618

general protection fault: 0000 [1] SMP DEBUG_PAGEALLOC
CPU 0
Modules linked in: snd_hda_intel snd_usb_audio snd_seq_dummy snd_seq_oss
snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_pcm snd_timer
snd_page_alloc snd_usb_lib snd_rawmidi snd_seq_device snd_hwdep snd soundcore
nfs nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs ipv6 dm_mirror dm_mod uinput
parport_pc parport floppy pcspkr 8139too 8139cp mii button sr_mod cdrom sg
ata_piix ahci libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd
[last unloaded: freq_table]
Pid: 1885, comm: block-step Not tainted 2.6.25-0.101.rc4.git3.fc8 #1
RIP: 0010:[<ffffffff8100ab79>]  [<ffffffff8100ab79>] __switch_to+0x218/0x2bc
RSP: 0018:ffff81000e42dca8  EFLAGS: 00000046
RAX: 0000000000000002 RBX: ffff81000e4cc470 RCX: 00000000000001d9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff81000e42c000
RBP: ffff81000e42dcd8 R08: ffff810001069e00 R09: ffff81000106c390
R10: 0000000000000000 R11: 0000000000000001 R12: ffff81000e408470
R13: 0000000000000000 R14: ffff81000e408000 R15: ffff81000e4cc000
FS:  00007fa22e1d66f0(0000) GS:ffffffff81412000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003b986ce650 CR3: 0000000011483000 CR4: 00000000000006a0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process block-step (pid: 1885, threadinfo ffff81000e42c000, task ffff81000e4cc000)
Stack:  ffff81000e42dce8 ffff81000106c300 ffff81001613d400 ffff81001613ca00
 0000000000000000 0000000000000001 ffff81000e4cc000 ffffffff812967d3
 ffff81000e42dd78 0000000000000046 ffff81000e42dd18 ffffffff8104fe28
Call Trace:
 [<ffffffff812967d3>] thread_return+0x0/0xa9
 [<ffffffff8104fe28>] ? lock_release_holdtime+0x45/0x4a
 [<ffffffff8103db2b>] ? ptrace_stop+0x161/0x166
 [<ffffffff8103e8cb>] ? get_signal_to_deliver+0xc8/0x316
 [<ffffffff8100b2e0>] ? do_notify_resume+0xac/0x868
 [<ffffffff81134f5f>] ? __up_read+0x7a/0x83
 [<ffffffff8104fe28>] ? lock_release_holdtime+0x45/0x4a
 [<ffffffff81012116>] ? native_sched_clock+0x50/0x6d
 [<ffffffff81012116>] ? native_sched_clock+0x50/0x6d
 [<ffffffff812987d0>] ? trace_hardirqs_on_thunk+0x35/0x3a
 [<ffffffff81051d44>] ? trace_hardirqs_on+0xf1/0x115
 [<ffffffff812987d0>] ? trace_hardirqs_on_thunk+0x35/0x3a
 [<ffffffff81299762>] ? paranoid_userspace1+0x44/0x4c


Code: 0f 30 48 89 f2 66 b9 00 06 89 f0 48 c1 ea 20 0f 30 31 d2 48 8b 83 b8 02 00
00 48 39 d0 74 0e 48 89 c2 b9 d9 01 00 00 48 c1 ea 20 <0f> 30 f6 47 12 20 74 2a
48 8b 43 50 0f 23 c0 48 8b 43 58 0f 23 
RIP  [<ffffffff8100ab79>] __switch_to+0x218/0x2bc
 RSP <ffff81000e42dca8>
---[ end trace ca143223eefdc828 ]---
BUG: sleeping function called from invalid context at kernel/rwsem.c:21
in_atomic():0, irqs_disabled():1
INFO: lockdep is turned off.
irq event stamp: 324
hardirqs last  enabled at (323): [<ffffffff8129932c>]
_spin_unlock_irqrestore+0x3f/0x47
hardirqs last disabled at (324): [<ffffffff812961b5>] schedule+0xf6/0x714
softirqs last  enabled at (168): [<ffffffff81038233>] __do_softirq+0xda/0xe3
softirqs last disabled at (165): [<ffffffff8100d19c>] call_softirq+0x1c/0x28
Pid: 1885, comm: block-step Tainted: G      D  2.6.25-0.101.rc4.git3.fc8 #1

Call Trace:
 [<ffffffff8104fd60>] ? print_irqtrace_events+0x110/0x114
 [<ffffffff8102ac30>] __might_sleep+0xda/0xdc
 [<ffffffff81297b27>] down_read+0x20/0x6d
 [<ffffffff81061594>] acct_collect+0x4c/0x19c
 [<ffffffff81036395>] do_exit+0x219/0x757
 [<ffffffff812999fe>] oops_begin+0x0/0x96
 [<ffffffff8100d9fa>] die+0x5d/0x66
 [<ffffffff8129a0ad>] do_general_protection+0x128/0x131
 [<ffffffff812994fd>] error_exit+0x0/0xa9
 [<ffffffff8100ab79>] ? __switch_to+0x218/0x2bc
 [<ffffffff812967d3>] ? thread_return+0x0/0xa9
 [<ffffffff8104fe28>] ? lock_release_holdtime+0x45/0x4a
 [<ffffffff8103db2b>] ? ptrace_stop+0x161/0x166
 [<ffffffff8103e8cb>] ? get_signal_to_deliver+0xc8/0x316
 [<ffffffff8100b2e0>] ? do_notify_resume+0xac/0x868
 [<ffffffff81134f5f>] ? __up_read+0x7a/0x83
 [<ffffffff8104fe28>] ? lock_release_holdtime+0x45/0x4a
 [<ffffffff81012116>] ? native_sched_clock+0x50/0x6d
 [<ffffffff81012116>] ? native_sched_clock+0x50/0x6d
 [<ffffffff812987d0>] ? trace_hardirqs_on_thunk+0x35/0x3a
 [<ffffffff81051d44>] ? trace_hardirqs_on+0xf1/0x115
 [<ffffffff812987d0>] ? trace_hardirqs_on_thunk+0x35/0x3a
 [<ffffffff81299762>] ? paranoid_userspace1+0x44/0x4c


Comment 3 Roland McGrath 2008-03-10 21:38:53 UTC
I can't reproduce any crash on 2.6.25-0.101.rc4.git3.fc9 on x86_64.
block-step.c rev 1.4

Was your test on real hardware or kvm?

Comment 4 Jan Kratochvil 2008-03-10 21:46:28 UTC
kvm... going to reproduce it on the RHTS real hardware.


Comment 5 Jan Kratochvil 2008-03-11 18:29:09 UTC
OK, closing as it is only a host kernel/kvm Bug 437028.

Removing the `crasher' category upstream as KVM crashes should not hurt much.