Bug 843640

Summary: Use of bogus address for perf uprobes can cause the kernel to hang
Product: [Fedora] Fedora Reporter: William Cohen <wcohen>
Component: kernelAssignee: Oleg Nesterov <onestero>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 17CC: anton, chaowang, fche, gansalmon, itamar, jistone, jonathan, kernel-maint, madhu.chinakonda, mjw, onestero
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-10-08 15:16:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Script that demonstrates the kernel hang with the perf uprobes support
none
further simplify reproducer none

Description William Cohen 2012-07-26 20:38:42 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 William Cohen 2012-07-26 20:52:27 UTC
Description of problem:

Use of bogus information for perf uprobes can cause the kernel to hang. Have a short script that reproduces the problem, pr14348_perf.sh which will be attached to this bug.


Version-Release number of selected component (if applicable):
kernel-3.4.6-2.fc17.x86_64
perf-3.4.6-2.fc17.x86_64


How reproducible:
Every time

Steps to Reproduce:
1. Unpack the pr14348.tar.gz
2. As root run the pr14348_perf.sh script
  
Actual results:

The machine hangs and backtraces are printed out showing where the processor is hanging.

[  796.066002] BUG: soft lockup - CPU#0 stuck for 22s! [buildid.x:1014]
[  796.066010] Modules linked in: bnep bluetooth rfkill ip6t_REJECT nf_conntrack_ipv4 nf_conntrack_ipv6 nf_defrag_ipv6 nf_defrag_ipv4 xt_state nf_conntrack ip6table_filter ip6_tables tpm_bios snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_page_alloc snd_timer e1000 snd joydev i2c_piix4 i2c_core floppy microcode soundcore pcspkr virtio_balloon uinput virtio_blk
[  796.066010] CPU 0 
[  796.066010] Pid: 1014, comm: buildid.x Not tainted 3.5.0+ #1 Red Hat KVM
[  796.066010] RIP: 0010:[<ffffffff812d2015>]  [<ffffffff812d2015>] rb_insert_color+0x35/0x150
[  796.066010] RSP: 0018:ffff88003be23af0  EFLAGS: 00000282
[  796.066010] RAX: ffff88003b8f53a8 RBX: ffff88003b8f53a8 RCX: ffff88003b8f53a8
[  796.066010] RDX: ffff88003b8f53a8 RSI: ffff880037414e08 RDI: ffff88003b8f53a8
[  796.066010] RBP: ffff88003be23b18 R08: ffff88003b8f53a8 R09: 0000000000000000
[  796.066010] R10: ffff88003b8f53a8 R11: ffff88003b8f5370 R12: 0000000000200000
[  796.066010] R13: ffff88003be23ae0 R14: 0000000000000000 R15: 00007ffff2d5d000
[  796.066010] FS:  0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[  796.066010] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  796.066010] CR2: 00000036c9284360 CR3: 000000003b728000 CR4: 00000000000006f0
[  796.066010] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  796.066010] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  796.066010] Process buildid.x (pid: 1014, threadinfo ffff88003be22000, task ffff88003b9f2e20)
[  796.066010] Stack:
[  796.066010]  ffff880037414e00 ffff88003b8f5370 ffff88003b8f53a8 ffff88003b8f53b0
[  796.066010]  ffff8800298acc98 ffff88003be23b28 ffffffff81152460 ffff88003be23b78
[  796.066010]  ffffffff811524e8 000000005011ad7a ffff88003b8f5370 ffff88003b8f5370
[  796.066010] Call Trace:
[  796.066010]  [<ffffffff81152460>] __vma_link_rb+0x30/0x40
[  796.066010]  [<ffffffff811524e8>] vma_link+0x78/0xe0
[  796.066010]  [<ffffffff8115460f>] mmap_region+0x56f/0x630
[  796.066010]  [<ffffffff8115491d>] do_mmap_pgoff+0x24d/0x340
[  796.066010]  [<ffffffff8114246c>] vm_mmap_pgoff+0x7c/0xb0
[  796.066010]  [<ffffffff811424cd>] vm_mmap+0x2d/0x40
[  796.066010]  [<ffffffff811d76f7>] load_elf_binary+0x627/0x1a70
[  796.066010]  [<ffffffff8114edf5>] ? __get_user_pages+0x175/0x580
[  796.066010]  [<ffffffff8118bb35>] ? copy_strings.isra.26+0x2b5/0x300
[  796.066010]  [<ffffffff811d70d0>] ? elf_core_dump+0x1230/0x1230
[  796.066010]  [<ffffffff8118be77>] search_binary_handler+0x117/0x320
[  796.066010]  [<ffffffff8118d3c4>] do_execve_common.isra.30+0x464/0x530
[  796.066010]  [<ffffffff8118d4ab>] do_execve+0x1b/0x20
[  796.066010]  [<ffffffff8101c3e7>] sys_execve+0x47/0x70
[  796.066010]  [<ffffffff8161166c>] stub_execve+0x6c/0xc0
[  796.066010] Code: 56 49 89 f6 41 55 41 54 53 4c 8b 27 66 2e 0f 1f 84 00 00 00 00 00 49 83 e4 fc 74 4a 49 8b 04 24 a8 01 75 42 48 89 c3 48 83 e3 fc <4c> 8b 6b 10 4d 39 e5 74 7a 4d 85 ed 74 45 49 8b 55 00 f6 c2 01 



Expected results:

That this use of perf just reports an error and the machine continues to run.


Additional info:

This problem was first encountered with running the
systemtap-testsuite-1.8-4 tests on Fedora 17 machine.  And the initial
bug was reported in
http://sourceware.org/bugzilla/show_bug.cgi?id=14348

Comment 2 William Cohen 2012-07-26 20:53:45 UTC
Created attachment 600605 [details]
Script that demonstrates the kernel hang with the perf uprobes support

Comment 3 Josh Boyer 2012-07-27 00:09:00 UTC
(In reply to comment #1)
> Description of problem:
> 
> Use of bogus information for perf uprobes can cause the kernel to hang. Have
> a short script that reproduces the problem, pr14348_perf.sh which will be
> attached to this bug.
> 
> 
> Version-Release number of selected component (if applicable):
> kernel-3.4.6-2.fc17.x86_64
> perf-3.4.6-2.fc17.x86_64
> 
> 
> How reproducible:
> Every time
> 
> Steps to Reproduce:
> 1. Unpack the pr14348.tar.gz
> 2. As root run the pr14348_perf.sh script
>   
> Actual results:
> 
> The machine hangs and backtraces are printed out showing where the processor
> is hanging.
> 
> [  796.066002] BUG: soft lockup - CPU#0 stuck for 22s! [buildid.x:1014]
> [  796.066010] Modules linked in: bnep bluetooth rfkill ip6t_REJECT
> nf_conntrack_ipv4 nf_conntrack_ipv6 nf_defrag_ipv6 nf_defrag_ipv4 xt_state
> nf_conntrack ip6table_filter ip6_tables tpm_bios snd_intel8x0 snd_ac97_codec
> ac97_bus snd_pcm snd_page_alloc snd_timer e1000 snd joydev i2c_piix4
> i2c_core floppy microcode soundcore pcspkr virtio_balloon uinput virtio_blk
> [  796.066010] CPU 0 
> [  796.066010] Pid: 1014, comm: buildid.x Not tainted 3.5.0+ #1 Red Hat KVM

That's not a Fedora kernel.  It's not even 3.4 based.


> [  796.066010] RIP: 0010:[<ffffffff812d2015>]  [<ffffffff812d2015>]
> rb_insert_color+0x35/0x150
> [  796.066010] RSP: 0018:ffff88003be23af0  EFLAGS: 00000282
> [  796.066010] RAX: ffff88003b8f53a8 RBX: ffff88003b8f53a8 RCX:
> ffff88003b8f53a8
> [  796.066010] RDX: ffff88003b8f53a8 RSI: ffff880037414e08 RDI:
> ffff88003b8f53a8
> [  796.066010] RBP: ffff88003be23b18 R08: ffff88003b8f53a8 R09:
> 0000000000000000
> [  796.066010] R10: ffff88003b8f53a8 R11: ffff88003b8f5370 R12:
> 0000000000200000
> [  796.066010] R13: ffff88003be23ae0 R14: 0000000000000000 R15:
> 00007ffff2d5d000
> [  796.066010] FS:  0000000000000000(0000) GS:ffff88003fc00000(0000)
> knlGS:0000000000000000
> [  796.066010] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  796.066010] CR2: 00000036c9284360 CR3: 000000003b728000 CR4:
> 00000000000006f0
> [  796.066010] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [  796.066010] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [  796.066010] Process buildid.x (pid: 1014, threadinfo ffff88003be22000,
> task ffff88003b9f2e20)
> [  796.066010] Stack:
> [  796.066010]  ffff880037414e00 ffff88003b8f5370 ffff88003b8f53a8
> ffff88003b8f53b0
> [  796.066010]  ffff8800298acc98 ffff88003be23b28 ffffffff81152460
> ffff88003be23b78
> [  796.066010]  ffffffff811524e8 000000005011ad7a ffff88003b8f5370
> ffff88003b8f5370
> [  796.066010] Call Trace:
> [  796.066010]  [<ffffffff81152460>] __vma_link_rb+0x30/0x40
> [  796.066010]  [<ffffffff811524e8>] vma_link+0x78/0xe0
> [  796.066010]  [<ffffffff8115460f>] mmap_region+0x56f/0x630
> [  796.066010]  [<ffffffff8115491d>] do_mmap_pgoff+0x24d/0x340
> [  796.066010]  [<ffffffff8114246c>] vm_mmap_pgoff+0x7c/0xb0
> [  796.066010]  [<ffffffff811424cd>] vm_mmap+0x2d/0x40
> [  796.066010]  [<ffffffff811d76f7>] load_elf_binary+0x627/0x1a70
> [  796.066010]  [<ffffffff8114edf5>] ? __get_user_pages+0x175/0x580
> [  796.066010]  [<ffffffff8118bb35>] ? copy_strings.isra.26+0x2b5/0x300
> [  796.066010]  [<ffffffff811d70d0>] ? elf_core_dump+0x1230/0x1230
> [  796.066010]  [<ffffffff8118be77>] search_binary_handler+0x117/0x320
> [  796.066010]  [<ffffffff8118d3c4>] do_execve_common.isra.30+0x464/0x530
> [  796.066010]  [<ffffffff8118d4ab>] do_execve+0x1b/0x20
> [  796.066010]  [<ffffffff8101c3e7>] sys_execve+0x47/0x70
> [  796.066010]  [<ffffffff8161166c>] stub_execve+0x6c/0xc0
> [  796.066010] Code: 56 49 89 f6 41 55 41 54 53 4c 8b 27 66 2e 0f 1f 84 00
> 00 00 00 00 49 83 e4 fc 74 4a 49 8b 04 24 a8 01 75 42 48 89 c3 48 83 e3 fc
> <4c> 8b 6b 10 4d 39 e5 74 7a 4d 85 ed 74 45 49 8b 55 00 f6 c2 01 

Anton, Oleg, any thoughts?

Comment 4 William Cohen 2012-07-27 03:00:32 UTC
You are correct that previous backtrace was an stock upstream 3.5.0+
compiled with config based on f17.  Below is a traceback from the same
pr14348_perf.sh reproducer, the backtrace looks very similar.


[  156.047010] BUG: soft lockup - CPU#0 stuck for 23s! [buildid.x:976]
[  156.047010] Modules linked in: bnep bluetooth rfkill ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ip6table_filter ip6_tables tpm_bios snd_intel8x0 microcode snd_ac97_codec ac97_bus e1000 snd_pcm snd_page_alloc i2c_piix4 i2c_core snd_timer snd soundcore virtio_balloon uinput virtio_blk [last unloaded: scsi_wait_scan]
[  156.047010] CPU 0 
[  156.047010] Modules linked in: bnep bluetooth rfkill ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ip6table_filter ip6_tables tpm_bios snd_intel8x0 microcode snd_ac97_codec ac97_bus e1000 snd_pcm snd_page_alloc i2c_piix4 i2c_core snd_timer snd soundcore virtio_balloon uinput virtio_blk [last unloaded: scsi_wait_scan]
[  156.047010] 
[  156.047010] Pid: 976, comm: buildid.x Not tainted 3.4.6-2.fc17.x86_64 #1 Red Hat KVM
[  156.047010] RIP: 0010:[<ffffffff812c6f0a>]  [<ffffffff812c6f0a>] rb_insert_color+0x2a/0x150
[  156.047010] RSP: 0018:ffff88003d185b00  EFLAGS: 00000286
[  156.047010] RAX: ffff88003bfaa0e8 RBX: ffff88003bfaa0e8 RCX: ffff88003bfaa0e8
[  156.047010] RDX: ffff88003bfaa0e8 RSI: ffff880036827c08 RDI: ffff88003bfaa0e8
[  156.047010] RBP: ffff88003d185b28 R08: ffff88003bfaa0e8 R09: 0000000000000000
[  156.047010] R10: ffff88003bfaa0e8 R11: ffff88003bfaa0b0 R12: 0000000000000000
[  156.047010] R13: ffff88003d185b28 R14: 0000000000200000 R15: 00007fff24338000
[  156.047010] FS:  0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[  156.047010] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  156.047010] CR2: 00000036c9284360 CR3: 000000003689e000 CR4: 00000000000006f0
[  156.047010] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  156.047010] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  156.047010] Process buildid.x (pid: 976, threadinfo ffff88003d184000, task ffff88003d380000)
[  156.047010] Stack:
[  156.047010]  ffff880036827c00 ffff88003bfaa0b0 ffff88003bfaa0e8 ffff88003bfaa0f0
[  156.047010]  ffff880026d29a40 ffff88003d185b38 ffffffff8114b440 ffff88003d185b88
[  156.047010]  ffffffff8114b4c8 0000000050120327 ffff88003bfaa0b0 ffff88003bfaa0b0
[  156.047010] Call Trace:
[  156.047010]  [<ffffffff8114b440>] __vma_link_rb+0x30/0x40
[  156.047010]  [<ffffffff8114b4c8>] vma_link+0x78/0xe0
[  156.047010]  [<ffffffff8114d12d>] mmap_region+0x56d/0x630
[  156.047010]  [<ffffffff8114d53d>] do_mmap_pgoff+0x34d/0x3a0
[  156.047010]  [<ffffffff8114d7ed>] do_mmap+0x2d/0x40
[  156.047010]  [<ffffffff815ed9ea>] elf_map+0x119/0x14a
[  156.047010]  [<ffffffff811d07b9>] load_elf_binary+0x5e9/0x18f0
[  156.047010]  [<ffffffff811871d5>] ? copy_strings.isra.24+0x2b5/0x300
[  156.047010]  [<ffffffff811d01d0>] ? notesize.isra.6+0x30/0x30
[  156.047010]  [<ffffffff81187517>] search_binary_handler+0x117/0x320
[  156.047010]  [<ffffffff81188a54>] do_execve_common.isra.28+0x434/0x500
[  156.047010]  [<ffffffff81188b3b>] do_execve+0x1b/0x20
[  156.047010]  [<ffffffff8101c3c7>] sys_execve+0x47/0x70
[  156.047010]  [<ffffffff815fd12c>] stub_execve+0x6c/0xc0
[  156.047010] Code: 00 55 48 89 e5 41 57 49 89 ff 41 56 49 89 f6 41 55 41 54 53 4c 8b 27 66 2e 0f 1f 84 00 00 00 00 00 49 83 e4 fc 74 4a 49 8b 04 24 <a8> 01 75 42 48 89 c3 48 83 e3 fc 4c 8b 6b 10 4d 39 e5 74 7a 4d 
[  156.047010] Call Trace:
[  156.047010]  [<ffffffff8114b440>] __vma_link_rb+0x30/0x40
[  156.047010]  [<ffffffff8114b4c8>] vma_link+0x78/0xe0
[  156.047010]  [<ffffffff8114d12d>] mmap_region+0x56d/0x630
[  156.047010]  [<ffffffff8114d53d>] do_mmap_pgoff+0x34d/0x3a0
[  156.047010]  [<ffffffff8114d7ed>] do_mmap+0x2d/0x40
[  156.047010]  [<ffffffff815ed9ea>] elf_map+0x119/0x14a
[  156.047010]  [<ffffffff811d07b9>] load_elf_binary+0x5e9/0x18f0
[  156.047010]  [<ffffffff811871d5>] ? copy_strings.isra.24+0x2b5/0x300
[  156.047010]  [<ffffffff811d01d0>] ? notesize.isra.6+0x30/0x30
[  156.047010]  [<ffffffff81187517>] search_binary_handler+0x117/0x320
[  156.047010]  [<ffffffff81188a54>] do_execve_common.isra.28+0x434/0x500
[  156.047010]  [<ffffffff81188b3b>] do_execve+0x1b/0x20
[  156.047010]  [<ffffffff8101c3c7>] sys_execve+0x47/0x70
[  156.047010]  [<ffffffff815fd12c>] stub_execve+0x6c/0xc0

Comment 5 William Cohen 2012-07-27 03:18:23 UTC
The addition of the longer  .note.gnu.build-id in the altered version of buildid.x places the probe point in dummy_frame function.

0000000000400530 <frame_dummy>:
  400530:	48 83 3d 08 02 20 00 	cmpq   $0x0,0x200208(%rip)        # 600740 <__JCR_END__>
  400537:	00 
  400538:	74 1b                	je     400555 <frame_dummy+0x25>
  40053a:	b8 00 00 00 00       	mov    $0x0,%eax
  40053f:	48 85 c0             	test   %rax,%rax
  400542:	74 11                	je     400555 <frame_dummy+0x25>
  400544:	55                   	push   %rbp
  400545:	bf 40 07 60 00       	mov    $0x600740,%edi <<<<uprobe here
  40054a:	48 89 e5             	mov    %rsp,%rbp
  40054d:	ff d0                	callq  *%rax
  40054f:	5d                   	pop    %rbp
  400550:	e9 7b ff ff ff       	jmpq   4004d0 <register_tm_clones>
  400555:	e9 76 ff ff ff       	jmpq   4004d0 <register_tm_clones>
  40055a:	66 90                	xchg   %ax,%ax

Comment 6 Dave Jones 2012-07-27 15:30:59 UTC
with upstream kernels, you're going to get a much faster turnaround reporting bugs like this directly yourself than us acting as middle-man.

With 800+ open bugs, we just don't have the manpower to champion every single bug to its upstream maintainer.

Comment 9 Oleg Nesterov 2012-07-27 16:25:12 UTC
I'll try to reserve the testing machine and reproduce.

But can I ask a couple of questions right now? I am looking
into pr14348_perf.sh, and I do not understand what is essential
and what is not.

Am I understand correctly?

        - "bogus information" means
          "#okay mung the executable and try running again" in the script

        - the kernel hangs at "perf record -e uprobes:FOO -aR ./buildid.x"
          (2nd one)

        - everything before "mung the executable and try running again"
          is not needed to reproduce

Yes?

And thank you very much for the simple reproducer.

Comment 10 William Cohen 2012-07-27 16:45:25 UTC
Created attachment 600832 [details]
further simplify reproducer

You can further simply the example by removing the first setup of perf/perf record/perf report like the attached. The first run of perf was more to verify that the mechanism was working and data was being collected.  The objcopy changes the location of the instruction and appears to still place the probe in the text section but a different function. The big problem is that this causes the kernel to hang.  The kernel should be doing at enough checking that probing a userspace application isn't going hang the a processor in the kernel.

Comment 11 Oleg Nesterov 2012-07-28 12:07:05 UTC
(In reply to comment #10)
>
> You can further simply the example by removing the first setup of perf/perf
> record/perf report like the attached.

And even more. You do not need perf. Just
"echo 1 >> /sys/kernel/debug/tracing/events/uprobes/enable"
and run the binary.

Comment 12 Oleg Nesterov 2012-07-28 16:39:21 UTC
OK, I sent the fix upstream.

William, if it works for you, please add your tested-by tag.

If it doesn't we have another problem ;)

Comment 13 Mark Wielaard 2012-07-28 17:49:11 UTC
https://lkml.org/lkml/2012/7/28/59

Comment 14 William Cohen 2012-07-30 19:33:24 UTC
The proposed upstream fixes the problem with the kernel getting hung:

http://marc.info/?l=linux-kernel&m=134365481827073&w=2

Comment 15 Josh Boyer 2012-08-02 12:59:19 UTC
Srikar seems to have proposed a different fix for this (I think)

http://marc.info/?l=linux-kernel&m=134390221001981&w=2

Comment 16 Oleg Nesterov 2012-08-02 14:07:46 UTC
(In reply to comment #15)
> Srikar seems to have proposed a different fix for this (I think)
> 
> http://marc.info/?l=linux-kernel&m=134390221001981&w=2

Amazing. So far I think Srikar doesn't really understand
the problem(s).

http://marc.info/?l=linux-kernel&m=134391607007134

Comment 17 Josh Boyer 2012-08-08 13:37:35 UTC
Anton provided a uprobes backport patchset recently.  It included:

uprobes: mmap_region() corrupts mm->mm_rb if uprobe_mmap() fails

I believe this should be fixed in git now.

Comment 18 Anton Arapov 2012-08-08 14:15:39 UTC
I confirm.

Comment 19 Josh Boyer 2012-10-08 15:16:39 UTC
Fixed in an update.