909138 – "echo 1 > function_profile_enabled" causes immediate reboot or hang on KVM guest

Bug 909138 - "echo 1 > function_profile_enabled" causes immediate reboot or hang on KVM guest

Summary: "echo 1 > function_profile_enabled" causes immediate reboot or hang on KVM guest

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	19
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-02-08 11:54 UTC by Jeff Layton
Modified:	2014-06-18 07:42 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2013-10-08 18:01:57 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
kernel config for 3.8.0-0.rc6.git3.3.fc19.x86_64 (122.13 KB, text/plain) 2013-02-09 11:34 UTC, Jeff Layton	no flags	Details
View All

Description Jeff Layton 2013-02-08 11:54:58 UTC

I was trying to set up a way to measure the latency of some checksumming routines, and decided to try and use the ftrace function profiler. If I do this on my x86_64 KVM guest, the guest either does an immediate reboot or just hangs, with no oops message or anything printed to the console:

    # echo 1 > /sys/kernel/debug/tracing/function_profile_enabled

The guest kernel is the latest rawhide kernel as of a day or two ago:

    3.8.0-0.rc6.git1.1.fc19.x86_64

Comment 1 Jeff Layton 2013-02-08 12:07:11 UTC

FWIW, same result with kernel-3.8.0-0.rc6.git3.3.fc19 as well

Comment 2 Steven Rostedt 2013-02-08 15:15:14 UTC

Do you get the same reboot/lockup if you run function tracer?

echo function > current_tracer

Comment 3 Jeff Layton 2013-02-08 15:54:46 UTC

Nope, that works fine.

Comment 4 Steven Rostedt 2013-02-08 16:48:33 UTC

I just remember that function profiling defaults to use function_graph, could you try that one too?

echo function_graph > current_tracer

See if that crashes.

Comment 5 Jeff Layton 2013-02-08 17:08:11 UTC

Nope, that doesn't crash either:

[root@rawhide tracing]# echo function_graph > current_tracer
[root@rawhide tracing]# echo $?
0
[root@rawhide tracing]# cat current_tracer 
function_graph

Comment 6 Steven Rostedt 2013-02-08 17:17:16 UTC

It only crashes on kvm guests and not bare metal?

Comment 7 Jeff Layton 2013-02-08 18:47:06 UTC

I haven't tested it on bare metal. I don't have a bare metal rawhide machine set up at the moment.

Comment 8 Jeff Layton 2013-02-08 18:59:17 UTC

Ok, I installed kernel-3.8.0-0.rc6.git3.3.fc19 on my main workstation (bare metal), and it did indeed reboot. FWIW, my main w/s has an Intel CPU and the host where the KVM guest lives is running on an AMD CPU. So maybe not anything HW specific?

Comment 9 Jeff Layton 2013-02-08 19:00:50 UTC

Also, I can't reproduce it on 3.7.5-201.fc18.x86_64, so it appears to be a regression introduced in 3.8.

Comment 10 Steven Rostedt 2013-02-09 00:58:35 UTC

I don't have a rawhide config. My latest test box is only f18, and as you pointed out, it doesn't have the issue.

I'm testing with the f18 config with 3.8-rc6 and there's no problem. Can you attach the kernel config file for 3.8.0-0.rc6.git3.3.fc19 and I'll see if I can reproduce it on my f18 box.

Thanks

Comment 11 Jeff Layton 2013-02-09 11:34:28 UTC

Created attachment 695425 [details]
kernel config for 3.8.0-0.rc6.git3.3.fc19.x86_64

Sure...

I didn't have a rawhide install on bare metal either. I just installed the f19 kernel on my f18 box to reproduce it there. A full rawhide install doesn't seem to be necessary.

Comment 12 Fedora End Of Life 2013-04-03 15:44:10 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19

Comment 13 Justin M. Forbes 2013-04-05 18:47:33 UTC

Is this still an issue with the 3.9 kernels in F19?

Comment 14 Justin M. Forbes 2013-04-23 17:27:50 UTC

This bug is being closed with INSUFFICIENT_DATA as there has not been a
response in 2 weeks.  If you are still experiencing this issue,
please reopen and attach the relevant data from the latest kernel you are
running and any data that might have been requested previously.

Comment 15 Jeff Layton 2013-04-23 17:34:47 UTC

Sorry, missed that you had requested info of me (the needinfo flag wasn't set)...

I don't have a f19 3.9 kernel handy, but the problem still exists on:

    3.9.0-0.rc7.git3.1.fc20.x86_64

Comment 16 Jeff Layton 2013-04-23 17:39:09 UTC

Interestingly, I did see this pop up on the KVM serial console when I did this:

[192580.150024] ------------[ cut here ]------------

...but nothing else. The box also didn't do a hard reset this time, but rather just hung hard. With that though, I can collect a vmcore -- stay tuned...

Comment 17 Jeff Layton 2013-04-23 17:59:36 UTC

Hrm...crash seems to be having "issues" with vmcores dumped from KVM, so while I have a "virsh dump", I'm getting the following error when I try to open it with crash:

crash: read error: kernel virtual address: ffffffff81807090  type: "cpu_possible_mask"

...I also generated a "virsh dump" from a running (uncrashed) guest and got the same result. I'll open a bug against crash and see whether we can get that resolved so we can work on chasing this down.

Comment 18 Jeff Layton 2013-04-23 19:20:42 UTC

Ok, it turns out that "virsh dump --memory-only" is how you're supposed to generate these vmcores nowadays. With that I get vmcores that crash can open, but after this bug fires, a lot of the memory seems to be scrambled.

I get a lot of warnings:

GNU gdb (GDB) 7.3.1
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

crash: cannot determine thread return address
please wait... (gathering module symbol data)   
WARNING: cannot access vmalloc'd module memory

please wait... (gathering task table data)
crash: duplicate task in pid_hash: ffffffff81090948

crash: invalid task address: ffff880079960000

crash: invalid kernel virtual address: 2abeffffff73  type: "fill_thread_info"

crash: invalid task address: ffffffff81090948

crash: invalid task address: ffff8800798fcdc0

WARNING: active task ffff88007941a6e0 on cpu 0: corrupt cpu value: 2165486936

...so this seems to be causing some sort of mem corruption? I can provide a vmcore if it'll help, but it looks pretty trashed...

Comment 19 Josh Boyer 2013-09-18 20:49:22 UTC

*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs.

Fedora 19 has now been rebased to 3.11.1-200.fc19.  Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you experience different issues, please open a new bug report for those.

Comment 20 Josh Boyer 2013-10-08 17:34:10 UTC

Jeff?

Comment 21 Jeff Layton 2013-10-08 18:01:57 UTC

Oops, sorry -- I dropped the ball...

Testing with 3.12.0-0.rc4.git0.1.fc21.x86_64 seems to be ok, as does 3.11.2-201.fc19.x86_64. It's unclear to me what kernel actually fixed it, but I guess we can call this one resolved.

Note You need to log in before you can comment on or make changes to this bug.