I was trying to set up a way to measure the latency of some checksumming routines, and decided to try and use the ftrace function profiler. If I do this on my x86_64 KVM guest, the guest either does an immediate reboot or just hangs, with no oops message or anything printed to the console: # echo 1 > /sys/kernel/debug/tracing/function_profile_enabled The guest kernel is the latest rawhide kernel as of a day or two ago: 3.8.0-0.rc6.git1.1.fc19.x86_64
FWIW, same result with kernel-3.8.0-0.rc6.git3.3.fc19 as well
Do you get the same reboot/lockup if you run function tracer? echo function > current_tracer
Nope, that works fine.
I just remember that function profiling defaults to use function_graph, could you try that one too? echo function_graph > current_tracer See if that crashes.
Nope, that doesn't crash either: [root@rawhide tracing]# echo function_graph > current_tracer [root@rawhide tracing]# echo $? 0 [root@rawhide tracing]# cat current_tracer function_graph
It only crashes on kvm guests and not bare metal?
I haven't tested it on bare metal. I don't have a bare metal rawhide machine set up at the moment.
Ok, I installed kernel-3.8.0-0.rc6.git3.3.fc19 on my main workstation (bare metal), and it did indeed reboot. FWIW, my main w/s has an Intel CPU and the host where the KVM guest lives is running on an AMD CPU. So maybe not anything HW specific?
Also, I can't reproduce it on 3.7.5-201.fc18.x86_64, so it appears to be a regression introduced in 3.8.
I don't have a rawhide config. My latest test box is only f18, and as you pointed out, it doesn't have the issue. I'm testing with the f18 config with 3.8-rc6 and there's no problem. Can you attach the kernel config file for 3.8.0-0.rc6.git3.3.fc19 and I'll see if I can reproduce it on my f18 box. Thanks
Created attachment 695425 [details] kernel config for 3.8.0-0.rc6.git3.3.fc19.x86_64 Sure... I didn't have a rawhide install on bare metal either. I just installed the f19 kernel on my f18 box to reproduce it there. A full rawhide install doesn't seem to be necessary.
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle. Changing version to '19'. (As we did not run this process for some time, it could affect also pre-Fedora 19 development cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.) More information and reason for this action is here: https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19
Is this still an issue with the 3.9 kernels in F19?
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 2 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.
Sorry, missed that you had requested info of me (the needinfo flag wasn't set)... I don't have a f19 3.9 kernel handy, but the problem still exists on: 3.9.0-0.rc7.git3.1.fc20.x86_64
Interestingly, I did see this pop up on the KVM serial console when I did this: [192580.150024] ------------[ cut here ]------------ ...but nothing else. The box also didn't do a hard reset this time, but rather just hung hard. With that though, I can collect a vmcore -- stay tuned...
Hrm...crash seems to be having "issues" with vmcores dumped from KVM, so while I have a "virsh dump", I'm getting the following error when I try to open it with crash: crash: read error: kernel virtual address: ffffffff81807090 type: "cpu_possible_mask" ...I also generated a "virsh dump" from a running (uncrashed) guest and got the same result. I'll open a bug against crash and see whether we can get that resolved so we can work on chasing this down.
Ok, it turns out that "virsh dump --memory-only" is how you're supposed to generate these vmcores nowadays. With that I get vmcores that crash can open, but after this bug fires, a lot of the memory seems to be scrambled. I get a lot of warnings: GNU gdb (GDB) 7.3.1 Copyright (C) 2011 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-unknown-linux-gnu"... crash: cannot determine thread return address please wait... (gathering module symbol data) WARNING: cannot access vmalloc'd module memory please wait... (gathering task table data) crash: duplicate task in pid_hash: ffffffff81090948 crash: invalid task address: ffff880079960000 crash: invalid kernel virtual address: 2abeffffff73 type: "fill_thread_info" crash: invalid task address: ffffffff81090948 crash: invalid task address: ffff8800798fcdc0 WARNING: active task ffff88007941a6e0 on cpu 0: corrupt cpu value: 2165486936 ...so this seems to be causing some sort of mem corruption? I can provide a vmcore if it'll help, but it looks pretty trashed...
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs. Fedora 19 has now been rebased to 3.11.1-200.fc19. Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel. If you experience different issues, please open a new bug report for those.
Jeff?
Oops, sorry -- I dropped the ball... Testing with 3.12.0-0.rc4.git0.1.fc21.x86_64 seems to be ok, as does 3.11.2-201.fc19.x86_64. It's unclear to me what kernel actually fixed it, but I guess we can call this one resolved.