Description of problem: If you run a testcase `block-step' on a new host+guest kernel the guest kernel crashes. Only the testcase should report failure. Version-Release number of selected component (if applicable): kernel-2.6.24.3-12.fc8.x86_64 (F8, tried) (in fact I did verify it on Rawhide but I read the sources below) kvm-60-3.fc8.x86_64 How reproducible: Always. Steps to Reproduce: 1. Run on the host machine: kernel-2.6.24.3-12.fc8.x86_64 (tried this one, latest Rawhide+upstream kernels would IMO behave the same) 2. Run qemu-kvm. 3. Run kernel-2.6.25-0.101.rc4.git3.fc9 as the guest kernel. (kernel-2.6.24.3-12.fc8.x86_64 would not work as it still does not support PTRACE_SINGLEBLOCK, any Rawhide later+upstream kernels would IMO behave the same) 4. wget http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/block-step.c?cvsroot=systemtap 5. gcc -o block-step block-step.c -Wall -ggdb2 -D_GNU_SOURCE 5. ./block-step;echo $? Actual results: Guest kernel crash (see the Bug 436678 for the dumps). Expected results: (no crash) Value 0 (PASS) or 1 (FAIL) should get returned, depending on whether the DEBUGCTLMSR_BTF CPU feature would get emulated by KVM. Additional info: qemu-kvm now prints: Mar 11 07:44:13 host0 kernel: kvm: 13319: cpu0 unhandled wrmsr: 0x1d9 Mar 11 07:44:13 host0 kernel: inject_general_protection: rip 0xffffffff8100a88b qemu-system-x86_64 works fine (and the testcase FAILs there even on the new kernels). qemu-0.9.1-4: void helper_wrmsr(void) default: /* XXX: exception ? */ break; kernel-2.6.25-0.101.rc4.git3.fc9: int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) default: pr_unimpl(vcpu, "unhandled wrmsr: 0x%x data %llx\n", msr, data); return 1; I did not try/find a hardware not supporting MSR_IA32_DEBUGCTLMSR how it does behave (if it gets ignored or some exception is invoked by the CPU there).
The main question here is if KVM is reporting as a machine that should have that MSR. It's probably more or less kosher to freak all the way out when an unknown MSR is written. The (2.6.25) guest kernel code is supposed to detect the CPU models that don't have it. If KVM is reporting as hardware that does not have it but the guest kernel wrongly thinks it does have it, then it's the guest kernel's fault.
A bunch of fixes for kvm just went in. Can you try the latest rawhide as both guest and host and see if that works?
OK, on both host + guest running: kernel-2.6.25-0.121.rc5.git4.fc9.x86_64 and: kvm-63-2.fc9.x86_64 I got: kernel: kvm: 10897: cpu0 svm_set_msr: MSR_IA32_DEBUGCTLMSR 0x2, nop and the testcase result code 1 (FAIL)
The problem still exists for host & guest: kernel-2.6.25-0.167.rc7.git2.fc9.x86_64 but this time on Intel Core2 T7200 (Lenovo T60) Checked kernel-2.6.25-0.170.rc7.git3.fc9 contains the code only for AMD: ./arch/x86/kvm/svm.c: pr_unimpl(vcpu, "%s: MSR_IA32_DEBUGCTLMSR 0x%llx, nop\n", It was (most probably) checked before in Comment 3 on: Dual-Core AMD Opteron(tm) Processor 8220 SE (Unaware where should be the detection Roland wrote about in the Comment 1.)
Should be fixed in 2.6.25-final.
Still crashing on kernel-2.6.25-1.fc9.x86_64: kvm: 7805: cpu0 unhandled wrmsr: 0x1d9 data 2 kvm: 7805: cpu0 unhandled wrmsr: 0x1d9 data 0 I will reopen it after a reboot to a more recent Rawhide kernel but this one is already 2.6.25-final.
Still crashing in kernel-2.6.25-8.fc9.x86_64. (and I do not see a relevant changelog entry in kernel-2.6.25-14.fc9.x86_64 or kernel-2.6.25.1-1.fc10)
I don't think the MSR is unsupported on the host CPU -- KVM is just failing to implement it in the guest. It should not crash the guest and it looks like the code that went into SVM should also go into vmx.c. Or even better since it's the same code maybe it should just be in arch/x86/kvm/x86.c:kvm_[gs]et_msr_common() ?? This commit fixed the problem in SVM: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a2938c807024ba30191e3bd593430c0659d75717
It looks pretty straightforward to move the code from commit a2938c80 into the generic KVM code so both Intel and AMD processors handle this case...
It isn't straightforward. We need to see how Intel cpus handle last-branch-record virtualization. What can potentially be done is to allow writes to the MSR that don't turn on LBR, and only fail those that do.
Changing version to '9' as part of upcoming Fedora 9 GA. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Just a confirmation the generated exception for the unsupported wrmsr types is right: RDMSR http://www.cs.inf.ethz.ch/stricker/lab/doc/intel-part1.pdf #GP(0) If the value in ECX specifies a reserved or unimplemented MSR address. WRMSR http://www.cs.inf.ethz.ch/stricker/lab/doc/intel-part2.pdf #GP(0) If the value in ECX specifies a reserved or unimplemented MSR address. Just the guest kernel should not crash on unsupported MSR register - it may happen for DEBUGCTLMSR=0x1d9 on real silicon i586 (->a different kernel Bug). On qemu-system-x86_64 of qemu-0.9.1-6.fc9.x86_64 it just ignores the wrmsr instructions for unknown registers - it does not crash. The testcase http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/block-step.c?cvsroot=systemtap will return exit code 2 as "unsupported". Still it is a qemu bug - as it does not support some essential MSR registers a generated exception would stop Linux kernel to boot: console [earlyser0] enabled end_pfn_map = 1048576 PANIC: early exception 0d rip 10:ffffffff81468302 error 0 cr2 0 Pid: 0, comm: swapper Not tainted 2.6.25.10-86.fc9.x86_64 #1 Call Trace: [<ffffffff81475500>] ? add_active_range+0x39/0xef [<ffffffff81468302>] ? mtrr_bp_init+0xda/0x137 [<ffffffff814675ff>] ? e820_end_of_ram+0x5c/0x6b [<ffffffff81465da9>] ? setup_arch+0x22d/0x4ee [<ffffffff8104dbcf>] ? clockevents_register_notifier+0x27/0x34 [<ffffffff8145f935>] ? start_kernel+0x76/0x2f4 [<ffffffff8145f1dc>] ? _sinittext+0x1dc/0x1e3 RIP 0x10 Considering this Bug as the KVM RFE for the DEBUGCTLMSR=0x1d9 support. Going to open another Bug for the ptrace detection whether DEBUGCTLMSR=0x1d9 is supported by the underlying hardware.
Re-assigning kvm.ko bugs to the kvm package for easier tracking
Just FYI; in current F-11 kvm, this block-step program no longer causes a guest crash. It now causes: kvm: 7962: cpu1 kvm_set_msr_common: MSR_IA32_DEBUGCTLMSR 0x2, nop to be printed on the host dmesg. Additionally, the block-step program inside the guest now has a return code of 2. Is that sufficient to address this BZ, or are you asking for full LBR virtualization? Chris Lalancette
No, block-step should return code 0 as everything working or 1 due to EIO. Return code 2 is not acceptable for bug-free kernel. Bug 456175 Comment 1 by Roland McGrath: > There is no x86-64 hardware without debugctlmsr, so that is just a kvm issue. [...] > The existing code (now upstream) checks >= 6 against the same number that's > shown in "cpu family". So that check would not let the K6 try it, and > PTRACE_SINGLEBLOCK would get EIO. > > The model check is compiled away by CONFIG_X86_DEBUGCTLMSR. [...] Therefore assuming Cced Roland McGrath does not accept x86_64 runtime model check which would be there only for KVM guests as any real x86_64 hardware supports debugctlmsr.
The upstream x86 kernel maintainers can decide if the CONFIG_X86_DEBUGCTLMSR and/or arch_has_block_step() criteria should change. AFAIK the existing definitions are the right criteria for real hardware. If KVM folks want the kernel to use new criteria specially tailored for how KVM differs from real hardware, they should take that issue upstream.
Sounds like the issue applies to F11 too; setting version to rawhide
This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle. Changing version to '11'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
This message is a reminder that Fedora 11 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 11. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '11'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 11's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 11 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Verified block-step return code 2 on: host+guest kernel-2.6.32.11-99.fc12.x86_64 host qemu-system-x86-0.11.0-13.fc12.x86_64
This message is a reminder that Fedora 12 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 12. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '12'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 12's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 12 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
kernel-debug-2.6.35.6-50.fc14.x86_64 qemu-system-x86-0.12.5-1.fc13.x86_64 kvm: 6881: cpu0 kvm_set_msr_common: MSR_IA32_DEBUGCTLMSR 0x2, nop
Doesn't look like relevant kernel code has changed for a while, so moving to rawhide.
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle. Changing version to '19'. (As we did not run this process for some time, it could affect also pre-Fedora 19 development cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.) More information and reason for this action is here: https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19
Is this still a problem with 3.9 based F19 kernels?
Yes; BTW the reproducer in Comment 0 is really simple to run. host: kernel-3.8.4-202.fc18.x86_64 guest: kernel-3.8.4-202.fc18.x86_64
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs. Fedora 19 has now been rebased to 3.11.1-200.fc19. Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel. If you experience different issues, please open a new bug report for those.
kernel-3.11.1-200.fc19.x86_64 Testcase returns rc 2 in guest and host kernel reports: kvm [PID]: vcpu2 kvm_set_msr_common: MSR_IA32_DEBUGCTLMSR 0x2, nop It no longer crashes, just PTRACE_SINGLEBLOCK silently does the same what PTRACE_SINGLESTEP does, that is PTRACE_SINGLEBLOCK cannot be used in KVM. A better fix would be to support PTRACE_SINGLEBLOCK even in KVM but the crash is fixed, therefore closing this bug.