Bug 437028 - KVM: Intel/VMX: host kernel should support DEBUGCTLMSR=0x1d9
Summary: KVM: Intel/VMX: host kernel should support DEBUGCTLMSR=0x1d9
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 19
Hardware: x86_64
OS: Linux
low
low
Target Milestone: ---
Assignee: fedora-kernel-kvm
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-03-11 18:27 UTC by Jan Kratochvil
Modified: 2013-09-29 14:47 UTC (History)
16 users (show)

Fixed In Version: kernel-3.11.1-200.fc19.x86_64
Clone Of:
Environment:
Last Closed: 2013-09-29 14:47:32 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Jan Kratochvil 2008-03-11 18:27:27 UTC
Description of problem:
If you run a testcase `block-step' on a new host+guest kernel the guest kernel
crashes.  Only the testcase should report failure.

Version-Release number of selected component (if applicable):
kernel-2.6.24.3-12.fc8.x86_64 (F8, tried)
(in fact I did verify it on Rawhide but I read the sources below)
kvm-60-3.fc8.x86_64

How reproducible:
Always.

Steps to Reproduce:
1. Run on the host machine:
   kernel-2.6.24.3-12.fc8.x86_64
   (tried this one, latest Rawhide+upstream kernels would IMO behave the same)
2. Run qemu-kvm.
3. Run kernel-2.6.25-0.101.rc4.git3.fc9 as the guest kernel.
   (kernel-2.6.24.3-12.fc8.x86_64 would not work as it still does not support
    PTRACE_SINGLEBLOCK, any Rawhide later+upstream kernels would IMO behave
    the same)
4. wget
http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/block-step.c?cvsroot=systemtap
5. gcc -o block-step block-step.c -Wall -ggdb2 -D_GNU_SOURCE
5. ./block-step;echo $?

Actual results:
Guest kernel crash (see the Bug 436678 for the dumps).

Expected results:
(no crash)
Value 0 (PASS) or 1 (FAIL) should get returned, depending on whether the
DEBUGCTLMSR_BTF CPU feature would get emulated by KVM.

Additional info:
qemu-kvm now prints:
Mar 11 07:44:13 host0 kernel: kvm: 13319: cpu0 unhandled wrmsr: 0x1d9
Mar 11 07:44:13 host0 kernel: inject_general_protection: rip 0xffffffff8100a88b

qemu-system-x86_64 works fine (and the testcase FAILs there even on the new
kernels).

qemu-0.9.1-4:
void helper_wrmsr(void)
    default:
        /* XXX: exception ? */
        break;

kernel-2.6.25-0.101.rc4.git3.fc9:
int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
        default:
                pr_unimpl(vcpu, "unhandled wrmsr: 0x%x data %llx\n", msr, data);
                return 1;

I did not try/find a hardware not supporting MSR_IA32_DEBUGCTLMSR how it does
behave (if it gets ignored or some exception is invoked by the CPU there).

Comment 1 Roland McGrath 2008-03-11 20:08:55 UTC
The main question here is if KVM is reporting as a machine that should have that
MSR.  It's probably more or less kosher to freak all the way out when an unknown
MSR is written.  The (2.6.25) guest kernel code is supposed to detect the CPU
models that don't have it.  If KVM is reporting as hardware that does not have
it but the guest kernel wrongly thinks it does have it, then it's the guest
kernel's fault.

Comment 2 Chuck Ebbert 2008-03-13 20:38:55 UTC
A bunch of fixes for kvm just went in. Can you try the latest rawhide as both
guest and host and see if that works?


Comment 3 Jan Kratochvil 2008-03-16 17:08:29 UTC
OK, on both host + guest running: kernel-2.6.25-0.121.rc5.git4.fc9.x86_64
and: kvm-63-2.fc9.x86_64
I got:
  kernel: kvm: 10897: cpu0 svm_set_msr: MSR_IA32_DEBUGCTLMSR 0x2, nop
and the testcase result code
1 (FAIL)


Comment 4 Jan Kratochvil 2008-03-28 19:58:30 UTC
The problem still exists for host & guest:
  kernel-2.6.25-0.167.rc7.git2.fc9.x86_64
but this time on
  Intel Core2 T7200 (Lenovo T60)

Checked kernel-2.6.25-0.170.rc7.git3.fc9 contains the code only for AMD:
./arch/x86/kvm/svm.c: pr_unimpl(vcpu, "%s: MSR_IA32_DEBUGCTLMSR 0x%llx, nop\n",

It was (most probably) checked before in Comment 3 on:
  Dual-Core AMD Opteron(tm) Processor 8220 SE

(Unaware where should be the detection Roland wrote about in the Comment 1.)


Comment 5 Chuck Ebbert 2008-04-27 04:03:51 UTC
Should be fixed in 2.6.25-final.

Comment 6 Jan Kratochvil 2008-04-27 20:00:00 UTC
Still crashing on kernel-2.6.25-1.fc9.x86_64:
kvm: 7805: cpu0 unhandled wrmsr: 0x1d9 data 2
kvm: 7805: cpu0 unhandled wrmsr: 0x1d9 data 0

I will reopen it after a reboot to a more recent Rawhide kernel but this one is
already 2.6.25-final.


Comment 7 Jan Kratochvil 2008-05-04 18:17:06 UTC
Still crashing in kernel-2.6.25-8.fc9.x86_64.
(and I do not see a relevant changelog entry in kernel-2.6.25-14.fc9.x86_64 or
kernel-2.6.25.1-1.fc10)


Comment 8 Chuck Ebbert 2008-05-07 03:41:36 UTC
I don't think the MSR is unsupported on the host CPU -- KVM is just failing to
implement it in the guest. It should not crash the guest and it looks like the
code that went into SVM should also go into vmx.c.

Or even better since it's the same code maybe it should just be in
arch/x86/kvm/x86.c:kvm_[gs]et_msr_common() ??

This commit fixed the problem in SVM:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a2938c807024ba30191e3bd593430c0659d75717


Comment 9 Chuck Ebbert 2008-05-13 05:44:34 UTC
It looks pretty straightforward to move the code from commit a2938c80 into the
generic KVM code so both Intel and AMD processors handle this case...

Comment 10 Avi Kivity 2008-05-13 07:32:28 UTC
It isn't straightforward.  We need to see how Intel cpus handle 
last-branch-record virtualization.

What can potentially be done is to allow writes to the MSR that don't turn on 
LBR, and only fail those that do.

Comment 11 Bug Zapper 2008-05-14 05:57:12 UTC
Changing version to '9' as part of upcoming Fedora 9 GA.
More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 12 Jan Kratochvil 2008-07-21 21:43:44 UTC
Just a confirmation the generated exception for the unsupported wrmsr types is
right:
RDMSR
http://www.cs.inf.ethz.ch/stricker/lab/doc/intel-part1.pdf
#GP(0) If the value in ECX specifies a reserved or unimplemented MSR address.

WRMSR
http://www.cs.inf.ethz.ch/stricker/lab/doc/intel-part2.pdf
#GP(0) If the value in ECX specifies a reserved or unimplemented MSR address.

Just the guest kernel should not crash on unsupported MSR register - it may
happen for DEBUGCTLMSR=0x1d9 on real silicon i586 (->a different kernel Bug).

On qemu-system-x86_64 of qemu-0.9.1-6.fc9.x86_64 it just ignores the wrmsr
instructions for unknown registers - it does not crash.  The testcase
http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/block-step.c?cvsroot=systemtap
will return exit code 2 as "unsupported".
Still it is a qemu bug - as it does not support some essential MSR registers a
generated exception would stop Linux kernel to boot:
console [earlyser0] enabled
end_pfn_map = 1048576
PANIC: early exception 0d rip 10:ffffffff81468302 error 0 cr2 0
Pid: 0, comm: swapper Not tainted 2.6.25.10-86.fc9.x86_64 #1

Call Trace:
 [<ffffffff81475500>] ? add_active_range+0x39/0xef
 [<ffffffff81468302>] ? mtrr_bp_init+0xda/0x137
 [<ffffffff814675ff>] ? e820_end_of_ram+0x5c/0x6b
 [<ffffffff81465da9>] ? setup_arch+0x22d/0x4ee
 [<ffffffff8104dbcf>] ? clockevents_register_notifier+0x27/0x34
 [<ffffffff8145f935>] ? start_kernel+0x76/0x2f4
 [<ffffffff8145f1dc>] ? _sinittext+0x1dc/0x1e3

RIP 0x10

Considering this Bug as the KVM RFE for the DEBUGCTLMSR=0x1d9 support.
Going to open another Bug for the ptrace detection whether DEBUGCTLMSR=0x1d9 is
supported by the underlying hardware.


Comment 13 Mark McLoughlin 2008-11-11 17:27:54 UTC
Re-assigning kvm.ko bugs to the kvm package for easier tracking

Comment 14 Chris Lalancette 2009-03-02 14:39:12 UTC
Just FYI; in current F-11 kvm, this block-step program no longer causes a guest crash.  It now causes:

kvm: 7962: cpu1 kvm_set_msr_common: MSR_IA32_DEBUGCTLMSR 0x2, nop

to be printed on the host dmesg.  Additionally, the block-step program inside the guest now has a return code of 2.  Is that sufficient to address this BZ, or are you asking for full LBR virtualization?

Chris Lalancette

Comment 15 Jan Kratochvil 2009-03-04 12:45:02 UTC
No, block-step should return code 0 as everything working or 1 due to EIO.
Return code 2 is not acceptable for bug-free kernel.

Bug 456175 Comment 1 by Roland McGrath:
> There is no x86-64 hardware without debugctlmsr, so that is just a kvm issue. 
[...]
> The existing code (now upstream) checks >= 6 against the same number that's
> shown in "cpu family".  So that check would not let the K6 try it, and
> PTRACE_SINGLEBLOCK would get EIO.
> 
> The model check is compiled away by CONFIG_X86_DEBUGCTLMSR.
[...]

Therefore assuming Cced Roland McGrath does not accept x86_64 runtime model check which would be there only for KVM guests as any real x86_64 hardware supports debugctlmsr.

Comment 16 Roland McGrath 2009-03-04 19:57:09 UTC
The upstream x86 kernel maintainers can decide if the CONFIG_X86_DEBUGCTLMSR and/or arch_has_block_step() criteria should change.  AFAIK the existing definitions are the right criteria for real hardware.  If KVM folks want the kernel to use new criteria specially tailored for how KVM differs from real hardware, they should take that issue upstream.

Comment 17 Mark McLoughlin 2009-03-20 17:07:44 UTC
Sounds like the issue applies to F11 too; setting version to rawhide

Comment 18 Bug Zapper 2009-06-09 09:28:59 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle.
Changing version to '11'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 19 Bug Zapper 2010-04-27 11:56:36 UTC
This message is a reminder that Fedora 11 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 11.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '11'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 11's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 11 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 20 Jan Kratochvil 2010-04-27 19:11:07 UTC
Verified block-step return code 2 on:
host+guest kernel-2.6.32.11-99.fc12.x86_64
host qemu-system-x86-0.11.0-13.fc12.x86_64

Comment 21 Bug Zapper 2010-11-04 12:00:39 UTC
This message is a reminder that Fedora 12 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 12.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '12'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 12's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 12 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 22 Jan Kratochvil 2010-11-04 17:10:27 UTC
kernel-debug-2.6.35.6-50.fc14.x86_64
qemu-system-x86-0.12.5-1.fc13.x86_64
kvm: 6881: cpu0 kvm_set_msr_common: MSR_IA32_DEBUGCTLMSR 0x2, nop

Comment 23 Cole Robinson 2012-05-20 23:38:42 UTC
Doesn't look like relevant kernel code has changed for a while, so moving to rawhide.

Comment 24 Fedora End Of Life 2013-04-03 20:07:17 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19

Comment 25 Justin M. Forbes 2013-04-05 15:46:16 UTC
Is this still a problem with 3.9 based F19 kernels?

Comment 26 Jan Kratochvil 2013-04-05 15:51:10 UTC
Yes; BTW the reproducer in Comment 0 is really simple to run.

host:  kernel-3.8.4-202.fc18.x86_64
guest: kernel-3.8.4-202.fc18.x86_64

Comment 27 Josh Boyer 2013-09-18 20:29:21 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 19 kernel bugs.

Fedora 19 has now been rebased to 3.11.1-200.fc19.  Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you experience different issues, please open a new bug report for those.

Comment 28 Jan Kratochvil 2013-09-29 14:47:32 UTC
kernel-3.11.1-200.fc19.x86_64

Testcase returns rc 2 in guest and host kernel reports:
kvm [PID]: vcpu2 kvm_set_msr_common: MSR_IA32_DEBUGCTLMSR 0x2, nop

It no longer crashes, just PTRACE_SINGLEBLOCK silently does the same what PTRACE_SINGLESTEP does, that is PTRACE_SINGLEBLOCK cannot be used in KVM.

A better fix would be to support PTRACE_SINGLEBLOCK even in KVM but the crash is fixed, therefore closing this bug.


Note You need to log in before you can comment on or make changes to this bug.