RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1259200 - old machine types in qemu-kvm-rhev should not expose vPMU to KVM guest
Summary: old machine types in qemu-kvm-rhev should not expose vPMU to KVM guest
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.2
Hardware: x86_64
OS: All
high
high
Target Milestone: rc
: ---
Assignee: Wei Huang (AMD)
QA Contact: Virtualization Bugs
Jiri Herrmann
URL:
Whiteboard:
Depends On:
Blocks: 1259210
TreeView+ depends on / blocked
 
Reported: 2015-09-02 07:46 UTC by Xiaoqing Wei
Modified: 2016-09-27 14:02 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
Cause: New KVM code allows KVM hosts to emulate a PMU on AMD CPUs, but migration support for the PMU is not available. Consequence: KVM guests will detect a PMU when running on RHEL-7.2 hosts, and will be able to use it. However, the PMU state won't be migrated if the VM is migrated to another host. So the PMU is not fully functional if the VM is migrated. Workaround (if any): Using the PMU in a KVM guest is still not supported, so users are advised to not rely on performance counters being fully functional. Result: No PMU-related errors if the PMU is not used. See Also: Bug 1076010 - failed to access perfctr error shown after VM clean install (without -cpu host)
Clone Of:
: 1259210 (view as bug list)
Environment:
Last Closed: 2016-09-27 13:58:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Xiaoqing Wei 2015-09-02 07:46:08 UTC
Description of problem:

Recently, with the backport of PMU series into RHEL 7,

75f2bf3 [x86] kvm: Enable PMU handling for AMD PERFCTRn and EVNTSELn MSRs
65be410 [x86] kvm: Implement AMD vPMU code for KVM
c7c4aa0 [x86] kvm: Define kvm_pmu_ops to support vPMU function dispatch
9d57867 [x86] kvm: vpmu: introduce kvm_pmu_msr_idx_to_pmc
148c080 [x86] kvm: vpmu: reorder PMU functions
6e0cbda [x86] kvm: vpmu: whitespace and stylistic adjustments in PMU code
74212f8 [x86] kvm: vpmu: use the new macros to go between PMC, PMU and VCPU
56b426d [x86] kvm: vpmu: introduce pmu.h header
e13fb99 [x86] kvm: vpmu: rename a few PMU functions

SVM hosts would expose K7 level PMU feature to KVM guests, without a chance to turn on or off.

boot vm the -M pc-i440fx-rhel7.0.0 machine type still sees PMU in guest dmesg

Since machine type could not control this guest awared cpuid change, it is likely to broke migration.

Version-Release number of selected component (if applicable):
kernel-3.10.0-290.el7.x86_64 or later (the kernel on host, which 290 has patches above)
qemu-kvm-rhev-2.3.0-21.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. boot a VM on AMD host(host with kernel >= 290)
    -machine pc-i440fx-rhel7.0.0 \
    -cpu 'Opteron_G5'

2. in guest: dmesg|less to check whether PMU is detected
3.

Actual results:
PMU detected

Expected results:
PMU shouldn't detected with old machine type.
7.0.0 GA does not able to expose such feature to guest.

Additional info:

Since the PMU emulation codes are in host kernel, it's likely qemu-kvm (rhel) would be affected too, I'll try and clone(if it affected) later.

Comment 1 Xiaoqing Wei 2015-09-02 07:49:33 UTC
Hello Wei,

Could you please help to have a look at this bug ?

Thank you.

Comment 3 Xiaoqing Wei 2015-09-02 08:08:25 UTC
thinking use machine type to control would be better then introduce new option, thus change the bug summary.

Comment 4 Andrew Jones 2015-09-04 13:22:52 UTC
Here are the scenarios I can think of

              to-version
probed            vs.                          result
              from-version
-------------------------------------------------------
before        >=                               OK
after         >=                               OK
before        <                                output "unimplemented msr" on use
after         <                                OK, PMU not used

Note, the '<' scenarios are probably not supported.

So, unless we can come up with a supported, potentially problematic scenario to test (and find that it is indeed problematic by testing), then I think this bug should be closed as not-a-bug.

Comment 5 Karen Noel 2015-10-03 11:59:23 UTC
Can someone confirm that live migration is not broken between RHEL 7.1 <--> RHEL 7.2 hosts with 6.5 and 7.1 machine types on AMD? Thanks.

Comment 6 Wei Huang (AMD) 2015-10-06 04:45:37 UTC
(In reply to Karen Noel from comment #5)
> Can someone confirm that live migration is not broken between RHEL 7.1 <-->
> RHEL 7.2 hosts with 6.5 and 7.1 machine types on AMD? Thanks.

I tried migration between RHEL 7.1 and 7.2 hosts. The guest VM was a fresh copy installed from RHEL 7.1 ISO and I tested two VM machine types: RHEL6.5.0 and pc-i440fx-rhel7.0.0 (default). The migration didn't break based on what I saw. 

-Wei

Comment 7 juzhang 2015-10-08 06:49:45 UTC
(In reply to Karen Noel from comment #5)
> Can someone confirm that live migration is not broken between RHEL 7.1 <-->
> RHEL 7.2 hosts with 6.5 and 7.1 machine types on AMD? Thanks.

Hi Xiaoqing,

Could you give a update?

Best Regards,
Junyi

Comment 8 Xiaoqing Wei 2015-10-08 09:16:04 UTC
(In reply to juzhang from comment #7)
> (In reply to Karen Noel from comment #5)
> > Can someone confirm that live migration is not broken between RHEL 7.1 <-->
> > RHEL 7.2 hosts with 6.5 and 7.1 machine types on AMD? Thanks.
> 
> Hi Xiaoqing,
> 
> Could you give a update?
> 

Sure,

the test result posted by Wei in C#6 are identical to mine.
the VM is working after migration, but the underlying PMU bit has changed.
I filled this bug as I thought we should no bring hidden change to old machine types, that could be a potential break.
Fix me if I am wrong.

Regards,
Xiaoqing.

Comment 9 Paolo Bonzini 2015-10-19 09:57:14 UTC
Xiaoqing is right and I think this would ordinarily be a problem, but isn't PMU support technology preview?  In this case I think it's enough to add something to the release notes.

Comment 10 Eduardo Habkost 2015-10-19 15:25:13 UTC
PMU reporting on CPUID is disabled since qemu-kvm-1.5.2-3.el7 to avoid this kind of problem (see bug 853101). Can you clarify which CPUID bits are changing on migration, and what are the PMU messages seen on guest dmesg?

Comment 11 Xiaoqing Wei 2015-10-20 03:30:47 UTC
(In reply to Eduardo Habkost from comment #10)
> PMU reporting on CPUID is disabled since qemu-kvm-1.5.2-3.el7 to avoid this
> kind of problem (see bug 853101). Can you clarify which CPUID bits are
> changing on migration, and what are the PMU messages seen on guest dmesg?

Eduardo,

I dont have two AMD host on hand, could not test it right now.
Will try to find another rig for cross host live migration.

My guess is this bit: CPUID_EXT3_PERFNB

I found this change by booting VM on host with kernels with 7.1 GA, and 7.2(> 290), and check each VM's /proc/cpuinfo.

Comment 12 Eduardo Habkost 2015-10-20 19:38:56 UTC
By looking at the guest Linux code, it looks like it will enable the AMD PMU driver unconditionally if CPU vendor is AMD and CPUID family >= 6. I'm not sure there's anything we can do on QEMU side.

Comment 13 Xiaoqing Wei 2015-10-21 02:58:19 UTC
(In reply to Eduardo Habkost from comment #12)
> By looking at the guest Linux code, it looks like it will enable the AMD PMU
> driver unconditionally if CPU vendor is AMD and CPUID family >= 6.


Yes, that's exactly what I mean. unconditionally expose to guest.
but isn't it on host kernel side ?



 I'm not
> sure there's anything we can do on QEMU side.

I dont know the detail, just rem that vPMU for Intel could be controlled by machine type ?

Comment 14 Eduardo Habkost 2015-10-21 14:19:58 UTC
(In reply to Xiaoqing Wei from comment #13)
> (In reply to Eduardo Habkost from comment #12)
> > By looking at the guest Linux code, it looks like it will enable the AMD PMU
> > driver unconditionally if CPU vendor is AMD and CPUID family >= 6.
> 
> 
> Yes, that's exactly what I mean. unconditionally expose to guest.
> but isn't it on host kernel side ?

That's on the guest side. If CPU vendor is AMD and family >= 6 you will see "Performance Events: AMD PMU driver" in guest dmesg, it doesn't matter what's the host behavior (you would see that message even if there were absolutely no PMU emulation code implemented on the host side).

>  I'm not
> > sure there's anything we can do on QEMU side.
> 
> I dont know the detail, just rem that vPMU for Intel could be controlled by
> machine type ?

We disable reporting of architectural perfomance counters on CPUID[0xA], but we can't prevent the guest from making assumptions about the PMU based on family/model values.


I would like to clarify this from comment #11:

> I found this change by booting VM on host with kernels with 7.1 GA, and 7.2(> 290), and check each VM's /proc/cpuinfo.

You shouldn't see any difference on /proc/cpuinfo when changing the host version. Could you clarify what exactly looks different on the guest-side depending on the host version?

Comment 15 Xiaoqing Wei 2015-10-22 03:02:27 UTC
(In reply to Eduardo Habkost from comment #14)

> 
> I would like to clarify this from comment #11:
> 
> > I found this change by booting VM on host with kernels with 7.1 GA, and 7.2(> 290), and check each VM's /proc/cpuinfo.
> 
> You shouldn't see any difference on /proc/cpuinfo when changing the host
> version. Could you clarify what exactly looks different on the guest-side
> depending on the host version?

Oops, my fault, it's in dmesg

as C#0, step2
2. in guest: dmesg|less to check whether PMU is detected

Comment 16 Eduardo Habkost 2015-10-23 18:12:44 UTC
(In reply to Xiaoqing Wei from comment #15)
> > I would like to clarify this from comment #11:
> > 
> > > I found this change by booting VM on host with kernels with 7.1 GA, and 7.2(> 290), and check each VM's /proc/cpuinfo.
> > 
> > You shouldn't see any difference on /proc/cpuinfo when changing the host
> > version. Could you clarify what exactly looks different on the guest-side
> > depending on the host version?
> 
> Oops, my fault, it's in dmesg
> 
> as C#0, step2
> 2. in guest: dmesg|less to check whether PMU is detected

So, did you find actual changes in dmesg, or did you only noticed that the guest unconditionally assumes a PMU is available on all cases? Like Drew suggested on comment #4, I suggest we close this as NOTABUG unless there's something that actually breaks when upgrading the host.

Comment 17 Xiaoqing Wei 2015-10-25 09:17:52 UTC
(In reply to Eduardo Habkost from comment #16)
> (In reply to Xiaoqing Wei from comment #15)
> > > I would like to clarify this from comment #11:
> > > 
> > > > I found this change by booting VM on host with kernels with 7.1 GA, and 7.2(> 290), and check each VM's /proc/cpuinfo.
> > > 
> > > You shouldn't see any difference on /proc/cpuinfo when changing the host
> > > version. Could you clarify what exactly looks different on the guest-side
> > > depending on the host version?
> > 
> > Oops, my fault, it's in dmesg
> > 
> > as C#0, step2
> > 2. in guest: dmesg|less to check whether PMU is detected
> 
> So, did you find actual changes in dmesg, or did you only noticed that the
> guest unconditionally assumes a PMU is available on all cases? Like Drew
> suggested on comment #4, I suggest we close this as NOTABUG unless there's
> something that actually breaks when upgrading the host.

Yes, dmesg does changes, booting a vm on host with kernels before and after vPMU patches(guest img is same one, same kernel version)

on kernel-290+
Actual results:
PMU detected in guest dmesg

on old kernel, eg: kernel-123
Expected results:
PMU does not detected in guest dmesg

Comment 18 Eduardo Habkost 2015-10-26 19:31:13 UTC
I confirm that when running kernel-120 on the host, I see this:
"Broken PMU hardware detected, using software events only."

That's because the guest kernel checks if the PMU MSRs really work, before enabling PMU usage. But I don't think we can do anything to avoid that, if the kernel doesn't provide an interface to disable PMU emulation.

Comment 21 Paolo Bonzini 2016-09-27 13:58:50 UTC
There is no CPUID bit for the AMD MSRs 0xC0010000..0xC0010007.

PERFNB refers to MSRs 0xC0010200..0xC001020B


Note You need to log in before you can comment on or make changes to this bug.