1420903 – Seeing cpu affinity is not supported messages on compute node

Bug 1420903 - Seeing cpu affinity is not supported messages on compute node

Summary: Seeing cpu affinity is not supported messages on compute node

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-nova
Sub Component:
Version:	9.0 (Mitaka)
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	zstream
Target Release:	9.0 (Mitaka)
Assignee:	Sahid Ferdjaoui
QA Contact:	Prasanth Anbalagan
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1515165 1563067 1591385
TreeView+	depends on / blocked

Reported:	2017-02-09 20:09 UTC by Jeremy
Modified:	2020-12-21 19:34 UTC (History)
CC List:	19 users (show)
Fixed In Version:	openstack-nova-13.1.4-13.el7ost
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1515165 1563067 1591385 (view as bug list)
Environment:
Last Closed:	2018-03-15 12:43:27 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
OpenStack gerrit	393254	0	None	MERGED	libvirt: fix vCPU usage reporing for LXC/QEMU guests	2020-12-18 01:04:17 UTC
Red Hat Product Errata	RHBA-2018:0538	0	None	None	None	2018-03-15 12:44:58 UTC

Description Jeremy 2017-02-09 20:09:45 UTC

Description of problem:
seeing the following messages in nova-compute.log
How can we prevent the messages? Why are we seeing these messages?

2017-02-09 04:38:16.125 51668 WARNING nova.virt.libvirt.driver [req-c0de8ab8-16ff-4caf-88b7-59e729045ca4 - - - - -] couldn't obtain the vpu count from domain id: a137b075-1c2d-41e6-a3ea-a5aeea6a182c, exception: Requested operation is not valid: cpu affinity is not supported



Version-Release number of selected component (if applicable):
 grep qemu sosreport-20170209-125702/compute-4.nuvem-intera.local/installed-rpms 
 ipxe-roms-qemu-20160127-5.git6366fa7a.el7.noarch            Tue Jan 10 04:58:47 2017
 libvirt-daemon-driver-qemu-2.0.0-10.el7_3.2.x86_64          Tue Jan 10 05:00:20 2017
qemu-img-rhev-2.6.0-27.el7.x86_64                           Tue Jan 10 04:59:34 2017
 qemu-kvm-common-rhev-2.6.0-27.el7.x86_64                    Tue Jan 10 04:59:28 2017
 qemu-kvm-rhev-2.6.0-27.el7.x86_64                           Tue Jan 10 05:00:21 2017



Additional info:
virt_type is kvm

cpu has vmx flag and lsmod shows kvm_intel loaded.

Comment 1 Red Hat Bugzilla Rules Engine 2017-02-09 20:09:55 UTC

This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.

Comment 4 Jeremy 2017-02-10 14:20:03 UTC

Hello,
Also seeing these messages on the system, not sure if it's related or what it means.

Feb  7 19:21:37 compute-4 kernel: kvm_get_msr_common: 6 callbacks suppressed

Comment 5 Daniel Berrangé 2017-02-10 16:25:48 UTC

(In reply to Jeremy from comment #0)
> Description of problem:
> seeing the following messages in nova-compute.log
> How can we prevent the messages? Why are we seeing these messages?
> 
> 2017-02-09 04:38:16.125 51668 WARNING nova.virt.libvirt.driver
> [req-c0de8ab8-16ff-4caf-88b7-59e729045ca4 - - - - -] couldn't obtain the vpu
> count from domain id: a137b075-1c2d-41e6-a3ea-a5aeea6a182c, exception:
> Requested operation is not valid: cpu affinity is not supported

[snip]

Libvirt only reports this error message if the guest does not provide a thread-per-VCPU. This means the guest must be running TCG, not KVM....

> Additional info:
> virt_type is kvm

So I'm sceptical about this claim. Perhaps there was a guest launched with QEMU before nova virt_type was then changed to kvm.

In any case, the real bug is nova because it was asking for vCPU affinity data for TCG guests which will always fail

Comment 6 Daniel Berrangé 2017-02-10 16:28:03 UTC

The nova bug was fixed upstream in

commit 2fdab3b922b0d99f415902462de967a910a6594b
Author: Daniel P. Berrange <berrange>
Date:   Wed Nov 2 14:46:06 2016 +0000

    libvirt: fix vCPU usage reporing for LXC/QEMU guests
    
    Currently if Nova is using the libvirt LXC driver, it is
    hardcoded to report 1 vCPU used on the host, regardless
    of how many containers are running.
    
    Meanwhile for QEMU (aka TCG) guests, the guest.get_vcpu_info
    method is throwing an exception, since QEMU does not use
    a dedicated thread per vCPU currently. The effect is that
    on QEMU hosts, we're reporting 0 vCPUs used on the host
    regardless of how many guests are running
    
    This causes the 'get_available_resources' method to report
    incorrect 'vcpus_used' values for the compute node. By a
    stroke of luck, the resource tracker merely logs this
    value and then throws it away, instead counting vcpu
    usage based on vcpus declared against the flavour.  Now
    ignoring the hypervisor reported data is arguably a bug
    in the resource tracker, because it means it is overcounting
    resource consumption for plain QEMU guests (they can only
    ever consume 1 pCPU of time, regardless of vCPU count).
    Fixing the resource tracker is out of scope for now, but
    we should at least ensure we're reporting accurate data
    to it, even if it is only used for logging at this time.
    
    If a host does not report detailed vCPU usage from libvirt
    then we should default to reporting 1 vCPU per guest, so
    that the 'vcpus_used' field reports some reasonably
    meaningful data on host CPU usage.
    
    Closes-bug: #1638889
    Change-Id: I627d30d61f8ead6211f78a1c79ffd79b81333f86

Comment 7 Jeremy 2017-02-10 17:10:02 UTC

Thanks Daniel. Do we know what version of nova has this fix so we can tell the customer to upgrade to that version?

Comment 9 Jeremy 2017-02-17 14:08:23 UTC

Thanks any idea what the other message they are seeing may mean? or if it's a problem.


Feb  7 19:21:37 compute-4 kernel: kvm_get_msr_common: 6 callbacks suppressed

Comment 10 Sahid Ferdjaoui 2017-02-20 09:46:24 UTC

(In reply to Jeremy from comment #9)
> Thanks any idea what the other message they are seeing may mean? or if it's
> a problem.
> 
> 
> Feb  7 19:21:37 compute-4 kernel: kvm_get_msr_common: 6 callbacks suppressed

I'm not expert at all in this area but it seems that in some circumstances KVM needs to have access to the CPU MSR [0][1]. According to the manpage, a kernel module might be needed. I checked on sosreport and it is not loaded on the compute node.
  
  $ cat lsmod  | grep msr | wc -l
  0
  
There is a patch to hide these messages [2]:
 
  $ echo 1 > /sys/modules/kvm/parameters/ignore_msrs

Back to the description now, customer reported to use virt_type=kvm and Daniel on its comment #5 seems to have some doubts. Before to move forward we need to ensure that, the running domain which raises those error messages is well configured as TCG? (please share to us the domain XML) because if not the issue could be related to a lower component. 

[0] https://www.kernel.org/doc/Documentation/virtual/kvm/msr.txt
[1] http://man7.org/linux/man-pages/man4/msr.4.html
[2] http://lkml.iu.edu/hypermail/linux/kernel/0908.2/01472.html

Comment 16 Vasili Namatov 2018-02-25 16:02:59 UTC

Tip: As verified on OSPD8, the compute reboot solves the issue.

Comment 19 errata-xmlrpc 2018-03-15 12:43:27 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0538

Note You need to log in before you can comment on or make changes to this bug.