Description of problem: seeing the following messages in nova-compute.log How can we prevent the messages? Why are we seeing these messages? 2017-02-09 04:38:16.125 51668 WARNING nova.virt.libvirt.driver [req-c0de8ab8-16ff-4caf-88b7-59e729045ca4 - - - - -] couldn't obtain the vpu count from domain id: a137b075-1c2d-41e6-a3ea-a5aeea6a182c, exception: Requested operation is not valid: cpu affinity is not supported Version-Release number of selected component (if applicable): grep qemu sosreport-20170209-125702/compute-4.nuvem-intera.local/installed-rpms ipxe-roms-qemu-20160127-5.git6366fa7a.el7.noarch Tue Jan 10 04:58:47 2017 libvirt-daemon-driver-qemu-2.0.0-10.el7_3.2.x86_64 Tue Jan 10 05:00:20 2017 qemu-img-rhev-2.6.0-27.el7.x86_64 Tue Jan 10 04:59:34 2017 qemu-kvm-common-rhev-2.6.0-27.el7.x86_64 Tue Jan 10 04:59:28 2017 qemu-kvm-rhev-2.6.0-27.el7.x86_64 Tue Jan 10 05:00:21 2017 Additional info: virt_type is kvm cpu has vmx flag and lsmod shows kvm_intel loaded.
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.
Hello, Also seeing these messages on the system, not sure if it's related or what it means. Feb 7 19:21:37 compute-4 kernel: kvm_get_msr_common: 6 callbacks suppressed
(In reply to Jeremy from comment #0) > Description of problem: > seeing the following messages in nova-compute.log > How can we prevent the messages? Why are we seeing these messages? > > 2017-02-09 04:38:16.125 51668 WARNING nova.virt.libvirt.driver > [req-c0de8ab8-16ff-4caf-88b7-59e729045ca4 - - - - -] couldn't obtain the vpu > count from domain id: a137b075-1c2d-41e6-a3ea-a5aeea6a182c, exception: > Requested operation is not valid: cpu affinity is not supported [snip] Libvirt only reports this error message if the guest does not provide a thread-per-VCPU. This means the guest must be running TCG, not KVM.... > Additional info: > virt_type is kvm So I'm sceptical about this claim. Perhaps there was a guest launched with QEMU before nova virt_type was then changed to kvm. In any case, the real bug is nova because it was asking for vCPU affinity data for TCG guests which will always fail
The nova bug was fixed upstream in commit 2fdab3b922b0d99f415902462de967a910a6594b Author: Daniel P. Berrange <berrange> Date: Wed Nov 2 14:46:06 2016 +0000 libvirt: fix vCPU usage reporing for LXC/QEMU guests Currently if Nova is using the libvirt LXC driver, it is hardcoded to report 1 vCPU used on the host, regardless of how many containers are running. Meanwhile for QEMU (aka TCG) guests, the guest.get_vcpu_info method is throwing an exception, since QEMU does not use a dedicated thread per vCPU currently. The effect is that on QEMU hosts, we're reporting 0 vCPUs used on the host regardless of how many guests are running This causes the 'get_available_resources' method to report incorrect 'vcpus_used' values for the compute node. By a stroke of luck, the resource tracker merely logs this value and then throws it away, instead counting vcpu usage based on vcpus declared against the flavour. Now ignoring the hypervisor reported data is arguably a bug in the resource tracker, because it means it is overcounting resource consumption for plain QEMU guests (they can only ever consume 1 pCPU of time, regardless of vCPU count). Fixing the resource tracker is out of scope for now, but we should at least ensure we're reporting accurate data to it, even if it is only used for logging at this time. If a host does not report detailed vCPU usage from libvirt then we should default to reporting 1 vCPU per guest, so that the 'vcpus_used' field reports some reasonably meaningful data on host CPU usage. Closes-bug: #1638889 Change-Id: I627d30d61f8ead6211f78a1c79ffd79b81333f86
Thanks Daniel. Do we know what version of nova has this fix so we can tell the customer to upgrade to that version?
Thanks any idea what the other message they are seeing may mean? or if it's a problem. Feb 7 19:21:37 compute-4 kernel: kvm_get_msr_common: 6 callbacks suppressed
(In reply to Jeremy from comment #9) > Thanks any idea what the other message they are seeing may mean? or if it's > a problem. > > > Feb 7 19:21:37 compute-4 kernel: kvm_get_msr_common: 6 callbacks suppressed I'm not expert at all in this area but it seems that in some circumstances KVM needs to have access to the CPU MSR [0][1]. According to the manpage, a kernel module might be needed. I checked on sosreport and it is not loaded on the compute node. $ cat lsmod | grep msr | wc -l 0 There is a patch to hide these messages [2]: $ echo 1 > /sys/modules/kvm/parameters/ignore_msrs Back to the description now, customer reported to use virt_type=kvm and Daniel on its comment #5 seems to have some doubts. Before to move forward we need to ensure that, the running domain which raises those error messages is well configured as TCG? (please share to us the domain XML) because if not the issue could be related to a lower component. [0] https://www.kernel.org/doc/Documentation/virtual/kvm/msr.txt [1] http://man7.org/linux/man-pages/man4/msr.4.html [2] http://lkml.iu.edu/hypermail/linux/kernel/0908.2/01472.html
Tip: As verified on OSPD8, the compute reboot solves the issue.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0538