Bug 1420903
Summary: | Seeing cpu affinity is not supported messages on compute node | |||
---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Jeremy <jmelvin> | |
Component: | openstack-nova | Assignee: | Sahid Ferdjaoui <sferdjao> | |
Status: | CLOSED ERRATA | QA Contact: | Prasanth Anbalagan <panbalag> | |
Severity: | medium | Docs Contact: | ||
Priority: | medium | |||
Version: | 9.0 (Mitaka) | CC: | berrange, ccollett, dasmith, dhill, eglynn, jmelvin, kchamart, knoel, lyarwood, mburns, mrezanin, rbalakri, saime, sbauza, sferdjao, sgordon, srevivo, vasili.namatov, vromanso | |
Target Milestone: | zstream | Keywords: | Triaged, ZStream | |
Target Release: | 9.0 (Mitaka) | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | openstack-nova-13.1.4-13.el7ost | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1515165 1563067 1591385 (view as bug list) | Environment: | ||
Last Closed: | 2018-03-15 12:43:27 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1515165, 1563067, 1591385 |
Description
Jeremy
2017-02-09 20:09:45 UTC
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release. Hello, Also seeing these messages on the system, not sure if it's related or what it means. Feb 7 19:21:37 compute-4 kernel: kvm_get_msr_common: 6 callbacks suppressed (In reply to Jeremy from comment #0) > Description of problem: > seeing the following messages in nova-compute.log > How can we prevent the messages? Why are we seeing these messages? > > 2017-02-09 04:38:16.125 51668 WARNING nova.virt.libvirt.driver > [req-c0de8ab8-16ff-4caf-88b7-59e729045ca4 - - - - -] couldn't obtain the vpu > count from domain id: a137b075-1c2d-41e6-a3ea-a5aeea6a182c, exception: > Requested operation is not valid: cpu affinity is not supported [snip] Libvirt only reports this error message if the guest does not provide a thread-per-VCPU. This means the guest must be running TCG, not KVM.... > Additional info: > virt_type is kvm So I'm sceptical about this claim. Perhaps there was a guest launched with QEMU before nova virt_type was then changed to kvm. In any case, the real bug is nova because it was asking for vCPU affinity data for TCG guests which will always fail The nova bug was fixed upstream in commit 2fdab3b922b0d99f415902462de967a910a6594b Author: Daniel P. Berrange <berrange> Date: Wed Nov 2 14:46:06 2016 +0000 libvirt: fix vCPU usage reporing for LXC/QEMU guests Currently if Nova is using the libvirt LXC driver, it is hardcoded to report 1 vCPU used on the host, regardless of how many containers are running. Meanwhile for QEMU (aka TCG) guests, the guest.get_vcpu_info method is throwing an exception, since QEMU does not use a dedicated thread per vCPU currently. The effect is that on QEMU hosts, we're reporting 0 vCPUs used on the host regardless of how many guests are running This causes the 'get_available_resources' method to report incorrect 'vcpus_used' values for the compute node. By a stroke of luck, the resource tracker merely logs this value and then throws it away, instead counting vcpu usage based on vcpus declared against the flavour. Now ignoring the hypervisor reported data is arguably a bug in the resource tracker, because it means it is overcounting resource consumption for plain QEMU guests (they can only ever consume 1 pCPU of time, regardless of vCPU count). Fixing the resource tracker is out of scope for now, but we should at least ensure we're reporting accurate data to it, even if it is only used for logging at this time. If a host does not report detailed vCPU usage from libvirt then we should default to reporting 1 vCPU per guest, so that the 'vcpus_used' field reports some reasonably meaningful data on host CPU usage. Closes-bug: #1638889 Change-Id: I627d30d61f8ead6211f78a1c79ffd79b81333f86 Thanks Daniel. Do we know what version of nova has this fix so we can tell the customer to upgrade to that version? Thanks any idea what the other message they are seeing may mean? or if it's a problem. Feb 7 19:21:37 compute-4 kernel: kvm_get_msr_common: 6 callbacks suppressed (In reply to Jeremy from comment #9) > Thanks any idea what the other message they are seeing may mean? or if it's > a problem. > > > Feb 7 19:21:37 compute-4 kernel: kvm_get_msr_common: 6 callbacks suppressed I'm not expert at all in this area but it seems that in some circumstances KVM needs to have access to the CPU MSR [0][1]. According to the manpage, a kernel module might be needed. I checked on sosreport and it is not loaded on the compute node. $ cat lsmod | grep msr | wc -l 0 There is a patch to hide these messages [2]: $ echo 1 > /sys/modules/kvm/parameters/ignore_msrs Back to the description now, customer reported to use virt_type=kvm and Daniel on its comment #5 seems to have some doubts. Before to move forward we need to ensure that, the running domain which raises those error messages is well configured as TCG? (please share to us the domain XML) because if not the issue could be related to a lower component. [0] https://www.kernel.org/doc/Documentation/virtual/kvm/msr.txt [1] http://man7.org/linux/man-pages/man4/msr.4.html [2] http://lkml.iu.edu/hypermail/linux/kernel/0908.2/01472.html Tip: As verified on OSPD8, the compute reboot solves the issue. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0538 |