Bug 1420903
| Summary: | Seeing cpu affinity is not supported messages on compute node | |||
|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Jeremy <jmelvin> | |
| Component: | openstack-nova | Assignee: | Sahid Ferdjaoui <sferdjao> | |
| Status: | CLOSED ERRATA | QA Contact: | Prasanth Anbalagan <panbalag> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | medium | |||
| Version: | 9.0 (Mitaka) | CC: | berrange, ccollett, dasmith, dhill, eglynn, jmelvin, kchamart, knoel, lyarwood, mburns, mrezanin, rbalakri, saime, sbauza, sferdjao, sgordon, srevivo, vasili.namatov, vromanso | |
| Target Milestone: | zstream | Keywords: | Triaged, ZStream | |
| Target Release: | 9.0 (Mitaka) | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | openstack-nova-13.1.4-13.el7ost | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1515165 1563067 1591385 (view as bug list) | Environment: | ||
| Last Closed: | 2018-03-15 12:43:27 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1515165, 1563067, 1591385 | |||
|
Description
Jeremy
2017-02-09 20:09:45 UTC
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release. Hello, Also seeing these messages on the system, not sure if it's related or what it means. Feb 7 19:21:37 compute-4 kernel: kvm_get_msr_common: 6 callbacks suppressed (In reply to Jeremy from comment #0) > Description of problem: > seeing the following messages in nova-compute.log > How can we prevent the messages? Why are we seeing these messages? > > 2017-02-09 04:38:16.125 51668 WARNING nova.virt.libvirt.driver > [req-c0de8ab8-16ff-4caf-88b7-59e729045ca4 - - - - -] couldn't obtain the vpu > count from domain id: a137b075-1c2d-41e6-a3ea-a5aeea6a182c, exception: > Requested operation is not valid: cpu affinity is not supported [snip] Libvirt only reports this error message if the guest does not provide a thread-per-VCPU. This means the guest must be running TCG, not KVM.... > Additional info: > virt_type is kvm So I'm sceptical about this claim. Perhaps there was a guest launched with QEMU before nova virt_type was then changed to kvm. In any case, the real bug is nova because it was asking for vCPU affinity data for TCG guests which will always fail The nova bug was fixed upstream in
commit 2fdab3b922b0d99f415902462de967a910a6594b
Author: Daniel P. Berrange <berrange>
Date: Wed Nov 2 14:46:06 2016 +0000
libvirt: fix vCPU usage reporing for LXC/QEMU guests
Currently if Nova is using the libvirt LXC driver, it is
hardcoded to report 1 vCPU used on the host, regardless
of how many containers are running.
Meanwhile for QEMU (aka TCG) guests, the guest.get_vcpu_info
method is throwing an exception, since QEMU does not use
a dedicated thread per vCPU currently. The effect is that
on QEMU hosts, we're reporting 0 vCPUs used on the host
regardless of how many guests are running
This causes the 'get_available_resources' method to report
incorrect 'vcpus_used' values for the compute node. By a
stroke of luck, the resource tracker merely logs this
value and then throws it away, instead counting vcpu
usage based on vcpus declared against the flavour. Now
ignoring the hypervisor reported data is arguably a bug
in the resource tracker, because it means it is overcounting
resource consumption for plain QEMU guests (they can only
ever consume 1 pCPU of time, regardless of vCPU count).
Fixing the resource tracker is out of scope for now, but
we should at least ensure we're reporting accurate data
to it, even if it is only used for logging at this time.
If a host does not report detailed vCPU usage from libvirt
then we should default to reporting 1 vCPU per guest, so
that the 'vcpus_used' field reports some reasonably
meaningful data on host CPU usage.
Closes-bug: #1638889
Change-Id: I627d30d61f8ead6211f78a1c79ffd79b81333f86
Thanks Daniel. Do we know what version of nova has this fix so we can tell the customer to upgrade to that version? Thanks any idea what the other message they are seeing may mean? or if it's a problem. Feb 7 19:21:37 compute-4 kernel: kvm_get_msr_common: 6 callbacks suppressed (In reply to Jeremy from comment #9) > Thanks any idea what the other message they are seeing may mean? or if it's > a problem. > > > Feb 7 19:21:37 compute-4 kernel: kvm_get_msr_common: 6 callbacks suppressed I'm not expert at all in this area but it seems that in some circumstances KVM needs to have access to the CPU MSR [0][1]. According to the manpage, a kernel module might be needed. I checked on sosreport and it is not loaded on the compute node. $ cat lsmod | grep msr | wc -l 0 There is a patch to hide these messages [2]: $ echo 1 > /sys/modules/kvm/parameters/ignore_msrs Back to the description now, customer reported to use virt_type=kvm and Daniel on its comment #5 seems to have some doubts. Before to move forward we need to ensure that, the running domain which raises those error messages is well configured as TCG? (please share to us the domain XML) because if not the issue could be related to a lower component. [0] https://www.kernel.org/doc/Documentation/virtual/kvm/msr.txt [1] http://man7.org/linux/man-pages/man4/msr.4.html [2] http://lkml.iu.edu/hypermail/linux/kernel/0908.2/01472.html Tip: As verified on OSPD8, the compute reboot solves the issue. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0538 |