Description of problem: Virsh vcpuin return wrong info on large machine Version-Release number of selected component (if applicable): libvirt-3.2.0-14.el7_4.3.x86_64 qemu-kvm-rhev-2.9.0-16.el7_4.5.x86_64 kernel: 3.10.0-693.2.1.el7.x86_64 How reproducible: 100% on large machine Steps to Reproduce: 1. Start a guest with 384 cpus: (without numa node, hugepage,cpupin part in xml) <vcpu placement='static'>384</vcpu> # virsh list --all Id Name State ---------------------------------------------------- 19 r7-4t running 2. Check the vcpupin of the guest # virsh vcpupin r7-4t VCPU: CPU Affinity ---------------------------------- 0: 0-359 1: 0-359 2: 0-359 ...... 280: 0-359 ...... 300: 0-359 ...... 3. Do vcpupin operations: # virsh vcpupin r7-4t 280 300-301 # virsh vcpupin r7-4t 300 330-331 4. Check the virsh vcpupin info, the info for vcpu280 is wrong. # virsh vcpupin r7-4t VCPU: CPU Affinity ---------------------------------- 0: 0-359 1: 0-359 2: 0-359 ...... 280: 300-301,320,322,325 ...... 300: 330-331 ...... 5. Check the guest xml is correct. # virsh dumpxml r7-4t <domain type='kvm' id='19'> <name>r7-4t</name> ...... <vcpu placement='static'>384</vcpu> <cputune> <vcpupin vcpu='280' cpuset='300-301'/> <vcpupin vcpu='300' cpuset='330-331'/> </cputune> ...... 6. Check the cgroup and taskset are correct. # cgget -g cpuset /machine.slice/machine-qemu\\x2d19\\x2dr7\\x2d4t.scope/vcpu280| grep cpuset.cpus cpuset.cpus: 300-301 # cat 185024/cpuset /machine.slice/machine-qemu\x2d19\x2dr7\x2d4t.scope/vcpu280 # taskset -c -p 185024 pid 185024's current affinity list: 300,301 Actual result: In step4, vcpupin return wrong info Expected result: In step4, vcpupin return the correct info Others: 1. numad service is not started.
Fixed upstream by: commit 51f9f80d350e633adf479c6a9b3c55f82ca9cbd4 Author: Allen, John <John.Allen> CommitDate: 2019-04-25 10:18:48 +0200 Handle copying bitmaps to larger data buffers If a bitmap of a shorter length than the data buffer is passed to virBitmapToDataBuf, it will read off the end of the bitmap and copy junk into the returned buffer. Add a check to only copy the length of the bitmap to the buffer. The problem can be observed after setting a vcpu affinity using the vcpupin command on a system with a large number of cores: # virsh vcpupin example_domain 0 0 # virsh vcpupin example_domain 0 VCPU CPU Affinity --------------------------- 0 0,192,197-198,202 Signed-off-by: John Allen <john.allen> git describe: v5.2.0-360-g51f9f80d35 contains: v5.3.0-rc1~7
According to the patch, this bug seems to be same with Bug 1703159 - virsh vcpupin reports bogus affinities (RHEL-7.7) and Bug 1703160 - virsh vcpupin reports bogus affinities (RHEL-8.1.0) What is strange is that I can bot hit these issue on 5 versions before libvirt-6.0.0-1.el8. For example: libvirt-5.6.0-7.module+el8.2.0+4670+07fe2774.x86_64 Version: libvirt-5.6.0-7.module+el8.2.0+4670+07fe2774.x86_64 kernel-4.18.0-187.el8.x86_64 qemu-kvm-4.2.0-13.module+el8.2.0+5898+fb4bceae.x86_64 Steps: # virsh domstate test82 # virsh dumpxml test82 |grep vcpu <vcpu placement='static'>2</vcpu> # virsh start test82 Domain test82 started # virsh vcpupin test82 VCPU CPU Affinity ---------------------- 0 0-447 1 0-447 # virsh vcpupin test82 0 0 # virsh vcpupin test82 VCPU CPU Affinity ---------------------- 0 0 1 0-447 And in bug description: libvirt-3.2.0-14.el7_4.3.x86_64, which is for RHEL-7. So could you pls check this bug to see whether this problem still exists on RHEL-8.2.0AV?
The patch was included in the upstream libvirt release v5.3.0, so even RHEL-AV-8.1.0 should be fixed already.
SO according to the previous comment, I will mark this bug as verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2017