Bug 1611062

Summary: "virsh vcpucount --guest" fails after hotunplug a vcpu with intermediate order by "setvcpu"
Product: Red Hat Enterprise Linux 7 Reporter: Fangge Jin <fjin>
Component: qemu-guest-agentAssignee: Marc-Andre Lureau <marcandre.lureau>
Status: CLOSED ERRATA QA Contact: FuXiangChun <xfu>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.6CC: chayang, hhuang, jiyan, juzhang, pkrempa
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-guest-agent-2.12.0-3.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-06 12:51:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1651787    

Description Fangge Jin 2018-08-02 03:18:27 UTC
Description of problem:
"virsh vcpucount --guest" fails after hotunplug a vcpu with intermediate order by "setvcpu"

Version-Release number of selected component:
Host: libvirt-4.5.0-5.virtcov.el7.x86_64
Guest: qemu-guest-agent-2.12.0-1.el7.x86_64
Guest: kernel-3.10.0-924.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
0. Prepare a guest with guest agent configured

1. Start guest with more than two vcpus:
# virsh dumpxml rhel7.5 |grep hot|head -11
    <vcpu id='0' enabled='yes' hotpluggable='no' order='1'/>
    <vcpu id='1' enabled='yes' hotpluggable='yes' order='3'/>
    <vcpu id='2' enabled='yes' hotpluggable='yes' order='4'/>
    <vcpu id='3' enabled='yes' hotpluggable='yes' order='5'/>
    <vcpu id='4' enabled='yes' hotpluggable='yes' order='6'/>
    <vcpu id='5' enabled='yes' hotpluggable='yes' order='2'/>
    <vcpu id='6' enabled='no' hotpluggable='yes'/>
    <vcpu id='7' enabled='no' hotpluggable='yes'/>
    <vcpu id='8' enabled='no' hotpluggable='yes'/>
    <vcpu id='9' enabled='no' hotpluggable='yes'/>
    <vcpu id='10' enabled='no' hotpluggable='yes'/>

# virsh vcpucount rhel7.5
maximum      config       240
maximum      live         240
current      config         3
current      live           6

2.  Disable vcpu 4 (which has the maximum order - 6), and query vcpu number by "virsh vcpucount rhel7.5 --guest":
# virsh setvcpu rhel7.5 4 --disable

# virsh vcpucount rhel7.5 --guest
5

3. Disable vcpu 5 (which has order=2, not the maximum order)
# virsh setvcpu rhel7.5 5 --disable

# virsh vcpucount rhel7.5 --guest
error: internal error: unable to execute QEMU agent command 'guest-get-vcpus': open("/sys/devices/system/cpu/cpu2/"): No such file or directory

[In guest]# lscpu -e
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE
0   0    0      0    0:0:0:0       yes
1   0    0      0    1:1:0:0       yes
3   0    0      1    2:2:1:0       yes
4   0    0      1    3:3:1:0       yes

Actual results:
As step 3.

Expected results:
"virsh vcpucount --guest" can succeed.

Comment 2 Igor Mammedov 2018-09-06 08:21:36 UTC
I'm in favour of Laszlo's opinion [1] that --guest variant shouldn't be used when real cpu-hotplug is in use so libvirt shouldn't allow simultaneous of fake/real hcpu hotplug.
Given there were no objections from CCed libvirt folks, reassigning BZ to libvirt.

1) https://www.mail-archive.com/qemu-devel@nongnu.org/msg559421.html

Comment 3 Peter Krempa 2018-09-06 09:02:45 UTC
I don't see a reason to disallow it since it is technically possible.

Additionally at certain points of the VM lifecycle we are unable to know whether the user wishes to do either approach.

Also the boot cpu(s) which are non-unpluggable can't be disabled in any other way.

Note that we query the data from qemu first and modify it for the desired vcpu count so if qemu is reporting wrong data in the get API it should be fixed, especially if the data is not accepted back via the set API.

There are two kind of APIs:
1) old one where we magically disable some vcpus to satisfy users request for a target vcpu count

2) new one where user is able to select which vcpus are online

We also have an API to provide the guest topology from the get command to the user so if the data is misleading users may complain.

Looking at the conversation in the linked thread, the data is invalid and we even do have a check for it:

1630         /* This shouldn't happen, but we can't trust the guest agent */         
1631         if (!cpuinfo[i].online && !cpuinfo[i].offlinable) {                     
1632             virReportError(VIR_ERR_INTERNAL_ERROR, "%s",                        
1633                            _("Invalid data provided by guest agent"));          
1634             return -1;                                                          
1635         }                          


I don't see a compelling reason to cripple this feature and libvirt is correctly able to handle sparse output of the get API.

This seems plainly like a bug in qemu-ga. I've also replied to the e-mail trhead. Back to the guest-agent component to fix the output.

Comment 4 Marc-Andre Lureau 2018-11-15 09:06:53 UTC
Igor wrote a patch, we can backport b4bf912a6c19449e68af7b4173a8c6da21904d99 "qga: ignore non present cpus when handling qmp_guest_get_vcpus()"

Comment 6 Miroslav Rezanina 2018-12-12 16:01:43 UTC
Fix included in qemu-guest-agent-2.12.0-3.el7

Comment 7 FuXiangChun 2018-12-19 09:10:09 UTC
verified bug with qemu-guest-agent-2.12.0-3.el7.  It is expected result. so move this bug to verified.

# virsh vcpucount rhel7
maximum      config       240
maximum      live         240
current      config         6
current      live           6

# virsh setvcpu rhel7 4 --disable

# virsh vcpucount rhel7 --guest
4

# virsh setvcpu rhel7 5 --disable

# virsh vcpucount rhel7 --guest
4

# virsh vcpucount rhel7
maximum      config       240
maximum      live         240
current      config         6
current      live           4

Comment 9 errata-xmlrpc 2019-08-06 12:51:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2124