1161540 – kvm_init_vcpu failed for cpu hot-plugging in NUMA

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1161540 - kvm_init_vcpu failed for cpu hot-plugging in NUMA

Summary: kvm_init_vcpu failed for cpu hot-plugging in NUMA

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	7.1
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Martin Kletzander
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-11-07 10:23 UTC by Jincheng Miao
Modified:	2020-09-10 09:20 UTC (History)
CC List:	9 users (show)
Fixed In Version:	libvirt-1.2.8-15.el7
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-03-05 07:47:12 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2015:0323	0	normal	SHIPPED_LIVE	Low: libvirt security, bug fix, and enhancement update	2015-03-05 12:10:54 UTC

Description Jincheng Miao 2014-11-07 10:23:27 UTC

Description of problem:
hot-plug vcpu in NUMA will casue guest exits.
From the guest log, it saids:
"kvm_init_vcpu failed: Cannot allocate memory"

version:
libvirt-1.2.8-6.el7.x86_64
qemu-kvm-1.5.3-77.el7.x86_64
kernel-3.10.0-195.el7.x86_64

How reproducible:
100%

Step to reproduce:

0. prepare a NUMA, which DMA32 zone is in Node 0.
# numactl --hard
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
node 0 size: 65514 MB
node 0 free: 63531 MB
node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
node 1 size: 65536 MB
node 1 free: 62750 MB
node distances:
node   0   1 
  0:  10  11 
  1:  11  10 

# grep "zone    DMA" /proc/zoneinfo
Node 0, zone    DMA32

1. start a guest with 2 current used vcpu
# virsh dumpxml r71
...
  <vcpu placement='auto' current='2'>4</vcpu>
  <numatune>
    <memory mode='strict' placement='auto'/>
  </numatune>
...

# virsh start r71

numad suggests bind memory to Node 1.
# cat /sys/fs/cgroup/cpuset/machine.slice/machine-qemu\\x2dr71.scope/cpuset.mems 
1


2. hot-plug vcpu
# virsh setvcpus r71 3
error: Unable to read from monitor: Connection reset by peer


Expect result:
hot-plug works.


Work around:
before you hot-plug vcpu, change domain's emulator pinning:
# echo 0-1 > /sys/fs/cgroup/cpuset/machine.slice/machine-qemu\\x2dr71.scope/cpuset.mems

# echo 0-1 > /sys/fs/cgroup/cpuset/machine.slice/machine-qemu\\x2dr71.scope/emulator/cpuset.mems

# virsh setvcpus r71 3

# virsh vcpucount r71
maximum      config         4
maximum      live           4
current      config         2
current      live           3

Comment 1 Martin Kletzander 2014-12-15 08:05:01 UTC

Upstream patch proposed:

https://www.redhat.com/archives/libvir-list/2014-December/msg00718.html

Comment 9 Martin Kletzander 2014-12-23 12:00:32 UTC

There are still some problems with this and they might be bigger than what we think.  Latest ideas are discussed upstream:

https://www.redhat.com/archives/libvir-list/2014-December/msg00998.html

Comment 15 Jincheng Miao 2015-01-26 08:51:14 UTC

This bug is fixed in libvirt-1.2.8-15.el7:

1. prepare NUMA host and older libvirt
# numactl --hard
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
node 0 size: 65514 MB
node 0 free: 62974 MB
node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
node 1 size: 65536 MB
node 1 free: 62821 MB
node distances:
node   0   1 
  0:  10  11 
  1:  11  10 

# grep DMA32 /proc/zoneinfo 
Node 0, zone    DMA32

# rpm -q libvirt
libvirt-1.2.8-13.el7

2. config guest xml, mbind to host node 1 (the node without DMA32 zone)
# virsh edit rhel7
...
  <vcpu placement='static' current='2'>5</vcpu>
  <numatune>
    <memory mode='strict' nodeset='1'/>
  </numatune>
...

3. start guest
# virsh start rhel7

4. Hotplug vcpu
# virsh setvcpus rhel7 3
error: Unable to read from monitor: Connection reset by peer

5. running a guest and upgrade to libvirt-1.2.8-15.el7
# virsh start rhel7

# yum install libvirt

# rpm -q libvirt
libvirt-1.2.8-15.el7.x86_64

6. hotplug vcpu again
# virsh setvcpus rhel7 3

# virsh vcpucount rhel7
maximum      config         5
maximum      live           5
current      config         2
current      live           3

# virsh destroy rhel7
Domain rhel7 destroyed

7. restart guest and hotplug vcpu on libvirt-1.2.8-15.el7
# virsh start rhel7
Domain rhel7 started

# virsh setvcpus rhel7 3

# virsh vcpucount rhel7
maximum      config         5
maximum      live           5
current      config         2
current      live           3

# virsh destroy rhel7
Domain rhel7 destroyed

According to above 7 steps, this bug is fixed, and I will change the status to VERIFIED.

Comment 17 errata-xmlrpc 2015-03-05 07:47:12 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0323.html

Note You need to log in before you can comment on or make changes to this bug.