Bug 1710687 - postcopy migration failed when guest with numa+hugepage+nodeset setting
Summary: postcopy migration failed when guest with numa+hugepage+nodeset setting
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: libvirt
Version: 8.0
Hardware: Unspecified
OS: Unspecified
low
unspecified
Target Milestone: rc
: 8.2
Assignee: Virtualization Maintenance
QA Contact: Jing Qi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-05-16 04:50 UTC by yafu
Modified: 2021-08-15 07:26 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-08-15 07:26:50 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
guest xml (3.30 KB, application/xml)
2019-05-16 04:51 UTC, yafu
no flags Details
qemu log and libvirtd log on source host (156.00 KB, application/gzip)
2019-05-16 04:58 UTC, yafu
no flags Details
guest xml - update (4.93 KB, text/plain)
2019-05-16 05:04 UTC, yafu
no flags Details

Description yafu 2019-05-16 04:50:23 UTC
Description of problem:
postcopy migration failed when guest with numa+hugepage+nodeset setting

Version-Release number of selected component (if applicable):
libvirt-4.5.0-17.el7.x86_64
qemu-kvm-rhev-2.12.0-27.el7.x86_64


How reproducible:
100%

Steps to Reproduce:
1.Config hugepage in both soure and target os;

2.Start a guest with numa+hugepage+nodeset setting:
#virsh dumpxml vm1
...
<memoryBacking>
    <hugepages>
      <page size='2048' unit='KiB' nodeset='0-1'/>
    </hugepages>
  </memoryBacking>
...
<cpu mode='custom' match='exact' check='full'>
  ...
    <numa>
      <cell id='0' cpus='0-1' memory='1025024' unit='KiB'/>
      <cell id='1' cpus='2-3' memory='1025024' unit='KiB'/>
    </numa>
  </cpu>
...

2.Check the qemu cmd line:
#ps aux | grep -i numa
...-object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu/1-vm1,size=1049624576 -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 -object memory-backend-file,id=ram-node1,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu/1-vm1,size=1049624576 -numa node,nodeid=1,cpus=2-3,memdev=ram-node1 ...

3.Do postcopy migraiton:
# virsh migrate vm1 qemu+ssh://10.66.4.143/system --live --verbose   --postcopy
error: internal error: unable to execute QEMU command 'migrate-set-capabilities': Postcopy is not supported

4.Check the libvirtd log:
# cat /var/log/libvirt/libvirtd.log | grep -i 'migrate-set'
2019-05-16 03:22:48.260+0000: 14503: debug : virJSONValueToString:2005 : result={"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"events","state":true}]},"id":"libvirt-3"}
2019-05-16 03:22:48.260+0000: 14503: debug : qemuMonitorJSONCommandWithFd:305 : Send command '{"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"events","state":true}]},"id":"libvirt-3"}' for write with FD -1
2019-05-16 03:22:48.260+0000: 14503: info : qemuMonitorSend:1083 : QEMU_MONITOR_SEND_MSG: mon=0x7fc70c014f80 msg={"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"events","state":true}]},"id":"libvirt-3"}
2019-05-16 03:22:48.260+0000: 14500: info : qemuMonitorIOWrite:551 : QEMU_MONITOR_IO_WRITE: mon=0x7fc70c014f80 buf={"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"events","state":true}]},"id":"libvirt-3"}


Actual results:
postcopy migration failed when guest with numa+hugepage+nodeset setting

Expected results:
postcopy migration should be successful when guest with numa+hugepage+nodeset setting

Additional info:
1.It works well when guest with numa+hugepage setting.
2.Since can not find the error info from qemu side in the log, i filed the bug toward libvirt. Please help to correct it if i make a mistake.

Comment 2 yafu 2019-05-16 04:51:40 UTC
Created attachment 1569326 [details]
guest xml

Comment 3 yafu 2019-05-16 04:58:28 UTC
Created attachment 1569328 [details]
qemu log and libvirtd log on source host

Comment 4 yafu 2019-05-16 05:04:01 UTC
Created attachment 1569330 [details]
guest xml - update

Please see guest xml here.

Comment 5 Jiri Denemark 2020-03-10 14:27:37 UTC
This looks like a limitation on QEMU side. Could you please recheck with
current libvirt and QEMU?

Comment 6 yafu 2020-03-20 04:28:30 UTC
(In reply to Jiri Denemark from comment #5)
> This looks like a limitation on QEMU side. Could you please recheck with
> current libvirt and QEMU?

It works well with:
libvirt-daemon-6.0.0-13.el8.x86_64
qemu-kvm-4.2.0-15.module+el8.2.0+6029+618ef2ec.x86_64

Comment 7 Jiri Denemark 2020-03-20 09:56:12 UTC
OK, thanks.

Comment 8 jiyan 2020-04-08 09:47:51 UTC
Tested this bug with latest RHEL-820AV with the following detailed components and steps.

(SRC host and DST host) Version:
libvirt-6.0.0-16.module+el8.2.0+6131+4e715f3b.x86_64
qemu-kvm-4.2.0-17.module+el8.2.0+6131+4e715f3b.x86_64
kernel-4.18.0-193.el8.x86_64

Steps:
1: Prepare a VM on the SRC host, start VM and check qemu cmd line
# virsh domstate vm1 
shut off

# virsh dumpxml vm1  --inactive 
  <memoryBacking>
    <hugepages>
      <page size='2048' unit='KiB' nodeset='0-1'/>
    </hugepages>
  </memoryBacking>
  <vcpu placement='static'>4</vcpu>
  <cpu mode='host-model' check='partial'>
    <feature policy='disable' name='vmx'/>
    <numa>
      <cell id='0' cpus='0-1' memory='1025024' unit='KiB'/>
      <cell id='1' cpus='2-3' memory='1025024' unit='KiB'/>
    </numa>
  </cpu>

# virsh start vm1 
Domain vm1 started

# ps -ef | grep vm1
-object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu/2-vm1,size=1049624576 
-numa node,nodeid=0,cpus=0-1,memdev=ram-node0 
-object memory-backend-file,id=ram-node1,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu/2-vm1,size=1049624576 
-numa node,nodeid=1,cpus=2-3,memdev=ram-node1 

2. Migrate VM with post-copy parameter from SRC host to DST host
# virsh migrate vm1 qemu+ssh://****/system --live --verbose --postcopy
error: internal error: unable to execute QEMU command 'migrate-set-capabilities': Postcopy is not supported

3. Log from SRC host
# vim /var/log/libvirt/libvirtd.log
2020-04-08 09:40:25.952+0000: 6836: info : qemuMonitorSend:996 : QEMU_MONITOR_SEND_MSG: mon=0x7efc400033d0 msg={"execute":"query-migrate-capabilities","id":"libvirt-2"}^M
 fd=-1
2020-04-08 09:40:25.953+0000: 6759: info : qemuMonitorIOWrite:453 : QEMU_MONITOR_IO_WRITE: mon=0x7efc400033d0 buf={"execute":"query-migrate-capabilities","id":"libvirt-2"}^M
 len=59 ret=59 errno=0
2020-04-08 09:40:25.956+0000: 6836: info : qemuMonitorSend:996 : QEMU_MONITOR_SEND_MSG: mon=0x7efc400033d0 msg={"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"events","state":true}]},"id":"libvirt-3"}^M
 fd=-1
2020-04-08 09:40:25.956+0000: 6759: info : qemuMonitorIOWrite:453 : QEMU_MONITOR_IO_WRITE: mon=0x7efc400033d0 buf={"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"events","state":true}]},"id":"libvirt-3"}^M
 len=125 ret=125 errno=0

Comment 9 jiyan 2020-04-08 09:53:25 UTC
In comment 6, the test succeeded because of there is no "nodeset" in memoryBacking/hugepages/page element after confirming with yafu@.
And also test the scenario with the following components and steps.

(SRC host and DST host) Version:
libvirt-6.0.0-16.module+el8.2.0+6131+4e715f3b.x86_64
qemu-kvm-4.2.0-17.module+el8.2.0+6131+4e715f3b.x86_64
kernel-4.18.0-193.el8.x86_64

Steps:
1: Prepare a VM on the SRC host, start VM and check qemu cmd line
# virsh domstate vm1 
shut off

# virsh dumpxml vm1  --inactive 
  <memoryBacking>
    <hugepages>
      <page size='2048' unit='KiB'/>
    </hugepages>
  </memoryBacking>
  <vcpu placement='static'>4</vcpu>
  <cpu mode='host-model' check='partial'>
    <feature policy='disable' name='vmx'/>
    <numa>
      <cell id='0' cpus='0-1' memory='1025024' unit='KiB'/>
      <cell id='1' cpus='2-3' memory='1025024' unit='KiB'/>
    </numa>
  </cpu>

# virsh start vm1 
Domain vm1 started

# ps -ef | grep vm1
-mem-prealloc -mem-path /dev/hugepages/libvirt/qemu/3-vm1 
-numa node,nodeid=0,cpus=0-1,mem=1001 -numa node,nodeid=1,cpus=2-3,mem=1001 

2. Migrate VM with post-copy parameter from SRC host to DST host
# virsh migrate vm1 qemu+ssh://****/system --live --verbose --postcopy
Migration: [100 %]

Comment 10 jiyan 2020-04-08 09:54:42 UTC
Hi Jiri
Could you pls check comment 8 and comment 9 again?
Thank you in advance. :)

Comment 13 RHEL Program Management 2021-08-15 07:26:50 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.


Note You need to log in before you can comment on or make changes to this bug.