Bug 1170093 - guest NUMA failed to migrate when machine is rhel6.5.0
Summary: guest NUMA failed to migrate when machine is rhel6.5.0
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.1
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Eduardo Habkost
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 1175397
TreeView+ depends on / blocked
 
Reported: 2014-12-03 08:53 UTC by Jincheng Miao
Modified: 2015-03-05 09:59 UTC (History)
20 users (show)

Fixed In Version: qemu-kvm-rhev-2.1.2-17.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1175397 (view as bug list)
Environment:
Last Closed: 2015-03-05 09:59:12 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:0624 normal SHIPPED_LIVE Important: qemu-kvm-rhev security, bug fix, and enhancement update 2015-03-05 14:37:36 UTC

Description Jincheng Miao 2014-12-03 08:53:01 UTC
Description of problem:
migration failed when guest configured NUMA.

version:
libvirt-1.2.8-9.el7.x86_64
qemu-kvm-rhev-2.1.2-13.el7.x86_64
kernel-3.10.0-206.el7.x86_64

How reproducible:
100%

Step to reproduce:
1. add following to domain xml

# virsh dump aaa
...
  <memory unit='KiB'>3145728</memory>
  <currentMemory unit='KiB'>3145728</currentMemory>
  <cpu>
    <numa>
      <cell id='0' cpus='0-1' memory='1048576'/>
      <cell id='1' cpus='2-3' memory='2097152'/>
    </numa>
  </cpu>
  <os>
    <type arch='x86_64' machine='rhel6.5.0'>hvm</type>
    <boot dev='hd'/>
  </os>
...

# ps -ef | grep aaa
qemu     19033     1  2 16:41 ?        00:00:15 /usr/libexec/qemu-kvm -name aaa -S -machine rhel6.5.0,accel=kvm,usb=off -m 3072 -realtime mlock=off -smp 2,maxcpus=4,sockets=4,cores=1,threads=1 -numa node,nodeid=0,cpus=0-1,mem=1024 -numa node,nodeid=1,cpus=2-3,mem=2048 -uuid ab2aa5f3-d216-d426-2dd0-7fc318b252f7 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/aaa.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive file=/mnt/jmiao/r71-latest.img,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/aaa.agent,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -vnc 127.0.0.1:0 -vga cirrus -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on



2. add iptables rule to destination

# iptables -I INPUT -p tcp --dport 49152:49261 -j ACCEPT


3. migrate it

# virsh migrate --live aaa qemu+ssh://$TARGET/system --verbose --unsafe
root@$TARGET's password:
Migration: [100 %]
error: internal error: early end of file from monitor: possible problem:
qemu-kvm: /builddir/build/BUILD/qemu-2.1.2/savevm.c:904: shadow_bios: Assertion `ram != ((void *)0)' failed.


Expect result:
Migration success.

Extra information:
if '-machine' argument is set to rhel7.0.0, the problem will not be reproduced.

Comment 2 Eduardo Habkost 2014-12-03 15:49:08 UTC
It is caused by the hack added to fix bug 1027565, which breaks when using NUMA and memory-backend objects.

Comment 6 vivian zhang 2014-12-17 03:51:36 UTC
hi, Eduardo Habkost

I found a similar issue, please help confirm whether they are caused by the same reason and could be fixed with the same patch
thanks

Description:
migration failed with error when configure guest with OVMF bios + machine type=rhel6.5.0
when machine type is set lower than rhel6.5.0, such as rhel6.4.0, migration failed with the same error.

Product version
libvirt-1.2.8-10.el7.x86_64
qemu-kvm-rhev-2.1.2-15.el7.x86_64
OVMF-20140822-7.git9ece15a.el7.x86_64

How producible
100%

Steps:
1. Prepare a migration env with nfs img between source and target host

2. make sure source and target host has been installed OVMF
# rpm -q OVMF
OVMF-20140822-7.git9ece15a.el7.x86_64

3. install a UEFI guest with virt-manger, make sure the guest with below configuration, set machine type='rhel6.5.0', and OVMF bios in guest xml

# virsh dumpxml rhel7new
...
<os>
    <type arch='x86_64' machine='rhel6.5.0'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/OVMF/OVMF_CODE.fd</loader>
    <nvram template='/usr/share/OVMF/OVMF_VARS.fd'>/var/lib/libvirt/qemu/nvram/rhel7new_VARS.fd</nvram>
    <boot dev='hd'/>
  </os>
...

4. start guest, it works well
# # virsh list --all
 Id    Name                           State
----------------------------------------------------
 27    rhel7new                       running

5. do migration for this guest, met qemu-kvm error
# virsh migrate rhel7new --live qemu+ssh://10.66.6.205/system --verbose
root@10.66.6.205's password:
Migration: [100 %]error: internal error: early end of file from monitor: possible problem:
RHEL-6 compat: ich9-usb-uhci1: irq_pin = 3
RHEL-6 compat: ich9-usb-uhci2: irq_pin = 3
RHEL-6 compat: ich9-usb-uhci3: irq_pin = 3
qemu-kvm: /builddir/build/BUILD/qemu-2.1.2/savevm.c:906: shadow_bios: Assertion `bios != ((void *)0)' failed.

6. when modify machine type to pc-i440fx-rhel7.1.0 or pc-i440fx-rhel7.0.0, migration could success

7. when delete OVMF bios configuration <nvram template='/usr/share/OVMF/OVMF_VARS.fd'></nvram>, migration could also success

Actual result:
migration failed with error

Expected result:
migration should success when configure guest with OVMF bios + machine type=rhel6.5.0

Comment 7 Jeff Nelson 2014-12-17 04:05:13 UTC
Fix included in qemu-kvm-rhev-2.1.2-17.el7

Comment 9 vivian zhang 2014-12-17 07:11:58 UTC
Hello Jeff
I try with the fix build qemu-kvm-rhev-2.1.2-17.el7
bug migration still failed with OVMF bios + machine type=rhel6.5.0
# virsh migrate rhel7new --live qemu+ssh://10.66.6.205/system --verbose
root@10.66.6.205's password: 
Migration: [100 %]error: internal error: early end of file from monitor: possible problem:
RHEL-6 compat: ich9-usb-uhci1: irq_pin = 3
RHEL-6 compat: ich9-usb-uhci2: irq_pin = 3
RHEL-6 compat: ich9-usb-uhci3: irq_pin = 3
2014-12-17T05:41:51.612642Z qemu-kvm: usb-redir warning: usb-redir connection broken during migration

qemu-kvm: /builddir/build/BUILD/qemu-2.1.2/savevm.c:906: shadow_bios: Assertion `bios != ((void *)0)' failed.

so I filed a new bug 1175099 to track this

Comment 10 Laszlo Ersek 2014-12-17 08:30:37 UTC
OVMF is completely unsupported on the rhel6.5.0 machine type.

Comment 11 Jeff Nelson 2014-12-17 15:01:48 UTC
As explained in Comment 10, the issue described in Comment 6 (and again in Comment 9) is not a bug, it's an invalid configuration.

Please re-verify using the configuration as described in the original problem report.

Comment 12 Michal Privoznik 2014-12-18 11:47:28 UTC
I've proposed patch on the libvirt's upstream list:

https://www.redhat.com/archives/libvir-list/2014-December/msg00931.html

Comment 13 Michal Privoznik 2014-12-19 07:05:15 UTC
And just pushed the patch:

commit f309db1f4d51009bad0d32e12efc75530b66836b
Author:     Michal Privoznik <mprivozn@redhat.com>
AuthorDate: Thu Dec 18 12:36:48 2014 +0100
Commit:     Michal Privoznik <mprivozn@redhat.com>
CommitDate: Fri Dec 19 07:44:44 2014 +0100

    qemu: Create memory-backend-{ram,file} iff needed
    
    Libvirt BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1175397
    QEMU BZ:    https://bugzilla.redhat.com/show_bug.cgi?id=1170093
    
    In qemu there are two interesting arguments:
    
    1) -numa to create a guest NUMA node
    2) -object memory-backend-{ram,file} to tell qemu which memory
    region on which host's NUMA node it should allocate the guest
    memory from.
    
    Combining these two together we can instruct qemu to create a
    guest NUMA node that is tied to a host NUMA node. And it works
    just fine. However, depending on machine type used, there might
    be some issued during migration when OVMF is enabled (see QEMU
    BZ). While this truly is a QEMU bug, we can help avoiding it. The
    problem lies within the memory backend objects somewhere. Having
    said that, fix on our side consists on putting those objects on
    the command line if and only if needed. For instance, while
    previously we would construct this (in all ways correct) command
    line:
    
        -object memory-backend-ram,size=256M,id=ram-node0 \
        -numa node,nodeid=0,cpus=0,memdev=ram-node0
    
    now we create just:
    
        -numa node,nodeid=0,cpus=0,mem=256
    
    because the backend object is obviously not tied to any specific
    host NUMA node.
    
    Signed-off-by: Michal Privoznik <mprivozn@redhat.com>

v1.2.11-60-gf309db1

Comment 14 Qian Guo 2014-12-22 09:09:41 UTC
Reproduced this bug with qemu-kvm-rhev-2.1.2-13.el7.x86_64

steps:
1.Boot guest with memory-backend object and numa:
# /usr/libexec/qemu-kvm -cpu Penryn  -machine rhel6.5.0,accel=kvm,usb=off -m 3072 -realtime mlock=off -smp 2,maxcpus=4,sockets=4,cores=1,threads=1 -object memory-backend-ram,size=1024M,id=ram-node0 -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 -object memory-backend-ram,size=2048M,id=ram-node1  -numa node,nodeid=1,cpus=2-3,memdev=ram-node1 -enable-kvm -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x3 -name test -nodefaults -nodefconfig -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -spice disable-ticketing,port=5001 -vga qxl -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -monitor stdio -drive file=/mnt/rhel7-64.qcow2,if=none,media=disk,format=qcow2,rerror=stop,werror=stop,aio=native,id=scsi-disk0 -device virtio-scsi-pci,id=bus2,bus=pci.0,addr=0x5 -device scsi-hd,bus=bus2.0,drive=scsi-disk0,id=disk0  -netdev tap,id=netdev1,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,bus=pci.0,addr=0x6,netdev=netdev1,id=vn2,mac=02:48:a7:f1:00:48 -boot menu=on -qmp unix:/tmp/q1,server,nowait -monitor unix:/tmp/m1,server,nowait 

2.Launch dst qemu with listening mode in dst node:
# /usr/libexec/qemu-kvm -cpu Penryn  -machine rhel6.5.0,accel=kvm,usb=off -m 3072 -realtime mlock=off -smp 2,maxcpus=4,sockets=4,cores=1,threads=1 -object memory-backend-ram,size=1024M,id=ram-node0 -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 -object memory-backend-ram,size=2048M,id=ram-node1  -numa node,nodeid=1,cpus=2-3,memdev=ram-node1 -enable-kvm -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x3 -name test -nodefaults -nodefconfig -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -spice disable-ticketing,port=5001 -vga qxl -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -monitor stdio -drive file=/mnt/rhel7-64.qcow2,if=none,media=disk,format=qcow2,rerror=stop,werror=stop,aio=native,id=scsi-disk0 -device virtio-scsi-pci,id=bus2,bus=pci.0,addr=0x5 -device scsi-hd,bus=bus2.0,drive=scsi-disk0,id=disk0  -netdev tap,id=netdev1,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,bus=pci.0,addr=0x6,netdev=netdev1,id=vn2,mac=02:48:a7:f1:00:48 -boot menu=on -qmp unix:/tmp/q1,server,nowait -monitor unix:/tmp/m1,server,nowait -incoming tcp:0:4444



3.Migrate 

Result:

In dst, qemu core dumpd:
qemu-kvm: /builddir/build/BUILD/qemu-2.1.2/savevm.c:904: shadow_bios: Assertion `ram != ((void *)0)' failed.
Aborted (core dumped)


So this bug is reproduced

Verify this bug with qemu-kvm-rhev-2.1.2-17.el7.x86_64

Steps:

Try to boot guest with rhel6.5.0 machine type and together with numa memdev

# /usr/libexec/qemu-kvm -cpu Penryn  -machine rhel6.5.0,accel=kvm,usb=off -m 3072 -realtime mlock=off -smp 2,maxcpus=4,sockets=4,cores=1,threads=1 -object memory-backend-ram,size=1024M,id=ram-node0 -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 -object memory-backend-ram,size=2048M,id=ram-node1  -numa node,nodeid=1,cpus=2-3,memdev=ram-node1 -enable-kvm -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x3 -name test -nodefaults -nodefconfig -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -spice disable-ticketing,port=5001 -vga qxl -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -monitor stdio -drive file=/mnt/rhel7-64.qcow2,if=none,media=disk,format=qcow2,rerror=stop,werror=stop,aio=native,id=scsi-disk0 -device virtio-scsi-pci,id=bus2,bus=pci.0,addr=0x5 -device scsi-hd,bus=bus2.0,drive=scsi-disk0,id=disk0  -netdev tap,id=netdev1,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,bus=pci.0,addr=0x6,netdev=netdev1,id=vn2,mac=02:48:a7:f1:00:48 -boot menu=on -qmp unix:/tmp/q1,server,nowait -monitor unix:/tmp/m1,server,nowait 


Results:

(qemu) qemu-kvm: -numa memdev is not supported by machine rhel6.5.0


So the fixed qemu-kvm-rhev does not support this configuration, so this bug is fixed.


Additionanly

If boot only with 
-numa node,nodeid=0,cpus=0-1,memdev=2048M  -numa node,nodeid=1,cpus=2-3,memdev=2048M

but w/o -object, the migration can finished successfully both with fix and unfix version.

Comment 16 vivian zhang 2014-12-24 01:35:16 UTC
(In reply to Jeff Nelson from comment #11)
> As explained in Comment 10, the issue described in Comment 6 (and again in
> Comment 9) is not a bug, it's an invalid configuration.
> 
> Please re-verify using the configuration as described in the original
> problem report.

Hello, Jeff

sorry for my late response.
I have used the original configuration described in this bug to verify this issue.

from libvirt view, I can get below result, please help check is it an expected behaviour?

version:
libvirt-1.2.8-11.el7.x86_64
qemu-kvm-rhev-2.1.2-17.el7.x86_64

steps:
1. prepare a guest with numa and -M rhel6.5.0

# virsh dumpxml rhel7
....
<os>
    <type arch='x86_64' machine='rhel6.5.0'>hvm</type>
    <loader readonly='yes' type='rom'>/usr/share/seabios/bios.bin</loader>
    <boot dev='hd'/>
    <bootmenu enable='yes' timeout='3000'/>
  </os>
...
 <cpu>
    <numa>
      <cell id='0' cpus='0-1' memory='1048576'/>
    </numa>
  </cpu>
...

2. start guest, report error to block guest boot up

# virsh start rhel7
error: Failed to start domain rhel7
error: internal error: early end of file from monitor: possible problem:
2014-12-24T01:30:15.569450Z qemu-kvm: -numa memdev is not supported by machine rhel6.5.0

Comment 17 Jeff Nelson 2015-01-13 17:12:54 UTC
Looks OK to me, but deferring to BZ owner to confirm.

Comment 18 Eduardo Habkost 2015-01-13 17:34:02 UTC
If you need migration to work using libvirt + rhel6.5.0 machine-type + NUMA, you need the fix for bug 1175397. That means using libvirt-1.2.8-12.el7 or newer.

Comment 20 errata-xmlrpc 2015-03-05 09:59:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0624.html


Note You need to log in before you can comment on or make changes to this bug.