Bug 1170093
| Summary: | guest NUMA failed to migrate when machine is rhel6.5.0 | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Jincheng Miao <jmiao> | |
| Component: | qemu-kvm-rhev | Assignee: | Eduardo Habkost <ehabkost> | |
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | medium | |||
| Version: | 7.1 | CC: | amit.shah, dgilbert, dyuan, ehabkost, hhuang, honzhang, huding, jen, juzhang, knoel, lersek, lhuang, lmiksik, mprivozn, mzhan, qiguo, quintela, virt-maint, vivianzhang, xfu | |
| Target Milestone: | rc | Keywords: | Regression | |
| Target Release: | --- | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | qemu-kvm-rhev-2.1.2-17.el7 | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1175397 (view as bug list) | Environment: | ||
| Last Closed: | 2015-03-05 09:59:12 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1175397 | |||
It is caused by the hack added to fix bug 1027565, which breaks when using NUMA and memory-backend objects. hi, Eduardo Habkost
I found a similar issue, please help confirm whether they are caused by the same reason and could be fixed with the same patch
thanks
Description:
migration failed with error when configure guest with OVMF bios + machine type=rhel6.5.0
when machine type is set lower than rhel6.5.0, such as rhel6.4.0, migration failed with the same error.
Product version
libvirt-1.2.8-10.el7.x86_64
qemu-kvm-rhev-2.1.2-15.el7.x86_64
OVMF-20140822-7.git9ece15a.el7.x86_64
How producible
100%
Steps:
1. Prepare a migration env with nfs img between source and target host
2. make sure source and target host has been installed OVMF
# rpm -q OVMF
OVMF-20140822-7.git9ece15a.el7.x86_64
3. install a UEFI guest with virt-manger, make sure the guest with below configuration, set machine type='rhel6.5.0', and OVMF bios in guest xml
# virsh dumpxml rhel7new
...
<os>
<type arch='x86_64' machine='rhel6.5.0'>hvm</type>
<loader readonly='yes' type='pflash'>/usr/share/OVMF/OVMF_CODE.fd</loader>
<nvram template='/usr/share/OVMF/OVMF_VARS.fd'>/var/lib/libvirt/qemu/nvram/rhel7new_VARS.fd</nvram>
<boot dev='hd'/>
</os>
...
4. start guest, it works well
# # virsh list --all
Id Name State
----------------------------------------------------
27 rhel7new running
5. do migration for this guest, met qemu-kvm error
# virsh migrate rhel7new --live qemu+ssh://10.66.6.205/system --verbose
root.6.205's password:
Migration: [100 %]error: internal error: early end of file from monitor: possible problem:
RHEL-6 compat: ich9-usb-uhci1: irq_pin = 3
RHEL-6 compat: ich9-usb-uhci2: irq_pin = 3
RHEL-6 compat: ich9-usb-uhci3: irq_pin = 3
qemu-kvm: /builddir/build/BUILD/qemu-2.1.2/savevm.c:906: shadow_bios: Assertion `bios != ((void *)0)' failed.
6. when modify machine type to pc-i440fx-rhel7.1.0 or pc-i440fx-rhel7.0.0, migration could success
7. when delete OVMF bios configuration <nvram template='/usr/share/OVMF/OVMF_VARS.fd'></nvram>, migration could also success
Actual result:
migration failed with error
Expected result:
migration should success when configure guest with OVMF bios + machine type=rhel6.5.0
Fix included in qemu-kvm-rhev-2.1.2-17.el7 Hello Jeff I try with the fix build qemu-kvm-rhev-2.1.2-17.el7 bug migration still failed with OVMF bios + machine type=rhel6.5.0 # virsh migrate rhel7new --live qemu+ssh://10.66.6.205/system --verbose root.6.205's password: Migration: [100 %]error: internal error: early end of file from monitor: possible problem: RHEL-6 compat: ich9-usb-uhci1: irq_pin = 3 RHEL-6 compat: ich9-usb-uhci2: irq_pin = 3 RHEL-6 compat: ich9-usb-uhci3: irq_pin = 3 2014-12-17T05:41:51.612642Z qemu-kvm: usb-redir warning: usb-redir connection broken during migration qemu-kvm: /builddir/build/BUILD/qemu-2.1.2/savevm.c:906: shadow_bios: Assertion `bios != ((void *)0)' failed. so I filed a new bug 1175099 to track this OVMF is completely unsupported on the rhel6.5.0 machine type. As explained in Comment 10, the issue described in Comment 6 (and again in Comment 9) is not a bug, it's an invalid configuration. Please re-verify using the configuration as described in the original problem report. I've proposed patch on the libvirt's upstream list: https://www.redhat.com/archives/libvir-list/2014-December/msg00931.html And just pushed the patch:
commit f309db1f4d51009bad0d32e12efc75530b66836b
Author: Michal Privoznik <mprivozn>
AuthorDate: Thu Dec 18 12:36:48 2014 +0100
Commit: Michal Privoznik <mprivozn>
CommitDate: Fri Dec 19 07:44:44 2014 +0100
qemu: Create memory-backend-{ram,file} iff needed
Libvirt BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1175397
QEMU BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1170093
In qemu there are two interesting arguments:
1) -numa to create a guest NUMA node
2) -object memory-backend-{ram,file} to tell qemu which memory
region on which host's NUMA node it should allocate the guest
memory from.
Combining these two together we can instruct qemu to create a
guest NUMA node that is tied to a host NUMA node. And it works
just fine. However, depending on machine type used, there might
be some issued during migration when OVMF is enabled (see QEMU
BZ). While this truly is a QEMU bug, we can help avoiding it. The
problem lies within the memory backend objects somewhere. Having
said that, fix on our side consists on putting those objects on
the command line if and only if needed. For instance, while
previously we would construct this (in all ways correct) command
line:
-object memory-backend-ram,size=256M,id=ram-node0 \
-numa node,nodeid=0,cpus=0,memdev=ram-node0
now we create just:
-numa node,nodeid=0,cpus=0,mem=256
because the backend object is obviously not tied to any specific
host NUMA node.
Signed-off-by: Michal Privoznik <mprivozn>
v1.2.11-60-gf309db1
Reproduced this bug with qemu-kvm-rhev-2.1.2-13.el7.x86_64 steps: 1.Boot guest with memory-backend object and numa: # /usr/libexec/qemu-kvm -cpu Penryn -machine rhel6.5.0,accel=kvm,usb=off -m 3072 -realtime mlock=off -smp 2,maxcpus=4,sockets=4,cores=1,threads=1 -object memory-backend-ram,size=1024M,id=ram-node0 -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 -object memory-backend-ram,size=2048M,id=ram-node1 -numa node,nodeid=1,cpus=2-3,memdev=ram-node1 -enable-kvm -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x3 -name test -nodefaults -nodefconfig -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -spice disable-ticketing,port=5001 -vga qxl -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -monitor stdio -drive file=/mnt/rhel7-64.qcow2,if=none,media=disk,format=qcow2,rerror=stop,werror=stop,aio=native,id=scsi-disk0 -device virtio-scsi-pci,id=bus2,bus=pci.0,addr=0x5 -device scsi-hd,bus=bus2.0,drive=scsi-disk0,id=disk0 -netdev tap,id=netdev1,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,bus=pci.0,addr=0x6,netdev=netdev1,id=vn2,mac=02:48:a7:f1:00:48 -boot menu=on -qmp unix:/tmp/q1,server,nowait -monitor unix:/tmp/m1,server,nowait 2.Launch dst qemu with listening mode in dst node: # /usr/libexec/qemu-kvm -cpu Penryn -machine rhel6.5.0,accel=kvm,usb=off -m 3072 -realtime mlock=off -smp 2,maxcpus=4,sockets=4,cores=1,threads=1 -object memory-backend-ram,size=1024M,id=ram-node0 -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 -object memory-backend-ram,size=2048M,id=ram-node1 -numa node,nodeid=1,cpus=2-3,memdev=ram-node1 -enable-kvm -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x3 -name test -nodefaults -nodefconfig -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -spice disable-ticketing,port=5001 -vga qxl -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -monitor stdio -drive file=/mnt/rhel7-64.qcow2,if=none,media=disk,format=qcow2,rerror=stop,werror=stop,aio=native,id=scsi-disk0 -device virtio-scsi-pci,id=bus2,bus=pci.0,addr=0x5 -device scsi-hd,bus=bus2.0,drive=scsi-disk0,id=disk0 -netdev tap,id=netdev1,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,bus=pci.0,addr=0x6,netdev=netdev1,id=vn2,mac=02:48:a7:f1:00:48 -boot menu=on -qmp unix:/tmp/q1,server,nowait -monitor unix:/tmp/m1,server,nowait -incoming tcp:0:4444 3.Migrate Result: In dst, qemu core dumpd: qemu-kvm: /builddir/build/BUILD/qemu-2.1.2/savevm.c:904: shadow_bios: Assertion `ram != ((void *)0)' failed. Aborted (core dumped) So this bug is reproduced Verify this bug with qemu-kvm-rhev-2.1.2-17.el7.x86_64 Steps: Try to boot guest with rhel6.5.0 machine type and together with numa memdev # /usr/libexec/qemu-kvm -cpu Penryn -machine rhel6.5.0,accel=kvm,usb=off -m 3072 -realtime mlock=off -smp 2,maxcpus=4,sockets=4,cores=1,threads=1 -object memory-backend-ram,size=1024M,id=ram-node0 -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 -object memory-backend-ram,size=2048M,id=ram-node1 -numa node,nodeid=1,cpus=2-3,memdev=ram-node1 -enable-kvm -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x3 -name test -nodefaults -nodefconfig -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -spice disable-ticketing,port=5001 -vga qxl -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -monitor stdio -drive file=/mnt/rhel7-64.qcow2,if=none,media=disk,format=qcow2,rerror=stop,werror=stop,aio=native,id=scsi-disk0 -device virtio-scsi-pci,id=bus2,bus=pci.0,addr=0x5 -device scsi-hd,bus=bus2.0,drive=scsi-disk0,id=disk0 -netdev tap,id=netdev1,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,bus=pci.0,addr=0x6,netdev=netdev1,id=vn2,mac=02:48:a7:f1:00:48 -boot menu=on -qmp unix:/tmp/q1,server,nowait -monitor unix:/tmp/m1,server,nowait Results: (qemu) qemu-kvm: -numa memdev is not supported by machine rhel6.5.0 So the fixed qemu-kvm-rhev does not support this configuration, so this bug is fixed. Additionanly If boot only with -numa node,nodeid=0,cpus=0-1,memdev=2048M -numa node,nodeid=1,cpus=2-3,memdev=2048M but w/o -object, the migration can finished successfully both with fix and unfix version. (In reply to Jeff Nelson from comment #11) > As explained in Comment 10, the issue described in Comment 6 (and again in > Comment 9) is not a bug, it's an invalid configuration. > > Please re-verify using the configuration as described in the original > problem report. Hello, Jeff sorry for my late response. I have used the original configuration described in this bug to verify this issue. from libvirt view, I can get below result, please help check is it an expected behaviour? version: libvirt-1.2.8-11.el7.x86_64 qemu-kvm-rhev-2.1.2-17.el7.x86_64 steps: 1. prepare a guest with numa and -M rhel6.5.0 # virsh dumpxml rhel7 .... <os> <type arch='x86_64' machine='rhel6.5.0'>hvm</type> <loader readonly='yes' type='rom'>/usr/share/seabios/bios.bin</loader> <boot dev='hd'/> <bootmenu enable='yes' timeout='3000'/> </os> ... <cpu> <numa> <cell id='0' cpus='0-1' memory='1048576'/> </numa> </cpu> ... 2. start guest, report error to block guest boot up # virsh start rhel7 error: Failed to start domain rhel7 error: internal error: early end of file from monitor: possible problem: 2014-12-24T01:30:15.569450Z qemu-kvm: -numa memdev is not supported by machine rhel6.5.0 Looks OK to me, but deferring to BZ owner to confirm. If you need migration to work using libvirt + rhel6.5.0 machine-type + NUMA, you need the fix for bug 1175397. That means using libvirt-1.2.8-12.el7 or newer. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-0624.html |
Description of problem: migration failed when guest configured NUMA. version: libvirt-1.2.8-9.el7.x86_64 qemu-kvm-rhev-2.1.2-13.el7.x86_64 kernel-3.10.0-206.el7.x86_64 How reproducible: 100% Step to reproduce: 1. add following to domain xml # virsh dump aaa ... <memory unit='KiB'>3145728</memory> <currentMemory unit='KiB'>3145728</currentMemory> <cpu> <numa> <cell id='0' cpus='0-1' memory='1048576'/> <cell id='1' cpus='2-3' memory='2097152'/> </numa> </cpu> <os> <type arch='x86_64' machine='rhel6.5.0'>hvm</type> <boot dev='hd'/> </os> ... # ps -ef | grep aaa qemu 19033 1 2 16:41 ? 00:00:15 /usr/libexec/qemu-kvm -name aaa -S -machine rhel6.5.0,accel=kvm,usb=off -m 3072 -realtime mlock=off -smp 2,maxcpus=4,sockets=4,cores=1,threads=1 -numa node,nodeid=0,cpus=0-1,mem=1024 -numa node,nodeid=1,cpus=2-3,mem=2048 -uuid ab2aa5f3-d216-d426-2dd0-7fc318b252f7 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/aaa.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive file=/mnt/jmiao/r71-latest.img,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/aaa.agent,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -vnc 127.0.0.1:0 -vga cirrus -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on 2. add iptables rule to destination # iptables -I INPUT -p tcp --dport 49152:49261 -j ACCEPT 3. migrate it # virsh migrate --live aaa qemu+ssh://$TARGET/system --verbose --unsafe root@$TARGET's password: Migration: [100 %] error: internal error: early end of file from monitor: possible problem: qemu-kvm: /builddir/build/BUILD/qemu-2.1.2/savevm.c:904: shadow_bios: Assertion `ram != ((void *)0)' failed. Expect result: Migration success. Extra information: if '-machine' argument is set to rhel7.0.0, the problem will not be reproduced.