Bug 1342434
Summary: | qemu core dump when starting a guest with more than 54 nested pcie switches | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Yang Yang <yanyang> | ||||
Component: | qemu-kvm-rhev | Assignee: | Dr. David Alan Gilbert <dgilbert> | ||||
Status: | CLOSED ERRATA | QA Contact: | jingzhao <jinzhao> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 7.3 | CC: | chayang, dgilbert, hhuang, jinzhao, juli, juzhang, mrezanin, virt-maint | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | qemu-kvm-rhev-2.9.0-1.el7 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2017-08-01 23:32:13 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1311684 | ||||||
Attachments: |
|
Description
Yang Yang
2016-06-03 09:27:44 UTC
Created attachment 1164396 [details]
backtrace
Hi David, This problem appears when we have nested devices, e.g. bridges. It happens because the vmstate builds some kind of id that "grows" with the nesting until it goes out of bounds. I am sure you know much more about it. Can you please advise on a good way to handle this problem? Thanks, Marcel (In reply to Marcel Apfelbaum from comment #2) > Hi David, > > This problem appears when we have nested devices, e.g. bridges. > It happens because the vmstate builds some kind of id that "grows" > with the nesting until it goes out of bounds. > I am sure you know much more about it. > > Can you please advise on a good way to handle this problem? (Recreated on upstream) Hmm that's fun; you're right it is the id length that's the problem; vmstate_register_with_alias_id: dev=pci.53 vmsd->name=xio3130-express-downstream-port instance_id=-1 vmstate_register_with_alias_id: (dev case) id=0000:00:0f.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0 that long path is ~260 characters in length - it comes from qdev_get_dev_path. There's no simple fix to getting migration to deal with longer paths; that becomes se->idstr that's written into the migration stream as a 1-byte length followed by the string (see save_section_header) and so it would need a format change to make it cope with longer names. I'd say the qdev path is unnecessarily long in this case (why does each bridge add 3 digits?) - but I don't think we can't change it again without breaking compatibility - I think the names form part of RAMBlock names (e.g. if a card is attached to the end of this bus with it's own ROM). I think we should probably fix vmstate_register_with_alias_id to fail cleanly and device_set_realized to check it's return value; but the migration code uses unchecked pstr* functions all over with these id's so there's lots of cases all that seem like they need fixing/checking. Dave > Thanks, > Marcel I've got a partial fix; but it isn't failing cleanly so I think this is part of a much bigger job to clean a lot of things up; we could impose an arbitrary PCI bus depth limit as a stop-gap if we find we need to, but I don't think it's urgent. Dave Reproduced it with qemu-kvm-rhev-2.6.0-5.el7.x86_64, and only support 23 nested switches, hit this issue when boot guest with 24 nested switches. Following is the core file (gdb) bt #0 0x00007f73fa0f05f7 in raise () from /lib64/libc.so.6 #1 0x00007f73fa0f1ce8 in abort () from /lib64/libc.so.6 #2 0x00007f73fa0e9566 in __assert_fail_base () from /lib64/libc.so.6 #3 0x00007f73fa0e9612 in __assert_fail () from /lib64/libc.so.6 #4 0x00007f740247b4ad in vmstate_register_with_alias_id (dev=dev@entry=0x7f7409810800, instance_id=<optimized out>, instance_id@entry=-1, vmsd=0x7f7402ba6a80 <vmstate_xio3130_downstream>, opaque=opaque@entry=0x7f7409810800, alias_id=alias_id@entry=-1, required_for_version=required_for_version@entry=0) at /usr/src/debug/qemu-2.6.0/migration/savevm.c:622 #5 0x00007f74025718da in device_set_realized (obj=<optimized out>, value=<optimized out>, errp=0x7fffa27cd1e8) at hw/core/qdev.c:1085 #6 0x00007f740264ed7e in property_set_bool (obj=0x7f7409810800, v=<optimized out>, name=<optimized out>, opaque=0x7f74093e45b0, errp=0x7fffa27cd1e8) at qom/object.c:1853 #7 0x00007f74026529d7 in object_property_set_qobject (obj=0x7f7409810800, value=<optimized out>, name=0x7f7402748fad "realized", errp=0x7fffa27cd1e8) at qom/qom-qobject.c:26 #8 0x00007f7402650850 in object_property_set_bool (obj=0x7f7409810800, value=<optimized out>, name=0x7f7402748fad "realized", errp=0x7fffa27cd1e8) at qom/object.c:1150 #9 0x00007f74025229cc in qdev_device_add (opts=0x7f7404cff400, errp=errp@entry=0x7fffa27cd2c0) at qdev-monitor.c:618 #10 0x00007f740252c7a7 in device_init_func (opaque=<optimized out>, opts=<optimized out>, errp=<optimized out>) at vl.c:2362 #11 0x00007f74026f8fea in qemu_opts_foreach (list=<optimized out>, func=func@entry=0x7f740252c780 <device_init_func>, opaque=opaque@entry=0x0, errp=errp@entry=0x0) at util/qemu-option.c:1116 #12 0x00007f7402422980 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4552 *** Bug 1360601 has been marked as a duplicate of this bug. *** Patches posted upstream: vmstate_register_with_alias_id: Take an Error ** migration: Check for ID length vmstate registration: check return values Fixed in 2.9; upstream IDs bc5c4f21966977c4ff00, 581f08bac22bdd5e081a, 581f08bac22bdd5e081a *** Bug 1058597 has been marked as a duplicate of this bug. *** *** Bug 1058200 has been marked as a duplicate of this bug. *** *** Bug 1058622 has been marked as a duplicate of this bug. *** 1. Reproduce the bz on qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64 2. Didn't hit the issue on qemu-kvm-rhev-2.9.0-1.el7.x86_64, but hit the "Path too long for VMState" issue Following are the detailed info 1. Boot guest with the script [1] sh script 24 2. Hit the following issue (qemu) qemu-kvm: -device x3130-upstream,bus=downstream23,id=upstream24: Path too long for VMState (0000:00:03.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0) BTW: when boot 23 nested switches, also hit the above issue Guest can boot up successfully with 22 nested switches. [1] #!/bin/sh MACHINE=q35 SMP=4,cores=2,threads=2,sockets=1 MEM=2G GUEST_IMG=/home/test/rhel/rhel74.qcow2 IMG_FORMAT=qcow2 CLI="/usr/libexec/qemu-kvm -enable-kvm -M $MACHINE -cpu SandyBridge -smp $SMP -m $MEM -name vm1 -drive file=$GUEST_IMG,if=none,id=guest-img,format=$IMG_FORMAT,werror=stop,rerror=stop -device ide-hd,drive=guest-img,bus=ide.0,unit=0,id=os-disk,bootindex=0 -spice port=5931,disable-ticketing -vga qxl -monitor stdio -serial unix:/tmp/console,server,nowait -qmp tcp:0:6666,server,nowait -chardev file,path=/home/seabios.log,id=seabios -device isa-debugcon,chardev=seabios,iobase=0x402 -boot menu=on,reboot-timeout=8,strict=on -device pcie-root-port,bus=pcie.0,id=root.0,slot=3 -device x3130-upstream,bus=root.0,id=upstream -device xio3130-downstream,bus=upstream,id=downstream0,chassis=1" while [ ${i:=0} -lt ${1:-0} ] do dstreamId=$((i+1)) ustreamId=$((i+1)) chassisId=$((dstreamId+1)) blkDiskId=$((i)) CLI="$CLI -device x3130-upstream,bus=downstream$i,id=upstream$ustreamId" CLI="$CLI -device xio3130-downstream,bus=upstream$ustreamId,id=downstream$dstreamId,chassis=$chassisId" i=$((i+1)) done CLI="$CLI -device usb-ehci,bus=downstream$i,id=ehci" $CLI Hi David Acorrding above test result and comment3, is there a bz for tracking the new issue? if not, QE will open a new bz for tracking the bz and close this bz. Are you agree? Thanks Jing (In reply to jingzhao from comment #12) > 1. Reproduce the bz on qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64 > > 2. Didn't hit the issue on qemu-kvm-rhev-2.9.0-1.el7.x86_64, but hit the > "Path too long for VMState" issue > > Following are the detailed info > > 1. Boot guest with the script [1] > sh script 24 > > 2. Hit the following issue > (qemu) qemu-kvm: -device x3130-upstream,bus=downstream23,id=upstream24: Path > too long for VMState > (0000:00:03.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00. > 0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0: > 00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0: > 00.0:00.0:00.0:00.0:00.0:00.0) > > BTW: when boot 23 nested switches, also hit the above issue > > Guest can boot up successfully with 22 nested switches. > > [1] > #!/bin/sh > > MACHINE=q35 > SMP=4,cores=2,threads=2,sockets=1 > MEM=2G > GUEST_IMG=/home/test/rhel/rhel74.qcow2 > IMG_FORMAT=qcow2 > > CLI="/usr/libexec/qemu-kvm -enable-kvm -M $MACHINE -cpu SandyBridge -smp > $SMP -m $MEM -name vm1 -drive > file=$GUEST_IMG,if=none,id=guest-img,format=$IMG_FORMAT,werror=stop, > rerror=stop -device > ide-hd,drive=guest-img,bus=ide.0,unit=0,id=os-disk,bootindex=0 -spice > port=5931,disable-ticketing -vga qxl -monitor stdio -serial > unix:/tmp/console,server,nowait -qmp tcp:0:6666,server,nowait -chardev > file,path=/home/seabios.log,id=seabios -device > isa-debugcon,chardev=seabios,iobase=0x402 -boot > menu=on,reboot-timeout=8,strict=on -device > pcie-root-port,bus=pcie.0,id=root.0,slot=3 -device > x3130-upstream,bus=root.0,id=upstream -device > xio3130-downstream,bus=upstream,id=downstream0,chassis=1" > while [ ${i:=0} -lt ${1:-0} ] > do > dstreamId=$((i+1)) > ustreamId=$((i+1)) > chassisId=$((dstreamId+1)) > blkDiskId=$((i)) > > > CLI="$CLI -device x3130-upstream,bus=downstream$i,id=upstream$ustreamId" > > CLI="$CLI -device > xio3130-downstream,bus=upstream$ustreamId,id=downstream$dstreamId, > chassis=$chassisId" > i=$((i+1)) > done > CLI="$CLI -device usb-ehci,bus=downstream$i,id=ehci" > > $CLI > > Hi David > > Acorrding above test result and comment3, is there a bz for tracking the new > issue? if not, QE will open a new bz for tracking the bz and close this bz. > Are you agree? So we've fixed the crash that used to occur, it's now giving you a correct error beacause you're not allowed to make a tree that deep; that's not a bug. Dave > > Thanks > Jing According to comment 12 and comment 13, changed to verified status Thanks Jing Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 |