Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1342434

Summary:

qemu core dump when starting a guest with more than 54 nested pcie switches

Product:

Red Hat Enterprise Linux 7

Reporter:

Yang Yang <yanyang>

Component:

qemu-kvm-rhev

Assignee:

Dr. David Alan Gilbert <dgilbert>

Status:

CLOSED ERRATA

QA Contact:

jingzhao <jinzhao>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

7.3

CC:

chayang, dgilbert, hhuang, jinzhao, juli, juzhang, mrezanin, virt-maint

Target Milestone:

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

qemu-kvm-rhev-2.9.0-1.el7

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2017-08-01 23:32:13 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1311684

Attachments:

Description	Flags
backtrace	none

Description Yang Yang 2016-06-03 09:27:44 UTC

Description of problem:
qemu core dump when starting a guest with more than 54 nested pcie switches

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.6.0-4.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. boot a guest

/usr/libexec/qemu-kvm -machine pc-q35-rhel7.2.0 \
-device i82801b11-bridge,id=pci.1,bus=pcie.0,addr=0x1e \
-device pci-bridge,chassis_nr=2,id=pci.2,bus=pci.1,addr=0x1 \
-device ioh3420,port=0x10,chassis=3,id=pci.3,bus=pcie.0,addr=0xf \
-device x3130-upstream,id=pci.4,bus=pci.3,addr=0x0 \
-device xio3130-downstream,port=0x0,chassis=5,id=pci.5,bus=pci.4,addr=0x0 \
-device x3130-upstream,id=pci.6,bus=pci.5,addr=0x0 \
-device xio3130-downstream,port=0x0,chassis=7,id=pci.7,bus=pci.6,addr=0x0 \
-device x3130-upstream,id=pci.8,bus=pci.7,addr=0x0 \
-device xio3130-downstream,port=0x0,chassis=9,id=pci.9,bus=pci.8,addr=0x0 \
-device x3130-upstream,id=pci.10,bus=pci.9,addr=0x0 \
-device xio3130-downstream,port=0x0,chassis=11,id=pci.11,bus=pci.10,addr=0x0 \
-device x3130-upstream,id=pci.12,bus=pci.11,addr=0x0 \
-device xio3130-downstream,port=0x0,chassis=13,id=pci.13,bus=pci.12,addr=0x0 \
-device x3130-upstream,id=pci.14,bus=pci.13,addr=0x0 \
-device xio3130-downstream,port=0x0,chassis=15,id=pci.15,bus=pci.14,addr=0x0 \
-device x3130-upstream,id=pci.16,bus=pci.15,addr=0x0 \
-device xio3130-downstream,port=0x0,chassis=17,id=pci.17,bus=pci.16,addr=0x0 \
-device x3130-upstream,id=pci.18,bus=pci.17,addr=0x0 \
-device xio3130-downstream,port=0x0,chassis=19,id=pci.19,bus=pci.18,addr=0x0 \
-device x3130-upstream,id=pci.20,bus=pci.19,addr=0x0 \
-device xio3130-downstream,port=0x0,chassis=21,id=pci.21,bus=pci.20,addr=0x0 \
-device x3130-upstream,id=pci.22,bus=pci.21,addr=0x0 \
-device xio3130-downstream,port=0x0,chassis=23,id=pci.23,bus=pci.22,addr=0x0 \
-device x3130-upstream,id=pci.24,bus=pci.23,addr=0x0 \
-device xio3130-downstream,port=0x0,chassis=25,id=pci.25,bus=pci.24,addr=0x0 \
-device x3130-upstream,id=pci.26,bus=pci.25,addr=0x0 \
-device xio3130-downstream,port=0x0,chassis=27,id=pci.27,bus=pci.26,addr=0x0 \
-device x3130-upstream,id=pci.28,bus=pci.27,addr=0x0 \
-device xio3130-downstream,port=0x0,chassis=29,id=pci.29,bus=pci.28,addr=0x0 \
-device x3130-upstream,id=pci.30,bus=pci.29,addr=0x0 \
-device xio3130-downstream,port=0x0,chassis=31,id=pci.31,bus=pci.30,addr=0x0 \
-device x3130-upstream,id=pci.32,bus=pci.31,addr=0x0 \
-device xio3130-downstream,port=0x0,chassis=33,id=pci.33,bus=pci.32,addr=0x0 \
-device x3130-upstream,id=pci.34,bus=pci.33,addr=0x0 \
-device xio3130-downstream,port=0x0,chassis=35,id=pci.35,bus=pci.34,addr=0x0 \
-device x3130-upstream,id=pci.36,bus=pci.35,addr=0x0 \
-device xio3130-downstream,port=0x0,chassis=37,id=pci.37,bus=pci.36,addr=0x0 \
-device x3130-upstream,id=pci.38,bus=pci.37,addr=0x0 \
-device xio3130-downstream,port=0x0,chassis=39,id=pci.39,bus=pci.38,addr=0x0 \
-device x3130-upstream,id=pci.40,bus=pci.39,addr=0x0 \
-device xio3130-downstream,port=0x0,chassis=41,id=pci.41,bus=pci.40,addr=0x0 \
-device x3130-upstream,id=pci.42,bus=pci.41,addr=0x0 \
-device xio3130-downstream,port=0x0,chassis=43,id=pci.43,bus=pci.42,addr=0x0 \
-device x3130-upstream,id=pci.44,bus=pci.43,addr=0x0 \
-device xio3130-downstream,port=0x0,chassis=45,id=pci.45,bus=pci.44,addr=0x0 \
-device x3130-upstream,id=pci.46,bus=pci.45,addr=0x0 \
-device xio3130-downstream,port=0x0,chassis=47,id=pci.47,bus=pci.46,addr=0x0 \
-device x3130-upstream,id=pci.48,bus=pci.47,addr=0x0 \
-device xio3130-downstream,port=0x0,chassis=49,id=pci.49,bus=pci.48,addr=0x0 \
-device x3130-upstream,id=pci.50,bus=pci.49,addr=0x0 \
-device xio3130-downstream,port=0x0,chassis=51,id=pci.51,bus=pci.50,addr=0x0 \
-device x3130-upstream,id=pci.52,bus=pci.51,addr=0x0 \
-device xio3130-downstream,port=0x0,chassis=53,id=pci.53,bus=pci.52,addr=0x0 \
-device x3130-upstream,id=pci.54,bus=pci.53,addr=0x0 \
-device xio3130-downstream,port=0x0,chassis=55,id=pci.55,bus=pci.54,addr=0x0 \
-drive file=/mnt/nfs2/RHEL-7.3-latest.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 \
-device virtio-blk-pci,scsi=off,bus=pci.2,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0 \
-monitor stdio -spice port=5931,disable-ticketing -boot menu=on -m 2G -qmp tcp:0:6666,server,nowait
QEMU 2.6.0 monitor - type 'help' for more information
(qemu) warning: host doesn't support requested feature: CPUID.80000001H:ECX.abm [bit 5]
warning: host doesn't support requested feature: CPUID.80000001H:ECX.sse4a [bit 6]
qemu-kvm: /builddir/build/BUILD/qemu-2.6.0/migration/savevm.c:622: vmstate_register_with_alias_id: Assertion `!se->compat || se->instance_id == 0' failed.
Aborted (core dumped)

Actual results:


Expected results:
qemu do not crash

Additional info:

Comment 1 Yang Yang 2016-06-03 09:28:31 UTC

Created attachment 1164396 [details]
backtrace

Comment 2 Marcel Apfelbaum 2016-06-05 08:51:44 UTC

Hi David,

This problem appears when we have nested devices, e.g. bridges.
It happens because the vmstate builds some kind of id that "grows"
with the nesting until it goes out of bounds.
I am sure you know much more about it.

Can you please advise on a good way to handle this problem?
Thanks,
Marcel

Comment 3 Dr. David Alan Gilbert 2016-06-06 09:23:59 UTC

(In reply to Marcel Apfelbaum from comment #2)
> Hi David,
> 
> This problem appears when we have nested devices, e.g. bridges.
> It happens because the vmstate builds some kind of id that "grows"
> with the nesting until it goes out of bounds.
> I am sure you know much more about it.
> 
> Can you please advise on a good way to handle this problem?

(Recreated on upstream)

Hmm that's fun; you're right it is the id length that's the problem;

vmstate_register_with_alias_id: dev=pci.53 vmsd->name=xio3130-express-downstream-port instance_id=-1
vmstate_register_with_alias_id: (dev case) id=0000:00:0f.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0

that long path is ~260 characters in length - it comes from qdev_get_dev_path.

There's no simple fix to getting migration to deal with longer paths; that becomes se->idstr that's written into the migration stream as a 1-byte length followed by the string (see save_section_header) and so it would need a format change to make it cope with longer names.

I'd say the qdev path is unnecessarily long in this case (why does each bridge add 3 digits?) - but I don't think we can't change it again without breaking compatibility - I think the names form part of RAMBlock names (e.g. if a card is attached to the end of this bus with it's own ROM).

I think we should probably fix vmstate_register_with_alias_id to fail cleanly and device_set_realized to check it's return value; but the migration code uses unchecked pstr* functions all over with these id's so there's lots of cases all that seem like they need fixing/checking.

Dave

> Thanks,
> Marcel

Comment 4 Dr. David Alan Gilbert 2016-06-07 12:29:51 UTC

I've got a partial fix; but it isn't failing cleanly so I think this is part of a much bigger job to clean a lot of things up; we could impose an arbitrary PCI bus depth limit as a stop-gap if we find we need to, but I don't think it's urgent.

Dave

Comment 5 jingzhao 2016-06-13 03:15:16 UTC

Reproduced it with qemu-kvm-rhev-2.6.0-5.el7.x86_64, and only support 23 nested switches, hit this issue when boot guest with 24 nested switches. Following is the core file
(gdb) bt
#0  0x00007f73fa0f05f7 in raise () from /lib64/libc.so.6
#1  0x00007f73fa0f1ce8 in abort () from /lib64/libc.so.6
#2  0x00007f73fa0e9566 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00007f73fa0e9612 in __assert_fail () from /lib64/libc.so.6
#4  0x00007f740247b4ad in vmstate_register_with_alias_id (dev=dev@entry=0x7f7409810800, instance_id=<optimized out>, 
    instance_id@entry=-1, vmsd=0x7f7402ba6a80 <vmstate_xio3130_downstream>, opaque=opaque@entry=0x7f7409810800, 
    alias_id=alias_id@entry=-1, required_for_version=required_for_version@entry=0)
    at /usr/src/debug/qemu-2.6.0/migration/savevm.c:622
#5  0x00007f74025718da in device_set_realized (obj=<optimized out>, value=<optimized out>, errp=0x7fffa27cd1e8)
    at hw/core/qdev.c:1085
#6  0x00007f740264ed7e in property_set_bool (obj=0x7f7409810800, v=<optimized out>, name=<optimized out>, 
    opaque=0x7f74093e45b0, errp=0x7fffa27cd1e8) at qom/object.c:1853
#7  0x00007f74026529d7 in object_property_set_qobject (obj=0x7f7409810800, value=<optimized out>, 
    name=0x7f7402748fad "realized", errp=0x7fffa27cd1e8) at qom/qom-qobject.c:26
#8  0x00007f7402650850 in object_property_set_bool (obj=0x7f7409810800, value=<optimized out>, 
    name=0x7f7402748fad "realized", errp=0x7fffa27cd1e8) at qom/object.c:1150
#9  0x00007f74025229cc in qdev_device_add (opts=0x7f7404cff400, errp=errp@entry=0x7fffa27cd2c0) at qdev-monitor.c:618
#10 0x00007f740252c7a7 in device_init_func (opaque=<optimized out>, opts=<optimized out>, errp=<optimized out>)
    at vl.c:2362
#11 0x00007f74026f8fea in qemu_opts_foreach (list=<optimized out>, 
    func=func@entry=0x7f740252c780 <device_init_func>, opaque=opaque@entry=0x0, errp=errp@entry=0x0)
    at util/qemu-option.c:1116
#12 0x00007f7402422980 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4552

Comment 6 Dr. David Alan Gilbert 2016-08-01 09:49:22 UTC

*** Bug 1360601 has been marked as a duplicate of this bug. ***

Comment 7 Dr. David Alan Gilbert 2017-01-09 20:15:13 UTC

Patches posted upstream:

  vmstate_register_with_alias_id: Take an Error **
  migration: Check for ID length
  vmstate registration: check return values

Comment 8 Dr. David Alan Gilbert 2017-02-07 18:08:21 UTC

Fixed in 2.9; upstream IDs bc5c4f21966977c4ff00, 581f08bac22bdd5e081a, 581f08bac22bdd5e081a

Comment 9 Marcel Apfelbaum 2017-02-14 10:21:49 UTC

*** Bug 1058597 has been marked as a duplicate of this bug. ***

Comment 10 Dr. David Alan Gilbert 2017-02-14 10:27:11 UTC

*** Bug 1058200 has been marked as a duplicate of this bug. ***

Comment 11 Dr. David Alan Gilbert 2017-02-14 10:28:56 UTC

*** Bug 1058622 has been marked as a duplicate of this bug. ***

Comment 12 jingzhao 2017-04-26 06:31:27 UTC

1. Reproduce the bz on qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64

2. Didn't hit the issue on qemu-kvm-rhev-2.9.0-1.el7.x86_64, but hit the "Path too long for VMState" issue 

Following are the detailed info

1. Boot guest with the script [1]
sh script 24

2. Hit the following issue
(qemu) qemu-kvm: -device x3130-upstream,bus=downstream23,id=upstream24: Path too long for VMState (0000:00:03.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0)

BTW: when boot 23 nested switches, also hit the above issue

Guest can boot up successfully with 22 nested switches.

[1]
#!/bin/sh

MACHINE=q35
SMP=4,cores=2,threads=2,sockets=1
MEM=2G
GUEST_IMG=/home/test/rhel/rhel74.qcow2
IMG_FORMAT=qcow2

CLI="/usr/libexec/qemu-kvm -enable-kvm -M $MACHINE -cpu SandyBridge -smp $SMP -m $MEM -name vm1 -drive file=$GUEST_IMG,if=none,id=guest-img,format=$IMG_FORMAT,werror=stop,rerror=stop -device ide-hd,drive=guest-img,bus=ide.0,unit=0,id=os-disk,bootindex=0 -spice port=5931,disable-ticketing -vga qxl -monitor stdio -serial unix:/tmp/console,server,nowait -qmp tcp:0:6666,server,nowait -chardev file,path=/home/seabios.log,id=seabios -device isa-debugcon,chardev=seabios,iobase=0x402 -boot menu=on,reboot-timeout=8,strict=on -device pcie-root-port,bus=pcie.0,id=root.0,slot=3 -device x3130-upstream,bus=root.0,id=upstream -device xio3130-downstream,bus=upstream,id=downstream0,chassis=1"
while [ ${i:=0} -lt ${1:-0} ]
do
    dstreamId=$((i+1))
    ustreamId=$((i+1))
    chassisId=$((dstreamId+1))
    blkDiskId=$((i))


    CLI="$CLI -device x3130-upstream,bus=downstream$i,id=upstream$ustreamId"    
    CLI="$CLI -device xio3130-downstream,bus=upstream$ustreamId,id=downstream$dstreamId,chassis=$chassisId"
    i=$((i+1))
done
   CLI="$CLI -device usb-ehci,bus=downstream$i,id=ehci"

$CLI

Hi David

Acorrding above test result and comment3, is there a bz for tracking the new issue? if not, QE will open a new bz for tracking the bz and close this bz.
Are you agree?

Thanks
Jing

Comment 13 Dr. David Alan Gilbert 2017-04-26 08:36:23 UTC

(In reply to jingzhao from comment #12)
> 1. Reproduce the bz on qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64
> 
> 2. Didn't hit the issue on qemu-kvm-rhev-2.9.0-1.el7.x86_64, but hit the
> "Path too long for VMState" issue 
> 
> Following are the detailed info
> 
> 1. Boot guest with the script [1]
> sh script 24
> 
> 2. Hit the following issue
> (qemu) qemu-kvm: -device x3130-upstream,bus=downstream23,id=upstream24: Path
> too long for VMState
> (0000:00:03.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.
> 0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:
> 00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:00.0:
> 00.0:00.0:00.0:00.0:00.0:00.0)
> 
> BTW: when boot 23 nested switches, also hit the above issue
> 
> Guest can boot up successfully with 22 nested switches.
> 
> [1]
> #!/bin/sh
> 
> MACHINE=q35
> SMP=4,cores=2,threads=2,sockets=1
> MEM=2G
> GUEST_IMG=/home/test/rhel/rhel74.qcow2
> IMG_FORMAT=qcow2
> 
> CLI="/usr/libexec/qemu-kvm -enable-kvm -M $MACHINE -cpu SandyBridge -smp
> $SMP -m $MEM -name vm1 -drive
> file=$GUEST_IMG,if=none,id=guest-img,format=$IMG_FORMAT,werror=stop,
> rerror=stop -device
> ide-hd,drive=guest-img,bus=ide.0,unit=0,id=os-disk,bootindex=0 -spice
> port=5931,disable-ticketing -vga qxl -monitor stdio -serial
> unix:/tmp/console,server,nowait -qmp tcp:0:6666,server,nowait -chardev
> file,path=/home/seabios.log,id=seabios -device
> isa-debugcon,chardev=seabios,iobase=0x402 -boot
> menu=on,reboot-timeout=8,strict=on -device
> pcie-root-port,bus=pcie.0,id=root.0,slot=3 -device
> x3130-upstream,bus=root.0,id=upstream -device
> xio3130-downstream,bus=upstream,id=downstream0,chassis=1"
> while [ ${i:=0} -lt ${1:-0} ]
> do
>     dstreamId=$((i+1))
>     ustreamId=$((i+1))
>     chassisId=$((dstreamId+1))
>     blkDiskId=$((i))
> 
> 
>     CLI="$CLI -device x3130-upstream,bus=downstream$i,id=upstream$ustreamId"
> 
>     CLI="$CLI -device
> xio3130-downstream,bus=upstream$ustreamId,id=downstream$dstreamId,
> chassis=$chassisId"
>     i=$((i+1))
> done
>    CLI="$CLI -device usb-ehci,bus=downstream$i,id=ehci"
> 
> $CLI
> 
> Hi David
> 
> Acorrding above test result and comment3, is there a bz for tracking the new
> issue? if not, QE will open a new bz for tracking the bz and close this bz.
> Are you agree?

So we've fixed the crash that used to occur, it's now giving you a correct error beacause you're not allowed to make a tree that deep; that's not a bug.

Dave

> 
> Thanks
> Jing

Comment 14 jingzhao 2017-04-27 02:30:29 UTC

According to comment 12 and comment 13, changed to verified status

Thanks
Jing

Comment 16 errata-xmlrpc 2017-08-01 23:32:13 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 17 errata-xmlrpc 2017-08-02 01:09:52 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 18 errata-xmlrpc 2017-08-02 02:01:51 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 19 errata-xmlrpc 2017-08-02 02:42:37 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 20 errata-xmlrpc 2017-08-02 03:07:20 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Comment 21 errata-xmlrpc 2017-08-02 03:27:29 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392