Bug 1377083
Summary: | [ppc64le] SLOF crashes during boot when adding two pci-bridge to the guest | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | xianwang <xianwang> |
Component: | SLOF | Assignee: | Thomas Huth <thuth> |
Status: | CLOSED ERRATA | QA Contact: | xianwang <xianwang> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.3 | CC: | dgibson, hannsj_uhl, knoel, mrezanin, qzhang, thuth, virt-maint, xianwang, yhong, zhengtli |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | ppc64le | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | SLOF-20161019 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-08-01 22:33:27 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1392055 | ||
Bug Blocks: | 1401400 |
Description
xianwang
2016-09-18 09:19:17 UTC
Seems like SLOF is crashing while trying to set up the bridges. When using "-serial stdio", the output of SLOF looks like this: SLOF ********************************************************************** QEMU Starting Build Date = Aug 3 2016 08:51:23 FW Version = git-8ae8607893c859e2 Press "s" to enter Open Firmware. Populating /vdevice methods Populating /vdevice/vty@71000000 Populating /vdevice/nvram@71000001 Populating /pci@800000020000000 00 2000 (B) : 1b36 0001 pci* 00 1800 (B) : 1b36 0001 pci* 52 4498 (�) : ffff ffff ( 300 ) Data Storage Exception [ 1dc572a0 ] R0 .. R7 R8 .. R15 R16 .. R23 R24 .. R31 000000001dbe4614 7c1043a67c0902c6 0000000000000000 000000001dbfb930 000000001e45eff0 000000001dbe0c74 0000000000000000 0000000000000006 000000001dc02008 000000001dc43038 0000000000000000 000000001dbf8a00 000000001dc45000 000000001fbc23c8 000000001dbe0e58 000000001dbfb760 0000000000000000 0000000000000001 0000000000000047 0000000000000003 000000001dc572a0 0000000000000000 000000001e53641b ffffffffffffffff 000000001dc43040 0000000000000000 000000001e52664c 000000001e45b010 7c1043a67c0902a6 0000000000000000 000000001dbe119c 000000001e4eaa90 CR / XER LR / CTR SRR0 / SRR1 DAR / DSISR 84000024 000000001dbe2188 000000001dbe1538 7c1043a67c0902c6 0000000000000000 000000001dbe1514 8000000000001000 40000000 Seems like you even do not need a graphics card or a virto-blk device to trigger the isse - I get the same crash in SLOF with this simplified command line already: sudo /usr/libexec/qemu-kvm -nodefaults -nographic -serial mon:stdio \ -device pci-bridge,chassis_nr=1,id=bridge0,addr=0x03 \ -device pci-bridge,chassis_nr=2,id=bridge1,addr=0x04 \ -device virtio-balloon,bus=bridge0,addr=0x04 There are two problems here: 1) The crash of SLOF happens because it hits a stack underflow when it detects an invalid PCI device type. I've sent a fix for this problem to the upstream mailing list here: https://lists.ozlabs.org/pipermail/slof/2016-September/001290.html 2) The PCI device is not recognized properly. I think this happens because SLOF internally enumerates the PCI buses in ascending order, but QEMU presents the PCI devices in the device tree in descending order. There was a patch for QEMU almost a year ago to fix this (https://lists.gnu.org/archive/html/qemu-devel/2015-11/msg06381.html - "spapr/pci: populate PCI DT in reverse order"), and this problem here is indeed fixed when I apply that patch here locally. However, the patch has not been included in upstream, so I've got to see whether we can re-activate that discussion or fix this problem somehow in SLOF instead... I've had a closer look at the bus enumeration in SLOF now: It keeps track of the current PCI bus number in a variable called "pci-bus-number" which is incremented each time a new PCI bridge has been found. This value is then used to program the "Secondary Bus Number Register" and the "Subordinate Bus Number Register" in the config space of the PCI bridge (see the pci-bridge-probe function in SLOF). However, since the bridge enumeration has been already done by QEMU and is represented in descending order in the device tree, the "pci-bus-number" values do not match the values from QEMU at all and thus the bus number registers of the bridge get configured completely wrong. SLOF should scan the children of the bridge's device tree node instead to get the right values for the secondary and subordinate bus numbers. Actually, SLOF should simply not write the secondary and subordinate bus number registers at all - since this has already been done by QEMU! I've now sent a patch to the upstream mailing list which should fix this issue: https://patchwork.ozlabs.org/patch/675528/ Patches have been merged upstream: https://github.com/aik/SLOF/commit/a6db31fda1cb23e24b https://github.com/aik/SLOF/commit/e44b7f074f549f7830 Fixed by rebase The following is the step of verification: 1.Version: Host:3.10.0-623.el7.ppc64le Qemu:qemu-kvm-rhev-2.9.0-0.el7.mrezanin201703210848 SLOF:SLOF.noarch 20170303-1.git66d250e.el7 2.Steps to Verify: Same to the top Description 3.Actual results: SLOF ********************************************************************** QEMU Starting Build Date = Mar 14 2017 08:36:17 FW Version = mockbuild@ release 20170303 Press "s" to enter Open Firmware. Populating /vdevice methods Populating /vdevice/vty@71000000 Populating /vdevice/nvram@71000001 Populating /pci@800000020000000 00 0000 (D) : 1234 1111 qemu vga 00 0800 (D) : 1af4 1003 virtio [ serial ] 00 1000 (D) : 1af4 1004 virtio [ scsi ] Populating /pci@800000020000000/scsi@2 SCSI: Looking for devices 00 1800 (B) : 1b36 0001 pci* 01 2000 (D) : 1af4 1001 virtio [ block ] 00 2000 (B) : 1b36 0001 pci* Installing QEMU fb Scanning USB No console specified using hvterm Welcome to Open Firmware Copyright (c) 2004, 2011 IBM Corporation All rights reserved. This program and the accompanying materials are made available under the terms of the BSD License available at http://www.opensource.org/licenses/bsd-license.php Trying to load: from: /pci@800000020000000/pci-bridge@3/scsi@4 ... E3405: No such device Trying to load: from: /pci@800000020000000/pci@3/scsi@4 ... Successfully loaded Red Hat Enterprise Linux Server (3.10.0-623.el7.ppc64le) 7.4 (Maipo) Red Hat Enterprise Linux Server (3.10.0-612.el7.ppc64le) 7.4 (Maipo) Red Hat Enterprise Linux Server (0-rescue-9ac7e2bb987f42d3be31f3ae292f3e> Use the ^ and v keys to change the selection. Press 'e' to edit the selected item, or 'c' for a command prompt. OF stdout device is: /vdevice/vty@71000000 Preparing to boot Linux version 3.10.0-623.el7.ppc64le (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Mar 21 20:33:46 EDT 2017 Detected machine type: 0000000000000101 Max number of cores passed to firmware: 2048 (NR_CPUS = 2048) Calling ibm,client-architecture-support... done command line: BOOT_IMAGE=/vmlinuz-3.10.0-623.el7.ppc64le root=/dev/mapper/rhel-root ro crashkernel=auto rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap rhgb quiet LANG=en_US.UTF-8 memory layout at init: memory_limit : 0000000000000000 (16 MB aligned) alloc_bottom : 0000000004b90000 alloc_top : 0000000020000000 alloc_top_hi : 0000000020000000 rmo_top : 0000000020000000 ram_top : 0000000020000000 found display : /pci@800000020000000/vga@0, opening... done instantiating rtas at 0x000000001daf0000... done prom_hold_cpus: skipped copying OF device tree... Building dt strings... Building dt structure... Device tree strings 0x0000000004ba0000 -> 0x0000000004ba0b38 Device tree struct 0x0000000004bb0000 -> 0x0000000004bc0000 Calling quiesce... returning from prom_init CF000012 CF000015ch Linux ppc64le #1 SMP Tue Mar 2 Red Hat Enterprise Linux Server 7.4 Beta (Maipo) Kernel 3.10.0-623.el7.ppc64le on an ppc64le localhost login: [root@localhost ~]# lspci lspci 00:00.0 VGA compatible controller: Device 1234:1111 (rev 02) 00:01.0 Communication controller: Red Hat, Inc Virtio console 00:02.0 SCSI storage controller: Red Hat, Inc Virtio SCSI 00:03.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge 00:04.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge 01:04.0 SCSI storage controller: Red Hat, Inc Virtio block device This bug is fixed, and change the status to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2093 |