RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1377083 - [ppc64le] SLOF crashes during boot when adding two pci-bridge to the guest
Summary: [ppc64le] SLOF crashes during boot when adding two pci-bridge to the guest
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: SLOF
Version: 7.3
Hardware: ppc64le
OS: Linux
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Thomas Huth
QA Contact: xianwang
URL:
Whiteboard:
Depends On: 1392055
Blocks: 1401400
TreeView+ depends on / blocked
 
Reported: 2016-09-18 09:19 UTC by xianwang
Modified: 2017-08-01 22:33 UTC (History)
10 users (show)

Fixed In Version: SLOF-20161019
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-01 22:33:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:2093 0 normal SHIPPED_LIVE SLOF bug fix and enhancement update 2017-08-01 19:35:59 UTC

Description xianwang 2016-09-18 09:19:17 UTC
Description of problem:
Guest has not initialized the display(yet),while adding two pci-bridge to the guest,but works while adding only one

Version-Release number of selected component (if applicable):
host kernel:3.10.0-505.el7.ppc64le
qemu-kvm-rhev:qemu-kvm-rhev-2.6.0-24.el7.ppc64le

How reproducible:
5/5

Steps to Reproduce:
1.boot a guest with the following command:

    -device pci-bridge,chassis_nr=1,id=bridge0,addr=0x03 \
    -device pci-bridge,chassis_nr=2,id=bridge1,addr=0x04 \

2.check the graphics in vnc


Actual results:
Guest has not initialized the display(yet) 

Expected results:
It should show the booting process

Additional info:
full command:
/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox off  \
    -nodefaults  \
    -machine pseries-rhel7.3.0 \
    -vga std  \
    -device virtio-serial-pci,id=virtio_serial_pci0,bus=pci.0,addr=01 \
    -device virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x02 \
    -device pci-bridge,chassis_nr=1,id=bridge0,addr=0x03 \
    -device pci-bridge,chassis_nr=2,id=bridge1,addr=0x04 \
    -drive file=/root/RHEL7.3.qcow2,if=none,id=blk1 \
    -device virtio-blk-pci,scsi=off,drive=blk1,id=hd,bus=bridge0,addr=0x04,bootindex=1 \
    -monitor stdio \
    -vnc :1

Comment 3 Thomas Huth 2016-09-22 10:25:40 UTC
Seems like SLOF is crashing while trying to set up the bridges. When using "-serial stdio", the output of SLOF looks like this:

SLOF **********************************************************************
QEMU Starting
 Build Date = Aug  3 2016 08:51:23
 FW Version = git-8ae8607893c859e2
 Press "s" to enter Open Firmware.

Populating /vdevice methods
Populating /vdevice/vty@71000000
Populating /vdevice/nvram@71000001
Populating /pci@800000020000000
                     00 2000 (B) : 1b36 0001    pci*
                     00 1800 (B) : 1b36 0001    pci*
                     52 4498 (�) : ffff ffff     

( 300 ) Data Storage Exception [ 1dc572a0 ]


    R0 .. R7           R8 .. R15         R16 .. R23         R24 .. R31
000000001dbe4614   7c1043a67c0902c6   0000000000000000   000000001dbfb930   
000000001e45eff0   000000001dbe0c74   0000000000000000   0000000000000006   
000000001dc02008   000000001dc43038   0000000000000000   000000001dbf8a00   
000000001dc45000   000000001fbc23c8   000000001dbe0e58   000000001dbfb760   
0000000000000000   0000000000000001   0000000000000047   0000000000000003   
000000001dc572a0   0000000000000000   000000001e53641b   ffffffffffffffff   
000000001dc43040   0000000000000000   000000001e52664c   000000001e45b010   
7c1043a67c0902a6   0000000000000000   000000001dbe119c   000000001e4eaa90   

    CR / XER           LR / CTR          SRR0 / SRR1        DAR / DSISR
        84000024   000000001dbe2188   000000001dbe1538   7c1043a67c0902c6   
0000000000000000   000000001dbe1514   8000000000001000           40000000

Comment 4 Thomas Huth 2016-09-22 10:53:48 UTC
Seems like you even do not need a graphics card or a virto-blk device to trigger the isse - I get the same crash in SLOF with this simplified command line already:

sudo /usr/libexec/qemu-kvm -nodefaults -nographic -serial mon:stdio \
  -device pci-bridge,chassis_nr=1,id=bridge0,addr=0x03 \
  -device pci-bridge,chassis_nr=2,id=bridge1,addr=0x04 \
  -device virtio-balloon,bus=bridge0,addr=0x04

Comment 5 Thomas Huth 2016-09-23 07:14:03 UTC
There are two problems here:

1) The crash of SLOF happens because it hits a stack underflow when it detects an invalid PCI device type. I've sent a fix for this problem to the upstream mailing list here:

https://lists.ozlabs.org/pipermail/slof/2016-September/001290.html

2) The PCI device is not recognized properly. I think this happens because SLOF internally enumerates the PCI buses in ascending order, but QEMU presents the PCI devices in the device tree in descending order. There was a patch for QEMU almost a year ago to fix this (https://lists.gnu.org/archive/html/qemu-devel/2015-11/msg06381.html - "spapr/pci: populate PCI DT in reverse order"), and this problem here is indeed fixed when I apply that patch here locally. However, the patch has not been included in upstream, so I've got to see whether we can re-activate that discussion or fix this problem somehow in SLOF instead...

Comment 6 Thomas Huth 2016-09-23 11:50:55 UTC
I've had a closer look at the bus enumeration in SLOF now: It keeps track of the current PCI bus number in a variable called "pci-bus-number" which is incremented each time a new PCI bridge has been found. This value is then used to program the "Secondary Bus Number Register" and the "Subordinate Bus Number Register" in the config space of the PCI bridge (see the pci-bridge-probe function in SLOF). However, since the bridge enumeration has been already done by QEMU and is represented in descending order in the device tree, the "pci-bus-number" values do not match the values from QEMU at all and thus the bus number registers of the bridge get configured completely wrong. SLOF should scan the children of the bridge's device tree node instead to get the right values for the secondary and subordinate bus numbers.

Comment 7 Thomas Huth 2016-09-27 12:02:16 UTC
Actually, SLOF should simply not write the secondary and subordinate bus number registers at all - since this has already been done by QEMU! I've now sent a patch to the upstream mailing list which should fix this issue:

https://patchwork.ozlabs.org/patch/675528/

Comment 10 Miroslav Rezanina 2017-03-14 13:52:04 UTC
Fixed by rebase

Comment 12 Yongxue Hong 2017-03-23 08:51:04 UTC
The following is the step of verification:

1.Version:
Host:3.10.0-623.el7.ppc64le
Qemu:qemu-kvm-rhev-2.9.0-0.el7.mrezanin201703210848
SLOF:SLOF.noarch  20170303-1.git66d250e.el7

2.Steps to Verify:
Same to the top Description

3.Actual results:
SLOF **********************************************************************
QEMU Starting
 Build Date = Mar 14 2017 08:36:17
 FW Version = mockbuild@ release 20170303
 Press "s" to enter Open Firmware.

Populating /vdevice methods
Populating /vdevice/vty@71000000
Populating /vdevice/nvram@71000001
Populating /pci@800000020000000
                     00 0000 (D) : 1234 1111    qemu vga
                     00 0800 (D) : 1af4 1003    virtio [ serial ]
                     00 1000 (D) : 1af4 1004    virtio [ scsi ]
Populating /pci@800000020000000/scsi@2
       SCSI: Looking for devices
                     00 1800 (B) : 1b36 0001    pci*
                     01 2000 (D) : 1af4 1001    virtio [ block ]
                     00 2000 (B) : 1b36 0001    pci*
Installing QEMU fb


Scanning USB 
No console specified using hvterm
     
  Welcome to Open Firmware

  Copyright (c) 2004, 2011 IBM Corporation All rights reserved.
  This program and the accompanying materials are made available
  under the terms of the BSD License available at
  http://www.opensource.org/licenses/bsd-license.php


Trying to load:  from: /pci@800000020000000/pci-bridge@3/scsi@4 ... 
E3405: No such device
Trying to load:  from: /pci@800000020000000/pci@3/scsi@4 ...   Successfully loaded

      Red Hat Enterprise Linux Server (3.10.0-623.el7.ppc64le) 7.4 (Maipo)      
      Red Hat Enterprise Linux Server (3.10.0-612.el7.ppc64le) 7.4 (Maipo)     
      Red Hat Enterprise Linux Server (0-rescue-9ac7e2bb987f42d3be31f3ae292f3e>

      Use the ^ and v keys to change the selection.                       
      Press 'e' to edit the selected item, or 'c' for a command prompt.                                                                               
                                                                            
OF stdout device is: /vdevice/vty@71000000
Preparing to boot Linux version 3.10.0-623.el7.ppc64le (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Mar 21 20:33:46 EDT 2017
Detected machine type: 0000000000000101
Max number of cores passed to firmware: 2048 (NR_CPUS = 2048)
Calling ibm,client-architecture-support... done
command line: BOOT_IMAGE=/vmlinuz-3.10.0-623.el7.ppc64le root=/dev/mapper/rhel-root ro crashkernel=auto rd.lvm.lv=rhel/root rd.lvm.lv=rhel/swap rhgb quiet LANG=en_US.UTF-8
memory layout at init:
  memory_limit : 0000000000000000 (16 MB aligned)
  alloc_bottom : 0000000004b90000
  alloc_top    : 0000000020000000
  alloc_top_hi : 0000000020000000
  rmo_top      : 0000000020000000
  ram_top      : 0000000020000000
found display   : /pci@800000020000000/vga@0, opening... done
instantiating rtas at 0x000000001daf0000... done
prom_hold_cpus: skipped
copying OF device tree...
Building dt strings...
Building dt structure...
Device tree strings 0x0000000004ba0000 -> 0x0000000004ba0b38
Device tree struct  0x0000000004bb0000 -> 0x0000000004bc0000
Calling quiesce...
returning from prom_init
CF000012
CF000015ch
Linux ppc64le
#1 SMP Tue Mar 2
Red Hat Enterprise Linux Server 7.4 Beta (Maipo)
Kernel 3.10.0-623.el7.ppc64le on an ppc64le

localhost login: 
[root@localhost ~]# lspci
lspci
00:00.0 VGA compatible controller: Device 1234:1111 (rev 02)
00:01.0 Communication controller: Red Hat, Inc Virtio console
00:02.0 SCSI storage controller: Red Hat, Inc Virtio SCSI
00:03.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
00:04.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
01:04.0 SCSI storage controller: Red Hat, Inc Virtio block device


This bug is fixed, and change the status to verified.

Comment 13 errata-xmlrpc 2017-08-01 22:33:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2093


Note You need to log in before you can comment on or make changes to this bug.