RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1333238 - Q35 machine can not boot up successfully with more than 3 virtio-scsi storage controller under switch
Summary: Q35 machine can not boot up successfully with more than 3 virtio-scsi storage...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: ovmf
Version: 7.3
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Laszlo Ersek
QA Contact: aihua liang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-05-05 05:16 UTC by yduan
Modified: 2016-11-04 08:40 UTC (History)
12 users (show)

Fixed In Version: ovmf-20160608-1.git988715a.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-04 08:40:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Q35 machine can not boot up successfully with more than 3 virtio-scsi storage controller under switch (11.37 KB, image/png)
2016-05-05 05:16 UTC, yduan
no flags Details
3 virtio-scsi controller (97.74 KB, text/plain)
2016-05-05 16:13 UTC, yduan
no flags Details
4 virtio-scsi controller (91.34 KB, text/plain)
2016-05-05 16:14 UTC, yduan
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1271457 0 medium CLOSED seabios has no output if boot guest with more than 8 pci-bridge disks. 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 1320908 0 unspecified CLOSED [PCI bridge] Automatically generated PCI/PCI bridge spec leads seabios out of memory 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHBA-2016:2608 0 normal SHIPPED_LIVE ovmf bug fix and enhancement update 2016-11-03 15:27:02 UTC

Internal Links: 1271457 1320908

Description yduan 2016-05-05 05:16:47 UTC
Created attachment 1154094 [details]
Q35 machine can not boot up successfully with more than 3 virtio-scsi storage controller under switch

Description of problem:
Q35 machine can not boot up successfully with more than 3 virtio-scsi storage controller under switch.

Version-Release number of selected component (if applicable):
Host:
  kernel-3.10.0-382.el7.x86_64
  qemu-kvm-rhev-2.5.0-4.el7.x86_64
  OVMF-20160202-2.gitd7c0dfa.el7.noarch
Guest:
  kernel-3.10.0-382.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Start a VM using following commands:
/usr/libexec/qemu-kvm \
 -S \
 -name 'rhel7.3-64' \
 -machine q35,accel=kvm,usb=off,vmport=off \
 -drive file=/usr/share/OVMF/OVMF_CODE.fd,if=pflash,format=raw,unit=0,readonly=on \
 -drive file=/home/scsi_test2_Q35/rhev7.3_VARS.fd,if=pflash,format=raw,unit=1 \
 -m 4096 \
 -smp 4,maxcpus=4,cores=2,threads=2,sockets=1 \
 -cpu SandyBridge,enforce \
 -rtc base=localtime,clock=host,driftfix=slew \
 -nodefaults \
 -vga qxl \
 -device AC97,bus=pcie.0 \
 -chardev socket,id=qmp_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1-20151214-111528-C6FB1EaX,server,nowait \
 -mon chardev=qmp_id_qmpmonitor1,mode=control \
 -chardev socket,id=qmp_id_catch_monitor,path=/tmp/monitor-catch_monitor-20151214-111528-C6FB1EaX,server,nowait \
 -mon chardev=qmp_id_catch_monitor,mode=control \
 -device pvpanic,ioport=0x505,id=idSWJ5gV \
 -chardev socket,id=serial_id_serial0,path=/tmp/serial-serial0-20151214-111528-C6FB1EaX,server,nowait \
 -device isa-serial,chardev=serial_id_serial0 \
 -chardev socket,id=seabioslog_id_20151214-111528-C6FB1EaX,path=/tmp/seabios-20151214-111528-C6FB1EaX,server,nowait \
 -device isa-debugcon,chardev=seabioslog_id_20151214-111528-C6FB1EaX,iobase=0x402 \
 -device ich9-usb-ehci1,id=usb1,addr=1d.7,multifunction=on,bus=pcie.0 \
 -device ich9-usb-uhci1,id=usb1.0,multifunction=on,masterbus=usb1.0,addr=1d.0,firstport=0,bus=pcie.0 \
 -device ich9-usb-uhci2,id=usb1.1,multifunction=on,masterbus=usb1.0,addr=1d.2,firstport=2,bus=pcie.0 \
 -device ich9-usb-uhci3,id=usb1.2,multifunction=on,masterbus=usb1.0,addr=1d.4,firstport=4,bus=pcie.0 \
 -device usb-tablet,id=usb-tablet1 \
 -netdev tap,id=netdev0,vhost=on,script=/etc/qemu-ifup,downscript=/etc/ifdown_script \
 -device virtio-net-pci,mac=BA:BC:13:83:4F:BD,id=net0,netdev=netdev0,status=on,bus=pcie.0 \
 -device ioh3420,bus=pcie.0,id=root.0,slot=1 \
 -device x3130-upstream,bus=root.0,id=upstream1 \
 -device xio3130-downstream,bus=upstream1,id=downstream1,chassis=1 \
 -device xio3130-downstream,bus=upstream1,id=downstream2,chassis=2 \
 -device xio3130-downstream,bus=upstream1,id=downstream3,chassis=3 \
 -device xio3130-downstream,bus=upstream1,id=downstream4,chassis=4 \
 -device xio3130-downstream,bus=upstream1,id=downstream5,chassis=5 \
 -device xio3130-downstream,bus=upstream1,id=downstream6,chassis=6 \
 -device virtio-scsi-pci,bus=downstream1,id=scsi_pci_bus0 \
 -drive file=/dev/sdb,format=qcow2,id=drive_sysdisk,if=none,cache=none,aio=native,werror=stop,rerror=stop \
 -device scsi-hd,drive=drive_sysdisk,bus=scsi_pci_bus0.0,id=device_sysdisk,bootindex=0,physical_block_size=512,logical_block_size=512,serial=12345678900987654321,ver=SYSDISK,wwn=0x123,channel=0,scsi-id=0,lun=0 \
 -device virtio-scsi-pci,bus=downstream2,id=scsi_pci_bus1 \
 -device virtio-scsi-pci,bus=downstream3,id=scsi_pci_bus2 \
 -device virtio-scsi-pci,bus=downstream4,id=scsi_pci_bus3 \
 -boot menu=on \
 -enable-kvm \
 -monitor stdio \
 -spice port=5900,disable-ticketing \
 -qmp tcp:0:6666,server,nowait

Actual results:
Can not boot up VM correctly and details are as attachment.

Expected results:
Boot up VM successfully.

Additional info:
Not reproducible with following commands:
1.Start a VM using following commands:
/usr/libexec/qemu-kvm \
 -S \
 -name 'rhel7.3-64' \
 -machine q35,accel=kvm,usb=off,vmport=off \
 -drive file=/usr/share/OVMF/OVMF_CODE.fd,if=pflash,format=raw,unit=0,readonly=on \
 -drive file=/home/scsi_test2_Q35/rhev7.3_VARS.fd,if=pflash,format=raw,unit=1 \
 -m 4096 \
 -smp 4,maxcpus=4,cores=2,threads=2,sockets=1 \
 -cpu SandyBridge,enforce \
 -rtc base=localtime,clock=host,driftfix=slew \
 -nodefaults \
 -vga qxl \
 -device AC97,bus=pcie.0 \
 -chardev socket,id=qmp_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1-20151214-111528-C6FB1EaX,server,nowait \
 -mon chardev=qmp_id_qmpmonitor1,mode=control \
 -chardev socket,id=qmp_id_catch_monitor,path=/tmp/monitor-catch_monitor-20151214-111528-C6FB1EaX,server,nowait \
 -mon chardev=qmp_id_catch_monitor,mode=control \
 -device pvpanic,ioport=0x505,id=idSWJ5gV \
 -chardev socket,id=serial_id_serial0,path=/tmp/serial-serial0-20151214-111528-C6FB1EaX,server,nowait \
 -device isa-serial,chardev=serial_id_serial0 \
 -chardev socket,id=seabioslog_id_20151214-111528-C6FB1EaX,path=/tmp/seabios-20151214-111528-C6FB1EaX,server,nowait \
 -device isa-debugcon,chardev=seabioslog_id_20151214-111528-C6FB1EaX,iobase=0x402 \
 -device ich9-usb-ehci1,id=usb1,addr=1d.7,multifunction=on,bus=pcie.0 \
 -device ich9-usb-uhci1,id=usb1.0,multifunction=on,masterbus=usb1.0,addr=1d.0,firstport=0,bus=pcie.0 \
 -device ich9-usb-uhci2,id=usb1.1,multifunction=on,masterbus=usb1.0,addr=1d.2,firstport=2,bus=pcie.0 \
 -device ich9-usb-uhci3,id=usb1.2,multifunction=on,masterbus=usb1.0,addr=1d.4,firstport=4,bus=pcie.0 \
 -device usb-tablet,id=usb-tablet1 \
 -netdev tap,id=netdev0,vhost=on,script=/etc/qemu-ifup,downscript=/etc/ifdown_script \
 -device virtio-net-pci,mac=BA:BC:13:83:4F:BD,id=net0,netdev=netdev0,status=on,bus=pcie.0 \
 -device ioh3420,bus=pcie.0,id=root.0,slot=1 \
 -device x3130-upstream,bus=root.0,id=upstream1 \
 -device xio3130-downstream,bus=upstream1,id=downstream1,chassis=1 \
 -device xio3130-downstream,bus=upstream1,id=downstream2,chassis=2 \
 -device xio3130-downstream,bus=upstream1,id=downstream3,chassis=3 \
 -device xio3130-downstream,bus=upstream1,id=downstream4,chassis=4 \
 -device xio3130-downstream,bus=upstream1,id=downstream5,chassis=5 \
 -device xio3130-downstream,bus=upstream1,id=downstream6,chassis=6 \
 -device virtio-scsi-pci,bus=downstream1,id=scsi_pci_bus0 \
 -drive file=/dev/sdb,format=qcow2,id=drive_sysdisk,if=none,cache=none,aio=native,werror=stop,rerror=stop \
 -device scsi-hd,drive=drive_sysdisk,bus=scsi_pci_bus0.0,id=device_sysdisk,bootindex=0,physical_block_size=512,logical_block_size=512,serial=12345678900987654321,ver=SYSDISK,wwn=0x123,channel=0,scsi-id=0,lun=0 \
 -device virtio-scsi-pci,bus=downstream2,id=scsi_pci_bus1 \
 -device virtio-scsi-pci,bus=downstream3,id=scsi_pci_bus2 \
 -boot menu=on \
 -enable-kvm \
 -monitor stdio \
 -spice port=5900,disable-ticketing \
 -qmp tcp:0:6666,server,nowait

Comment 1 jingzhao 2016-05-05 05:32:51 UTC
Tried with above steps and didn't reproduced it 
seabios-1.9.1-3.el7.x86_64
qemu-kvm-rhev-2.5.0-4.el7.x86_64 
kernel-3.10.0-382.el7.x86_64 

Following is the qemu command:
/usr/libexec/qemu-kvm \
-M q35 \
-cpu SandyBridge \
-nodefaults -rtc base=utc \
-m 4G \
-smp 2,sockets=2,cores=1,threads=1 \
-enable-kvm \
-name rhel7 \
-uuid 990ea161-6b67-47b2-b803-19fb01d30d12 \
-smbios type=1,manufacturer='Red Hat',product='RHEV Hypervisor',version=el6,serial=koTUXQrb,uuid=feebc8fd-f8b0-4e75-abc3-e63fcdb67170 \
-k en-us \
-serial unix:/tmp/serial0,server,nowait \
-boot menu=on \
-bios /usr/share/seabios/bios.bin \
-vga std \
-vnc :0 \
-qmp tcp:0:6666,server,nowait \
-chardev file,path=/home/seabios.log,id=seabios \
-device isa-debugcon,chardev=seabios,iobase=0x402 \
-device ioh3420,bus=pcie.0,id=root.0,slot=1 \
-device x3130-upstream,bus=root.0,id=upstream1 \
-device xio3130-downstream,bus=upstream1,id=downstream1,chassis=1 \
-device xio3130-downstream,bus=upstream1,id=downstream2,chassis=2 \
-device xio3130-downstream,bus=upstream1,id=downstream3,chassis=3 \
-device xio3130-downstream,bus=upstream1,id=downstream4,chassis=4 \
-device virtio-scsi-pci,bus=downstream1,id=scsi_pci_bus0 \
-device virtio-scsi-pci,bus=downstream2,id=scsi_pci_bus1 \
-drive file=/home/rhel.img,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,werror=stop,rerror=stop,aio=threads \
-device scsi-hd,bus=scsi_pci_bus1.0,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=0 \
-device virtio-scsi-pci,bus=downstream3,id=scsi_pci_bus2 \
-device virtio-scsi-pci,bus=downstream4,id=scsi_pci_bus3 \
-monitor stdio \

Comment 2 yduan 2016-05-05 06:32:34 UTC
(In reply to yduan from comment #0)
> Created attachment 1154094 [details]
> Q35 machine can not boot up successfully with more than 3 virtio-scsi
> storage controller under switch
> 
> Description of problem:
> Q35 machine can not boot up successfully with more than 3 virtio-scsi
> storage controller under switch.
> 
> Version-Release number of selected component (if applicable):
> Host:
>   kernel-3.10.0-382.el7.x86_64
>   qemu-kvm-rhev-2.5.0-4.el7.x86_64
>   OVMF-20160202-2.gitd7c0dfa.el7.noarch
> Guest:
>   kernel-3.10.0-382.el7.x86_64
> 
> How reproducible:
> 100%
> 
> Steps to Reproduce:
> 1.Start a VM using following commands:
> /usr/libexec/qemu-kvm \
>  -S \
>  -name 'rhel7.3-64' \
>  -machine q35,accel=kvm,usb=off,vmport=off \
>  -drive
> file=/usr/share/OVMF/OVMF_CODE.fd,if=pflash,format=raw,unit=0,readonly=on \
>  -drive
> file=/home/scsi_test2_Q35/rhev7.3_VARS.fd,if=pflash,format=raw,unit=1 \
>  -m 4096 \
>  -smp 4,maxcpus=4,cores=2,threads=2,sockets=1 \
>  -cpu SandyBridge,enforce \
>  -rtc base=localtime,clock=host,driftfix=slew \
>  -nodefaults \
>  -vga qxl \
>  -device AC97,bus=pcie.0 \
>  -chardev
> socket,id=qmp_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1-20151214-111528-
> C6FB1EaX,server,nowait \
>  -mon chardev=qmp_id_qmpmonitor1,mode=control \
>  -chardev
> socket,id=qmp_id_catch_monitor,path=/tmp/monitor-catch_monitor-20151214-
> 111528-C6FB1EaX,server,nowait \
>  -mon chardev=qmp_id_catch_monitor,mode=control \
>  -device pvpanic,ioport=0x505,id=idSWJ5gV \
>  -chardev
> socket,id=serial_id_serial0,path=/tmp/serial-serial0-20151214-111528-
> C6FB1EaX,server,nowait \
>  -device isa-serial,chardev=serial_id_serial0 \
>  -chardev
> socket,id=seabioslog_id_20151214-111528-C6FB1EaX,path=/tmp/seabios-20151214-
> 111528-C6FB1EaX,server,nowait \
>  -device
> isa-debugcon,chardev=seabioslog_id_20151214-111528-C6FB1EaX,iobase=0x402 \
>  -device ich9-usb-ehci1,id=usb1,addr=1d.7,multifunction=on,bus=pcie.0 \
>  -device
> ich9-usb-uhci1,id=usb1.0,multifunction=on,masterbus=usb1.0,addr=1d.0,
> firstport=0,bus=pcie.0 \
>  -device
> ich9-usb-uhci2,id=usb1.1,multifunction=on,masterbus=usb1.0,addr=1d.2,
> firstport=2,bus=pcie.0 \
>  -device
> ich9-usb-uhci3,id=usb1.2,multifunction=on,masterbus=usb1.0,addr=1d.4,
> firstport=4,bus=pcie.0 \
>  -device usb-tablet,id=usb-tablet1 \
>  -netdev
> tap,id=netdev0,vhost=on,script=/etc/qemu-ifup,downscript=/etc/ifdown_script \
>  -device
> virtio-net-pci,mac=BA:BC:13:83:4F:BD,id=net0,netdev=netdev0,status=on,
> bus=pcie.0 \
>  -device ioh3420,bus=pcie.0,id=root.0,slot=1 \
>  -device x3130-upstream,bus=root.0,id=upstream1 \
>  -device xio3130-downstream,bus=upstream1,id=downstream1,chassis=1 \
>  -device xio3130-downstream,bus=upstream1,id=downstream2,chassis=2 \
>  -device xio3130-downstream,bus=upstream1,id=downstream3,chassis=3 \
>  -device xio3130-downstream,bus=upstream1,id=downstream4,chassis=4 \
>  -device xio3130-downstream,bus=upstream1,id=downstream5,chassis=5 \
>  -device xio3130-downstream,bus=upstream1,id=downstream6,chassis=6 \
>  -device virtio-scsi-pci,bus=downstream1,id=scsi_pci_bus0 \
>  -drive
> file=/dev/sdb,format=qcow2,id=drive_sysdisk,if=none,cache=none,aio=native,
> werror=stop,rerror=stop \
>  -device
> scsi-hd,drive=drive_sysdisk,bus=scsi_pci_bus0.0,id=device_sysdisk,
> bootindex=0,physical_block_size=512,logical_block_size=512,
> serial=12345678900987654321,ver=SYSDISK,wwn=0x123,channel=0,scsi-id=0,lun=0 \
>  -device virtio-scsi-pci,bus=downstream2,id=scsi_pci_bus1 \
>  -device virtio-scsi-pci,bus=downstream3,id=scsi_pci_bus2 \
>  -device virtio-scsi-pci,bus=downstream4,id=scsi_pci_bus3 \
>  -boot menu=on \
>  -enable-kvm \
>  -monitor stdio \
>  -spice port=5900,disable-ticketing \
>  -qmp tcp:0:6666,server,nowait
> 
> Actual results:
> Can not boot up VM correctly and details are as attachment.
> 
> Expected results:
> Boot up VM successfully.
> 
> Additional info:
> Not reproducible with following commands:
> 1.Start a VM using following commands:
> /usr/libexec/qemu-kvm \
>  -S \
>  -name 'rhel7.3-64' \
>  -machine q35,accel=kvm,usb=off,vmport=off \
>  -drive
> file=/usr/share/OVMF/OVMF_CODE.fd,if=pflash,format=raw,unit=0,readonly=on \
>  -drive
> file=/home/scsi_test2_Q35/rhev7.3_VARS.fd,if=pflash,format=raw,unit=1 \
>  -m 4096 \
>  -smp 4,maxcpus=4,cores=2,threads=2,sockets=1 \
>  -cpu SandyBridge,enforce \
>  -rtc base=localtime,clock=host,driftfix=slew \
>  -nodefaults \
>  -vga qxl \
>  -device AC97,bus=pcie.0 \
>  -chardev
> socket,id=qmp_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1-20151214-111528-
> C6FB1EaX,server,nowait \
>  -mon chardev=qmp_id_qmpmonitor1,mode=control \
>  -chardev
> socket,id=qmp_id_catch_monitor,path=/tmp/monitor-catch_monitor-20151214-
> 111528-C6FB1EaX,server,nowait \
>  -mon chardev=qmp_id_catch_monitor,mode=control \
>  -device pvpanic,ioport=0x505,id=idSWJ5gV \
>  -chardev
> socket,id=serial_id_serial0,path=/tmp/serial-serial0-20151214-111528-
> C6FB1EaX,server,nowait \
>  -device isa-serial,chardev=serial_id_serial0 \
>  -chardev
> socket,id=seabioslog_id_20151214-111528-C6FB1EaX,path=/tmp/seabios-20151214-
> 111528-C6FB1EaX,server,nowait \
>  -device
> isa-debugcon,chardev=seabioslog_id_20151214-111528-C6FB1EaX,iobase=0x402 \
>  -device ich9-usb-ehci1,id=usb1,addr=1d.7,multifunction=on,bus=pcie.0 \
>  -device
> ich9-usb-uhci1,id=usb1.0,multifunction=on,masterbus=usb1.0,addr=1d.0,
> firstport=0,bus=pcie.0 \
>  -device
> ich9-usb-uhci2,id=usb1.1,multifunction=on,masterbus=usb1.0,addr=1d.2,
> firstport=2,bus=pcie.0 \
>  -device
> ich9-usb-uhci3,id=usb1.2,multifunction=on,masterbus=usb1.0,addr=1d.4,
> firstport=4,bus=pcie.0 \
>  -device usb-tablet,id=usb-tablet1 \
>  -netdev
> tap,id=netdev0,vhost=on,script=/etc/qemu-ifup,downscript=/etc/ifdown_script \
>  -device
> virtio-net-pci,mac=BA:BC:13:83:4F:BD,id=net0,netdev=netdev0,status=on,
> bus=pcie.0 \
>  -device ioh3420,bus=pcie.0,id=root.0,slot=1 \
>  -device x3130-upstream,bus=root.0,id=upstream1 \
>  -device xio3130-downstream,bus=upstream1,id=downstream1,chassis=1 \
>  -device xio3130-downstream,bus=upstream1,id=downstream2,chassis=2 \
>  -device xio3130-downstream,bus=upstream1,id=downstream3,chassis=3 \
>  -device xio3130-downstream,bus=upstream1,id=downstream4,chassis=4 \
>  -device xio3130-downstream,bus=upstream1,id=downstream5,chassis=5 \
>  -device xio3130-downstream,bus=upstream1,id=downstream6,chassis=6 \
>  -device virtio-scsi-pci,bus=downstream1,id=scsi_pci_bus0 \
>  -drive
> file=/dev/sdb,format=qcow2,id=drive_sysdisk,if=none,cache=none,aio=native,
> werror=stop,rerror=stop \
>  -device
> scsi-hd,drive=drive_sysdisk,bus=scsi_pci_bus0.0,id=device_sysdisk,
> bootindex=0,physical_block_size=512,logical_block_size=512,
> serial=12345678900987654321,ver=SYSDISK,wwn=0x123,channel=0,scsi-id=0,lun=0 \
>  -device virtio-scsi-pci,bus=downstream2,id=scsi_pci_bus1 \
>  -device virtio-scsi-pci,bus=downstream3,id=scsi_pci_bus2 \
>  -boot menu=on \
>  -enable-kvm \
>  -monitor stdio \
>  -spice port=5900,disable-ticketing \
>  -qmp tcp:0:6666,server,nowait

Highlight:
The difference between two qemu-kvm command line is the number of virtio-scsi controller under switch "-device virtio-scsi-pci,...".
One is 4 and another is 3.

Comment 3 Laszlo Ersek 2016-05-05 07:51:11 UTC
Yanbin Duan,

thanks for pointing out the difference between the two command lines; that's helpful. Can you please attach the OVMF debug log for both cases (successful and failed)?

My suspicion is the following: OVMF currently provides an IO port space with size 0x4000 for PCI resource assignment (from 0xC000 upwards). AFAIK each bridge / downstream port requires 0x1000 ports for devices behind it. In the failing case, I think the IO port space might be exhausted, and one of the bridges (maybe the one where your boot disk is) cannot be enumerated successfully.

If that's the case (which should be possible to ascertain from the OVMF logs), then this issue is similar to bug 1271457. CC'ing Marcel and Alex.

What number of downstream ports / bridges / virtio-scsi HBAs would be considered acceptable? I'm unsure how much I can lower the IO space start address from 0xC000.

Thanks.

Comment 5 yduan 2016-05-05 16:13:28 UTC
Created attachment 1154297 [details]
3 virtio-scsi controller

Comment 6 yduan 2016-05-05 16:14:20 UTC
Created attachment 1154298 [details]
4 virtio-scsi controller

Comment 7 yduan 2016-05-05 16:19:26 UTC
(In reply to Laszlo Ersek from comment #3)
> Yanbin Duan,
> 
> thanks for pointing out the difference between the two command lines; that's
> helpful. Can you please attach the OVMF debug log for both cases (successful
> and failed)?
> 
> My suspicion is the following: OVMF currently provides an IO port space with
> size 0x4000 for PCI resource assignment (from 0xC000 upwards). AFAIK each
> bridge / downstream port requires 0x1000 ports for devices behind it. In the
> failing case, I think the IO port space might be exhausted, and one of the
> bridges (maybe the one where your boot disk is) cannot be enumerated
> successfully.
> 
> If that's the case (which should be possible to ascertain from the OVMF
> logs), then this issue is similar to bug 1271457. CC'ing Marcel and Alex.
> 
> What number of downstream ports / bridges / virtio-scsi HBAs would be
> considered acceptable? I'm unsure how much I can lower the IO space start
> address from 0xC000.
> 
> Thanks.

OVMF debug log with 3/4 virtio-scsi controller as attachments.

Thanks.

Comment 8 Laszlo Ersek 2016-05-06 09:44:04 UTC
Thanks for the logs. It is exactly what I suspected:

- The root bus gets an IO port aperture of size 0x1000, starting at 0xF000,
  for the IO BARs of the devices that are directly plugged into it.

- The upstream port called "upstream1" (device type "x3130-upstream") gets
  an aperture of size 0x3000, starting at 0xC000.

- Three downstream ports (type "xio3130-downstream", names "downstream1"
  through "downstream3") each get an aperture of size 0x1000, carved out of
  the above 0x3000, at 0xC000, 0xD000, and 0xE000, respectively.

- The rest of the downstream ports (type "xio3130-downstream", names
  "downstream4" through "downstream6") don't get anything, because there are
  no devices plugged into them.

- When a device (in this case: a virtio-scsi-pci device) is plugged into
  "downstream4", then "downstream4" too requires another 0x1000 aperture.
  But, since OVMF makes only 0xC000..0xFFFF (inclusive) available for PCI IO
  BAR allocation at the moment, this resource request fails.

- In turn, the edk2 PCI Bus driver identifies one of the devices *not*
  satisfying which causes the least damage, and leaves it unprogrammed /
  without resources. In particular, the victim seems to be the
  virtio-scsi-pci HBA with exactly the scsi-hd disk that we'd like to boot
  off of, so we land in the UEFI shell.

Enlarging the IO range that OVMF offers for PCI IO BAR allocation is
technically trivial (it requires changing one constant, from 0xC000 to
something lower), but, as I asked in comment 3, what is the number of
bridges / downstream ports that would be deemed acceptable?

Say, if I lower it to 0xB000, then four bridges / downstream ports are going
to work. Will someone file a BZ the day after, for five bridges / ports?

NB, the specific example in this BZ is quite pathologic -- it makes no sense
to me to create four virtio-scsi-pci HBAs, plus plug them into separate
downstream ports. What someone needs are "leaf devices" (disks, CD-ROMs),
and those can already be plugged into a single virtio-scsi-HBA aplenty.

So, if this is just a synthetic use case, then I'm prone to close it as
NOTABUG.

If there is a business justification, with an *actual number*, I could look
into that number. (In which case I'd have to figure out how low the IO range
base address can go safely, from 0xC000).

Comment 9 Karen Noel 2016-05-06 16:46:18 UTC
Adding Laine.

I think the use case we should try to cover is one pcie bus per numa node, assuming a single guest can be as wide as the host, same number of numa nodes and same number of vcpus as host threads. Thanks.

Comment 10 Laine Stump 2016-05-06 17:02:46 UTC
This isn't about having 5 (or whatever) buses that can each accept 31 endpoint devices (as is the case with pci-bridge on 440FX), The problem is that each downstream port can only accept a single endpoint device. Comparing this to the kinds of tests done for 440FX machines (where they plug in multiple hundreds of disk devices), limiting to 4 or 5 devices seems, well, *limiting*. The config in this test case seems really quite basic - 5 ports, so a mac of 5 endpoint devices, which is really nothing.

An upstream port has 32 ports to plug in downstream ports. How can all those ports be used (in real hardware even) if the IO address space is exhausted so quickly? What can we do to scale up to the numbers of devices that people are accustomed to with 440FX-based virtual machines? (or am I missing some unusual limiting factor in this particular case that won't be an issue in general?)

Comment 11 Laine Stump 2016-05-06 17:15:25 UTC
Is the problem here that the virtio-scsi devices are legacy PCI devices rather than PCIe? Would it behave more reasonably with "disable_modern=off,disable_legacy=on" (or is it "disable-modern=off,disable-legacy=on"?) for the virtio-scsi devices?

Comment 12 Alex Williamson 2016-05-06 17:34:49 UTC
(In reply to Karen Noel from comment #9)
> I think the use case we should try to cover is one pcie bus per numa node,
> assuming a single guest can be as wide as the host, same number of numa
> nodes and same number of vcpus as host threads. Thanks.

While a lot of the "bus" terminology from conventional PCI carries over to PCIe, PCIe is a point-to-point interconnect, there really is no bus, so it's not clear what a PCIe bus per NUMA node really means.  Does this mean just the host bridge itself?  An individual PCIe root port per host bridge only allows us a single PCI slot.  If we incorporate a PCIe switch, each downstream port is a bridge, leading to issues like we see here.

Part of the problem is the mismatch between virtio devices using I/O port space, which has (had?) performance advantages in an x86 VM vs MMIO space, but requires use of a very limited resource.  Real hardware has moved away from I/O port space, partially for this reason and partially because it simply doesn't exist on many architectures.  Note for instance that SR-IOV VFs don't even support I/O port space.

There are ways to get more I/O port space, for instance we could create a sparse mapping per host bridge that translates to I/O port space on the PCI "bus".  This is how ia64 supported multiple I/O port spaces.  It's not supported by all guests though and requires some convoluted ACPI to describe, so we'd need to figure out whether that's worthwhile.  We might also consider only allocating I/O port resources on the primary host bridge, but I'm not sure what that does to the supportability of virtio (or QEMU's collection of emulated devices for that matter).

The most direct approach is probably as Laszlo suggests, to figure out what assumptions lead us to start allocating device I/O port space at 0xc000 and figure out how much more space we can easily give ourselves.  It's really only a stop-gap solution though until we move away from devices that require I/O port space or at least allow the system firmware to fail gracefully in allocating I/O port space for devices.

Comment 13 Laszlo Ersek 2016-05-06 18:14:55 UTC
Great comments, thank you guys.

In random order:

* At least in the case of OVMF, the firmware does fail gracefully when it
  cannot satisfy this or that IO BAR request. I don't remember the details
  without looking, but the PCI Bus driver has logic to minimize the impact
  of such an allocation failure, and (IIRC) it tries to exclude the bridge
  with the fewest devices on it, or some such. Either way, the boot progress
  doesn't stop; it continues with those devices that could be enumerated.

  In this specific case, the boot wasn't successful ultimately because
  exactly the one device with the bootable system got excluded (even though
  the exhaustion was triggered by another device -- I guess when there is no
  clear loser for the "Out of Resource Killer", it is unspecified which one
  is picked from the candidates).

* I found an interesting comment in edk2 today. It is related to a boolean
  config knob that allows the PCI bus driver to probe bridges for IO port
  space granularities smaller than 4K. It is called
  "PcdPciBridgeIoAlignmentProbe", and the comments go like:

In "MdeModulePkg/MdeModulePkg.dec":
> Indicates if the PciBus driver probes non-standard, such as 2K/1K/512,
> granularity for PCI to PCI bridge I/O window.

In "MdeModulePkg/Bus/Pci/PciBusDxe/PciLib.c":
> If non-stardard PCI Bridge I/O window alignment is supported, set I/O
> aligment to minimum possible alignment for root bridge.

In "MdeModulePkg/Bus/Pci/PciBusDxe/PciEnumeratorSupport.c":
> if PcdPciBridgeIoAlignmentProbe is TRUE, PCI bus driver probes PCI bridge
> supporting non-stardard I/O window alignment less than 4K.

and

> Check any bits of bit 3-1 of I/O Base Register are writable. if so, it is
> assumed non-stardard I/O window alignment is supported by this bridge. Per
> spec, bit 3-1 of I/O Base Register are reserved bits, so its content can't
> be assumed.

  The setting defaults to FALSE. Obviously, I tested things with TRUE as
  well. Interestingly (and somewhat disappointingly), while the root bridge
  was affected by this setting (i.e., it only allocated 512 (0x200) bytes IO
  port space for itself, rather than the original 0x1000 (4096)), the
  bridges on the downstream ports were *not* affected -- they insisted on
  their 0x1000 chunks.

  I attribute this to QEMU's host bridge models supporting smaller-than-4k
  IO window alignments, and the non-host bridge models not supporting the
  same. I guess we could patch this in QEMU, technically, but it could be a
  very hard sell, considering it is non-standard.

* "disable-modern=off,disable-legacy=on" should indeed solve this (thanks
  Laine). I remember (from working on bug 1257882) that with these
  properties set, the virtio devices need no IO ports. And, the PCI Bus
  driver in edk2 apparently only tries to allocate an IO port window for a
  bridge if there are devices on the bridge that actually need IO ports.

  So, I recommend that this test case be retried with
  "disable-modern=off,disable-legacy=on" for all virtio devices, and that we
  ignore the size of the IO port space for now. After all, according to
  Alex, "Real hardware has moved away from I/O port space" too.

  Bug 1257882 is fixed by the latest (pending) OVMF rebase,
  "ovmf-20160419-1.git90bb4c5.el7". Unfortunately, that rebase is blocked by
  bug 1329559. Thus, I can't ask Yanbin Duan to retest with
  "disable-modern=off,disable-legacy=on" using an official OVMF build right
  now, but I can provide my scratch build of said rebase. Please see the
  link in the next comment.

  (I always archive my scratch builds from Brew, until the related RHBZ
  transitions to VERIFIED.)

Comment 15 Laine Stump 2016-05-06 18:25:07 UTC
Hmm, Adding those options may or may not lead to a usable system. I learned about them in Bug 1330024, and just noticed that Bug 1330024 Comment 10 (the paragraph starting with "Michael says there is a bug") say there is a problem with those settings and block devices that won't be fixed until qemu 2.7.

Comment 16 Laszlo Ersek 2016-05-06 20:32:07 UTC
Sigh.

I'm curious what that virtio-block problem is, but anyway, if it's going to
be fixed in 2.7, then that's what we have to work with.

So, I tried to look into lowering the base from 0xC000 -- just on the source
code level. First of all, I fired up my long-term OVMF Fedora Q35 guest
(libvirt of course), and issued

  virsh qemu-monitor-command ovmf.fedora.q35 --hmp info mtree

Here's the interesting part:

> address-space: I/O
>   0000000000000000-000000000000ffff (prio 0, RW): io
>     0000000000000000-0000000000000007 (prio 0, RW): dma-chan
>     0000000000000008-000000000000000f (prio 0, RW): dma-cont
>     0000000000000020-0000000000000021 (prio 0, RW): kvm-pic
>     0000000000000040-0000000000000043 (prio 0, RW): kvm-pit
>     0000000000000060-0000000000000060 (prio 0, RW): i8042-data
>     0000000000000061-0000000000000061 (prio 0, RW): pcspk
>     0000000000000064-0000000000000064 (prio 0, RW): i8042-cmd
>     0000000000000070-0000000000000071 (prio 0, RW): rtc
>     000000000000007e-000000000000007f (prio 0, RW): kvmvapic
>     0000000000000080-0000000000000080 (prio 0, RW): ioport80
>     0000000000000081-0000000000000083 (prio 0, RW): dma-page
>     0000000000000087-0000000000000087 (prio 0, RW): dma-page
>     0000000000000089-000000000000008b (prio 0, RW): dma-page
>     000000000000008f-000000000000008f (prio 0, RW): dma-page
>     0000000000000092-0000000000000092 (prio 0, RW): port92
>     00000000000000a0-00000000000000a1 (prio 0, RW): kvm-pic
>     00000000000000b2-00000000000000b3 (prio 0, RW): apm-io
>     00000000000000c0-00000000000000cf (prio 0, RW): dma-chan
>     00000000000000d0-00000000000000df (prio 0, RW): dma-cont
>     00000000000000f0-00000000000000f0 (prio 0, RW): ioportF0
>     00000000000001ce-00000000000001d1 (prio 0, RW): vbe
>     00000000000003b4-00000000000003b5 (prio 0, RW): vga
>     00000000000003ba-00000000000003ba (prio 0, RW): vga
>     00000000000003c0-00000000000003cf (prio 0, RW): vga
>     00000000000003d4-00000000000003d5 (prio 0, RW): vga
>     00000000000003da-00000000000003da (prio 0, RW): vga
>     00000000000003f8-00000000000003ff (prio 0, RW): serial
>     0000000000000402-0000000000000402 (prio 0, RW): isa-debugcon
>     00000000000004d0-00000000000004d0 (prio 0, RW): kvm-elcr
>     00000000000004d1-00000000000004d1 (prio 0, RW): kvm-elcr
>     0000000000000510-0000000000000511 (prio 0, RW): fwcfg
>     0000000000000514-000000000000051b (prio 0, RW): fwcfg.dma
>     0000000000000cd8-0000000000000cf7 (prio 0, RW): acpi-cpu-hotplug
>     0000000000000cf8-0000000000000cfb (prio 0, RW): pci-conf-idx
>     0000000000000cf9-0000000000000cf9 (prio 1, RW): lpc-reset-control
>     0000000000000cfc-0000000000000cff (prio 0, RW): pci-conf-data
>     0000000000005658-0000000000005658 (prio 0, RW): vmport
>     000000000000b000-000000000000b07f (prio 0, RW): ich9-pm
>       000000000000b000-000000000000b003 (prio 0, RW): acpi-evt
>       000000000000b004-000000000000b005 (prio 0, RW): acpi-cnt
>       000000000000b008-000000000000b00b (prio 0, RW): acpi-tmr
>       000000000000b020-000000000000b02f (prio 0, RW): acpi-gpe0
>       000000000000b030-000000000000b037 (prio 0, RW): acpi-smi
>       000000000000b060-000000000000b07f (prio 0, RW): sm-tco
>     000000000000c000-000000000000cfff (prio 1, RW): alias pci_bridge_io
>                                                     @pci_bridge_io
>                                          000000000000c000-000000000000cfff
>     000000000000d000-000000000000d03f (prio 1, RW): pm-smbus
>     000000000000d040-000000000000d05f (prio 1, RW): ahci-idp
>     000000000000d060-000000000000d07f (prio 1, RW): qxl-ioports

Observations:

- Most of the mess is in the first 0x1000 bytes

- Let me return later to vmport (at 0x5658)

- The ich9-pm register block is at 0xB000 (= PMBASE), directly under 0xC000.
  This is not PCI related, but it is nonetheless programmed by OVMF, and can
  be changed.

- "pci_bridge_io" at 0xC000, "pm-smbus", "ahci-idp" and "qxl-ioports" are
  all PCI resources (bridge window and individual IO BARs), allocated by the
  edk2 PCI Bus driver.

We can also look at pci_bridge_io" more closely:

> memory-region: pci_bridge_io
>   0000000000000000-000000000000ffff (prio 0, RW): pci_bridge_io
>     000000000000c000-000000000000c03f (prio 1, RW): virtio-pci
>     000000000000c040-000000000000c07f (prio 1, RW): virtio-pci
>     000000000000c080-000000000000c09f (prio 1, RW): virtio-pci
>     000000000000c0a0-000000000000c0bf (prio 1, RW): virtio-pci
>     000000000000c0c0-000000000000c0df (prio 1, RW): uhci
>     000000000000c0e0-000000000000c0ff (prio 1, RW): uhci
>     000000000000c100-000000000000c11f (prio 1, RW): uhci
>     000000000000c120-000000000000c13f (prio 1, RW): virtio-pci

Again, the result of the PCI resource allocation. I can directly correlate
the PCI BARs with the edk2 enumeration debug log.


Okay then, let's look at this PMBASE thing. Clearly, the first thing to
check is SeaBIOS -- what does SeaBIOS do? The following:

  https://code.coreboot.org/p/seabios/source/commit/a217de932969
  https://code.coreboot.org/p/seabios/source/commit/7eac0c4e

(Adjacent commits; see the mailing list discussion at:
  http://www.seabios.org/pipermail/seabios/2014-May/008040.html
  http://www.seabios.org/pipermail/seabios/2014-May/008038.html
  http://www.seabios.org/pipermail/seabios/2014-May/008041.html
  http://www.seabios.org/pipermail/seabios/2014-May/008039.html
  http://www.seabios.org/pipermail/seabios/2014-May/008045.html
  http://www.seabios.org/pipermail/seabios/2014-May/008044.html
  http://www.seabios.org/pipermail/seabios/2014-May/008043.html
)

In other words, SeaBIOS used to place the PMBASE at fixed 0xB000 (just like
OVMF does to this day) and the IO BARs at 0xC000. (Search "src/fw/pciinit.c
" for the string "traditionally used for pci io".)

However, once QEMU's ACPI table generator was able to advertise a
dynamically programmed PMBASE, SeaBIOS lowered the PMBASE to 0x0600, and
started to use the following ranges for IO BAR allocation:

- if the IO BARs don't need more than 4K IO ports in total, then keep them
  above 0xC000,
- otherwise:
  - on Q35: place them in [0x1000, 0x1_0000)
  - on PIIX4: place them [0x1000, 0xA000)

These SeaBIOS commits are from May 2014.

This is when we come to vmport. In QEMU, "vmport" is a PC machine type
property, of type OnOffAuto. The pc_q35_init() function looks at this knob,
and unless the user set it to OFF or ON, it is auto-set to ON if we're not
running on Xen.

Then, we have:

  pc_q35_init()
    pc_basic_device_init()
      vmport_init()
        isa_create_simple("vmport")
          ...
            vmport_realizefn()
              isa_register_ioport(isadev, &s->io, 0x5658);

That's right, the vmport device registers a fixed single-byte IO port within
the IO address space that is granted to the firmware, for PCI BAR allocation
purposes. This fixed IO port seems to date back to commit 548df2acc6fcd,
from the year 2007.

So, I think that (a) SeaBIOS's current "greedy" IO BAR range is not correct,
(b) in OVMF I can try to program the PMBASE at 0x600 (same as SeaBIOS), and
lower the base of the PCI IO area to 0x6000. That should be good enough for
10 bridges (0x6000 through 0xF000), including root bridges, without
overlapping the "vmport" device's IO port.

Comment 17 Laszlo Ersek 2016-05-06 20:39:11 UTC
(In reply to Laszlo Ersek from comment #16)

> - if the IO BARs don't need more than 4K IO ports in total, then keep them
>   above 0xC000,

sorry, that should be "more than 0x4000 total" -- 16K ports

Comment 18 Alex Williamson 2016-05-06 21:26:51 UTC
(In reply to Laszlo Ersek from comment #16)
> 
> That's right, the vmport device registers a fixed single-byte IO port within
> the IO address space that is granted to the firmware, for PCI BAR allocation
> purposes. This fixed IO port seems to date back to commit 548df2acc6fcd,
> from the year 2007.


It's fixed per the VMware spec:

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1009458

1009458 is the document number in case that link breaks.

Interesting to note how the x86 IN instruction only modifies EAX while if the port is probed on a vmware compatible hypervisor, EAX through EDX are modified.  I wonder if this is why I have vfio-users reporting that they need to disable vmport in some cases, if PCI re-used that address it's not just a conflict, unexpected registers are getting changed that might not be scratch registers.  Nice analysis Laszlo.

Comment 19 Laszlo Ersek 2016-05-06 21:31:14 UTC
On PIIX4, the situation is worse:

>     0000000000000cfc-0000000000000cff (prio 0, RW): pci-conf-data
>     0000000000005658-0000000000005658 (prio 0, RW): vmport
>     000000000000ae00-000000000000ae13 (prio 0, RW): acpi-pci-hotplug
>     000000000000af00-000000000000af1f (prio 0, RW): acpi-cpu-hotplug
>     000000000000afe0-000000000000afe3 (prio 0, RW): acpi-gpe0
>     000000000000b000-000000000000b03f (prio 0, RW): piix4-pm
>       000000000000b000-000000000000b003 (prio 0, RW): acpi-evt
>       000000000000b004-000000000000b005 (prio 0, RW): acpi-cnt
>       000000000000b008-000000000000b00b (prio 0, RW): acpi-tmr
>     000000000000b100-000000000000b13f (prio 0, RW): pm-smbus
>     000000000000c000-000000000000c03f (prio 1, RW): virtio-pci

The acpi-* port ranges, in [0xA000..0xAFFF] are practically unmovable
(SeaBIOS steers clear of them too).

The piix4-pm block is the same old PMBASE story, so that's fine.

The "pm-smbus" device is different on i440fx than on ich9. On i440fx, it is
not a PCI BAR, but a separately programmable resource. OVMF doesn't care
about it at the moment, at all, and the QEMU default location is 0xB100.
(See the piix4_pm_init() call in "hw/i386/pc_piix.c".)

So, by reprogramming PMBASE to 0x0600, and moving pm-smbus to 0x0700 (I
guess... that's what SeaBIOS commit a217de932969 does), I could free up
another 0x1000 ports, at 0xB000. But the stuff at 0xA000 is unmovable, so 
the following blocks would result:

[0x1000..0x4FFF]: 4 blocks (room for 4 bridges)
[0x6000..0x9000]: 4 blocks (room for 4 bridges)
[0xB000..0xFFFF]: 5 blocks (room for 5 bridges)

In other words, on PIIX4 we could advance from the current 4 bridges to 5
bridges (including root bridges). I think that simpy doesn't justify any
patches.

So, this will have to be Q35 only in OVMF.

Comment 20 Marcel Apfelbaum 2016-05-08 09:05:57 UTC
Hi,

It seems I am late to the party, here is my opinion anyway:

1. We do have a Q35 (PCIe) problematic limitation on the number of
   PCI devices requiring IO space we can use. The reason is that
   each device has to be placed on his own bridge and we are out
   of IO space really fast.

2. Trying to increase the available IO space is always good
   but it does not solve our problem, we will still have room
   for like max 16 devices, not good enough.

3. Non-standard approach to limit the IO space the bridges require
   is a no go either.

4. The right way to go IMO is to make virtio devices PCIe by default,
   (disable-modern=off) and the firmware should allow the PCI bridges
   to have MEM only ranges and not IO.

5. A problem I was not aware of is that IO mode is faster than MEM.
   Is this (still) true? We need to check this to be sure.

6. Another problem is the *possible* bug in virtio block devices
   using the disable-modern=off. If anybody knows something about
   it please do tell.

Thanks,
Marcel

Comment 21 Laszlo Ersek 2016-05-09 08:34:33 UTC
(better click "Unwrap comments" near the top)

(In reply to Marcel Apfelbaum from comment #20)

> 1. We do have a Q35 (PCIe) problematic limitation on the number of
>    PCI devices requiring IO space we can use. The reason is that
>    each device has to be placed on his own bridge and we are out
>    of IO space really fast.
> 
> 2. Trying to increase the available IO space is always good
>    but it does not solve our problem, we will still have room
>    for like max 16 devices, not good enough.
> 
> 3. Non-standard approach to limit the IO space the bridges require
>    is a no go either.

We seem to have consensus for these three.

> 4. The right way to go IMO is to make virtio devices PCIe by default,
>    (disable-modern=off) and the firmware should allow the PCI bridges
>    to have MEM only ranges and not IO.

Right, the edk2 PCI driver stack satisfies this.

For example, consider the following topology (printed from within the
guest):

> -[0000:00]-+-00.0  Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
>            +-01.0  Red Hat, Inc. QXL paravirtual graphic card
>            +-02.0-[01-05]----00.0-[02-05]--+-00.0-[03]----00.0  Red Hat, Inc Device 1048
>            |                               +-01.0-[04]----00.0  Red Hat, Inc Device 1041
>            |                               \-02.0-[05]----00.0  Red Hat, Inc Device 1041
>            +-1f.0  Intel Corporation 82801IB (ICH9) LPC Interface Controller
>            +-1f.2  Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode]
>            \-1f.3  Intel Corporation 82801I (ICH9 Family) SMBus Controller

The QEMU command line is:

> ISO=/mnt/data/isos/fedora/23/Fedora-Live-Workstation-x86_64-23-10.iso
> CODE=/usr/share/OVMF/OVMF_CODE.secboot.fd
> TMPL=/usr/share/OVMF/OVMF_VARS.fd
> TFTP=/var/lib/dnsmasq
> BF=shim.efi
> MODERN=disable-legacy=on,disable-modern=off
> 
> cp $TMPL vars12.fd
> 
> qemu-system-x86_64 \
>   -m 2048 \
>   \
>   -machine q35,smm=on,accel=kvm \
>   -global driver=cfi.pflash01,property=secure,value=on \
>   \
>   -device qxl-vga \
>   -drive if=pflash,format=raw,file=$CODE,readonly \
>   -drive if=pflash,format=raw,file=vars12.fd \
>   \
>   -chardev file,id=debugfile,path=debug12.log \
>   -device isa-debugcon,iobase=0x402,chardev=debugfile \
>   \
>   -chardev stdio,id=char0,signal=off,mux=on \
>   -mon chardev=char0,mode=readline,default \
>   -serial chardev:char0 \
>   \
>   -device ioh3420,id=root,bus=pcie.0 \
>   -device x3130-upstream,id=up,bus=root \
>   \
>   -drive id=cdrom,if=none,readonly,format=raw,cache=writethrough,file=$ISO \
>   -device xio3130-downstream,id=down1,bus=up,chassis=1 \
>   -device virtio-scsi-pci,id=scsi0,bus=down1,$MODERN \
>   -device scsi-cd,bus=scsi0.0,drive=cdrom,bootindex=0 \
>   \
>   -netdev user,id=netdev0,hostfwd=tcp:127.0.0.1:2223-:22,tftp=$TFTP,bootfile=$BF \
>   -device xio3130-downstream,id=down2,bus=up,chassis=2 \
>   -device virtio-net-pci,bus=down2,netdev=netdev0,$MODERN,bootindex=1 \
>   \
>   -netdev user,id=netdev1 \
>   -device xio3130-downstream,id=down3,bus=up,chassis=3 \
>   -device virtio-net-pci,bus=down3,netdev=netdev1,$MODERN \
>   \
>   -global ICH9-LPC.disable_s3=0

The edk2 PCI Bus driver prints the following resource map:

> PciHostBridge: SubmitResources for PciRoot(0x0)
>  I/O: Granularity/SpecificFlag = 0 / 01
>       Length/Alignment = 0x1000 / 0xFFF
>  Mem: Granularity/SpecificFlag = 32 / 00
>       Length/Alignment = 0x9400000 / 0x3FFFFFF
>  Mem: Granularity/SpecificFlag = 64 / 00
>       Length/Alignment = 0x800000 / 0x7FFFFF
> PciBus: HostBridge->SubmitResources() - Success
> PciHostBridge: NotifyPhase (AllocateResources)
>  RootBridge: PciRoot(0x0)
>   Mem: Base/Length/Alignment = 90000000/9400000/3FFFFFF - Success
>   Mem64: Base/Length/Alignment = 800000000/800000/7FFFFF - Success
>   I/O: Base/Length/Alignment = C000/1000/FFF - Success
> PciBus: HostBridge->NotifyPhase(AllocateResources) - Success
> PciBus: Resource Map for Root Bridge PciRoot(0x0)
> Type =   Io16; Base = 0xC000;	Length = 0x1000;	Alignment = 0xFFF
>    Base = 0xC000;	Length = 0x40;	Alignment = 0x3F;	Owner = PCI [00|1F|03:20]
>    Base = 0xC040;	Length = 0x20;	Alignment = 0x1F;	Owner = PCI [00|1F|02:20]
>    Base = 0xC060;	Length = 0x20;	Alignment = 0x1F;	Owner = PCI [00|01|00:1C]
> Type =  Mem32; Base = 0x90000000;	Length = 0x9400000;	Alignment = 0x3FFFFFF
>    Base = 0x90000000;	Length = 0x4000000;	Alignment = 0x3FFFFFF;	Owner = PCI [00|01|00:14]
>    Base = 0x94000000;	Length = 0x4000000;	Alignment = 0x3FFFFFF;	Owner = PCI [00|01|00:10]
>    Base = 0x98000000;	Length = 0x1300000;	Alignment = 0x7FFFFF;	Owner = PPB [00|02|00:**]
>    Base = 0x99300000;	Length = 0x2000;	Alignment = 0x1FFF;	Owner = PCI [00|01|00:18]
>    Base = 0x99302000;	Length = 0x1000;	Alignment = 0xFFF;	Owner = PCI [00|1F|02:24]
> Type =  Mem64; Base = 0x800000000;	Length = 0x800000;	Alignment = 0x7FFFFF
>    Base = 0x800000000;	Length = 0x800000;	Alignment = 0x7FFFFF;	Owner = PPB [00|02|00:**]; Type = PMem64
> 
> PciBus: Resource Map for Bridge [00|02|00]
> Type =  Mem32; Base = 0x98000000;	Length = 0x1300000;	Alignment = 0x7FFFFF
>    Base = 0x98000000;	Length = 0x1300000;	Alignment = 0x7FFFFF;	Owner = PPB [01|00|00:**]
> Type = PMem64; Base = 0x800000000;	Length = 0x800000;	Alignment = 0x7FFFFF
>    Base = 0x800000000;	Length = 0x800000;	Alignment = 0x7FFFFF;	Owner = PPB [01|00|00:**]
> 
> PciBus: Resource Map for Bridge [01|00|00]
> Type =  Mem32; Base = 0x98000000;	Length = 0x1300000;	Alignment = 0x7FFFFF
>    Base = 0x98000000;	Length = 0x800000;	Alignment = 0x7FFFFF;	Owner = PPB [02|01|00:**]
>    Base = 0x98800000;	Length = 0x800000;	Alignment = 0x7FFFFF;	Owner = PPB [02|02|00:**]
>    Base = 0x99000000;	Length = 0x100000;	Alignment = 0xFFFFF;	Owner = PPB [02|02|00:**]
>    Base = 0x99100000;	Length = 0x100000;	Alignment = 0xFFFFF;	Owner = PPB [02|01|00:**]
>    Base = 0x99200000;	Length = 0x100000;	Alignment = 0xFFFFF;	Owner = PPB [02|00|00:**]
> Type = PMem64; Base = 0x800000000;	Length = 0x800000;	Alignment = 0x7FFFFF
>    Base = 0x800000000;	Length = 0x800000;	Alignment = 0x7FFFFF;	Owner = PPB [02|00|00:**]
> 
> PciBus: Resource Map for Bridge [02|00|00]
> Type =  Mem32; Base = 0x99200000;	Length = 0x100000;	Alignment = 0xFFFFF
>    Base = 0x99200000;	Length = 0x1000;	Alignment = 0xFFF;	Owner = PCI [03|00|00:14]
> Type = PMem64; Base = 0x800000000;	Length = 0x800000;	Alignment = 0x7FFFFF
>    Base = 0x800000000;	Length = 0x800000;	Alignment = 0x7FFFFF;	Owner = PCI [03|00|00:20]
> 
> PciBus: Resource Map for Bridge [02|01|00]
> Type =  Mem32; Base = 0x98000000;	Length = 0x800000;	Alignment = 0x7FFFFF
>    Base = 0x98000000;	Length = 0x800000;	Alignment = 0x7FFFFF;	Owner = PCI [04|00|00:20]; Type = PMem64
> Type =  Mem32; Base = 0x99100000;	Length = 0x100000;	Alignment = 0xFFFFF
>    Base = 0x99100000;	Length = 0x1000;	Alignment = 0xFFF;	Owner = PCI [04|00|00:14]
> 
> PciBus: Resource Map for Bridge [02|02|00]
> Type =  Mem32; Base = 0x98800000;	Length = 0x800000;	Alignment = 0x7FFFFF
>    Base = 0x98800000;	Length = 0x800000;	Alignment = 0x7FFFFF;	Owner = PCI [05|00|00:20]; Type = PMem64
> Type =  Mem32; Base = 0x99000000;	Length = 0x100000;	Alignment = 0xFFFFF
>    Base = 0x99000000;	Length = 0x1000;	Alignment = 0xFFF;	Owner = PCI [05|00|00:14]

IO ports are allocated only for QXL (00:01.0), the on-board SATA controller
(00:1f.2) and the on-board SMBus controller (00:1f.3); and those belong
directly to the root bridge.

The
- bridges belonging to
  - the root port (00:02.0),
  - the upstream port (01:00.0),
  - the downstream ports (02:00.0, 02:01.0, 02:02.0),
- and the virtio-1.0 devices in the downstream ports:
  - 03:00.0 -- virtio-scsi,
  - 04:00.0 -- virtio-net,
  - 05:00.0 -- virtio-net

have MMIO resources only.

(In reply to Marcel Apfelbaum from comment #20)

> 
> 5. A problem I was not aware of is that IO mode is faster than MEM.
>    Is this (still) true? We need to check this to be sure.
> 
> 6. Another problem is the *possible* bug in virtio block devices
>    using the disable-modern=off. If anybody knows something about
>    it please do tell.

Ah, from your comments in the BZ linked by Laine, I thought you had already
learned the details from Michael. It seems only Michael knows the details.
Can you ask him please?

Thanks!

Comment 22 Laszlo Ersek 2016-05-09 20:08:32 UTC
CC'ing Gerd for comment 16 through comment 18, should he want to consider vmport in the SeaBIOS allocation logic.

Comment 23 Laszlo Ersek 2016-05-10 00:38:28 UTC
Posted upstream series
http://thread.gmane.org/gmane.comp.bios.edk2.devel/11943

Comment 24 Gerd Hoffmann 2016-05-10 12:52:04 UTC
> Then, we have:
> 
>             vmport_realizefn()
>               isa_register_ioport(isadev, &s->io, 0x5658);

> So, I think that (a) SeaBIOS's current "greedy" IO BAR range is not correct,

Yes.

But I/O bars are usually pretty small.  You also have one pcie device per "bus" only b/c it isn't really a bus.  Typical use is a small range at the start of each bridge region.  I have yet to see a configuration where a pci device ends up overlapping the vmport.

If it turns out to actually happen we can add vmport detection to seabios to avoid the range in case vmport is present.

Oh, and turning off vmport by default would be a good idea too IMHO.  At least we have a switch for it meanwhile.

Comment 25 Laszlo Ersek 2016-05-10 15:39:19 UTC
(In reply to Gerd Hoffmann from comment #24)
> > Then, we have:
> > 
> >             vmport_realizefn()
> >               isa_register_ioport(isadev, &s->io, 0x5658);
> 
> > So, I think that (a) SeaBIOS's current "greedy" IO BAR range is not correct,
> 
> Yes.
> 
> But I/O bars are usually pretty small.  You also have one pcie device per
> "bus" only b/c it isn't really a bus.  Typical use is a small range at the
> start of each bridge region.  I have yet to see a configuration where a pci
> device ends up overlapping the vmport.

Sure; I wasn't implying it was a grave bug, just that there was an overlap you might want to check out. (I was surprised by the overlap though.)

> If it turns out to actually happen we can add vmport detection to seabios to
> avoid the range in case vmport is present.
> 
> Oh, and turning off vmport by default would be a good idea too IMHO.  At
> least we have a switch for it meanwhile.

I agree. I don't even know why it defaults to "enabled".

Comment 26 Laszlo Ersek 2016-05-17 15:23:43 UTC
Posted upstream v2:
http://thread.gmane.org/gmane.comp.bios.edk2.devel/12320

Comment 27 Laszlo Ersek 2016-05-17 18:51:57 UTC
(In reply to Laszlo Ersek from comment #26)
> Posted upstream v2:
> http://thread.gmane.org/gmane.comp.bios.edk2.devel/12320

Commit range b41ef3251809..bba734ab4c7c.

Comment 29 jingzhao 2016-08-11 02:45:57 UTC
Reproduced with qemu-kvm-rhev-2.5.0-4.el7.x86_64.rpm and OVMF-20160419-2.git90bb4c5.el7.noarch.rpm.

Verified it with OVMF-20160419-2.git90bb4c5.el7.noarch.rpm and qemu-kvm-rhev-2.6.0-19.el7.x86_64

1. Boot guest with following cli:
/usr/libexec/qemu-kvm \
 -S \
 -name 'rhel7.3-64' \
 -machine q35,accel=kvm,usb=off,vmport=off \
 -drive file=/usr/share/OVMF/OVMF_CODE.secboot.fd,if=pflash,format=raw,unit=0,readonly=on \
 -drive file=/home/OVMF_VARS.fd,if=pflash,format=raw,unit=1 \
 -m 4096 \
 -smp 4,maxcpus=4,cores=2,threads=2,sockets=1 \
 -cpu Nehalem \
 -rtc base=localtime,clock=host,driftfix=slew \
 -nodefaults \
 -vga qxl \
 -netdev tap,id=netdev0,vhost=on \
 -device virtio-net-pci,mac=BA:BC:13:83:4F:BD,id=net0,netdev=netdev0,status=on,bus=pcie.0 \
 -device ioh3420,bus=pcie.0,id=root.0,slot=1 \
 -device x3130-upstream,bus=root.0,id=upstream1 \
 -device xio3130-downstream,bus=upstream1,id=downstream1,chassis=1 \
 -device xio3130-downstream,bus=upstream1,id=downstream2,chassis=2 \
 -device xio3130-downstream,bus=upstream1,id=downstream3,chassis=3 \
 -device xio3130-downstream,bus=upstream1,id=downstream4,chassis=4 \
 -device xio3130-downstream,bus=upstream1,id=downstream5,chassis=5 \
 -device xio3130-downstream,bus=upstream1,id=downstream6,chassis=6 \
 -device virtio-scsi-pci,bus=downstream1,id=scsi_pci_bus0 \
 -drive file=/home/pxb-ovmf.qcow2,format=qcow2,id=drive_sysdisk,if=none,cache=none,aio=native,werror=stop,rerror=stop \
 -device scsi-hd,drive=drive_sysdisk,bus=scsi_pci_bus0.0,id=device_sysdisk,bootindex=0,physical_block_size=512,logical_block_size=512,serial=12345678900987654321,ver=SYSDISK,wwn=0x123,channel=0,scsi-id=0,lun=0 \
 -device virtio-scsi-pci,bus=downstream2,id=scsi_pci_bus1 \
 -device virtio-scsi-pci,bus=downstream3,id=scsi_pci_bus2 \
 -device virtio-scsi-pci,bus=downstream4,id=scsi_pci_bus3 \
 -boot menu=on \
 -enable-kvm \
 -monitor stdio \
 -spice port=5900,disable-ticketing \
 -qmp tcp:0:6666,server,nowait

2. rhel7.3 guest can boot up successfully

Comment 30 aihua liang 2016-09-12 03:35:54 UTC
Has verified, the problem has been resolved.

Verified Version:
  Kernel Version:3.10.0-500.el7.x86_64
  qemu-kvm-version:qemu-kvm-rhev-2.6.0-22.el7.x86_64
  OVMF Version:OVMF-20160608-3.git988715a.el7.noarch


Verified Steps:
1.Start guest using cmds:
 /usr/libexec/qemu-kvm \
-name 'rhel7.3-64' \
-machine q35,accel=kvm,usb=off,vmport=off \
-drive file=/usr/share/OVMF/OVMF_CODE.secboot.fd,if=pflash,format=raw,unit=0,readonly=on \
-drive file=/usr/share/OVMF/OVMF_Client_VARS.fd,if=pflash,format=raw,unit=1 \
-m 4096 \
-smp 4,maxcpus=4,cores=2,threads=2,sockets=1 \
-cpu Nehalem \
-rtc base=localtime,clock=host,driftfix=slew \
-nodefaults \
-vga qxl \
-netdev tap,id=netdev0,vhost=on \
-device virtio-net-pci,mac=BA:BC:13:83:4F:BD,id=net0,netdev=netdev0,status=on,bus=pcie.0 \
-device ioh3420,bus=pcie.0,id=root.0,slot=1 \
-device x3130-upstream,bus=root.0,id=upstream1 \
-device xio3130-downstream,bus=upstream1,id=downstream1,chassis=1 \
-device xio3130-downstream,bus=upstream1,id=downstream2,chassis=2 \
-device xio3130-downstream,bus=upstream1,id=downstream3,chassis=3 \
-device xio3130-downstream,bus=upstream1,id=downstream4,chassis=4 \
-device xio3130-downstream,bus=upstream1,id=downstream5,chassis=5 \
-device xio3130-downstream,bus=upstream1,id=downstream6,chassis=6 \
-device virtio-scsi-pci,bus=downstream1,id=scsi_pci_bus0 \
-drive file=/home/73test/script/ovmf/img/rhel72/rhel72_64.qcow2,format=qcow2,id=drive_sysdisk,if=none,cache=none,aio=native,werror=stop,rerror=stop \
-device scsi-hd,drive=drive_sysdisk,bus=scsi_pci_bus0.0,id=device_sysdisk,bootindex=0,physical_block_size=512,logical_block_size=512,serial=12345678900987654321,ver=SYSDISK,wwn=0x123,channel=0,scsi-id=0,lun=0 \
-device virtio-scsi-pci,bus=downstream2,id=scsi_pci_bus1 \
-device virtio-scsi-pci,bus=downstream3,id=scsi_pci_bus2 \
-device virtio-scsi-pci,bus=downstream4,id=scsi_pci_bus3 \
-boot menu=on \
-enable-kvm \
-monitor stdio \
-spice port=5900,disable-ticketing \
-qmp tcp:0:6666,server,nowait \

2.In guest, execute "lspci"

Verified Result:
 Guest can start successfully.

 [root@dhcp-9-37 ~]# lspci
00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
00:01.0 VGA compatible controller: Red Hat, Inc. QXL paravirtual graphic card (rev 04)
00:02.0 Ethernet controller: Red Hat, Inc Virtio network device
00:03.0 PCI bridge: Intel Corporation 7500/5520/5500/X58 I/O Hub PCI Express Root Port 0 (rev 02)
00:1f.0 ISA bridge: Intel Corporation 82801IB (ICH9) LPC Interface Controller (rev 02)
00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode] (rev 02)
00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02)
01:00.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch (Upstream) (rev 02)
02:00.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch (Downstream) (rev 01)
02:01.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch (Downstream) (rev 01)
02:02.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch (Downstream) (rev 01)
02:03.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch (Downstream) (rev 01)
02:04.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch (Downstream) (rev 01)
02:05.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch (Downstream) (rev 01)
03:00.0 SCSI storage controller: Red Hat, Inc Device 1048 (rev 01)
04:00.0 SCSI storage controller: Red Hat, Inc Device 1048 (rev 01)
05:00.0 SCSI storage controller: Red Hat, Inc Device 1048 (rev 01)
06:00.0 SCSI storage controller: Red Hat, Inc Device 1048 (rev 01)

Comment 32 errata-xmlrpc 2016-11-04 08:40:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2608.html


Note You need to log in before you can comment on or make changes to this bug.