Bug 1271457

Summary: seabios has no output if boot guest with more than 8 pci-bridge disks.
Product: Red Hat Enterprise Linux 7 Reporter: Qian Guo <qiguo>
Component: qemu-kvm-rhevAssignee: Marcel Apfelbaum <marcel>
Status: CLOSED NOTABUG QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.2CC: alex.williamson, jinzhao, juzhang, knoel, michen, virt-maint, xfu
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-23 11:55:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
The vnc window has no output. none

Description Qian Guo 2015-10-14 03:14:38 UTC
Created attachment 1082676 [details]
The vnc window has no output.

Description of problem:
from vdi monitor(I use vnc), no seabios infos, and neither from the seabios unix socket.

Can boot if the pci bridge disks are less than 9.
Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.3.0-29.el7.x86_64

How reproducible:

100%

Steps to Reproduce:
1.Boot guest with over 8 pci-bridge disks, and with seabios socket.:
...
-chardev socket,id=seabioslog_id_20151013-073546-vS9GRL3g,path=/tmp/seabios1,server,nowait \
-device isa-debugcon,chardev=seabioslog_id_20151013-073546-vS9GRL3g,iobase=0x402 \
...

2.
3.

Actual results:
From 
# nc -U /tmp/seabios1 

No print.

From vnc window, refer the attachment.



Expected results:
Can load the seabios.

Additional info:
The cli that can not boot up.
/usr/libexec/qemu-kvm -net none -M pc -smp 4,cores=4,threads=1,sockets=1 -m 4G \
-name vm1 -vnc :1 -monitor stdio \
-vga cirrus \
-device pci-bridge,chassis_nr=1,id=bridge0,addr=0x12 \
-drive file=/home/rhel72cp1.raw,cache=none,if=none,id=drive-virtio-disk0,format=raw \
-device virtio-scsi-pci,bus=bridge0,addr=0x04,id=scsi1 \
-device scsi-hd,drive=drive-virtio-disk0,bus=scsi1.0,id=hd1 \
-serial unix:/tmp/bridge-con,server,nowait -boot menu=on \
-device pci-bridge,chassis_nr=2,id=bridge2,addr=0x03 \
-drive file=/home/image2,if=none,id=hd2,format=qcow2,werror=stop,rerror=stop -device virtio-blk-pci,scsi=off,drive=hd2,id=os-disk2,bus=bridge2,addr=0x04 \
-device pci-bridge,chassis_nr=3,id=bridge3,addr=0x04 \
-drive file=/home/image3,if=none,id=hd3,format=qcow2,werror=stop,rerror=stop -device virtio-blk-pci,scsi=off,drive=hd3,id=os-disk3,bus=bridge3,addr=0x04 \
-device pci-bridge,chassis_nr=4,id=bridge4,addr=0x05 \
-drive file=/home/image4,if=none,id=hd4,format=qcow2,werror=stop,rerror=stop -device virtio-blk-pci,scsi=off,drive=hd4,id=os-disk4,bus=bridge4,addr=0x04 \
-device pci-bridge,chassis_nr=5,id=bridge5,addr=0x06 \
-drive file=/home/image5,if=none,id=hd5,format=qcow2,werror=stop,rerror=stop -device virtio-blk-pci,scsi=off,drive=hd5,id=os-disk5,bus=bridge5,addr=0x04 \
-device pci-bridge,chassis_nr=6,id=bridge6,addr=0x07 \
-drive file=/home/image6,if=none,id=hd6,format=qcow2,werror=stop,rerror=stop -device virtio-blk-pci,scsi=off,drive=hd6,id=os-disk6,bus=bridge6,addr=0x04 \
-device pci-bridge,chassis_nr=7,id=bridge7,addr=0x08 \
-drive file=/home/image7,if=none,id=hd7,format=qcow2,werror=stop,rerror=stop -device virtio-blk-pci,scsi=off,drive=hd7,id=os-disk7,bus=bridge7,addr=0x04 \
-device pci-bridge,chassis_nr=8,id=bridge8,addr=0x09 \
-drive file=/home/image8,if=none,id=hd8,format=qcow2,werror=stop,rerror=stop -device virtio-blk-pci,scsi=off,drive=hd8,id=os-disk8,bus=bridge8,addr=0x04 \
-drive file=/home/image9,if=none,id=hd9,format=qcow2,werror=stop,rerror=stop -device virtio-blk-pci,scsi=off,drive=hd9,id=os-disk9,bus=bridge9,addr=0x04 \
-chardev socket,id=seabioslog_id_20151013-073546-vS9GRL3g,path=/tmp/seabios1,server,nowait \
-device isa-debugcon,chardev=seabioslog_id_20151013-073546-vS9GRL3g,iobase=0x402 \

Comment 1 Qian Guo 2015-10-14 03:16:08 UTC
# rpm -qa |grep seabios
seabios-bin-1.7.5-11.el7.noarch

Comment 3 Marcel Apfelbaum 2015-12-23 11:55:47 UTC
This is not a bug, I'll explain why. The PCI/PCI bridge spec requires to assign to each PCI bridge 4K IO space, even if the devices behind the bridge require much less. Since the IO space is very limited (64K if I remember correctly) there are so much pci-bridges you may have.

However, if you need more than 9-10 pci-bridges you can nest them (bridge behind bridge). The reason it will work is that the nested bridges will share the same IO ranges as the bus 0 originator bridge.

Something like:


 `for i in {1..8}; do echo -device pci-bridge,chassis_nr=$i,id=b$i -device pci-bridge,chassis_nr=$((2*$i)),bus=b$i,addr=1; done`

This will allow 16 bridges and so on.

Comment 4 Alex Williamson 2016-04-27 15:20:11 UTC
Marcel, it may not be possible for SeaBIOS to give the devices all the ioport space they need in this configuration, but don't you think we should fail more gracefully here?  4K may be the granularity at which we can program ioport space to a subordinate bus, but assigning no ioport space is also an option.  Clearly we won't be able to program ioport resources of those devices behind bridges that don't have any allocated to the bridge, but not all devices require ioport space.  Even in the physical world, a device my expose ioport space, but only require it in very limited use cases, such as for boot, but not for runtime support.

IMO there is a bug here and SeaBIOS should fail more gracefully, perhaps with a warning output to the console about exhausted ioport resources.  I would expect robust physical hardware to do the same rather than simply failing to boot.

Comment 5 Marcel Apfelbaum 2016-04-27 16:12:39 UTC
(In reply to Alex Williamson from comment #4)
> Marcel, it may not be possible for SeaBIOS to give the devices all the
> ioport space they need in this configuration, but don't you think we should
> fail more gracefully here?  4K may be the granularity at which we can
> program ioport space to a subordinate bus, but assigning no ioport space is
> also an option.  Clearly we won't be able to program ioport resources of
> those devices behind bridges that don't have any allocated to the bridge,
> but not all devices require ioport space.  Even in the physical world, a
> device my expose ioport space, but only require it in very limited use
> cases, such as for boot, but not for runtime support.
> 

Hi Alex,
Thank you for having a look at this problem.

> IMO there is a bug here and SeaBIOS should fail more gracefully, perhaps
> with a warning output to the console about exhausted ioport resources.  I
> would expect robust physical hardware to do the same rather than simply
> failing to boot.

Once the IO space is finished the SeaBIOS fails with a panic message
that can be seen at least with the debug device enabled. By the way, this
mode of operation is the "standard" SeaBIOS way to halt when all
kind of resources can't be assigned, not only IO.

Michael had another idea regarding what SeaBIOS should do when no IO space
remains: instead of panic, just skip assigning IO space to those bridges.

I didn't quite like this approach because it is not a 'fair' way to assign
resources, maybe the devices behind the bridge that got the IO don't need it,
while others do need it even if they got skipped.

I preferred the approach: skip assigning IO to the bridges that devices
attached to them do not require IO.

Thanks,
Marcel

Comment 6 Alex Williamson 2016-04-27 16:30:58 UTC
Hi Marcel,

(In reply to Marcel Apfelbaum from comment #5)
> (In reply to Alex Williamson from comment #4)
> > IMO there is a bug here and SeaBIOS should fail more gracefully, perhaps
> > with a warning output to the console about exhausted ioport resources.  I
> > would expect robust physical hardware to do the same rather than simply
> > failing to boot.
> 
> Once the IO space is finished the SeaBIOS fails with a panic message
> that can be seen at least with the debug device enabled. By the way, this
> mode of operation is the "standard" SeaBIOS way to halt when all
> kind of resources can't be assigned, not only IO.
> 
> Michael had another idea regarding what SeaBIOS should do when no IO space
> remains: instead of panic, just skip assigning IO space to those bridges.

Right, once ioport space is exhausted, simply stop assigning it, report error, and hobble along with a partially programmed PCI space.  The guest may be able to reallocate the buses with more knowledge of the specific devices or the guest drivers may not actually make use of the unprogrammed ioport space.

> I didn't quite like this approach because it is not a 'fair' way to assign
> resources, maybe the devices behind the bridge that got the IO don't need it,
> while others do need it even if they got skipped.

At the BIOS level we can only determine which devices are requesting ioport resources, whether they actually need it or not is for the option ROM and guest level drivers to decide.
 
> I preferred the approach: skip assigning IO to the bridges that devices
> attached to them do not require IO.

Sure, that seems like a bug in itself if SeaBIOS is wasting ioport resources where none is actually used by subordinate devices, but that's really a corner case to the problem of ioport space being exhausted.  This seems like a case where we could improve SeaBIOS to an "enterprise class" by adding robustness to fail gracefully and predictably in this case of resource exhaustion to achieve a better user experience.  Of course libvirt could also make a default configuration that gives SeaBIOS a better chance of success, per your suggestions, which is being addressed in bug 1320908 and led me to this one.  Thanks

Comment 7 Marcel Apfelbaum 2016-04-27 19:09:08 UTC
(In reply to Alex Williamson from comment #6)
> Hi Marcel,
> 
> (In reply to Marcel Apfelbaum from comment #5)
> > (In reply to Alex Williamson from comment #4)
> > > IMO there is a bug here and SeaBIOS should fail more gracefully, perhaps
> > > with a warning output to the console about exhausted ioport resources.  I
> > > would expect robust physical hardware to do the same rather than simply
> > > failing to boot.
> > 
> > Once the IO space is finished the SeaBIOS fails with a panic message
> > that can be seen at least with the debug device enabled. By the way, this
> > mode of operation is the "standard" SeaBIOS way to halt when all
> > kind of resources can't be assigned, not only IO.
> > 
> > Michael had another idea regarding what SeaBIOS should do when no IO space
> > remains: instead of panic, just skip assigning IO space to those bridges.

Hi Alex,

> 
> Right, once ioport space is exhausted, simply stop assigning it, report
> error, and hobble along with a partially programmed PCI space.  The guest
> may be able to reallocate the buses with more knowledge of the specific
> devices or the guest drivers may not actually make use of the unprogrammed
> ioport space.
> 

OK, since both you and Michael think is a plausible solution, I can give it a try.
How do you propose to report the error? I didn't check it yet, but I think the
resources allocation step goes before initiating the vga drivers and the other
way is to report back to QEMU. However, the fwcfg stuff goes only one way.

> > I didn't quite like this approach because it is not a 'fair' way to assign
> > resources, maybe the devices behind the bridge that got the IO don't need it,
> > while others do need it even if they got skipped.
> 
> At the BIOS level we can only determine which devices are requesting ioport
> resources, whether they actually need it or not is for the option ROM and
> guest level drivers to decide.
>  
> > I preferred the approach: skip assigning IO to the bridges that devices
> > attached to them do not require IO.
> 
> Sure, that seems like a bug in itself if SeaBIOS is wasting ioport resources
> where none is actually used by subordinate devices, but that's really a
> corner case to the problem of ioport space being exhausted.  This seems like
> a case where we could improve SeaBIOS to an "enterprise class" by adding
> robustness to fail gracefully and predictably in this case of resource
> exhaustion to achieve a better user experience.  Of course libvirt could
> also make a default configuration that gives SeaBIOS a better chance of
> success, per your suggestions, which is being addressed in bug 1320908 and
> led me to this one.  Thanks

I get your point, we need a way to report back to QEMU, I'll speak with Gerd about it.
Thanks,
Marcel

Comment 8 Alex Williamson 2016-04-27 19:31:35 UTC
What would we expect to happen on bare metal where we don't have a hypervisor?  The BIOS would probably spit out some warning on the console (default VGA) with a timeout or key press to continue.  I would think SeaBIOS could do something similar.  The error may occur prior to VGA init, but we don't necessarily need to post the warning immediately at the point we hit the exhaustion.  We could simply flag that it occurred, write out the details to the debug port, and splash the warning to the console after it posts.  I'm not sure what QEMU would do if it were notified of the error other than just an error_report().  Thanks