Bug 847546 - seabios sits for 3 seconds doing nothing during boot
seabios sits for 3 seconds doing nothing during boot
Status: CLOSED NOTABUG
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
rawhide
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-08-12 17:26 EDT by Richard W.M. Jones
Modified: 2013-01-09 07:06 EST (History)
14 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-08-13 12:33:39 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Richard W.M. Jones 2012-08-12 17:26:36 EDT
Description of problem:

$ annotate-output +'%T:%N' guestfish -a /dev/null run -v
[...]
22:17:54:457493465 E: [00514ms] /usr/bin/qemu-kvm [...]
22:17:54:460912406 E: Google, Inc.
22:17:54:464330486 E: Serial Graphics Adapter 01/14/12
22:17:54:467727552 E: SGABIOS $Id: sgabios.S 8 2010-04-22 00:03:40Z nlaredo $ (\
mockbuild@) Sat Jan 14 23:18:18 UTC 2012
22:17:54:471107681 E: Term: 80x24
22:17:54:474397757 E: 4 0
22:17:54:478236572 E: x1b[2J^MSeaBIOS (version 1.7.0-20120722_040125-)^M
22:17:54:481615939 E: ^M
22:17:54:624122918 E: ^MBooting from ROM...^M
22:17:57:767809071 E: ^MProbing EDD (edd=off to disable)... ok^M
22:17:58:033771349 E: x1b[2J[    0.000000] Initializing cgroup subsys cpuset
22:17:58:037214742 E: [    0.000000] Initializing cgroup subsys cpu

Notice the 3 second pause 22:17:54:624122918 -> 22:17:57:767809071.
That's 3/8ths of the entire time libguestfs takes to boot up.

Version-Release number of selected component (if applicable):

$ rpm -qf /usr/share/qemu/sgabios.bin
qemu-system-x86-1.1.1-1.fc18.x86_64

How reproducible:

100%

Steps to Reproduce:

Run this command:
  annotate-output +'%T:%N' guestfish -a /dev/null run -v
in Fedora 18+.

Observe that it hangs for 3 seconds.

FYI here is the full qemu command line we are using:

/usr/bin/qemu-kvm     -global virtio-blk-pci.scsi=off     -nodefconfig     -nodefaults     -nographic     -device virtio-scsi-pci,id=scsi     -drive file=/dev/null,id=hd0,if=none     -device scsi-hd,drive=hd0     -drive file=/var/tmp/.guestfs-1000/root.2326,snapshot=on,id=appliance,if=none,cache=unsafe     -device scsi-hd,drive=appliance     -machine accel=kvm:tcg     -m 500     -no-reboot     -no-hpet     -device virtio-serial     -serial stdio     -device sga     -chardev socket,path=/tmp/libguestfslnVlGO/guestfsd.sock,id=channel0     -device virtserialport,chardev=channel0,name=org.libguestfs.channel.0     -kernel /var/tmp/.guestfs-1000/kernel.2326     -initrd /var/tmp/.guestfs-1000/initrd.2326     -append 'panic=1 console=ttyS0 udevtimeout=600 no_timer_check acpi=off printk.time=1 cgroup_disable=memory root=/dev/sdb selinux=0 guestfs_verbose=1 TERM=xterm '
Comment 1 Richard W.M. Jones 2012-08-12 17:37:36 EDT
Also tested with qemu 2:1.2-0.1.20120806git3e430569.fc18.

This also suffers from the same problem -- 3 second delay in
sgabios.

It also contains some sort of bug in virtio-scsi emulation
so that the kernel crashes when the guest does 'insmod virtio_scsi.ko'.
I'll open a separate bug about that.
Comment 2 Richard W.M. Jones 2012-08-12 17:55:19 EDT
Sorry, wrong component.  The bios is located in the
sgabios package:

sgabios-bin-0-0.20110622SVN.fc17.noarch

(I tried the later sgabios, but it is completely
broken.  I will file a separate bug ...)
Comment 3 Richard W.M. Jones 2012-08-12 18:29:23 EDT
... SeaBIOS ...
Comment 4 Richard W.M. Jones 2012-08-13 08:18:32 EDT
Even enabling full debugging in seabios didn't make the
problem obvious, so I took a Fedora 17 (baremetal) host
and updated it component by component to Fedora 18 until
the bug manifested itself.

The component to blame is the *host kernel* (NB: not the
guest kernel).

Updating from 3.4.0-1.fc17.x86_64 -> 3.6.0-0.rc1.git4.1.fc18.x86_64
is what triggers the problem.

So I guess this is some sort of regression in the KVM module,
perhaps causing guest memory or I/O access to be slower.
Comment 5 Richard W.M. Jones 2012-08-13 09:00:50 EDT
I tracked this down using gdb, and it does seem to be a
regression (severe slow down) in the 'rep insb' instruction.

(gdb) info registers 
eax            0x500011 5242897
ecx            0x354990 3492240
edx            0x511    1297
ebx            0x0      0
esp            0x6f40   0x6f40
ebp            0x6f40   0x6f40
esi            0x0      0
edi            0x2b1400 2823168
eip            0x15e    0x15e
eflags         0x86     [ PF SF ]
cs             0xc380   50048
ss             0x0      0
ds             0xc380   50048
es             0x10     16
fs             0x0      0
gs             0x0      0
(gdb) x/5i ($cs*16)+$eip
   0xc395e:     rep insb (%dx),%es:(%edi)   <--- here
   0xc3961:     mov    $0xa,%ax
   0xc3964:     mov    $0x510,%dx
   0xc3967:     out    %ax,(%dx)
   0xc3968:     mov    $0x511,%dx

Port 0x511 is PORT_QEMU_CFG_DATA, ie. fw_cfg data.

I don't know the full size of the read or the destination,
because by the time I hit ^C it had already copied maybe
half of the data, so %ecx and %edi have already decremented/
incremented from the initial values.

Lots of places in the seabios code call qemu_cfg_read and
it could be any one of them, so still debugging this.
Comment 6 Josh Boyer 2012-08-13 09:55:39 EDT
(In reply to comment #4)
> Even enabling full debugging in seabios didn't make the
> problem obvious, so I took a Fedora 17 (baremetal) host
> and updated it component by component to Fedora 18 until
> the bug manifested itself.
> 
> The component to blame is the *host kernel* (NB: not the
> guest kernel).
> 
> Updating from 3.4.0-1.fc17.x86_64 -> 3.6.0-0.rc1.git4.1.fc18.x86_64
> is what triggers the problem.
> 
> So I guess this is some sort of regression in the KVM module,
> perhaps causing guest memory or I/O access to be slower.

That has all of the various kernel debugging options enabled.  They are known to slow things down.  You might want to test a non-debug kernel.  Grab 

http://koji.fedoraproject.org/koji/buildinfo?buildID=345823

or wait for the first -rc2 kernel to be built if that one has bugs that hold you up.
Comment 7 Richard W.M. Jones 2012-08-13 10:35:47 EDT
I don't know why there is a regression.

However I do have a fix for this, which is to apply Gleb's
(non-upstream) patch series to improve rep ins performance:

http://comments.gmane.org/gmane.comp.emulators.kvm.devel/95259

I applied this series on top of 3.6.0-0.rc1.git3.2 and
essentially the pause goes away.  Going from "Booting the ROM"
to the first kernel message is about 0.8 seconds (versus 3 before).
Comment 8 Richard W.M. Jones 2012-08-13 10:37:22 EDT
(In reply to comment #6)
> That has all of the various kernel debugging options enabled.  They are
> known to slow things down.  You might want to test a non-debug kernel.  Grab 

How can I find out if a kernel has debugging options enabled?
Comment 9 Richard W.M. Jones 2012-08-13 10:43:18 EDT
(In reply to comment #7)
> http://comments.gmane.org/gmane.comp.emulators.kvm.devel/95259
> 
> I applied this series on top of 3.6.0-0.rc1.git3.2 and
> essentially the pause goes away.  Going from "Booting the ROM"
> to the first kernel message is about 0.8 seconds (versus 3 before).

Just to be clearer about this.  I tested two identical host
kernels except that one had Gleb's patch series applied.  So
whether or not debugging is enabled in this kernel, the patch
series still makes a big difference.
Comment 10 Josh Boyer 2012-08-13 10:46:29 EDT
(In reply to comment #8)
> (In reply to comment #6)
> > That has all of the various kernel debugging options enabled.  They are
> > known to slow things down.  You might want to test a non-debug kernel.  Grab 
> 
> How can I find out if a kernel has debugging options enabled?

If you're using rawhide, we build one kernel without debug set per RC/release.  So the first build of rc1, rc2, etc will have them off.  Otherwise, they're always enabled.  Anything with -gitX will have debug options enabled.
Comment 11 Richard W.M. Jones 2012-08-13 12:27:16 EDT
(In reply to comment #6)
> That has all of the various kernel debugging options enabled.  They are
> known to slow things down.  You might want to test a non-debug kernel.

What I did was I removed Gleb's patches and cherry picked
d1027f01240c3426211a80146f8c36c2b5736865 (which disables
debug options).

So now I have a kernel which has debugging off, but is otherwise
completely identical to the one I was testing before (eg. in
comment 4).

I rebooted the host with this kernel and ran the test again.

The delay is now ~ 1.2 seconds, so it does seem likely that
the regression was caused by adding debugging options.

Nevertheless Gleb's patches are a clear benefit (even with
debugging on).
Comment 12 Richard W.M. Jones 2012-08-13 12:33:39 EDT
OK I'm going to close this because I'm satisfied with
the boot times I'm getting without debug enabled.

Note You need to log in before you can comment on or make changes to this bug.