Description of problem: $ annotate-output +'%T:%N' guestfish -a /dev/null run -v [...] 22:17:54:457493465 E: [00514ms] /usr/bin/qemu-kvm [...] 22:17:54:460912406 E: Google, Inc. 22:17:54:464330486 E: Serial Graphics Adapter 01/14/12 22:17:54:467727552 E: SGABIOS $Id: sgabios.S 8 2010-04-22 00:03:40Z nlaredo $ (\ mockbuild@) Sat Jan 14 23:18:18 UTC 2012 22:17:54:471107681 E: Term: 80x24 22:17:54:474397757 E: 4 0 22:17:54:478236572 E: x1b[2J^MSeaBIOS (version 1.7.0-20120722_040125-)^M 22:17:54:481615939 E: ^M 22:17:54:624122918 E: ^MBooting from ROM...^M 22:17:57:767809071 E: ^MProbing EDD (edd=off to disable)... ok^M 22:17:58:033771349 E: x1b[2J[ 0.000000] Initializing cgroup subsys cpuset 22:17:58:037214742 E: [ 0.000000] Initializing cgroup subsys cpu Notice the 3 second pause 22:17:54:624122918 -> 22:17:57:767809071. That's 3/8ths of the entire time libguestfs takes to boot up. Version-Release number of selected component (if applicable): $ rpm -qf /usr/share/qemu/sgabios.bin qemu-system-x86-1.1.1-1.fc18.x86_64 How reproducible: 100% Steps to Reproduce: Run this command: annotate-output +'%T:%N' guestfish -a /dev/null run -v in Fedora 18+. Observe that it hangs for 3 seconds. FYI here is the full qemu command line we are using: /usr/bin/qemu-kvm -global virtio-blk-pci.scsi=off -nodefconfig -nodefaults -nographic -device virtio-scsi-pci,id=scsi -drive file=/dev/null,id=hd0,if=none -device scsi-hd,drive=hd0 -drive file=/var/tmp/.guestfs-1000/root.2326,snapshot=on,id=appliance,if=none,cache=unsafe -device scsi-hd,drive=appliance -machine accel=kvm:tcg -m 500 -no-reboot -no-hpet -device virtio-serial -serial stdio -device sga -chardev socket,path=/tmp/libguestfslnVlGO/guestfsd.sock,id=channel0 -device virtserialport,chardev=channel0,name=org.libguestfs.channel.0 -kernel /var/tmp/.guestfs-1000/kernel.2326 -initrd /var/tmp/.guestfs-1000/initrd.2326 -append 'panic=1 console=ttyS0 udevtimeout=600 no_timer_check acpi=off printk.time=1 cgroup_disable=memory root=/dev/sdb selinux=0 guestfs_verbose=1 TERM=xterm '
Also tested with qemu 2:1.2-0.1.20120806git3e430569.fc18. This also suffers from the same problem -- 3 second delay in sgabios. It also contains some sort of bug in virtio-scsi emulation so that the kernel crashes when the guest does 'insmod virtio_scsi.ko'. I'll open a separate bug about that.
Sorry, wrong component. The bios is located in the sgabios package: sgabios-bin-0-0.20110622SVN.fc17.noarch (I tried the later sgabios, but it is completely broken. I will file a separate bug ...)
... SeaBIOS ...
Even enabling full debugging in seabios didn't make the problem obvious, so I took a Fedora 17 (baremetal) host and updated it component by component to Fedora 18 until the bug manifested itself. The component to blame is the *host kernel* (NB: not the guest kernel). Updating from 3.4.0-1.fc17.x86_64 -> 3.6.0-0.rc1.git4.1.fc18.x86_64 is what triggers the problem. So I guess this is some sort of regression in the KVM module, perhaps causing guest memory or I/O access to be slower.
I tracked this down using gdb, and it does seem to be a regression (severe slow down) in the 'rep insb' instruction. (gdb) info registers eax 0x500011 5242897 ecx 0x354990 3492240 edx 0x511 1297 ebx 0x0 0 esp 0x6f40 0x6f40 ebp 0x6f40 0x6f40 esi 0x0 0 edi 0x2b1400 2823168 eip 0x15e 0x15e eflags 0x86 [ PF SF ] cs 0xc380 50048 ss 0x0 0 ds 0xc380 50048 es 0x10 16 fs 0x0 0 gs 0x0 0 (gdb) x/5i ($cs*16)+$eip 0xc395e: rep insb (%dx),%es:(%edi) <--- here 0xc3961: mov $0xa,%ax 0xc3964: mov $0x510,%dx 0xc3967: out %ax,(%dx) 0xc3968: mov $0x511,%dx Port 0x511 is PORT_QEMU_CFG_DATA, ie. fw_cfg data. I don't know the full size of the read or the destination, because by the time I hit ^C it had already copied maybe half of the data, so %ecx and %edi have already decremented/ incremented from the initial values. Lots of places in the seabios code call qemu_cfg_read and it could be any one of them, so still debugging this.
(In reply to comment #4) > Even enabling full debugging in seabios didn't make the > problem obvious, so I took a Fedora 17 (baremetal) host > and updated it component by component to Fedora 18 until > the bug manifested itself. > > The component to blame is the *host kernel* (NB: not the > guest kernel). > > Updating from 3.4.0-1.fc17.x86_64 -> 3.6.0-0.rc1.git4.1.fc18.x86_64 > is what triggers the problem. > > So I guess this is some sort of regression in the KVM module, > perhaps causing guest memory or I/O access to be slower. That has all of the various kernel debugging options enabled. They are known to slow things down. You might want to test a non-debug kernel. Grab http://koji.fedoraproject.org/koji/buildinfo?buildID=345823 or wait for the first -rc2 kernel to be built if that one has bugs that hold you up.
I don't know why there is a regression. However I do have a fix for this, which is to apply Gleb's (non-upstream) patch series to improve rep ins performance: http://comments.gmane.org/gmane.comp.emulators.kvm.devel/95259 I applied this series on top of 3.6.0-0.rc1.git3.2 and essentially the pause goes away. Going from "Booting the ROM" to the first kernel message is about 0.8 seconds (versus 3 before).
(In reply to comment #6) > That has all of the various kernel debugging options enabled. They are > known to slow things down. You might want to test a non-debug kernel. Grab How can I find out if a kernel has debugging options enabled?
(In reply to comment #7) > http://comments.gmane.org/gmane.comp.emulators.kvm.devel/95259 > > I applied this series on top of 3.6.0-0.rc1.git3.2 and > essentially the pause goes away. Going from "Booting the ROM" > to the first kernel message is about 0.8 seconds (versus 3 before). Just to be clearer about this. I tested two identical host kernels except that one had Gleb's patch series applied. So whether or not debugging is enabled in this kernel, the patch series still makes a big difference.
(In reply to comment #8) > (In reply to comment #6) > > That has all of the various kernel debugging options enabled. They are > > known to slow things down. You might want to test a non-debug kernel. Grab > > How can I find out if a kernel has debugging options enabled? If you're using rawhide, we build one kernel without debug set per RC/release. So the first build of rc1, rc2, etc will have them off. Otherwise, they're always enabled. Anything with -gitX will have debug options enabled.
(In reply to comment #6) > That has all of the various kernel debugging options enabled. They are > known to slow things down. You might want to test a non-debug kernel. What I did was I removed Gleb's patches and cherry picked d1027f01240c3426211a80146f8c36c2b5736865 (which disables debug options). So now I have a kernel which has debugging off, but is otherwise completely identical to the one I was testing before (eg. in comment 4). I rebooted the host with this kernel and ran the test again. The delay is now ~ 1.2 seconds, so it does seem likely that the regression was caused by adding debugging options. Nevertheless Gleb's patches are a clear benefit (even with debugging on).
OK I'm going to close this because I'm satisfied with the boot times I'm getting without debug enabled.