Description of problem: When booting Linux 3.1-rc9 in qemu (with TCG), there is a big slowdown during boot. It is much slower than Linux 3.1-rc6. In some cases it takes up to 2 minutes to do this: [ 19.059454] piix4_smbus 0000:00:01.3: SMBus Host Controller at 0xb100, revision 0 [ 134.360659] Clocksource tsc unstable (delta = 115167492624 ns) [ 134.435965] Switching to clocksource jiffies [ 134.502575] ifconfig used greatest stack depth: 2992 bytes left Here is another example from a different host machine. The delay is only 15 seconds, but still much longer than it used to be: [ 15.908844] piix4_smbus 0000:00:01.3: SMBus Host Controller at 0xb100, revision 0 [ 33.685023] lvm used greatest stack depth: 2880 bytes left [ 39.742930] Clocksource tsc unstable (delta = 83125775 ns) Version-Release number of selected component (if applicable): kernel-3.1.0-0.rc9.git0.0.fc17.x86_64 (but rc6 is GOOD) qemu-0.15.0-5.fc17.x86_64 How reproducible: 100% reliably. Steps to Reproduce: 1. Just boot the kernel in qemu with TCG. 2. 3. Actual results: Slowdown. Expected results: Full speed ahead. Additional info: I'm going to git bisect this one, since we have a pretty narrow range of possible commits to test. No slowdown with regular accelerated KVM. Have not tested on baremetal.
Did you ever bisect this? Does it still happen with 3.1.0-7.fc16? Do you have a command line to recreate using TCG? I'm not sure why you think this is the kernel's fault. TCG, as I understand it, is dynamic translation of instructions on the fly. If an instruction stream changes, it seems perfectly reasonable that performance can vary.
This is still happening with kernel 3.2.0-0.rc0.git4.1.fc17 from a few days ago. It's definitely a bug, although probably one in qemu, because something shouldn't go from no time to taking several minutes without us understanding exactly what the reason is. I've not had time to bisect this. I cannot reproduce this locally (only in Koji, where it happens all the time), but if it were to be reproducible then something like this should do it: cd /tmp cat > sleep.c #include <stdio.h> #include <stdlib.h> #include <unistd.h> int main (void) { write (2, "sleeping\n", 9); sleep (60); write (2, "exiting\n", 8); return 0; } gcc -static sleep.c -o init echo 'init' | cpio -o -c > initrd qemu-system-x86_64 -nodefconfig -machine accel=tcg -nodefaults -nographic -m 500 -no-reboot -no-hpet -device virtio-serial -serial stdio -kernel /boot/vmlinuz-3.1.0-0.rc9.git0.0.fc17.x86_64 -initrd /tmp/initrd -append 'panic=1 console=ttyS0 no_timer_check acpi=off'
From: "Richard W.M. Jones" <rjones> But why do we have all this timer detection code running in the virt path? It makes no sense for Linux guests to have flakey timing loops and calculations, when all of this information is already known by the hypervisor and could just be passed up to the guest. Why don't we just put all of this stuff in the dmi information about what timers are available and what speeds they run at, and pass the whole lot over to Linux and be done with it?
*** Bug 870042 has been marked as a duplicate of this bug. ***
Richard, Please inform whether the lpj= setting resolves the problem for you.
I'm going to close this one because it's old and the slowdown no longer occurs. However I am going to try your lpj= suggestion to see if it improves the clocksource stability problems under TCG.
I tested this again and it does seem to have gone away. I have also pushed a patch to libguestfs so it will try to pass the lpj=... parameter (TCG only): https://github.com/libguestfs/libguestfs/commit/aeea803ad0fafe1ed4c7f8e781dfe4fdc150cac0