Occasionally boot hangs under qemu at the point shown. Probably about 1 time in 20. /usr/bin/qemu-kvm \ -drive file=test.img,cache=off,if=virtio \ -nodefconfig \ -machine accel=kvm:tcg \ -nodefaults \ -nographic \ -m 500 \ -no-reboot \ -no-hpet \ -device virtio-serial \ -serial stdio \ -chardev socket,path=/tmp/libguestfs3ISUDX/guestfsd.sock,id=channel0 \ -device virtserialport,chardev=channel0,name=org.libguestfs.channel.0 \ -kernel /var/tmp/.guestfs-419/kernel.19164 \ -initrd /var/tmp/.guestfs-419/initrd.19164 \ -append 'panic=1 console=ttyS0 udevtimeout=300 noapic acpi=off printk.time=1 cgroup_disable=memory selinux=0 guestfs_verbose=1 TERM=linux ' \ -drive file=/var/tmp/.guestfs-419/root.19164,snapshot=on,if=virtio,cache=unsafe libguestfs: accept_from_daemon: 0x2636ca0 g->state = 1 Could not access KVM kernel module: No such file or directory failed to initialize KVM: No such file or directory Back to tcg accelerator. Could not open option rom 'sgabios.bin': No such file or directory [ 0.000000] Initializing cgroup subsys cpuset [ 0.000000] Initializing cgroup subsys cpu [ 0.000000] Linux version 3.0-0.rc7.git3.1.fc16.x86_64 (mockbuild.fedoraproject.org) (gcc version 4.6.1 20110715 (Red Hat 4.6.1-3) (GCC) ) #1 SMP Fri Jul 15 22:56:12 UTC 2011 [ 0.000000] Command line: panic=1 console=ttyS0 udevtimeout=300 noapic acpi=off printk.time=1 cgroup_disable=memory selinux=0 guestfs_verbose=1 TERM=linux [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] BIOS-e820: 0000000000000000 - 000000000009bc00 (usable) [ 0.000000] BIOS-e820: 000000000009bc00 - 00000000000a0000 (reserved) [ 0.000000] BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) [ 0.000000] BIOS-e820: 0000000000100000 - 000000001f3fd000 (usable) [ 0.000000] BIOS-e820: 000000001f3fd000 - 000000001f400000 (reserved) [ 0.000000] BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved) [ 0.000000] NX (Execute Disable) protection: active [ 0.000000] DMI 2.4 present. [ 0.000000] No AGP bridge found [ 0.000000] last_pfn = 0x1f3fd max_arch_pfn = 0x400000000 [ 0.000000] x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106 [ 0.000000] found SMP MP-table at [ffff8800000fdaf0] fdaf0 [ 0.000000] init_memory_mapping: 0000000000000000-000000001f3fd000 [ 0.000000] RAMDISK: 1f2d4000 - 1f3f0000 [ 0.000000] No NUMA configuration found [ 0.000000] Faking a node at 0000000000000000-000000001f3fd000 [ 0.000000] Initmem setup node 0 0000000000000000-000000001f3fd000 [ 0.000000] NODE_DATA [000000001f2bf000 - 000000001f2d3fff] [ 0.000000] Zone PFN ranges: [ 0.000000] DMA 0x00000010 -> 0x00001000 [ 0.000000] DMA32 0x00001000 -> 0x00100000 [ 0.000000] Normal empty [ 0.000000] Movable zone start PFN for each node [ 0.000000] early_node_map[2] active PFN ranges [ 0.000000] 0: 0x00000010 -> 0x0000009b [ 0.000000] 0: 0x00000100 -> 0x0001f3fd [ 0.000000] SFI: Simple Firmware Interface v0.81 http://simplefirmware.org [ 0.000000] Intel MultiProcessor Specification v1.4 [ 0.000000] MPTABLE: OEM ID: BOCHSCPU [ 0.000000] MPTABLE: Product ID: 0.1 [ 0.000000] MPTABLE: APIC at: 0xFEE00000 [ 0.000000] Processor #0 (Bootup-CPU) [ 0.000000] IOAPIC[0]: apic_id 1, version 17, address 0xfec00000, GSI 0-23 [ 0.000000] Processors: 1 [ 0.000000] SMP: Allowing 1 CPUs, 0 hotplug CPUs [ 0.000000] PM: Registered nosave memory: 000000000009b000 - 000000000009c000 [ 0.000000] PM: Registered nosave memory: 000000000009c000 - 00000000000a0000 [ 0.000000] PM: Registered nosave memory: 00000000000a0000 - 00000000000f0000 [ 0.000000] PM: Registered nosave memory: 00000000000f0000 - 0000000000100000 [ 0.000000] Allocating PCI resources starting at 1f400000 (gap: 1f400000:e0bc0000) [ 0.000000] Booting paravirtualized kernel on bare hardware [ 0.000000] setup_percpu: NR_CPUS:512 nr_cpumask_bits:512 nr_cpu_ids:1 nr_node_ids:1 [ 0.000000] PERCPU: Embedded 475 pages/cpu @ffff88001f000000 s1916672 r8192 d20736 u2097152 [ 0.000000] Built 1 zonelists in Node order, mobility grouping on. Total pages: 126125 [ 0.000000] Policy zone: DMA32 [ 0.000000] Kernel command line: panic=1 console=ttyS0 udevtimeout=300 noapic acpi=off printk.time=1 cgroup_disable=memory selinux=0 guestfs_verbose=1 TERM=linux [ 0.000000] Disabling memory control group subsystem [ 0.000000] PID hash table entries: 2048 (order: 2, 16384 bytes) [ 0.000000] Checking aperture... [ 0.000000] No AGP bridge found [ 0.000000] Memory: 473184k/511988k available (5134k kernel code, 468k absent, 38336k reserved, 6583k data, 2780k init) [ 0.000000] SLUB: Genslabs=15, HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1 [ 0.000000] Hierarchical RCU implementation. [ 0.000000] RCU dyntick-idle grace-period acceleration is enabled. [ 0.000000] RCU lockdep checking is enabled. [ 0.000000] NR_IRQS:33024 nr_irqs:256 16 [ 0.000000] Console: colour dummy device 80x25 [ 0.000000] console [ttyS0] enabled [ 0.000000] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar [ 0.000000] ... MAX_LOCKDEP_SUBCLASSES: 8 [ 0.000000] ... MAX_LOCK_DEPTH: 48 [ 0.000000] ... MAX_LOCKDEP_KEYS: 8191 [ 0.000000] ... CLASSHASH_SIZE: 4096 [ 0.000000] ... MAX_LOCKDEP_ENTRIES: 16384 [ 0.000000] ... MAX_LOCKDEP_CHAINS: 32768 [ 0.000000] ... CHAINHASH_SIZE: 16384 [ 0.000000] memory used by lock dependency info: 6367 kB [ 0.000000] per task-struct memory footprint: 2688 bytes [ 0.000000] Fast TSC calibration using PIT [ 0.000000] Detected 2333.006 MHz processor. [ 0.008999] Calibrating delay loop (skipped), value calculated using timer frequency.. 4666.01 BogoMIPS (lpj=2333006) [ 0.010317] pid_max: default: 32768 minimum: 301 [ 0.025576] Security Framework initialized [ 0.028428] SELinux: Disabled at boot. [ 0.046736] Dentry cache hash table entries: 65536 (order: 7, 524288 bytes) [ 0.051646] Inode-cache hash table entries: 32768 (order: 6, 262144 bytes) [ 0.054478] Mount-cache hash table entries: 256 [ 0.109218] Initializing cgroup subsys cpuacct [ 0.110212] Initializing cgroup subsys memory [ 0.114265] Initializing cgroup subsys devices [ 0.114584] Initializing cgroup subsys freezer [ 0.115116] Initializing cgroup subsys net_cls [ 0.115376] Initializing cgroup subsys blkio [ 0.117156] Initializing cgroup subsys perf_event [ 0.124115] mce: CPU supports 10 MCE banks [ 0.126631] SMP alternatives: switching to UP code [ 0.350568] Freeing SMP alternatives: 12k freed [ 0.351793] ftrace: allocating 25740 entries in 101 pages [ 0.358954] CPU0: AMD QEMU Virtual CPU version 0.14.50 stepping 03
Does NOT seem to happen with qemu 2:0.15.0-0.1.20110718525e3df.fc16
I can't reproduce this with qemu-0.15.0-0.2.20110718525e3df.fc16.x86_64 either. However I'll make an observation: the next line in normal output would be the one about verifying the APIC: [ 0.405947] CPU0: AMD QEMU Virtual CPU version 0.14.50 stepping 03 [ 0.516930] APIC timer disabled due to verification failure We've had lots of problems with the qemu APIC in the past, so if we can reproduce this, we should try adding LIBGUESTFS_APPEND=noapic .
Closing since it appears to have "fixed itself" for some reason.
I've just seen this happen again with: 2:qemu-kvm-0.15.0-0.2.20110718525e3df.fc16.i686 Last messages before the hang: [ 0.000999] SELinux: Disabled at boot. [ 0.000999] Mount-cache hash table entries: 512 [ 0.000999] Initializing cgroup subsys cpuacct [ 0.000999] Initializing cgroup subsys memory [ 0.000999] Initializing cgroup subsys devices [ 0.000999] Initializing cgroup subsys freezer [ 0.000999] Initializing cgroup subsys net_cls [ 0.000999] Initializing cgroup subsys blkio [ 0.000999] Initializing cgroup subsys perf_event [ 0.000999] mce: CPU supports 10 MCE banks [ 0.000999] SMP alternatives: switching to UP code [ 0.000999] Freeing SMP alternatives: 12k freed [ 0.000999] ftrace: allocating 24340 entries in 48 pages [ 0.000999] Enabling APIC mode: Flat. Using 1 I/O APICs [ 0.000999] CPU0: AMD QEMU Virtual CPU version 0.14.50 stepping 03
This doesn't seem to happen in Rawhide with 2:qemu-kvm-0.15.0-1.fc17.i686. Can we backport 0.15.0 to F16?
Assign to correct component.
Unfortunately I've seen this again on Rawhide 2:qemu-kvm-0.15.0-3.fc17.x86_64. Failed build: http://koji.fedoraproject.org/koji/taskinfo?taskID=3296778
Is there some way to find out where it's hanging?
No, I tried to reproduce this, but I can't reproduce it locally. It only happens inside Koji which is like the worst place to do any debugging.
I tried to reproduce tris bug with our VPS. Problem occurs only when guest kernel has parameter "noapic". (In reply to comment #9) > No, I tried to reproduce this, but I can't reproduce it > locally. It only happens inside Koji which is like the > worst place to do any debugging.
Interesting ... I'm going to remove that parameter in the next release and see if it happens again.
This bug happens without noapic (ie. with APICs). I'm still trying to reproduce it locally in the hope of capturing a kernel dump when it happens.
After 1764 iterations of my boot test, I have reproduced this bug on my local machine! Now to capture some information from the KVM process ...
I captured %rip and the hang happens in the loop in arch/x86/kernel/apic.c : calibrate_APIC_clock http://lxr.linux.no/#linux+v3.0.4/arch/x86/kernel/apic/apic.c#L632 while (lapic_cal_loops <= LAPIC_CAL_LOOPS) cpu_relax(); <--- hangs here The assembly is: ffffffff81b849f3: eb 02 jmp ffffffff81b849f7 <setup_b oot_APIC_clock+0xa3> ffffffff81b849f5: f3 90 pause ffffffff81b849f7: 83 3d e6 f4 05 00 64 cmpl $0x64,0x5f4e6(%rip) # ffffffff81be3ee4 <lapic_cal_loops> ffffffff81b849fe: 7e f5 jle ffffffff81b849f5 <setup_boot_APIC_clock+0xa1> (%rip pointed to the pause instruction) Note that it consumes lots of CPU so it seems to be spinning in this loop, not stopped at the pause.
(In reply to comment #14) > while (lapic_cal_loops <= LAPIC_CAL_LOOPS) > cpu_relax(); <--- hangs here To clarify what I said: %rip points to that instruction. It doesn't "hang" there. It appears to be spinning in this loop forever.
Still happening occasionally. Note the strange times which I see often (but not always): [ 0.000000] Fast TSC calibration using PIT [ 0.000000] Detected 2532.902 MHz processor. [ 0.000504] Calibrating delay loop (skipped), value calculated using timer frequency.. 5065.80 BogoMIPS (lpj=2532902) [ 0.000999] pid_max: default: 32768 minimum: 301 [ 0.000999] Security Framework initialized [ 0.000999] SELinux: Disabled at boot. [ 0.000999] Dentry cache hash table entries: 65536 (order: 7, 524288 bytes) [ 0.000999] Inode-cache hash table entries: 32768 (order: 6, 262144 bytes) [ 0.000999] Mount-cache hash table entries: 256 [ 0.000999] Initializing cgroup subsys cpuacct [ 0.000999] Initializing cgroup subsys memory [ 0.000999] Initializing cgroup subsys devices [ 0.000999] Initializing cgroup subsys freezer [ 0.000999] Initializing cgroup subsys net_cls [ 0.000999] Initializing cgroup subsys blkio [ 0.000999] Initializing cgroup subsys perf_event [ 0.000999] mce: CPU supports 10 MCE banks [ 0.000999] SMP alternatives: switching to UP code [ 0.000999] Freeing SMP alternatives: 12k freed [ 0.000999] ftrace: allocating 25829 entries in 102 pages [ 0.000999] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 [ 0.000999] CPU0: AMD QEMU Virtual CPU version 0.15.0 stepping 03
With current version of kernel and qemu-kvm problem does not occurs. I think this bug can be closed.
This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component.
Closing per Comment #17, if anyone hits it again please reopen