Bug 998065 - libguestfs kernel hang in RHEL 6.5
libguestfs kernel hang in RHEL 6.5
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libguestfs (Show other bugs)
6.5
Unspecified Unspecified
unspecified Severity unspecified
: rc
: ---
Assigned To: Richard W.M. Jones
Virtualization Bugs
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-16 18:46 EDT by Colin Walters
Modified: 2014-05-20 07:05 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 998108 (view as bug list)
Environment:
Last Closed: 2014-05-20 07:05:16 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Colin Walters 2013-08-16 18:46:40 EDT
This is almost certainly a regression in kernel 2.6.32-410.el6.x86_64, but filing here.

[root@pluto libvirt]# LIBGUESTFS_DEBUG=1 guestfish -a /var/lib/libvirt/gnome-ostree-local.img --ro -m /dev/sda3 -m /dev/sda1:/boot
libguestfs: create: flags = 0, handle = 0x22f1540
libguestfs: launch: attach-method=appliance
libguestfs: launch: tmpdir=/tmp/libguestfsQ8lPnC
libguestfs: launch: umask=0022
libguestfs: launch: euid=0
libguestfs: command: run: febootstrap-supermin-helper
libguestfs: command: run: \ --verbose
libguestfs: command: run: \ -f checksum
libguestfs: command: run: \ /usr/lib64/guestfs/supermin.d
libguestfs: command: run: \ x86_64
supermin helper [00000ms] whitelist = (not specified), host_cpu = x86_64, kernel = (null), initrd = (null), appliance = (null)
supermin helper [00000ms] inputs[0] = /usr/lib64/guestfs/supermin.d
checking modpath /lib/modules/2.6.32-381.el6.x86_64 is a directory
picked vmlinuz-2.6.32-381.el6.x86_64 because modpath /lib/modules/2.6.32-381.el6.x86_64 exists
checking modpath /lib/modules/2.6.32-400.el6.x86_64 is a directory
picked vmlinuz-2.6.32-400.el6.x86_64 because modpath /lib/modules/2.6.32-400.el6.x86_64 exists
checking modpath /lib/modules/2.6.32-410.el6.x86_64 is a directory
picked vmlinuz-2.6.32-410.el6.x86_64 because modpath /lib/modules/2.6.32-410.el6.x86_64 exists
supermin helper [00001ms] finished creating kernel
supermin helper [00002ms] visiting /usr/lib64/guestfs/supermin.d
supermin helper [00002ms] visiting /usr/lib64/guestfs/supermin.d/base.img
supermin helper [00002ms] visiting /usr/lib64/guestfs/supermin.d/daemon.img
supermin helper [00002ms] visiting /usr/lib64/guestfs/supermin.d/hostfiles
supermin helper [00022ms] visiting /usr/lib64/guestfs/supermin.d/init.img
supermin helper [00022ms] visiting /usr/lib64/guestfs/supermin.d/udev-rules.img
supermin helper [00022ms] adding kernel modules
supermin helper [00057ms] finished creating appliance
libguestfs: checksum of existing appliance: 99c7bcc2b5b4a1d498810ddcb2dc02b23cc4fcab261201e01e5df4f12ba112e7
libguestfs: [00061ms] begin testing qemu features
libguestfs: command: run: /usr/libexec/qemu-kvm
libguestfs: command: run: \ -nographic
libguestfs: command: run: \ -help
libguestfs: command: run: /usr/libexec/qemu-kvm
libguestfs: command: run: \ -nographic
libguestfs: command: run: \ -version
libguestfs: qemu version 0.12
libguestfs: command: run: /usr/libexec/qemu-kvm
libguestfs: command: run: \ -nographic
libguestfs: command: run: \ -machine accel=kvm:tcg
libguestfs: command: run: \ -device ?
libguestfs: [00183ms] finished testing qemu features
libguestfs: accept_from_daemon: 0x22f1540 g->state = 1
[00184ms] /usr/libexec/qemu-kvm \
    -global virtio-blk-pci.scsi=off \
    -nodefconfig \
    -nodefaults \
    -nographic \
    -device virtio-scsi-pci,id=scsi \
    -drive file=/var/lib/libvirt/gnome-ostree-local.img,snapshot=on,id=hd0,if=none \
    -device scsi-hd,drive=hd0 \
    -drive file=/var/tmp/.guestfs-0/root.14884,snapshot=on,id=appliance,if=none,cache=unsafe \
    -device scsi-hd,drive=appliance \
    -machine accel=kvm:tcg \
    -m 500 \
    -no-reboot \
    -device virtio-serial \
    -serial stdio \
    -device sga \
    -chardev socket,path=/tmp/libguestfsQ8lPnC/guestfsd.sock,id=channel0 \
    -device virtserialport,chardev=channel0,name=org.libguestfs.channel.0 \
    -kernel /var/tmp/.guestfs-0/kernel.14884 \
    -initrd /var/tmp/.guestfs-0/initrd.14884 \
    -append 'panic=1 console=ttyS0 udevtimeout=600 no_timer_check acpi=off printk.time=1 cgroup_disable=memory root=/dev/sdb selinux=0 guestfs_verbose=1 TERM=xterm'\x1b[1;256r\x1b[256;256H\x1b[6n
Google, Inc.
Serial Graphics Adapter 07/26/11
SGABIOS $Id: sgabios.S 8 2010-04-22 00:03:40Z nlaredo $ (mockbuild@hs20-bc2-3.build.redhat.com) Tue Jul 26 15:05:08 UTC 2011
Term: 80x24
4 0
SeaBIOS (version seabios-0.6.1.2-28.el6)
Probing EDD (edd=off to disable)... ok
\x1b[2JInitializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.32-410.el6.x86_64 (mockbuild@x86-002.build.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) #1 SMP Wed Aug 7 12:07:46 EDT 2013
Command line: panic=1 console=ttyS0 udevtimeout=600 no_timer_check acpi=off printk.time=1 cgroup_disable=memory root=/dev/sdb selinux=0 guestfs_verbose=1 TERM=xterm
KERNEL supported cpus:
  Intel GenuineIntel
  AMD AuthenticAMD
  Centaur CentaurHauls
Disabled fast string operations
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009d800 (usable)
 BIOS-e820: 000000000009d800 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000001f3fd000 (usable)
 BIOS-e820: 000000001f3fd000 - 000000001f400000 (reserved)
 BIOS-e820: 00000000fffbc000 - 0000000100000000 (reserved)
DMI 2.4 present.
SMBIOS version 2.4 @ 0xFDA40
Hypervisor detected: KVM
last_pfn = 0x1f3fd max_arch_pfn = 0x400000000
PAT not supported by CPU.
init_memory_mapping: 0000000000000000-000000001f3fd000
RAMDISK: 1f1a8000 - 1f3efc00
No NUMA configuration found
Faking a node at 0000000000000000-000000001f3fd000
Bootmem setup node 0 0000000000000000-000000001f3fd000
  NODE_DATA [0000000000009000 - 000000000003cfff]
  bootmap [000000000003d000 -  0000000000040e7f] pages 4
(7 early reservations) ==> bootmem [0000000000 - 001f3fd000]
  #0 [0000000000 - 0000001000]   BIOS data page ==> [0000000000 - 0000001000]
  #1 [0000006000 - 0000008000]       TRAMPOLINE ==> [0000006000 - 0000008000]
  #2 [0001000000 - 000201fae4]    TEXT DATA BSS ==> [0001000000 - 000201fae4]
  #3 [001f1a8000 - 001f3efc00]          RAMDISK ==> [001f1a8000 - 001f3efc00]
  #4 [000009d800 - 0000100000]    BIOS reserved ==> [000009d800 - 0000100000]
  #5 [0002020000 - 0002020059]              BRK ==> [0002020000 - 0002020059]
  #6 [0000008000 - 0000009000]          PGTABLE ==> [0000008000 - 0000009000]
found SMP MP-table at [ffff8800000fda60] fda60
kvm-clock: Using msrs 4b564d01 and 4b564d00
kvm-clock: cpu 0, msr 0:1c277c1, boot clock
Zone PFN ranges:
  DMA      0x00000001 -> 0x00001000
  DMA32    0x00001000 -> 0x00100000
  Normal   0x00100000 -> 0x00100000
Movable zone start PFN for each node
early_node_map[2] active PFN ranges
    0: 0x00000001 -> 0x0000009d
    0: 0x00000100 -> 0x0001f3fd
SFI: Simple Firmware Interface v0.7 http://simplefirmware.org
Intel MultiProcessor Specification v1.4
MPTABLE: OEM ID: BOCHSCPU
MPTABLE: Product ID: 0.1         
MPTABLE: APIC at: 0xFEE00000
Processor #0 (Bootup-CPU)
I/O APIC #0 Version 17 at 0xFEC00000.
Processors: 1
SMP: Allowing 1 CPUs, 0 hotplug CPUs
PM: Registered nosave memory: 000000000009d000 - 000000000009e000
PM: Registered nosave memory: 000000000009e000 - 00000000000a0000
PM: Registered nosave memory: 00000000000a0000 - 00000000000f0000
PM: Registered nosave memory: 00000000000f0000 - 0000000000100000
Allocating PCI resources starting at 1f400000 (gap: 1f400000:e0bbc000)
Booting paravirtualized kernel on KVM
NR_CPUS:4096 nr_cpumask_bits:1 nr_cpu_ids:1 nr_node_ids:1
PERCPU: Embedded 31 pages/cpu @ffff880002200000 s94872 r8192 d23912 u2097152
pcpu-alloc: s94872 r8192 d23912 u2097152 alloc=1*2097152
pcpu-alloc: [0] 0 
kvm-clock: cpu 0, msr 0:22167c1, primary cpu clock
kvm-stealtime: cpu 0, msr 220e880
Built 1 zonelists in Node order, mobility grouping on.  Total pages: 126045
Policy zone: DMA32
Kernel command line: panic=1 console=ttyS0 udevtimeout=600 no_timer_check acpi=off printk.time=1 cgroup_disable=memory root=/dev/sdb selinux=0 guestfs_verbose=1 TERM=xterm
[    0.000000] Disabling memory control group subsystem
[    0.000000] PID hash table entries: 2048 (order: 2, 16384 bytes)
[    0.000000] Checking aperture...
[    0.000000] No AGP bridge found
[    0.000000] Memory: 483936k/511988k available (5295k kernel code, 400k absent, 27652k reserved, 7053k data, 1268k init)
[    0.000000] Hierarchical RCU implementation.
[    0.000000] NR_IRQS:33024 nr_irqs:256
[    0.000000] Console: colour dummy device 80x25
[    0.000000] console [ttyS0] enabled
[    0.000000] Detected 2591.580 MHz processor.
[    0.001999] Calibrating delay loop (skipped) preset value.. 5183.16 BogoMIPS (lpj=2591580)
[    0.001999] pid_max: default: 32768 minimum: 301
[    0.002130] Security Framework initialized
[    0.002422] SELinux:  Disabled at boot.
[    0.003066] Dentry cache hash table entries: 65536 (order: 7, 524288 bytes)
[    0.003648] Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
[    0.004058] Mount-cache hash table entries: 256
[    0.004728] Initializing cgroup subsys ns
[    0.005007] Initializing cgroup subsys cpuacct
[    0.005341] Initializing cgroup subsys memory
[    0.005648] Initializing cgroup subsys devices
[    0.006005] Initializing cgroup subsys freezer
[    0.006294] Initializing cgroup subsys net_cls
[    0.006616] Initializing cgroup subsys blkio
[    0.007010] Initializing cgroup subsys perf_event
[    0.007329] Initializing cgroup subsys net_prio
[    0.008033] Disabled fast string operations
[    0.008568] mce: CPU supports 10 MCE banks
[    0.008902] alternatives: switching to unfair spinlock
[    0.011475] SMP alternatives: switching to UP code
[    0.022889] Freeing SMP alternatives: 36k freed
[    0.023020] ftrace: converting mcount calls to 0f 1f 44 00 00
[    0.023382] ftrace: allocating 21719 entries in 86 pages
[    0.027066] APIC routing finalized to flat.
[    0.027591] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    0.027998] CPU0: Intel QEMU Virtual CPU version (cpu64-rhel6) stepping 03
Comment 1 Richard W.M. Jones 2013-08-17 02:34:47 EDT
With the -412 kernel, nested (ie. using TCG) I'm getting a slightly
different problem.  Lots of:

[   24.678207] Clocksource tsc unstable (delta = 75815452 ns).  Enable clocksource failover by adding clocksource_failover kernel parameter.
[   26.311900] Clocksource tsc unstable (delta = 78919174 ns).  Enable clocksource failover by adding clocksource_failover kernel parameter.
[   32.618435] Clocksource tsc unstable (delta = 197138175 ns).  Enable clocksource failover by adding clocksource_failover kernel parameter.
[   34.296502] Clocksource tsc unstable (delta = 167980713 ns).  Enable clocksource failover by adding clocksource_failover kernel parameter.
[   63.056573] Clocksource tsc unstable (delta = 71039379 ns).  Enable clocksource failover by adding clocksource_failover kernel parameter.
[   69.328987] Clocksource tsc unstable (delta = 226577207 ns).  Enable clocksource failover by adding clocksource_failover kernel parameter.
[   72.595910] Clocksource tsc unstable (delta = 267708526 ns).  Enable clocksource failover by adding clocksource_failover kernel parameter.
[   78.234690] Clocksource tsc unstable (delta = 85922093 ns).  Enable clocksource failover by adding clocksource_failover kernel parameter.
[   81.805477] Clocksource tsc unstable (delta = 71247525 ns).  Enable clocksource failover by adding clocksource_failover kernel parameter.
[   82.420955] Clocksource tsc unstable (delta = 115523027 ns).  Enable clocksource failover by adding clocksource_failover kernel parameter.

and not much progress being made.

Upstream we switched over to using kvmclock (038ed0a08e & c53b459fdd)
which we should probably do in RHEL too since it would avoid most of
this trouble.

I will test on baremetal next.
Comment 2 Richard W.M. Jones 2013-08-17 02:43:40 EDT
It works OK for me on baremetal (with the -412 kernel).

Is the error reproducible every time, or only occasionally?

Are you using this on baremetal or nested (eg in a cloud VM)?
Comment 3 Colin Walters 2013-08-17 07:49:53 EDT
(In reply to Richard W.M. Jones from comment #2)
> It works OK for me on baremetal (with the -412 kernel).

Hmm.  So you're booting -412 on -412?  I'm sadly stuck on 2.6.32-381.el6.x86_64 due to 
https://bugzilla.redhat.com/show_bug.cgi?id=987060

Although I haven't tested -412 yet as a host.  Give me a bit to context switch and try it.

> Is the error reproducible every time, or only occasionally?

Hangs every time.

> Are you using this on baremetal or nested (eg in a cloud VM)?

Baremetal; Lenovo T420s laptop.
Comment 4 Richard W.M. Jones 2013-08-17 08:36:16 EDT
Updates from IRC conversations and others:

- Would be interesting to know if the kernel eventually
  prints out anything, or if nothing is printed before
  the libguestfs-test-tool timeout (10 mins).

- I have tried the -412 kernel on 3 systems, 2 baremetal,
  1 virtualized, and I can't reproduce it.  Note the bug
  was reported on -410 so this is not necessarily indicative.
  Would be interesting to know if the -412 or -413 kernel
  also shows the bug.

- Colin tried adding -cpu host,+kvmclock to the qemu command
  line, but that didn't make any difference.  The bug still
  happened with kvmclock enabled (hence I'm removing the
  blocked bugs).
Comment 6 RHEL Product and Program Management 2013-10-13 22:41:31 EDT
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.
Comment 7 bfan 2014-03-16 22:58:02 EDT
Hello Colin Walters,

May I know do you still meet this kernel hung in latest rhel6 kernel?
Comment 8 Richard W.M. Jones 2014-03-17 04:25:47 EDT
I don't see this, and we'd have heard about it if it was
happening in the released RHEL 6.5 kernel.  My guess is it
was a temporary blip in an unreleased kernel.
Comment 9 Richard W.M. Jones 2014-05-20 07:05:16 EDT
I'm closing this based on my reasoning in comment 8.

Please reopen this if you see the same problem again, or open
another bug if you see a different kernel hang.

Note You need to log in before you can comment on or make changes to this bug.