Bug 1060126 - libguests appliance creation (with 'direct' backend) hangs on a Btrfs machine
Summary: libguests appliance creation (with 'direct' backend) hangs on a Btrfs machine
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Virtualization Tools
Classification: Community
Component: libguestfs
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Richard W.M. Jones
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-01-31 10:55 UTC by Kashyap Chamarthy
Modified: 2014-02-09 17:25 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2014-02-09 17:25:03 UTC
Embargoed:


Attachments (Terms of Use)
Complete stdout of libguestfs-test-tool (48.99 KB, text/plain)
2014-01-31 10:55 UTC, Kashyap Chamarthy
no flags Details
Complete stderr of libguestfs test tool with direct backend and TCG.txt (45.45 KB, text/plain)
2014-02-01 20:43 UTC, Kashyap Chamarthy
no flags Details
Complete stderr of libguestfs-test-tool with direct backend and TCG (10.18 KB, text/plain)
2014-02-01 20:58 UTC, Kashyap Chamarthy
no flags Details
Successful stderr of libguestfs-test-tool with direct backend (46.36 KB, text/plain)
2014-02-09 17:20 UTC, Kashyap Chamarthy
no flags Details
Successful stderr of libguestfs-test-tool with 'libvirt' backend (50.45 KB, text/plain)
2014-02-09 17:23 UTC, Kashyap Chamarthy
no flags Details

Description Kashyap Chamarthy 2014-01-31 10:55:32 UTC
Created attachment 857793 [details]
Complete stdout of libguestfs-test-tool

Description of problem
----------------------

Running libguestfs-test-tool ( with 'direct' backend) on a machine with
btrfs just hangs. 

    # NOTE: Full stdout, refer attachment

    $ libguestfs-test-tool
    [. . .]
    guestfsd: main_loop: new request, len 0x34
    guestfsd: main_loop: proc 3 (touch) took 0.06 seconds
    guestfsd: main_loop: new request, len 0x28
    umount /sysroot
    fsync /dev/sda
    libguestfs: sending SIGTERM to process 48146

Contextual `strace` of qemu-system-x86 further below. (Full strace is
37MB)


Version
-------

  $ uname -r; rpm -q qemu-system-x86 libvirt-daemon-kvm \
    libguestfs btrfs-progs
  3.14.0-0.rc0.git12.1.fc21.x86_64
  qemu-system-x86-1.7.0-4.fc21.x86_64
  libvirt-daemon-kvm-1.2.1-2.fc21.x86_64
  libguestfs-1.25.29-1.fc21.x86_64
  btrfs-progs-0.20.rc1.20131114git9f0c53f-1.fc20.x86_64


How reproducible: Consistently


Steps to reproduce
------------------

- Make a qumu wrapper script to strace qemu process, set libguestfs
  backend as 'direct' (use QEMU)

- Run libguestfs-test-tool with the above qemu wrapper to capture strace
  results of QEMU binary:
 
      $ cat qemu-strace.sh 
      #!/bin/bash -
      exec strace -o /tmp/qemu-strace.log -f -s 1024 /usr/bin/qemu-system-x86_64 "$@"
      
      $ chmod +x qemu-strace.sh
      
      $ export LIBGUESTFS_QEMU=/root/qemu-strace.sh 
      
      $ export LIBGUESTFS_BACKEND=direct
    
      $ guestfish get_backend
      direct
    
      $ libguestfs-test-tool


Actual results
--------------

libguestfs-test-tool


Expected results
----------------

Appliance creation should be successful on a Btrfs machine.


Additional info
---------------

Contextual `strace` of qemu-system-x86

===========
[. . .]
48170 +++ exited with 0 +++
48168 <... futex resumed> )             = -1 ETIMEDOUT (Connection timed out)
48168 madvise(0x7f7c14ffa000, 8368128, MADV_DONTNEED) = 0
48168 munmap(0x7f7c15ffc000, 8392704)   = 0
48168 _exit(0)                          = ?
48168 +++ exited with 0 +++
48161 <... futex resumed> )             = -1 ETIMEDOUT (Connection timed out)
48161 madvise(0x7f7c485fc000, 8368128, MADV_DONTNEED) = 0
48161 munmap(0x7f7c147f9000, 8392704)   = 0
48161 _exit(0)                          = ?
48161 +++ exited with 0 +++
48165 <... futex resumed> )             = -1 ETIMEDOUT (Connection timed out)
48165 madvise(0x7f7c167fd000, 8368128, MADV_DONTNEED) = 0
48165 munmap(0x7f7c243ff000, 8392704)   = 0
48165 _exit(0)                          = ?
48165 +++ exited with 0 +++
48154 <... ioctl resumed> , 0)          = 0
48154 ioctl(13, KVM_RUN, 0)             = 0
48154 ioctl(13, KVM_RUN, 0)             = 0
48154 ioctl(13, KVM_RUN, 0)             = 0
48154 ioctl(13, KVM_RUN, 0)             = 0
48154 ioctl(13, KVM_RUN, 0)             = 0
48154 ioctl(13, KVM_RUN, 0)             = 0
48154 ioctl(13, KVM_RUN, 0)             = 0
48154 ioctl(13, KVM_RUN <unfinished ...>
48152 <... ppoll resumed> )             = ? ERESTARTNOHAND (To be restarted if no handler)

48152 --- SIGWINCH {si_signo=SIGWINCH, si_code=SI_KERNEL} ---
48152 ppoll([{fd=80, events=POLLIN|POLLERR|POLLHUP}, {fd=79, events=POLLIN|POLLERR|POLLHUP}, {fd=78, events=POLLIN|POLLERR|POLLHUP}, {fd=77, events=POLLIN|POLLERR|POLLHUP}, {fd=76, events=POLLIN|POLLERR|POLLHUP}, {fd=75, events=POLLIN|POLLERR|POLLHUP}, {fd=74, events=POLLIN|POLLERR|POLLHUP}, {fd=73, events=POLLIN|POLLERR|POLLHUP}, {fd=72, events=POLLIN|POLLERR|POLLHUP}, {fd=71, events=POLLIN|POLLERR|POLLHUP}, {fd=70, events=POLLIN|POLLERR|POLLHUP}, {fd=69, events=POLLIN|POLLERR|POLLHUP}, {fd=68, events=POLLIN|POLLERR|POLLHUP}, {fd=67, events=POLLIN|POLLERR|POLLHUP}, {fd=66, events=POLLIN|POLLERR|POLLHUP}, {fd=65, events=POLLIN|POLLERR|POLLHUP}, {fd=64, events=POLLIN|POLLERR|POLLHUP}, {fd=63, events=POLLIN|POLLERR|POLLHUP}, {fd=62, events=POLLIN|POLLERR|POLLHUP}, {fd=61, events=POLLIN|POLLERR|POLLHUP}, {fd=60, events=POLLIN|POLLERR|POLLHUP}, {fd=59, events=POLLIN|POLLERR|POLLHUP}, {fd=58, events=POLLIN|POLLERR|POLLHUP}, {fd=57, events=POLLIN|POLLERR|POLLHUP}, {fd=56, events=POLLIN|POLLERR|POLLHUP}, {fd=55, events=POLLIN|POLLERR|POLLHUP}, {fd=54, events=POLLIN|POLLERR|POLLHUP}, {fd=53, events=POLLIN|POLLERR|POLLHUP}, {fd=52, events=POLLIN|POLLERR|POLLHUP}, {fd=51, events=POLLIN|POLLERR|POLLHUP}, {fd=50, events=POLLIN|POLLERR|POLLHUP}, {fd=49, events=POLLIN|POLLERR|POLLHUP}, {fd=48, events=POLLIN|POLLERR|POLLHUP}, {fd=47, events=POLLIN|POLLERR|POLLHUP}, {fd=46, events=POLLIN|POLLERR|POLLHUP}, {fd=45, events=POLLIN|POLLERR|POLLHUP}, {fd=44, events=POLLIN|POLLERR|POLLHUP}, {fd=43, events=POLLIN|POLLERR|POLLHUP}, {fd=42, events=POLLIN|POLLERR|POLLHUP}, {fd=41, events=POLLIN|POLLERR|POLLHUP}, {fd=40, events=POLLIN|POLLERR|POLLHUP}, {fd=39, events=POLLIN|POLLERR|POLLHUP}, {fd=38, events=POLLIN|POLLERR|POLLHUP}, {fd=37, events=POLLIN|POLLERR|POLLHUP}, {fd=36, events=POLLIN|POLLERR|POLLHUP}, {fd=35, events=POLLIN|POLLERR|POLLHUP}, {fd=34, events=POLLIN|POLLERR|POLLHUP}, {fd=33, events=POLLIN|POLLERR|POLLHUP}, {fd=32, events=POLLIN|POLLERR|POLLHUP}, {fd=31, events=POLLIN|POLLERR|POLLHUP}, {fd=30, events=POLLIN|POLLERR|POLLHUP}, {fd=29, events=POLLIN|POLLERR|POLLHUP}, {fd=28, events=POLLIN|POLLERR|POLLHUP}, {fd=27, events=POLLIN|POLLERR|POLLHUP}, {fd=26, events=POLLIN|POLLERR|POLLHUP}, {fd=25, events=POLLIN|POLLERR|POLLHUP}, {fd=24, events=POLLIN|POLLERR|POLLHUP}, {fd=23, events=POLLIN|POLLERR|POLLHUP}, {fd=22, events=POLLIN|POLLERR|POLLHUP}, {fd=21, events=POLLIN|POLLERR|POLLHUP}, {fd=20, events=POLLIN|POLLERR|POLLHUP}, {fd=19, events=POLLIN|POLLERR|POLLHUP}, {fd=18, events=POLLIN|POLLERR|POLLHUP}, {fd=17, events=POLLIN|POLLERR|POLLHUP}, {fd=16, events=POLLIN|POLLERR|POLLHUP}, {fd=15, events=POLLIN|POLLERR|POLLHUP}, {fd=14, events=POLLIN|POLLERR|POLLHUP}, {fd=3, events=POLLIN|POLLERR|POLLHUP}, {fd=5, events=POLLIN}, {fd=0, events=POLLIN}, {fd=6, events=POLLIN}, {fd=11, events=POLLIN}, {fd=4, events=POLLIN}], 73, NULL, NULL, 8
(END)
===========


Just for reference, Btrfs partitioning scheme on the host:

  $ df -hT
  Filesystem     Type      Size  Used Avail Use% Mounted on
  /dev/sda3      btrfs     123G   15G  107G  13% /
  devtmpfs       devtmpfs   32G     0   32G   0% /dev
  tmpfs          tmpfs      32G     0   32G   0% /dev/shm
  tmpfs          tmpfs      32G  1.4M   32G   1% /run
  tmpfs          tmpfs      32G     0   32G   0% /sys/fs/cgroup
  tmpfs          tmpfs      32G   42M   32G   1% /tmp
  /dev/sda3      btrfs     123G   15G  107G  13% /home

Comment 1 Kashyap Chamarthy 2014-01-31 10:59:03 UTC
(In reply to Kashyap Chamarthy from comment #0)
> Created attachment 857793 [details]
> Complete stdout of libguestfs-test-tool
> 
> Description of problem
> ----------------------
> 
> Running libguestfs-test-tool ( with 'direct' backend) on a machine with
> btrfs just hangs. 
> 
>     # NOTE: Full stdout, refer attachment
> 
>     $ libguestfs-test-tool
>     [. . .]
>     guestfsd: main_loop: new request, len 0x34
>     guestfsd: main_loop: proc 3 (touch) took 0.06 seconds
>     guestfsd: main_loop: new request, len 0x28
>     umount /sysroot
>     fsync /dev/sda
>     libguestfs: sending SIGTERM to process 48146
> 
> Contextual `strace` of qemu-system-x86 further below. (Full strace is
> 37MB)
> 
> 
> Version
> -------
> 
>   $ uname -r; rpm -q qemu-system-x86 libvirt-daemon-kvm \
>     libguestfs btrfs-progs
>   3.14.0-0.rc0.git12.1.fc21.x86_64
>   qemu-system-x86-1.7.0-4.fc21.x86_64
>   libvirt-daemon-kvm-1.2.1-2.fc21.x86_64
>   libguestfs-1.25.29-1.fc21.x86_64
>   btrfs-progs-0.20.rc1.20131114git9f0c53f-1.fc20.x86_64
> 
> 
> How reproducible: Consistently
> 
> 
> Steps to reproduce
> ------------------
> 
> - Make a qumu wrapper script to strace qemu process, set libguestfs
>   backend as 'direct' (use QEMU)
> 
> - Run libguestfs-test-tool with the above qemu wrapper to capture strace
>   results of QEMU binary:
>  
>       $ cat qemu-strace.sh 
>       #!/bin/bash -
>       exec strace -o /tmp/qemu-strace.log -f -s 1024
> /usr/bin/qemu-system-x86_64 "$@"
>       
>       $ chmod +x qemu-strace.sh
>       
>       $ export LIBGUESTFS_QEMU=/root/qemu-strace.sh 
>       
>       $ export LIBGUESTFS_BACKEND=direct
>     
>       $ guestfish get_backend
>       direct
>     
>       $ libguestfs-test-tool
> 
> 
> Actual results
> --------------
> 
> libguestfs-test-tool

s/libguestfs-test-tool/Appliance creation hangs as noted in the attachment

    https://bugzilla.redhat.com/attachment.cgi?id=857793

Comment 2 Kashyap Chamarthy 2014-01-31 11:00:30 UTC
Complete `strace`: http://kashyapc.fedorapeople.org/temp/qemu-strace.log

Comment 3 Richard W.M. Jones 2014-02-01 07:55:54 UTC
I suspect this isn't btrfs, but just a general hang.  I've seen
Rawhide hanging quite a lot when running the tests, but it is
frustratingly unreproducible.  It indicates some kind of kernel/
qemu emulation bug.

Firstly, I'm assuming because of previous conversations that this
is running on your baremetal 48 core monster machine?  So we're
not talking about nested KVM.

'clocksource: kvm-clock' which indicates that it is using KVM.
Also KVM_RUN ioctls appear in the strace.

I notice that you've set the backend to 'direct'.  Does it happen
with the libvirt backend?

Comment 4 Richard W.M. Jones 2014-02-01 07:56:38 UTC
Also you could try:

  export LIBGUESTFS_BACKEND_SETTINGS=force_tcg

(in combination with backend direct & libvirt).

Comment 5 Kashyap Chamarthy 2014-02-01 17:55:34 UTC
(In reply to Richard W.M. Jones from comment #3)
> I suspect this isn't btrfs, but just a general hang.  I've seen
> Rawhide hanging quite a lot when running the tests, but it is
> frustratingly unreproducible.  It indicates some kind of kernel/
> qemu emulation bug.

I see. Even, I noticed some compiles were hanging in a unreliable way, but I haven't done a systematic analysis. Next time I compile, will try to keep notes.

> 
> Firstly, I'm assuming because of previous conversations that this
> is running on your baremetal 48 core monster machine? So we're
> not talking about nested KVM.

Correct, just bare-metal with: F20, with Kernel, QEMU, Libvirt from Rawhide.

> 
> 'clocksource: kvm-clock' which indicates that it is using KVM.
> Also KVM_RUN ioctls appear in the strace.
> 
> I notice that you've set the backend to 'direct'.  Does it happen
> with the libvirt backend?

IIRC, it did hang with libvirt back-end, let me double confirm.

Yes, will try the 'force_tcg' back-end setting with both 'direct' and 'libvirt' back-end combinations.

Comment 6 Kashyap Chamarthy 2014-02-01 19:50:22 UTC
With 'libvirt' backend (and KVM)
--------------------------------

    $ guestfish get_backend
    libvirt
  
       $ time libguestfs-test-tool |& tee stderr-libvirt-kvm.txt
        ************************************************************
        *                    IMPORTANT NOTICE
        *
        * When reporting bugs, include the COMPLETE, UNEDITED
        * output below in your bug report.
        *
        ************************************************************
   LIBGUESTFS_QEMU=/usr/local/bin/qemu-strace.sh
   LIBGUESTFS_BACKEND=libvirt
   PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/root/bin
   SELinux: Permissive
   guestfs_get_append: (null)
   guestfs_get_autosync: 1
   guestfs_get_backend: libvirt
   guestfs_get_backend_settings: []
   guestfs_get_cachedir: /var/tmp
   guestfs_get_direct: 0
   guestfs_get_hv: /usr/local/bin/qemu-strace.sh
   guestfs_get_memsize: 500
   guestfs_get_network: 0
   guestfs_get_path: /usr/lib64/guestfs
   guestfs_get_pgroup: 0
   guestfs_get_program: libguestfs-test-tool
   guestfs_get_recovery_proc: 1
   guestfs_get_selinux: 0
   guestfs_get_smp: 1
   guestfs_get_tmpdir: /tmp
   guestfs_get_trace: 0
   guestfs_get_verbose: 1
   host_cpu: x86_64
   Launching appliance, timeout set to 600 seconds.
   libguestfs: launch: program=libguestfs-test-tool
   libguestfs: launch: version=1.25.29fedora=21,release=1.fc21,libvirt
   libguestfs: launch: backend registered: unix
   libguestfs: launch: backend registered: uml
   libguestfs: launch: backend registered: libvirt
   libguestfs: launch: backend registered: direct
   libguestfs: launch: backend=libvirt
   libguestfs: launch: tmpdir=/tmp/libguestfsHhzoO1
   libguestfs: launch: umask=0022
   libguestfs: launch: euid=0
   libguestfs: libvirt version = 1002001 (1.2.1)
   libguestfs: guest random name = guestfs-ehitwgt8t5q9o250
   libguestfs: [00000ms] connect to libvirt
   libguestfs: opening libvirt handle: URI = qemu:///system, auth = default+wrapper, flags = 0
   libguestfs: successfully opened libvirt handle: conn = 0x7f7547ee4880
   libguestfs: [00015ms] get libvirt capabilities    
       real    10m3.155s
       user    0m0.020s
       sys     0m0.322s
     
       $ echo $?
       0

Note
----

The reason for 10 minutes to run w/ wrapper (from IRC discussion with
Rich:

  From IRC discussion with Rich: I's basically because when you use a
  qemu wrapper w/ strace libvirt has trouble running qemu in order to
  query its capabilities.

Comment 7 Kashyap Chamarthy 2014-02-01 20:43:01 UTC
Created attachment 858116 [details]
Complete stderr of libguestfs test tool with direct backend and TCG.txt

So, even with setting 'force, /direct/ backend, the appliance seems to be hung just as previously:

------
.
.
.
[   40.500245] EXT4-fs (sda1): mounting ext2 file system using the ext4 subsystem
[   40.669298] EXT4-fs (sda1): mounted filesystem without journal. Opts: (null)
guestfsd: main_loop: proc 1 (mount) took 0.76 seconds
guestfsd: main_loop: new request, len 0x34
guestfsd: main_loop: proc 3 (touch) took 0.02 seconds
guestfsd: main_loop: new request, len 0x28
umount /sysroot
fsync /dev/sda
libguestfs: sending SIGTERM to process 40190
------

Comment 8 Kashyap Chamarthy 2014-02-01 20:58:58 UTC
Created attachment 858129 [details]
Complete stderr of libguestfs-test-tool with direct backend and TCG

Now for real, test with 'direct' backend and force_tcg.

With TCG, it's hung here:

    $ export LIBGUESTFS_BACKEND=direct

    $ export LIBGUESTFS_BACKEND_SETTINGS=force_tcg

    $ guestfish get_backend_settings
    force_tcg

    $ libguestfs-test-tool |& tee stderr-direct-tcg.txt
    .
    .
    .
    [    0.000000] \tRCU lockdep checking is enabled.
    [    0.000000] \tRCU restricting CPUs from NR_CPUS=8192 to nr_cpu_ids=1.
    [    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
    [    0.000000] NR_IRQS:524544 nr_irqs:256 16
    [    0.000000] Console: colour *CGA 80x25
    [    0.000000] console [ttyS0] enabled
    [    0.000000] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
    [    0.000000] ... MAX_LOCKDEP_SUBCLASSES:  8
    [    0.000000] ... MAX_LOCK_DEPTH:          48
    [    0.000000] ... MAX_LOCKDEP_KEYS:        8191
    [    0.000000] ... CLASSHASH_SIZE:          4096
    [    0.000000] ... MAX_LOCKDEP_ENTRIES:     16384
    [    0.000000] ... MAX_LOCKDEP_CHAINS:      32768
    [    0.000000] ... CHAINHASH_SIZE:          16384
    [    0.000000]  memory used by lock dependency info: 6367 kB
    [    0.000000]  per task-struct memory footprint: 2688 bytes
    [    0.000000] kmemleak: Kernel memory leak detector disabled
    main-loop: WARNING: I/O thread spun for 1000 iterations

Comment 9 Kashyap Chamarthy 2014-02-02 06:05:58 UTC
(In reply to Kashyap Chamarthy from comment #8)
> Created attachment 858129 [details]
> Complete stderr of libguestfs-test-tool with direct backend and TCG
> 
> Now for real, test with 'direct' backend and force_tcg.

Here's the complete strace with 'direct' backend and force_tcg:

   http://kashyapc.fedorapeople.org/temp/qemu-strace-direct-backend-with-tcg.log


PS: Forgot to note in the previous comment -- disregard comment #7]

Comment 10 Kashyap Chamarthy 2014-02-09 17:20:40 UTC
Created attachment 861085 [details]
Successful stderr of libguestfs-test-tool with direct backend

Ok, I re-ran the test with "direct" and "libvirt" backends with (KVM), both are successful on this Btrfs machine with 3.14 rc1 Kernel -- 3.14.0-0.rc1.git3.1.fc21.x86_64

  $ rpm -q libvirt-daemon-kvm qemu-system-x86 libguestfs
  libvirt-daemon-kvm-1.2.1-2.fc21.x86_64
  qemu-system-x86-1.7.0-4.fc21.x86_64
  libguestfs-1.25.29-1.fc21.x86_64

Comment 11 Kashyap Chamarthy 2014-02-09 17:23:56 UTC
Created attachment 861097 [details]
Successful stderr of libguestfs-test-tool with 'libvirt' backend

Comment 12 Kashyap Chamarthy 2014-02-09 17:25:03 UTC
Since a mysterious rawhide Kernel fixed it, closing it with rationale: RAWHIDE

I noted the fixed Kernel version in Comment #10


Note You need to log in before you can comment on or make changes to this bug.