Bug 1087127

Summary:

vm stuck in build phase; guestfs instance appears stuck

Product:

Red Hat OpenStack

Reporter:

Ian Wienand <iwienand>

Component:

openstack-nova

Assignee:

Nikola Dipanov <ndipanov>

Status:

CLOSED INSUFFICIENT_DATA

QA Contact:

Ami Jeain <ajeain>

Severity:

low

Docs Contact:

Priority:

low

Version:

4.0

CC:

aortega, eglynn, gdubreui, iwienand, kchamart, lars, ndipanov, rjones, sgordon, vpopovic, wfoster, yeylon

Target Milestone:

Target Release:

6.0 (Juno)

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2014-07-31 19:17:49 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
strace of qemu	none
lsof of qemu	none

Description Ian Wienand 2014-04-14 03:36:47 UTC

Description of problem:

Got a VM that is stuck building (or "spawning" in horizon)

I can see it got created and sent to host11

---
[root@host02 nova]# cat scheduler.log | grep 71aa7d02-aca0-490b-abab-3c2f2f8df8bf
2014-04-11 18:40:31.669 28987 INFO nova.scheduler.filter_scheduler [req-80b5a559-fa95-4d0a-919f-4a9ecc47b92a 6c84f979672f44039145387a0e0fb864 bd6159c7f8aa4baba9a33f0b2725e6d1] Attempting to build 1 instance(s) uuids: [u'71aa7d02-aca0-490b-abab-3c2f2f8df8bf']
2014-04-11 18:40:31.696 28987 INFO nova.scheduler.filter_scheduler [req-80b5a559-fa95-4d0a-919f-4a9ecc47b92a 6c84f979672f44039145387a0e0fb864 bd6159c7f8aa4baba9a33f0b2725e6d1] Choosing host WeighedHost [host: host11.oslab.priv, weight: 12689.0] for instance 71aa7d02-aca0-490b-abab-3c2f2f8df8bf
---

On host11, we can see that libvirt got started

---
[root@host11 nova]# grep 71aa7d02-aca0-490b-abab-3c2f2f8df8bf *
compute.log:2014-04-11 18:40:32.805 9578 AUDIT nova.compute.manager [req-80b5a559-fa95-4d0a-919f-4a9ecc47b92a 6c84f979672f44039145387a0e0fb864 bd6159c7f8aa4baba9a33f0b2725e6d1] [instance: 71aa7d02-aca0-490b-abab-3c2f2f8df8bf] Starting instance...
compute.log:2014-04-11 18:40:36.610 9578 AUDIT nova.compute.claims [req-80b5a559-fa95-4d0a-919f-4a9ecc47b92a 6c84f979672f44039145387a0e0fb864 bd6159c7f8aa4baba9a33f0b2725e6d1] [instance: 71aa7d02-aca0-490b-abab-3c2f2f8df8bf] Attempting claim: memory 4096 MB, disk 40 GB, VCPUs 2
compute.log:2014-04-11 18:40:36.610 9578 AUDIT nova.compute.claims [req-80b5a559-fa95-4d0a-919f-4a9ecc47b92a 6c84f979672f44039145387a0e0fb864 bd6159c7f8aa4baba9a33f0b2725e6d1] [instance: 71aa7d02-aca0-490b-abab-3c2f2f8df8bf] Total Memory: 64401 MB, used: 51712.00 MB
compute.log:2014-04-11 18:40:36.611 9578 AUDIT nova.compute.claims [req-80b5a559-fa95-4d0a-919f-4a9ecc47b92a 6c84f979672f44039145387a0e0fb864 bd6159c7f8aa4baba9a33f0b2725e6d1] [instance: 71aa7d02-aca0-490b-abab-3c2f2f8df8bf] Memory limit: 96601.50 MB, free: 44889.50 MB
compute.log:2014-04-11 18:40:36.611 9578 AUDIT nova.compute.claims [req-80b5a559-fa95-4d0a-919f-4a9ecc47b92a 6c84f979672f44039145387a0e0fb864 bd6159c7f8aa4baba9a33f0b2725e6d1] [instance: 71aa7d02-aca0-490b-abab-3c2f2f8df8bf] Total Disk: 842 GB, used: 500.00 GB
compute.log:2014-04-11 18:40:36.612 9578 AUDIT nova.compute.claims [req-80b5a559-fa95-4d0a-919f-4a9ecc47b92a 6c84f979672f44039145387a0e0fb864 bd6159c7f8aa4baba9a33f0b2725e6d1] [instance: 71aa7d02-aca0-490b-abab-3c2f2f8df8bf] Disk limit not specified, defaulting to unlimited
compute.log:2014-04-11 18:40:36.612 9578 AUDIT nova.compute.claims [req-80b5a559-fa95-4d0a-919f-4a9ecc47b92a 6c84f979672f44039145387a0e0fb864 bd6159c7f8aa4baba9a33f0b2725e6d1] [instance: 71aa7d02-aca0-490b-abab-3c2f2f8df8bf] Total CPU: 16 VCPUs, used: 26.00 VCPUs
compute.log:2014-04-11 18:40:36.613 9578 AUDIT nova.compute.claims [req-80b5a559-fa95-4d0a-919f-4a9ecc47b92a 6c84f979672f44039145387a0e0fb864 bd6159c7f8aa4baba9a33f0b2725e6d1] [instance: 71aa7d02-aca0-490b-abab-3c2f2f8df8bf] CPU limit not specified, defaulting to unlimited
compute.log:2014-04-11 18:40:36.613 9578 AUDIT nova.compute.claims [req-80b5a559-fa95-4d0a-919f-4a9ecc47b92a 6c84f979672f44039145387a0e0fb864 bd6159c7f8aa4baba9a33f0b2725e6d1] [instance: 71aa7d02-aca0-490b-abab-3c2f2f8df8bf] Claim successful
compute.log:2014-04-11 18:40:44.872 9578 INFO nova.virt.libvirt.driver [req-80b5a559-fa95-4d0a-919f-4a9ecc47b92a 6c84f979672f44039145387a0e0fb864 bd6159c7f8aa4baba9a33f0b2725e6d1] [instance: 71aa7d02-aca0-490b-abab-3c2f2f8df8bf] Creating image
compute.log:2014-04-11 18:40:48.650 9578 INFO nova.virt.libvirt.driver [req-80b5a559-fa95-4d0a-919f-4a9ecc47b92a 6c84f979672f44039145387a0e0fb864 bd6159c7f8aa4baba9a33f0b2725e6d1] [instance: 71aa7d02-aca0-490b-abab-3c2f2f8df8bf] Injecting key into image 93368c57-e350-4b1c-9197-85b3b1df0d89
---

however, there isn't an instance libvirt.xml created
---
[root@host11 nova]# ls /var/lib/nova/instances/71aa7d02-aca0-490b-abab-3c2f2f8df8bf
console.log  disk
---

libvirtd.log gives no clues; there isn't even any log messages for 2014-04-11; they are all prior.

Looking at messages around the same time (2014-04-11 18:40:44) it does seem there was some kvm activity

---
Apr 11 18:40:46 host11 kernel: __ratelimit: 18 callbacks suppressed
Apr 11 18:40:46 host11 kernel: kvm: 12458: cpu0 unhandled rdmsr: 0x345
Apr 11 18:40:46 host11 kernel: kvm: 12458: cpu0 unhandled wrmsr: 0x680 data 0
Apr 11 18:40:46 host11 kernel: kvm: 12458: cpu0 unhandled wrmsr: 0x6c0 data 0
Apr 11 18:40:46 host11 kernel: kvm: 12458: cpu0 unhandled wrmsr: 0x681 data 0
Apr 11 18:40:46 host11 kernel: kvm: 12458: cpu0 unhandled wrmsr: 0x6c1 data 0
Apr 11 18:40:46 host11 kernel: kvm: 12458: cpu0 unhandled wrmsr: 0x682 data 0
Apr 11 18:40:46 host11 kernel: kvm: 12458: cpu0 unhandled wrmsr: 0x6c2 data 0
Apr 11 18:40:46 host11 kernel: kvm: 12458: cpu0 unhandled wrmsr: 0x683 data 0
Apr 11 18:40:46 host11 kernel: kvm: 12458: cpu0 unhandled wrmsr: 0x6c3 data 0
Apr 11 18:40:46 host11 kernel: kvm: 12458: cpu0 unhandled wrmsr: 0x684 data 0
Apr 11 18:45:14 host11 kernel: __ratelimit: 60 callbacks suppressed
---

and then looking at that pid

---
[root@host11 log]# ps -aef | grep 12458
root      4364  4187  0 03:22 pts/10   00:00:00 grep 12458
nova     12458  9578  0 Apr11 ?        00:01:50 /usr/libexec/qemu-kvm -global virtio-blk-pci.scsi=off -nodefconfig -nodefaults -nographic -machine accel=kvm:tcg -cpu host,+kvmclock -m 500 -no-reboot -kernel /var/tmp/.guestfs-162/kernel.9578 -initrd /var/tmp/.guestfs-162/initrd.9578 -device virtio-scsi-pci,id=scsi -drive file=/var/lib/nova/instances/71aa7d02-aca0-490b-abab-3c2f2f8df8bf/disk,cache=none,format=qcow2,id=hd0,if=none -device scsi-hd,drive=hd0 -drive file=/var/tmp/.guestfs-162/root.9578,snapshot=on,id=appliance,if=none,cache=unsafe -device scsi-hd,drive=appliance -device virtio-serial -serial stdio -device sga -chardev socket,path=/tmp/libguestfsvyouUp/guestfsd.sock,id=channel0 -device virtserialport,chardev=channel0,name=org.libguestfs.channel.0 -append panic=1 console=ttyS0 udevtimeout=600 no_timer_check acpi=off printk.time=1 cgroup_disable=memory root=/dev/sdb selinux=0 TERM=linux
---

So there appears to be a qemu-kvm process started for the vm, but nothing in the console

---
[root@host11 log]# cat /var/lib/nova/instances/71aa7d02-aca0-490b-abab-3c2f2f8df8bf/console.log
[root@host11 log]# 
---

Presumably this being stuck is the root cause.

I've attached an strace of pid 12458 for a few seconds, along with an lsof output of the process

Comment 1 Ian Wienand 2014-04-14 03:40:59 UTC

Created attachment 886005 [details]
strace of qemu

Comment 3 Ian Wienand 2014-04-14 04:12:37 UTC

nothing particularly out of place in the back-trace

---
(gdb) bt full
#0  0x00007eff0c0ce5e3 in select () at ../sysdeps/unix/syscall-template.S:82
No locals.
#1  0x00007eff0f1563f8 in main_loop_wait (timeout=1000) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4046
        ioh = <value optimized out>
        rfds = {fds_bits = {10995118440465, 0 <repeats 15 times>}}
        wfds = {fds_bits = {0 <repeats 16 times>}}
        xfds = {fds_bits = {0 <repeats 16 times>}}
        ret = <value optimized out>
        nfds = 43
        tv = {tv_sec = 0, tv_usec = 281508}
#2  0x00007eff0f1794ea in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2245
        fds = {41, 42}
        mask = {__val = {268443712, 0 <repeats 15 times>}}
        sigfd = 43
#3  0x00007eff0f159349 in main_loop (argc=39, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4266
        r = <value optimized out>
#4  main (argc=39, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6644
        gdbstub_dev = 0x0
        i = <value optimized out>
        snapshot = 0
        linux_boot = 1
        initrd_filename = 0x7fffa5d75c08 "/var/tmp/.guestfs-162/initrd.9578"
        kernel_filename = 0x7fffa5d75bde "/var/tmp/.guestfs-162/kernel.9578"
        kernel_cmdline = 0x7fffa5d75e16 "panic=1 console=ttyS0 udevtimeout=600 no_timer_check acpi=off printk.time=1 cgroup_disable=memory root=/dev/sdb selinux=0 TERM=linux"
        boot_devices = "cad", '\000' <repeats 29 times>
        ds = <value optimized out>
        dcl = <value optimized out>
        cyls = 0
        heads = 0
        secs = 0
        translation = 0
        hda_opts = 0x7eff0f6858f0
        opts = <value optimized out>
        olist = <value optimized out>
        optind = 39
        optarg = 0x7fffa5d75e16 "panic=1 console=ttyS0 udevtimeout=600 no_timer_check acpi=off printk.time=1 cgroup_disable=memory root=/dev/sdb selinux=0 TERM=linux"
        loadvm = 0x0
        machine = 0x7eff0f67ede0
        cpu_model = 0x7fffa5d75bb5 "host,+kvmclock"
        fds = {252483160, 32511}
        tb_size = 0
        pid_file = 0x0
        incoming = 0x0
        fd = 0
        pwd = 0x0
        chroot_dir = 0x0
        run_as = 0x0
        env = <value optimized out>
        show_vnc_port = 0
        defconfig = <value optimized out>
        defconfig_verbose = <value optimized out>
---

---
(gdb) info threads
  2 Thread 0x7eff05c4c700 (LWP 12466)  0x00007eff0c0cdb37 in ioctl () at ../sysdeps/unix/syscall-template.S:82
* 1 Thread 0x7eff0f0a7980 (LWP 12458)  0x00007eff0c0ce5e3 in select () at ../sysdeps/unix/syscall-template.S:82
(gdb) thread 2
[Switching to thread 2 (Thread 0x7eff05c4c700 (LWP 12466))]#0  0x00007eff0c0cdb37 in ioctl () at ../sysdeps/unix/syscall-template.S:82
82	T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
(gdb) bt full
#0  0x00007eff0c0cdb37 in ioctl () at ../sysdeps/unix/syscall-template.S:82
No locals.
#1  0x00007eff0f17afaa in kvm_run (env=0x7eff117ce800) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1015
        r = 0
        kvm = 0x7eff112fec40
        run = 0x7eff0f0c1000
        fd = 37
#2  0x00007eff0f17b469 in kvm_cpu_exec (env=<value optimized out>) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1744
        r = <value optimized out>
#3  0x00007eff0f17c34d in kvm_main_loop_cpu (_env=0x7eff117ce800) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2005
        run_cpu = <value optimized out>
#4  ap_main_loop (_env=0x7eff117ce800) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2061
        env = 0x7eff117ce800
        signals = {__val = {18446744067267100671, 18446744073709551615 <repeats 15 times>}}
        data = <value optimized out>
#5  0x00007eff0ea8d9d1 in start_thread (arg=0x7eff05c4c700) at pthread_create.c:301
        __res = <value optimized out>
        pd = 0x7eff05c4c700
        now = <value optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139633778542336, 1553379894277813129, 139633929888608, 139633778543040, 0, 3, -1697648968769883255, -1697668385152060535}, mask_was_saved = 0}}, 
          priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <value optimized out>
        pagesize_m1 = <value optimized out>
        sp = <value optimized out>
        freesize = <value optimized out>
#6  0x00007eff0c0d5b6d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
No locals.
---

---
(gdb) up
#1  0x00007eff0f17afaa in kvm_run (env=0x7eff117ce800) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1015
1015	    r = ioctl(fd, KVM_RUN, 0);
---

Comment 4 Ian Wienand 2014-04-14 04:13:11 UTC

Created attachment 886008 [details]
lsof of qemu

Comment 5 Ian Wienand 2014-04-14 04:30:17 UTC

Ok, not sure I have enough insight into the vm creation process to take this much further.

---
        ├─nova-compute(9578)─┬─nova-compute(1155)
...
        │                    ├─qemu-kvm(12458)───{qemu-kvm}(12466)
---

this qemu process has been launched by nova-compute.

Once the vm is running, you end up with something like

---
qemu     24696     1 13 Apr11 ?        08:44:48 /usr/libexec/qemu-kvm -name instance-0000927d -S -M rhel6.5.0 ...
---

which has an instance-id you can correlate to libvirt via virsh, for example

Comment 6 Ian Wienand 2014-04-14 04:37:40 UTC

To clarify comment#5; this vm appears to be associated with pid 12458 (from timestamp analysis).  note that it has no instance-id specified on the command line, so I'm not sure how to correlate it to libvirt. 

An example of a correctly running vm is 24696 with the example of it's command-line.

Comment 7 Kashyap Chamarthy 2014-04-14 12:04:41 UTC

A couple of thoughts:

  - Do you see anything in /var/log/libvirt/qemu/vm.log?

  - Maybe you can apply some logging controls in libvirt:

    To enable debug logs (i.e. libvirt tries to log everything that's
    possible, including interactions with QEMU):

        log_level = 1
        log_outputs = 1:file:/var/tmp/libvirtd.log

    to /etc/libvirt/libvirtd.conf

    More on logging here -- http://libvirt.org/logging.html
    
Also helps to provide exact Kernel, libvirt, QEMU versions used

Comment 8 Lars Kellogg-Stedman 2014-04-14 13:51:43 UTC

Ian:

Note that nova starts up a libguestfs instance prior to starting up your actual instance in order to do things such as:

- determine if the disk is partitioned or not
- to perform filesystem injection

Your "ps" output shows that the guestfs instance is getting stuck.  This is why you're not seeing any .xml file created and nothing in the console log -- your instance has not actually been started yet.

You can use the "libguestfs-test-tool" command to verify that guestfs is working correctly in general...just run "libguestfs-test-tool" as nova with no arguments and see if you get as far as:

===== TEST FINISHED OK =====

Comment 9 Vladan Popovic 2014-04-14 14:18:54 UTC

Hi Ian,
Please respond to Lars' comment so we can track this better. ATM we're not sure if we should track this as a bug in nova, i.e. if the libvirt xml should be created if libguestfs fails, or whether it should expect everything to be setup on the lower lever and if something fails during the boot process, then nova shouldn't create the xml at all.

Comment 10 Ian Wienand 2014-04-15 01:25:34 UTC

Thanks lars; that is definitely the issue here.  libguestfs-test-tool runs correctly; not surprising as *most* vm's work ... just this one has got stuck.

It gets a bit confusing how the threads are parented etc; but looking a bit more closely at the threads involved

---
        |-nova-compute(9578)-+
        |                    |-nova-compute(12459)
        |                    |-qemu-img(12456)
        |                    |-qemu-kvm(12458)---{qemu-kvm}(12466)
---

What's interesting there is that qemu-img process; which is a zombie

---
[root@host11 tmp]# ps -aef | grep [1]2456
nova     12456  9578  0 Apr11 ?        00:00:00 [qemu-img] <defunct>
---

checking /proc, that is definitely involved here

---
[root@host11 tmp]# ls -ld /proc/12456
dr-xr-xr-x. 8 nova nova 0 Apr 11 18:41 /proc/12456
---

The first entry for this vm was "2014-04-11 18:40:32.805" so this fits the timeframe perfectly.  It's hard to tell who should be reaping this; looking at the contemporary threads just by guessing their pid and checking dates gives

---
[root@host11 proc]# ls -ld * | grep 'Apr 11 18'
dr-xr-xr-x.  8 nova       nova          0 Apr 11 18:41 12456
dr-xr-xr-x.  8 nova       nova          0 Apr 11 18:40 12458
dr-xr-xr-x.  8 nova       nova          0 Apr 11 18:41 12459
dr-xr-xr-x.  8 root       root          0 Apr 11 18:41 12461
---

That gives us some clue; that 12459 process is at least stuck in libguestfs

---
(gdb) bt
#0  0x000000399f2accdd in nanosleep () at ../sysdeps/unix/syscall-template.S:82
#1  0x000000399f2acb50 in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:138
#2  0x000000348589c7f4 in launch_appliance (g=0x4bbf580, arg=<value optimized out>) at launch-appliance.c:611
#3  0x000000348589a9c6 in guestfs__launch (g=0x4bbf580) at launch.c:88
#4  0x000000348585e7f1 in guestfs_launch (g=0x4bbf580) at actions-5.c:168
#5  0x00007f2328374a56 in py_guestfs_launch (self=<value optimized out>, args=<value optimized out>) at guestfs-py.c:2346
#6  0x000000348ecd59e4 in call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:3794
[... then we're into the python interpreter ...]
---

so we're in the loop around 611

---
#2  0x000000348589c7f4 in launch_appliance (g=0x4bbf580, arg=<value optimized out>) at launch-appliance.c:611
611	        sleep (2);
(gdb) list
586	      /* It would be nice to be able to put this in the same process
587	       * group as qemu (ie. setpgid (0, qemu_pid)).  However this is
588	       * not possible because we don't have any guarantee here that
589	       * the qemu process has started yet.
590	       */
591	      if (g->pgroup)
592	        setpgid (0, 0);
593	
594	      /* Writing to argv is hideously complicated and error prone.  See:
595	       * http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/utils/misc/ps_status.c;hb=HEAD
596	       */
597	
598	      /* Loop around waiting for one or both of the other processes to
599	       * disappear.  It's fair to say this is very hairy.  The PIDs that
600	       * we are looking at might be reused by another process.  We are
601	       * effectively polling.  Is the cure worse than the disease?
602	       */
603	      for (;;) {
604	        if (kill (qemu_pid, 0) == -1) /* qemu's gone away, we aren't needed */
605	          _exit (EXIT_SUCCESS);
606	        if (kill (parent_pid, 0) == -1) {
607	          /* Parent's gone away, qemu still around, so kill qemu. */
608	          kill (qemu_pid, 9);
609	          _exit (EXIT_SUCCESS);
610	        }
611	        sleep (2);
612	      }
613	    }
---

---
(gdb) print parent_pid
$1 = 9578
(gdb) print qemu_pid
$2 = 12458
---

So we're back to the start really; this is checking qemu_pid (12458) which is still there, seemingly sitting doing nothing as per comment#3

For reference, here's "g"

---
(gdb) print *g
$3 = {
  next = 0x4eebf20, 
  state = CONFIG, 
  verbose = false, 
  trace = false, 
  autosync = true, 
  direct = false, 
  recovery_proc = true, 
  enable_network = false, 
  selinux = false, 
  pgroup = false, 
  close_on_exit = true, 
  smp = 1, 
  memsize = 500, 
  path = 0x506dd80 "/usr/lib64/guestfs", 
  qemu = 0x3467890 "/usr/libexec/qemu-kvm", 
  append = 0x0, 
  qemu_params = 0x0, 
  drives = 0x7f22dc001200, 
  nr_drives = 1, 
  attach_method = ATTACH_METHOD_APPLIANCE, 
  attach_method_arg = 0x0, 
  attach_ops = 0x3485aede40, 
  last_error = 0x0, 
  last_errnum = 0, 
  tmpdir = 0x7f22f0001530 "/tmp/libguestfsvyouUp", 
  env_tmpdir = 0x0, 
  int_tmpdir = 0x0, 
  int_cachedir = 0x0, 
  error_cb = 0, 
  error_cb_data = 0x0, 
  error_cb_stack = 0x0, 
  abort_cb = 0x399f233f90 <abort>, 
  events = 0x0, 
  nr_events = 0, 
  fses = 0x0, 
  nr_fses = 0, 
  pda = 0x0, 
  pda_next = 0x0, 
  user_cancel = 0, 
  launch_t = {
    tv_sec = 1397241644, 
    tv_usec = 987875
  }, 
  test_fp = 0x0, 
  unique = 0, 
  fd = {-1, -1}, 
  sock = 53, 
  msg_next_serial = 1192960, 
  localmountpoint = 0x0, 
  fuse = 0x0, 
  ml_dir_cache_timeout = 0, 
  lsc_ht = 0x0, 
  xac_ht = 0x0, 
  rlc_ht = 0x0, 
  ml_read_only = 0, 
  ml_debug_calls = 0, 
  nr_supported_credentials = 0, 
  supported_credentials = {0, 0, 0, 0, 0, 0, 0, 0, 0}, 
  saved_libvirt_uri = 0x0, 
  nr_requested_credentials = 0, 
  requested_credentials = 0x0, 
  app = {
    pid = 12458, 
    recoverypid = -1, 
    qemu_help = 
    0x7f22f0009990 "QEMU PC emulator version 0.12.1 (qemu-kvm-0.12.1.2), Copyright (c) 2003-2008 Fabrice Bellard\n\nWARNING: Direct use of qemu-kvm from the command line is not recommended by Red Hat.\nWARNING: Some command"..., 
    qemu_version = 0x7f22f0008420 "QEMU PC emulator version 0.12.1 (qemu-kvm-0.12.1.2), Copyright (c) 2003-2008 Fabrice Bellard\n", 
    qemu_devices = 
    0x7f22f000d970 "name \"pci-bridge\", bus PCI\nname \"virtio-scsi-pci\", bus PCI, alias \"virtio-scsi\"\nname \"virtio-balloon-pci\", bus PCI\nname \"virtio-serial-pci\", bus PCI, alias \"virtio-serial\"\nname \"virtio-net-pci\", bus P"..., 
    qemu_version_major = 0, 
    qemu_version_minor = 12, 
    cmdline = 0x0, 
    cmdline_size = 0, 
    virtio_scsi = 0
  }, 
  virt = {
    conn = 0x0, 
    dom = 0x0
  }
}
---

I'm fairly certain that qemu-img has got to do with why this vm creation is stuck

Comment 11 Richard W.M. Jones 2014-04-25 12:51:39 UTC

Nested KVM involved here?

Please CC me on all libguestfs bugs.

Comment 12 Ian Wienand 2014-04-28 00:38:06 UTC

(In reply to Richard W.M. Jones from comment #11)
> Nested KVM involved here?

Nested KVM is not involved

Comment 13 Richard W.M. Jones 2014-04-28 07:54:24 UTC

I don't believe there's any way to debug this retrospectively.
You'll have to enable debugging:

http://libguestfs.org/guestfs-faq.1.html#debug

and capture the full debug output (can nova do this through its
logging system yet?) and if it happens again the debug output
will tell us what's going on.

Comment 14 Ian Wienand 2014-04-28 10:54:41 UTC

(In reply to Richard W.M. Jones from comment #13)
> I don't believe there's any way to debug this retrospectively.

I'm not super-confident of a reproduction ... I've only seen it once.  My main hope is that this comes up for someone searching for as similar issue

> You'll have to enable debugging:
> and capture the full debug output

I guess the env variable described there would have to be set for nova in the init.d file?  As you'd understand ad-hoc changes are pretty unappealing for a cluster that has many compute nodes, managed by puppet that gets upgraded semi-frequently.  I'm not sure how chatty this level is but managing the logs over a long period becomes troublesome.

> (can nova do this through its logging system yet?) 

Please let's find out and if not get it working.  Standard changes to a config are much easier to deal with

Comment 15 Richard W.M. Jones 2014-04-28 11:50:56 UTC

There's an API for capturing libguestfs events into another
logging system:

http://libguestfs.org/guestfs-faq.1.html#how-do-i-capture-debug-output-and-put-it-into-my-logging-system

but I don't believe it has been wired up like this in nova yet.
See: https://bugzilla.redhat.com/show_bug.cgi?id=1064948

Comment 16 Stephen Gordon 2014-07-31 19:17:49 UTC

(In reply to Ian Wienand from comment #14)
> (In reply to Richard W.M. Jones from comment #13)
> > I don't believe there's any way to debug this retrospectively.
> 
> I'm not super-confident of a reproduction ... I've only seen it once.  My
> main hope is that this comes up for someone searching for as similar issue

Hi Ian,

I am going to have to close this based on the available information - please re-open if you re-encounter this issue.

Thanks,

Steve