Bug 1308736 - qemu capabilites probing hangs with libvirt 1.2.9.3
Summary: qemu capabilites probing hangs with libvirt 1.2.9.3
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Virtualization Tools
Classification: Community
Component: libvirt
Version: unspecified
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Libvirt Maintainers
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-02-15 22:03 UTC by Florian Hubold
Modified: 2016-03-07 11:37 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2016-03-07 11:37:13 UTC
Embargoed:


Attachments (Terms of Use)
debug output of manual libvirtd start via "LIBVIRT_DEBUG=1 /usr/sbin/libvirtd --config /etc/libvirt/libvirtd.conf" (921.76 KB, text/plain)
2016-02-15 22:03 UTC, Florian Hubold
no flags Details

Description Florian Hubold 2016-02-15 22:03:27 UTC
Created attachment 1127457 [details]
debug output of manual libvirtd start via "LIBVIRT_DEBUG=1 /usr/sbin/libvirtd --config /etc/libvirt/libvirtd.conf"

Description of problem:

We are currently preparing an update for libvirt for Mageia, to fix CVE-2015-5313 and to update libvirt from 1.2.9.2 to the latest 1.2.9 maintenance release. Problem is that all builds (including 1.2.9.3, 1.2.18.2 and latest stable 1.3.1) that I've tested seem to have an issue properly starting libvirtd. 
Currently testing with 1.2.9.3 plus the patch for CVE-2015-5313 plus the other 5 bugfix patches from the v1.2.9-maint branch after 1.2.9.3 ( http://libvirt.org/git/?p=libvirt.git;a=shortlog;h=refs/heads/v1.2.9-maint )

From what I can tell, during the initial probing of libvirtd for capabilities via qemu, at first it will first run a qemu-system-arm instance, and the probing of capabilities is where it hangs. It seems to leave it's socket and pidfile /var/lib/libvirt/qemu/capabilities.{monitor.sock,pidfile} behind. 

Hence libvirtd doesn't complete its startup and won't respond to connections, everything will simply time out.

----

This is the systemctl output after regular libvirtd start (1.2.9.3) :

sudo systemctl status libvirtd -al
● libvirtd.service - Virtualization daemon
   Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled)
   Active: active (running) since Mo 2016-02-15 21:39:59 CET; 59min ago
     Docs: man:libvirtd(8)
           http://libvirt.org
 Main PID: 3473 (libvirtd)
   CGroup: /system.slice/libvirtd.service
           ├─3473 /usr/sbin/libvirtd
           └─3562 /usr/bin/qemu-system-arm -S -no-user-config -nodefaults -nographic -M none -qmp unix:/var/lib/libvirt/qemu/capabilities.monitor.sock,server,nowait -pidfile /var/lib/libvirt/qemu/capabilities.pidfile -daemonize

----

The debug output from a manual start of libvirtd will be attached as a logfile as it's quite extensive. Debug output was created as root via

LC_ALL=C LIBVIRT_DEBUG=1 /usr/sbin/libvirtd --config /etc/libvirt/libvirtd.conf 2>&1 | tee libvirtd_1.2.9.3_LIBVIRT_DEBUG_1.log

with the default upstream libvirtd.conf

----

This is the debug output from virsh capabilities:

LIBVIRT_DEBUG=1 virsh -d 0 capabilities
2016-02-15 21:27:02.442+0000: 14248: info : libvirt version: 1.2.18.2
2016-02-15 21:27:02.442+0000: 14248: debug : virGlobalInit:389 : register drivers
2016-02-15 21:27:02.442+0000: 14248: debug : virRegisterConnectDriver:691 : driver=0x7f63ff0bd700 name=Test
2016-02-15 21:27:02.442+0000: 14248: debug : virRegisterConnectDriver:702 : registering Test as driver 0
2016-02-15 21:27:02.442+0000: 14248: debug : virRegisterConnectDriver:691 : driver=0x7f63ff0bfe80 name=OPENVZ
2016-02-15 21:27:02.442+0000: 14248: debug : virRegisterConnectDriver:702 : registering OPENVZ as driver 1
2016-02-15 21:27:02.442+0000: 14248: debug : virRegisterConnectDriver:691 : driver=0x7f63ff0c0600 name=VMWARE
2016-02-15 21:27:02.442+0000: 14248: debug : virRegisterConnectDriver:702 : registering VMWARE as driver 2
2016-02-15 21:27:02.442+0000: 14248: debug : virRegisterConnectDriver:691 : driver=0x7f63ff0bf4c0 name=PHYP
2016-02-15 21:27:02.442+0000: 14248: debug : virRegisterConnectDriver:702 : registering PHYP as driver 3
2016-02-15 21:27:02.442+0000: 14248: debug : virRegisterConnectDriver:691 : driver=0x7f63ff0c0d40 name=ESX
2016-02-15 21:27:02.442+0000: 14248: debug : virRegisterConnectDriver:702 : registering ESX as driver 4
2016-02-15 21:27:02.442+0000: 14248: debug : virRegisterConnectDriver:691 : driver=0x7f63ff0be200 name=remote
2016-02-15 21:27:02.442+0000: 14248: debug : virRegisterConnectDriver:702 : registering remote as driver 5
commands: "capabilities"
2016-02-15 21:27:02.442+0000: 14248: debug : virEventRegisterDefaultImpl:267 : registering default event implementation
2016-02-15 21:27:02.442+0000: 14248: debug : virEventPollAddHandle:115 : Used 0 handle slots, adding at least 10 more
2016-02-15 21:27:02.442+0000: 14248: debug : virEventPollInterruptLocked:723 : Skip interrupt, 0 0
2016-02-15 21:27:02.442+0000: 14248: info : virEventPollAddHandle:140 : EVENT_POLL_ADD_HANDLE: watch=1 fd=4 events=1 cb=0x7f63febc7dd0 opaque=(nil) ff=(nil)
2016-02-15 21:27:02.442+0000: 14248: debug : virEventRegisterImpl:234 : addHandle=0x7f63febc8480 updateHandle=0x7f63febc87b0 removeHandle=0x7f63febc7e10 addTimeout=0x7f63febc7f80 updateTimeout=0x7f63febc8160 removeTimeout=0x7f63febc8300
2016-02-15 21:27:02.442+0000: 14248: debug : virEventPollAddTimeout:230 : Used 0 timeout slots, adding at least 10 more
2016-02-15 21:27:02.442+0000: 14248: debug : virEventPollInterruptLocked:723 : Skip interrupt, 0 0
2016-02-15 21:27:02.442+0000: 14248: info : virEventPollAddTimeout:253 : EVENT_POLL_ADD_TIMEOUT: timer=1 frequency=-1 cb=0x555de8a71b00 opaque=0x7ffd8d83ce30 ff=(nil)
2016-02-15 21:27:02.442+0000: 14249: debug : virThreadJobSet:99 : Thread 14249 is now running job vshEventLoop
2016-02-15 21:27:02.442+0000: 14248: debug : virConnectOpenAuth:1339 : name=<null>, auth=0x7f63ff0bd4a0, flags=0
2016-02-15 21:27:02.442+0000: 14249: debug : virEventRunDefaultImpl:311 : running default event implementation
2016-02-15 21:27:02.442+0000: 14249: debug : virEventPollCleanupTimeouts:526 : Cleanup 1
2016-02-15 21:27:02.442+0000: 14249: debug : virEventPollCleanupHandles:575 : Cleanup 1
2016-02-15 21:27:02.442+0000: 14248: info : virObjectNew:202 : OBJECT_NEW: obj=0x555de9a1b150 classname=virConnect
2016-02-15 21:27:02.442+0000: 14249: debug : virEventPollMakePollFDs:401 : Prepare n=0 w=1, f=4 e=1 d=0
2016-02-15 21:27:02.442+0000: 14248: info : virObjectNew:202 : OBJECT_NEW: obj=0x555de9a1b2c0 classname=virConnectCloseCallbackData
2016-02-15 21:27:02.442+0000: 14249: debug : virEventPollCalculateTimeout:338 : Calculate expiry of 1 timers
2016-02-15 21:27:02.442+0000: 14249: debug : virEventPollCalculateTimeout:371 : No timeout is pending
2016-02-15 21:27:02.442+0000: 14249: info : virEventPollRunOnce:641 : EVENT_POLL_RUN: nhandles=1 timeout=-1
2016-02-15 21:27:02.442+0000: 14248: debug : virConnectGetConfigFile:953 : Loading config file '/etc/libvirt/libvirt.conf'
2016-02-15 21:27:02.442+0000: 14248: debug : virConfReadFile:769 : filename=/etc/libvirt/libvirt.conf
2016-02-15 21:27:02.442+0000: 14248: debug : virFileClose:103 : Closed fd 6
2016-02-15 21:27:02.442+0000: 14248: debug : do_open:1136 : no name, allowing driver auto-select
2016-02-15 21:27:02.442+0000: 14248: debug : do_open:1179 : trying driver 0 (Test) ...
2016-02-15 21:27:02.442+0000: 14248: debug : do_open:1194 : driver 0 Test returned DECLINED
2016-02-15 21:27:02.442+0000: 14248: debug : do_open:1179 : trying driver 1 (OPENVZ) ...
2016-02-15 21:27:02.442+0000: 14248: debug : do_open:1194 : driver 1 OPENVZ returned DECLINED
2016-02-15 21:27:02.442+0000: 14248: debug : do_open:1179 : trying driver 2 (VMWARE) ...
2016-02-15 21:27:02.442+0000: 14248: debug : do_open:1194 : driver 2 VMWARE returned DECLINED
2016-02-15 21:27:02.442+0000: 14248: debug : do_open:1179 : trying driver 3 (PHYP) ...
2016-02-15 21:27:02.442+0000: 14248: debug : do_open:1194 : driver 3 PHYP returned DECLINED
2016-02-15 21:27:02.442+0000: 14248: debug : do_open:1179 : trying driver 4 (ESX) ...
2016-02-15 21:27:02.442+0000: 14248: debug : do_open:1194 : driver 4 ESX returned DECLINED
2016-02-15 21:27:02.442+0000: 14248: debug : do_open:1179 : trying driver 5 (remote) ...
2016-02-15 21:27:02.442+0000: 14248: debug : remoteConnectOpen:1187 : Auto-probe remote URI
2016-02-15 21:27:02.442+0000: 14248: debug : doRemoteOpen:788 : proceeding with name = 
2016-02-15 21:27:02.442+0000: 14248: debug : doRemoteOpen:797 : Connecting with transport 1
2016-02-15 21:27:02.442+0000: 14248: debug : doRemoteOpen:889 : Proceeding with sockname /var/run/libvirt/libvirt-sock
2016-02-15 21:27:02.442+0000: 14248: debug : virNetSocketNew:226 : localAddr=0x7ffd8d83c6e0 remoteAddr=0x7ffd8d83c770 fd=6 errfd=-1 pid=0
2016-02-15 21:27:02.442+0000: 14248: info : virObjectNew:202 : OBJECT_NEW: obj=0x555de9a1b7e0 classname=virNetSocket
2016-02-15 21:27:02.442+0000: 14248: info : virNetSocketNew:277 : RPC_SOCKET_NEW: sock=0x555de9a1b7e0 fd=6 errfd=-1 pid=0 localAddr=127.0.0.1;0, remoteAddr=127.0.0.1;0
2016-02-15 21:27:02.442+0000: 14248: info : virObjectNew:202 : OBJECT_NEW: obj=0x555de9a1baa0 classname=virNetClient
2016-02-15 21:27:02.442+0000: 14248: info : virNetClientNew:327 : RPC_CLIENT_NEW: client=0x555de9a1baa0 sock=0x555de9a1b7e0
2016-02-15 21:27:02.442+0000: 14248: info : virObjectRef:296 : OBJECT_REF: obj=0x555de9a1baa0
2016-02-15 21:27:02.442+0000: 14248: info : virObjectRef:296 : OBJECT_REF: obj=0x555de9a1b7e0
2016-02-15 21:27:02.442+0000: 14248: debug : virEventPollInterruptLocked:727 : Interrupting
2016-02-15 21:27:02.442+0000: 14248: info : virEventPollAddHandle:140 : EVENT_POLL_ADD_HANDLE: watch=2 fd=6 events=1 cb=0x7f63fed106a0 opaque=0x555de9a1b7e0 ff=0x7f63fed10650
2016-02-15 21:27:02.442+0000: 14248: debug : virKeepAliveNew:199 : client=0x555de9a1baa0, interval=-1, count=0
2016-02-15 21:27:02.443+0000: 14248: info : virObjectNew:202 : OBJECT_NEW: obj=0x555de9a1bd90 classname=virKeepAlive
2016-02-15 21:27:02.443+0000: 14248: info : virKeepAliveNew:218 : RPC_KEEPALIVE_NEW: ka=0x555de9a1bd90 client=0x555de9a1baa0
2016-02-15 21:27:02.443+0000: 14248: info : virObjectRef:296 : OBJECT_REF: obj=0x555de9a1baa0
2016-02-15 21:27:02.443+0000: 14248: info : virObjectRef:296 : OBJECT_REF: obj=0x555de9a1b2c0
2016-02-15 21:27:02.443+0000: 14248: info : virObjectNew:202 : OBJECT_NEW: obj=0x555de9a1bcb0 classname=virNetClientProgram
2016-02-15 21:27:02.443+0000: 14248: info : virObjectNew:202 : OBJECT_NEW: obj=0x555de9a1b9c0 classname=virNetClientProgram
2016-02-15 21:27:02.443+0000: 14248: info : virObjectNew:202 : OBJECT_NEW: obj=0x555de9a1b5b0 classname=virNetClientProgram
2016-02-15 21:27:02.443+0000: 14248: info : virObjectRef:296 : OBJECT_REF: obj=0x555de9a1bcb0
2016-02-15 21:27:02.443+0000: 14248: info : virObjectRef:296 : OBJECT_REF: obj=0x555de9a1b9c0
2016-02-15 21:27:02.443+0000: 14248: info : virObjectRef:296 : OBJECT_REF: obj=0x555de9a1b5b0
2016-02-15 21:27:02.443+0000: 14248: debug : doRemoteOpen:1006 : Trying authentication
2016-02-15 21:27:02.443+0000: 14248: debug : virNetMessageNew:46 : msg=0x555de9a1c300 tracked=0
2016-02-15 21:27:02.443+0000: 14248: debug : virNetMessageEncodePayload:376 : Encode length as 28
2016-02-15 21:27:02.443+0000: 14248: info : virNetClientSendInternal:1950 : RPC_CLIENT_MSG_TX_QUEUE: client=0x555de9a1baa0 len=28 prog=536903814 vers=1 proc=66 type=0 status=0 serial=0
2016-02-15 21:27:02.443+0000: 14248: debug : virNetClientCallNew:1903 : New call 0x555de9a1d3a0: msg=0x555de9a1c300, expectReply=1, nonBlock=0
2016-02-15 21:27:02.443+0000: 14248: debug : virNetClientIO:1712 : Outgoing message prog=536903814 version=1 serial=0 proc=66 type=0 length=28 dispatch=(nil)
2016-02-15 21:27:02.443+0000: 14248: debug : virNetClientIO:1771 : We have the buck head=0x555de9a1d3a0 call=0x555de9a1d3a0
2016-02-15 21:27:02.443+0000: 14248: info : virEventPollUpdateHandle:152 : EVENT_POLL_UPDATE_HANDLE: watch=2 events=0
2016-02-15 21:27:02.443+0000: 14248: debug : virEventPollInterruptLocked:727 : Interrupting
2016-02-15 21:27:02.443+0000: 14249: debug : virEventPollRunOnce:651 : Poll got 1 event(s)
2016-02-15 21:27:02.443+0000: 14249: debug : virEventPollDispatchTimeouts:433 : Dispatch 1
2016-02-15 21:27:02.443+0000: 14249: debug : virEventPollDispatchHandles:479 : Dispatch 1
2016-02-15 21:27:02.443+0000: 14249: debug : virEventPollDispatchHandles:493 : i=0 w=1
2016-02-15 21:27:02.443+0000: 14249: info : virEventPollDispatchHandles:507 : EVENT_POLL_DISPATCH_HANDLE: watch=1 events=1
2016-02-15 21:27:02.443+0000: 14249: debug : virEventPollCleanupTimeouts:526 : Cleanup 1
2016-02-15 21:27:02.443+0000: 14249: debug : virEventPollCleanupHandles:575 : Cleanup 2
2016-02-15 21:27:02.443+0000: 14249: debug : virEventRunDefaultImpl:311 : running default event implementation
2016-02-15 21:27:02.443+0000: 14249: debug : virEventPollCleanupTimeouts:526 : Cleanup 1
2016-02-15 21:27:02.443+0000: 14249: debug : virEventPollCleanupHandles:575 : Cleanup 2
2016-02-15 21:27:02.443+0000: 14249: debug : virEventPollMakePollFDs:401 : Prepare n=0 w=1, f=4 e=1 d=0
2016-02-15 21:27:02.443+0000: 14249: debug : virEventPollMakePollFDs:401 : Prepare n=1 w=2, f=6 e=0 d=0
2016-02-15 21:27:02.443+0000: 14249: debug : virEventPollCalculateTimeout:338 : Calculate expiry of 1 timers
2016-02-15 21:27:02.443+0000: 14249: debug : virEventPollCalculateTimeout:371 : No timeout is pending
2016-02-15 21:27:02.443+0000: 14249: info : virEventPollRunOnce:641 : EVENT_POLL_RUN: nhandles=1 timeout=-1
2016-02-15 21:27:02.443+0000: 14249: debug : virEventPollRunOnce:651 : Poll got 1 event(s)
2016-02-15 21:27:02.443+0000: 14249: debug : virEventPollDispatchTimeouts:433 : Dispatch 1
2016-02-15 21:27:02.443+0000: 14249: debug : virEventPollDispatchHandles:479 : Dispatch 1
2016-02-15 21:27:02.443+0000: 14249: debug : virEventPollDispatchHandles:493 : i=0 w=1
2016-02-15 21:27:02.443+0000: 14249: info : virEventPollDispatchHandles:507 : EVENT_POLL_DISPATCH_HANDLE: watch=1 events=1
2016-02-15 21:27:02.443+0000: 14249: debug : virEventPollCleanupTimeouts:526 : Cleanup 1
2016-02-15 21:27:02.443+0000: 14249: debug : virEventPollCleanupHandles:575 : Cleanup 2
2016-02-15 21:27:02.443+0000: 14249: debug : virEventRunDefaultImpl:311 : running default event implementation
2016-02-15 21:27:02.443+0000: 14249: debug : virEventPollCleanupTimeouts:526 : Cleanup 1
2016-02-15 21:27:02.443+0000: 14249: debug : virEventPollCleanupHandles:575 : Cleanup 2
2016-02-15 21:27:02.443+0000: 14249: debug : virEventPollMakePollFDs:401 : Prepare n=0 w=1, f=4 e=1 d=0
2016-02-15 21:27:02.443+0000: 14249: debug : virEventPollMakePollFDs:401 : Prepare n=1 w=2, f=6 e=0 d=0
2016-02-15 21:27:02.443+0000: 14249: debug : virEventPollCalculateTimeout:338 : Calculate expiry of 1 timers
2016-02-15 21:27:02.443+0000: 14249: debug : virEventPollCalculateTimeout:371 : No timeout is pending
2016-02-15 21:27:02.443+0000: 14249: info : virEventPollRunOnce:641 : EVENT_POLL_RUN: nhandles=1 timeout=-1



After that, nothing happens. Same happens for other basic virsh queries like "virsh list" (when there's not even any guest defined yet).



Version-Release number of selected component (if applicable):

>= 1.2.9.3

Steps to Reproduce:
1. start libvirtd >= 1.2.9.3
2. try any basic virsh query

Actual results:

timeout

Expected results:

results of virsh query, or connection to libvirt via e.g. virt-manager or virt-install or similar


Additional info:
Mageia bugreport for the libvirt update for CVE-2015-5313 is
https://bugs.mageia.org/show_bug.cgi?id=17417

Comment 1 Florian Hubold 2016-02-15 22:21:22 UTC
gdb backtrace of libvirtd after ~30 mins runtime


(gdb) thread apply all bt

Thread 17 (Thread 0x7f088982a700 (LWP 27394)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f089152fe06 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007f089153066b in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007f089152fbab in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007f089104e5bd in start_thread (arg=0x7f088982a700) at pthread_create.c:310
#5  0x00007f0890d8b5cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 16 (Thread 0x7f0889029700 (LWP 27395)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f089152fe06 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007f089153066b in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007f089152fbab in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007f089104e5bd in start_thread (arg=0x7f0889029700) at pthread_create.c:310
#5  0x00007f0890d8b5cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 15 (Thread 0x7f0888828700 (LWP 27396)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f089152fe06 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007f089153066b in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007f089152fbab in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007f089104e5bd in start_thread (arg=0x7f0888828700) at pthread_create.c:310
#5  0x00007f0890d8b5cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 14 (Thread 0x7f0888027700 (LWP 27397)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f089152fe06 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007f089153066b in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007f089152fbab in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007f089104e5bd in start_thread (arg=0x7f0888027700) at pthread_create.c:310
#5  0x00007f0890d8b5cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 13 (Thread 0x7f0887826700 (LWP 27398)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f089152fe06 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007f089153066b in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007f089152fbab in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007f089104e5bd in start_thread (arg=0x7f0887826700) at pthread_create.c:310
#5  0x00007f0890d8b5cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 12 (Thread 0x7f0887025700 (LWP 27399)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f089152fe06 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007f089153066b in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007f089152fbab in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007f089104e5bd in start_thread (arg=0x7f0887025700) at pthread_create.c:310
#5  0x00007f0890d8b5cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 11 (Thread 0x7f0886824700 (LWP 27400)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f089152fe06 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007f089153066b in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007f089152fbab in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007f089104e5bd in start_thread (arg=0x7f0886824700) at pthread_create.c:310
#5  0x00007f0890d8b5cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 10 (Thread 0x7f0886023700 (LWP 27401)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
---Type <return> to continue, or q <return> to quit---
#1  0x00007f089152fe06 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007f089153066b in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007f089152fbab in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007f089104e5bd in start_thread (arg=0x7f0886023700) at pthread_create.c:310
#5  0x00007f0890d8b5cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 9 (Thread 0x7f0885822700 (LWP 27402)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f089152fe06 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007f089153066b in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007f089152fbab in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007f089104e5bd in start_thread (arg=0x7f0885822700) at pthread_create.c:310
#5  0x00007f0890d8b5cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 8 (Thread 0x7f0885021700 (LWP 27403)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f089152fe06 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007f089153066b in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007f089152fbab in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007f089104e5bd in start_thread (arg=0x7f0885021700) at pthread_create.c:310
#5  0x00007f0890d8b5cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 7 (Thread 0x7f0857fff700 (LWP 27406)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f089152fe06 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007f089153066b in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007f089152fbab in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007f089104e5bd in start_thread (arg=0x7f0857fff700) at pthread_create.c:310
#5  0x00007f0890d8b5cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 6 (Thread 0x7f08577fe700 (LWP 27407)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f089152fe06 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007f089153066b in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007f089152fbab in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007f089104e5bd in start_thread (arg=0x7f08577fe700) at pthread_create.c:310
#5  0x00007f0890d8b5cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 5 (Thread 0x7f0856ffd700 (LWP 27408)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f089152fe06 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007f089153066b in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007f089152fbab in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007f089104e5bd in start_thread (arg=0x7f0856ffd700) at pthread_create.c:310
#5  0x00007f0890d8b5cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 4 (Thread 0x7f08567fc700 (LWP 27409)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f089152fe06 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007f089153066b in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007f089152fbab in virThreadHelper () from /lib64/libvirt.so.0
#4  0x00007f089104e5bd in start_thread (arg=0x7f08567fc700) at pthread_create.c:310
#5  0x00007f0890d8b5cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 3 (Thread 0x7f0855ffb700 (LWP 27410)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f089152fe06 in virCondWait () from /lib64/libvirt.so.0
#2  0x00007f089153066b in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x00007f089152fbab in virThreadHelper () from /lib64/libvirt.so.0
---Type <return> to continue, or q <return> to quit---
#4  0x00007f089104e5bd in start_thread (arg=0x7f0855ffb700) at pthread_create.c:310
#5  0x00007f0890d8b5cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 2 (Thread 0x7f0854fd8700 (LWP 27411)):
#0  0x00007f0890d7fd1d in poll () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f08914e1a4a in virCommandProcessIO () from /lib64/libvirt.so.0
#2  0x00007f08914e5d02 in virCommandRun () from /lib64/libvirt.so.0
#3  0x00007f085dcb1f95 in virQEMUCapsNewForBinary () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#4  0x00007f085dcb3b74 in virQEMUCapsCacheLookup () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#5  0x00007f085dcb3f8d in virQEMUCapsInit () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#6  0x00007f085dcea670 in virQEMUDriverCreateCapabilities () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#7  0x00007f085dd27e10 in qemuStateInitialize () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#8  0x00007f08915c2e08 in virStateInitialize () from /lib64/libvirt.so.0
#9  0x0000563fef46df9b in daemonRunStateInit ()
#10 0x00007f089152fbca in virThreadHelper () from /lib64/libvirt.so.0
#11 0x00007f089104e5bd in start_thread (arg=0x7f0854fd8700) at pthread_create.c:310
#12 0x00007f0890d8b5cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 1 (Thread 0x7f089217e8c0 (LWP 27392)):
#0  0x00007f0890d7fd1d in poll () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f08914f2c67 in virEventPollRunOnce () from /lib64/libvirt.so.0
#2  0x00007f08914f18f1 in virEventRunDefaultImpl () from /lib64/libvirt.so.0
#3  0x00007f0891630a8d in virNetDaemonRun () from /lib64/libvirt.so.0
#4  0x0000563fef46d86b in main ()
(gdb)

Comment 2 Florian Hubold 2016-02-15 22:24:31 UTC
Versions of packages involved:

$ rpm -qa qemu lib64virt0 lib64virt-devel libvirt-debuginfo libvirt-utils | sort
lib64virt0-1.2.18.2-1.1.mga5
lib64virt-devel-1.2.18.2-1.1.mga5
libvirt-debuginfo-1.2.9.3-1.2.mga5
libvirt-utils-1.2.18.2-1.1.mga5
qemu-2.4.0.1-1.1.mga5

qemu 2.4.0 was just for testing, happens the same way with qemu-2.1.3.

Comment 3 Florian Hubold 2016-02-15 22:39:41 UTC
Sorry for the copy&paste mixup with 1.2.18.2 ...

$ rpm -qa qemu lib64virt0 lib64virt-devel libvirt-debuginfo libvirt-utils | sort
lib64virt0-1.2.9.3-1.1.mga5
lib64virt-devel-1.2.9.3-1.1.mga5
libvirt-debuginfo-1.2.9.3-1.1.mga5
libvirt-utils-1.2.9.3-1.1.mga5
qemu-2.1.3-2.11.mga5


Please mention if you need any further info.

Comment 4 Florian Hubold 2016-02-21 20:59:15 UTC
Tried a git bisect, using the following boundaries:

git bisect bad v1.2.9.3 2>&1 | tee -a ~/libvirt_bisect.log
git bisect good ea7de69315f227a14a1960f4ad02aa397699c827 2>&1 | tee -a ~/libvirt_bisect.log

Where the good commit is the last patch in our package, on top of 1.2.9.2 (this one: http://libvirt.org/git/?p=libvirt.git;a=commit;h=ea7de69315f227a14a1960f4ad02aa397699c827 )


After doing the bisect got this:


Bisecting: 14 revisions left to test after this (roughly 4 steps)
[808a6382400b0ea64beea7681e3eca8c784019de] qemu: conf: Clarify paths that are relative to libDir
Bisecting: 7 revisions left to test after this (roughly 3 steps)
[c9da7af61ed93fc6e8d12c3c522c17fb0afcb71b] Do not format CPU features without a model
Bisecting: 3 revisions left to test after this (roughly 2 steps)
[a514d325d534f799d4e3af0163408fbb3e96d5bf] Report original error when QMP probing fails with new QEMU
Bisecting: 1 revision left to test after this (roughly 1 step)
[5d6cf6d4e94ad6fcb49d8b241eb4ab5202671dbc] storage: qemu: Fix security labelling of new image chain elements
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[b2436dafbf66ae227efd0cba91ede5bc9608c651] cpu: Add {Haswell,Broadwell}-noTSX CPU models


a514d325d534f799d4e3af0163408fbb3e96d5bf is the first bad commit
commit a514d325d534f799d4e3af0163408fbb3e96d5bf
Author: Daniel P. Berrange <berrange>
Date:   Thu Dec 4 15:26:41 2014 +0000

    Report original error when QMP probing fails with new QEMU
    
    If probing capabilities via QMP fails, we now have a check
    that prevents us falling back to -help parsing. Unfortunately
    the error message
    
      "Failed to probe capabilities for /usr/bin/qemu-kvm:
       unsupported configuration: QEMU 2.1.2 is too new for help parsing"
    
    is proving rather unhelpful to the user. We need to be telling
    them why QMP failed (the root cause), rather than they can't
    use -help (the side effect).
    
    To do this we should capture stderr during QMP probing, and
    if -help parsing then sees a new QEMU version, we know that
    QMP should have worked, and so we can show the messages from
    stderr. The message thus becomes
    
      "Failed to probe capabilities for /usr/bin/qemu-kvm:
       internal error: QEMU / QMP failed: Could not access
       KVM kernel module: No such file or directory
       failed to initialize KVM: No such file or directory"
    
    (cherry picked from commit 25bf888a6653ecdeca58c597c296757f4cf8a229)

:040000 040000 4024eda9f5fb1a750dc601cc5a86b3d212188d8d bce657fdee99fe8f4ac3e4a3d71886a8d77d9eb9 M	src
:040000 040000 99aa1279311ce0934b252e95f7b1d749faad0581 0b29c521ac39e009e1188ec1a6b397642320b994 M	tests



@Daniel: Can you provide me some pointers what to do now? FWIW, this is with qemu 2.1.3, as shown in comment 3.

Comment 5 Ján Tomko 2016-03-02 09:52:44 UTC
That commit started capturing QEMU's stderr output, but QEMU does not seem to cooperate.
Do you have any downstream patches on top of QEMU? Does this happen with other qemu binaries than -arm, or does it hang on -arm because it's first in the alphabet?

Note that there is currently a bug in qemu.git master with the same symptoms, but that should not be in 2.4.0 and 2.1.3:
https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg00130.html

Also, the only point of that commit is to provide a friendlier error message if QEMU is broken, so it should be safe to revert in your downstream build.

Comment 6 Florian Hubold 2016-03-02 19:13:44 UTC
(In reply to Ján Tomko from comment #5)

> Do you have any downstream patches on top of QEMU? 

Our package is based loosely on the fedora one, and we have less patches, all of them are from upstream AFAICS. You can take a look at http://svnweb.mageia.org/packages/cauldron/qemu/current/SOURCES/ for a list.

> Does this happen with other qemu binaries than -arm, or does it hang on -arm
> because it's first in the alphabet?

That's a good question, from what I can see it always stops at arm which is the first probably because it's the first alphabetically. Can you tell me how to "blacklist" certain qemu flavours from the available ones, to e.g. rule out that maybe only the arm one has an issue?

> Note that there is currently a bug in qemu.git master with the same
> symptoms, but that should not be in 2.4.0 and 2.1.3:
> https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg00130.html

Thanks for the pointer, will take a look.

> Also, the only point of that commit is to provide a friendlier error message
> if QEMU is broken, so it should be safe to revert in your downstream build.


OK, will try.


FWIW, with later libvirt versions I tried for testing often got the error
> internal error: cannot mix string I/O with daemon
when starting libvirtd interactively, does that ring any bells?

[doktor5000@Mageia5]─[17:10:47]─[~] sudo LC_ALL=C /usr/sbin/libvirtd --config /etc/libvirt/libvirtd.conf
2016-02-12 16:11:20.522+0000: 29641: info : libvirt version: 1.2.9.3
2016-02-12 16:11:20.522+0000: 29641: error : virCommandRun:2211 : internal error: cannot mix string I/O with daemon
2016-02-12 16:11:20.522+0000: 29641: warning : virQEMUCapsLogProbeFailure:3374 : Failed to probe capabilities for /bin/qemu-system-arm: internal error: cannot mix string I/O with daemon
2016-02-12 16:11:20.522+0000: 29641: error : virCommandRun:2211 : internal error: cannot mix string I/O with daemon
2016-02-12 16:11:20.522+0000: 29641: warning : virQEMUCapsLogProbeFailure:3374 : Failed to probe capabilities for /bin/qemu-system-cris: internal error: cannot mix string I/O with daemon
2016-02-12 16:11:20.522+0000: 29641: error : virCommandRun:2211 : internal error: cannot mix string I/O with daemon

[ snip - continues for all qemu flavours up to the last one ]

2016-02-12 16:11:20.527+0000: 29641: error : virCommandRun:2211 : internal error: cannot mix string I/O with daemon
2016-02-12 16:11:20.527+0000: 29641: warning : virQEMUCapsLogProbeFailure:3374 : Failed to probe capabilities for /bin/qemu-system-x86_64: internal error: cannot mix string I/O with daemon
2016-02-12 16:11:20.528+0000: 29641: error : virCommandRun:2211 : internal error: cannot mix string I/O with daemon
2016-02-12 16:11:20.528+0000: 29641: warning : virQEMUCapsLogProbeFailure:3374 : Failed to probe capabilities for /bin/qemu-kvm: internal error: cannot mix string I/O with daemon

Comment 7 Ján Tomko 2016-03-03 16:00:47 UTC
(In reply to Florian Hubold from comment #6)
> > Does this happen with other qemu binaries than -arm, or does it hang on -arm
> > because it's first in the alphabet?
> 
> That's a good question, from what I can see it always stops at arm which is
> the first probably because it's the first alphabetically. Can you tell me
> how to "blacklist" certain qemu flavours from the available ones, to e.g.
> rule out that maybe only the arm one has an issue?
> 

Temporarily removing the executable bit should do the trick.

> FWIW, with later libvirt versions I tried for testing often got the error
> > internal error: cannot mix string I/O with daemon
> when starting libvirtd interactively, does that ring any bells?
> 

That looks like a coding error inside libvirt - using both virCommandDaemonize and virCommandSetErrorBuffer for running the qemu process.

But virCommandDaemonize was only used for a few months between:
    commit 1a50ba2cb07d8bb2aa724062889deb9efd7ad9e9
    CommitDate: 2012-11-28 14:54:29 +0000
        qemu: Fix QMP Capabability Probing Failure
(contained in v1.0.1-rc1~168) until:
    commit 1b253a102fbfddb73947b46288c90d74d01fcf0f
    CommitDate: 2013-01-25 10:41:48 +0000
        Fix performance & reliabilty of QMP probing
contained in v1.0.2-rc2~19.

The virCommandSetErrorBuffer call was added by the above mentioned commit:
    commit 25bf888a6653ecdeca58c597c296757f4cf8a229
    CommitDate: 2014-12-05 10:57:46 +0000
        Report original error when QMP probing fails with new QEMU
contained in v1.2.11-rc1~27 so I have no idea how that error message could be triggered.

Comment 8 Florian Hubold 2016-03-05 15:24:33 UTC
(In reply to Ján Tomko from comment #7)
> (In reply to Florian Hubold from comment #6)
> > FWIW, with later libvirt versions I tried for testing often got the error
> > > internal error: cannot mix string I/O with daemon
> > when starting libvirtd interactively, does that ring any bells?
> > 
> 
> That looks like a coding error inside libvirt - using both
> virCommandDaemonize and virCommandSetErrorBuffer for running the qemu
> process.
> 
> But virCommandDaemonize was only used for a few months between:
>     commit 1a50ba2cb07d8bb2aa724062889deb9efd7ad9e9
>     CommitDate: 2012-11-28 14:54:29 +0000
>         qemu: Fix QMP Capabability Probing Failure
> (contained in v1.0.1-rc1~168) until:
>     commit 1b253a102fbfddb73947b46288c90d74d01fcf0f
>     CommitDate: 2013-01-25 10:41:48 +0000
>         Fix performance & reliabilty of QMP probing
> contained in v1.0.2-rc2~19.
> 
> The virCommandSetErrorBuffer call was added by the above mentioned commit:
>     commit 25bf888a6653ecdeca58c597c296757f4cf8a229
>     CommitDate: 2014-12-05 10:57:46 +0000
>         Report original error when QMP probing fails with new QEMU
> contained in v1.2.11-rc1~27 so I have no idea how that error message could
> be triggered.


(In reply to Ján Tomko from comment #5)
> Do you have any downstream patches on top of QEMU?

For what it's worth, forgot one downstream patch, maybe you can comment on that one ... although sadly I don't know the origin of that patch. It seems to be required for us to allow e.g. virt-manager or others to connect to libvirt.

This is the patch:
http://svnweb.mageia.org/packages/updates/5/libvirt/current/SOURCES/libvirt-1.2.3-mga-no-daemonize.patch?revision=955439&view=markup

This would be an explanation for the mixup of both virCommandDaemonize and virCommandSetErrorBuffer right?

----

> Also, the only point of that commit is to provide a friendlier error message
> if QEMU is broken, so it should be safe to revert in your downstream build.

After reverting that commit, tests with libvirt 1.2.9.3 seem fine, and virsh or virt-manager or similar can connect fine again.

From a quick lookup on libvirt packages for other distros, apparently nobody uses the v1.2.9-maint maintenance branch we use, so maybe nobody else noticed this issue. Closest seems to be openSUSE 13.2 but they seem to use 1.2.9 directly with several dozen upstream patches backported on top. Same as Debian (Jessie).

I'll try to switch us to one of the more current maintenance branches, I'll try first with v1.2.13-maint and then to v1.2.18-maint.

Comment 9 Daniel Berrangé 2016-03-07 11:37:13 UTC
(In reply to Florian Hubold from comment #8)
> (In reply to Ján Tomko from comment #7)
> > (In reply to Florian Hubold from comment #6)
> > > FWIW, with later libvirt versions I tried for testing often got the error
> > > > internal error: cannot mix string I/O with daemon
> > > when starting libvirtd interactively, does that ring any bells?
> > > 
> > 
> > That looks like a coding error inside libvirt - using both
> > virCommandDaemonize and virCommandSetErrorBuffer for running the qemu
> > process.
> > 
> > But virCommandDaemonize was only used for a few months between:
> >     commit 1a50ba2cb07d8bb2aa724062889deb9efd7ad9e9
> >     CommitDate: 2012-11-28 14:54:29 +0000
> >         qemu: Fix QMP Capabability Probing Failure
> > (contained in v1.0.1-rc1~168) until:
> >     commit 1b253a102fbfddb73947b46288c90d74d01fcf0f
> >     CommitDate: 2013-01-25 10:41:48 +0000
> >         Fix performance & reliabilty of QMP probing
> > contained in v1.0.2-rc2~19.
> > 
> > The virCommandSetErrorBuffer call was added by the above mentioned commit:
> >     commit 25bf888a6653ecdeca58c597c296757f4cf8a229
> >     CommitDate: 2014-12-05 10:57:46 +0000
> >         Report original error when QMP probing fails with new QEMU
> > contained in v1.2.11-rc1~27 so I have no idea how that error message could
> > be triggered.
> 
> 
> (In reply to Ján Tomko from comment #5)
> > Do you have any downstream patches on top of QEMU?
> 
> For what it's worth, forgot one downstream patch, maybe you can comment on
> that one ... although sadly I don't know the origin of that patch. It seems
> to be required for us to allow e.g. virt-manager or others to connect to
> libvirt.
> 
> This is the patch:
> http://svnweb.mageia.org/packages/updates/5/libvirt/current/SOURCES/libvirt-
> 1.2.3-mga-no-daemonize.patch?revision=955439&view=markup
> 
> This would be an explanation for the mixup of both virCommandDaemonize and
> virCommandSetErrorBuffer right?

That patch is the root cause of the problems here. The virCommandDaemonize() code is *not* functionally the same as the -daemonize flag to QEMU.  Specifically virCommandDaemonize will leave stdout/stderr open, while -daemonize will close them.  This difference violates the assumption against which commit a514d325d534f799d4e3af0163408fbb3e96d5bf was written.

So either you have to remove your non-upstream change to libvirt, or revert this error reporting change. Either way its not a libvirt upstream bug.


Note You need to log in before you can comment on or make changes to this bug.