Bug 503178

Summary: libvirtd crashes on tls connection
Product: Red Hat Enterprise Linux 5 Reporter: Daniel Berrangé <berrange>
Component: libvirtAssignee: Daniel Berrangé <berrange>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: low    
Version: 5.4CC: crobinso, nzhang, sghosh, veillard, virt-maint, xen-maint
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 503066 Environment:
Last Closed: 2009-09-02 09:22:33 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Daniel Berrangé 2009-05-29 12:56:24 UTC
+++ This bug was initially created as a clone of Bug #503066 +++

This is libvirt-0.6.2-8.fc11.x86_64

The following backtrace was produced by connecting with tls from a remote machine to a running libvirtd instance:

gdb --args /usr/sbin/libvirtd --listen
(gdb) r
Starting program: /usr/sbin/libvirtd --listen
[Thread debugging using libthread_db enabled]
Detaching after fork from child process 15818.
Detaching after fork from child process 15820.
Detaching after fork from child process 15821.
[New Thread 0x7ffff1ca0910 (LWP 15822)]
[New Thread 0x7ffff129f910 (LWP 15823)]
[New Thread 0x7ffff089e910 (LWP 15824)]
[New Thread 0x7fffefe9d910 (LWP 15825)]
[New Thread 0x7fffef49c910 (LWP 15826)]
18:33:37.734: error : gnutls_record_recv: A TLS packet with unexpected length was received.
*** glibc detected *** /usr/sbin/libvirtd: double free or corruption (!prev): 0x00000000006bd530 ***
======= Backtrace: =========
/lib64/libc.so.6[0x7ffff5b40a26]
/usr/sbin/libvirtd[0x41143a]
/usr/sbin/libvirtd[0x411d18]
/usr/sbin/libvirtd[0x40ea11]
/usr/sbin/libvirtd[0x4109e9]
/usr/sbin/libvirtd[0x410cae]
/usr/sbin/libvirtd[0x4142b7]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x7ffff5ae9a2d]
/usr/sbin/libvirtd[0x40db49]
======= Memory map: ========
00400000-00460000 r-xp 00000000 fd:00 21064                              /usr/sbin/libvirtd
00660000-00663000 rw-p 00060000 fd:00 21064                              /usr/sbin/libvirtd
00663000-006e4000 rw-p 00663000 00:00 0                                  [heap]
7fffee882000-7fffee89b000 r-xp 00000000 08:01 262156                     /lib64/libgcc_s-4.4.0-20090506.so.1
7fffee89b000-7fffeea9b000 ---p 00019000 08:01 262156                     /lib64/libgcc_s-4.4.0-20090506.so.1
7fffeea9b000-7fffeea9c000 rw-p 00019000 08:01 262156                     /lib64/libgcc_s-4.4.0-20090506.so.1
7fffeea9c000-7fffeea9d000 ---p 7fffeea9c000 00:00 0 
7fffeea9d000-7fffef49d000 rw-p 7fffeea9d000 00:00 0 
7fffef49d000-7fffef49e000 ---p 7fffef49d000 00:00 0 
7fffef49e000-7fffefe9e000 rw-p 7fffef49e000 00:00 0 
7fffefe9e000-7fffefe9f000 ---p 7fffefe9e000 00:00 0 
7fffefe9f000-7ffff089f000 rw-p 7fffefe9f000 00:00 0 
7ffff089f000-7ffff08a0000 ---p 7ffff089f000 00:00 0 
7ffff08a0000-7ffff12a0000 rw-p 7ffff08a0000 00:00 0 
7ffff12a0000-7ffff12a1000 ---p 7ffff12a0000 00:00 0 
7ffff12a1000-7ffff1ca1000 rw-p 7ffff12a1000 00:00 0 
7ffff1ca1000-7ffff1ca5000 r-xp 00000000 fd:00 3411614                    /usr/lib64/sasl2/libcrammd5.so.2.0.22
7ffff1ca5000-7ffff1ea5000 ---p 00004000 fd:00 3411614                    /usr/lib64/sasl2/libcrammd5.so.2.0.22
7ffff1ea5000-7ffff1ea6000 rw-p 00004000 fd:00 3411614                    /usr/lib64/sasl2/libcrammd5.so.2.0.22
7ffff1ea6000-7ffff1eaa000 r-xp 00000000 fd:00 3446402                    /usr/lib64/sasl2/liblogin.so.2.0.22
7ffff1eaa000-7ffff20a9000 ---p 00004000 fd:00 3446402                    /usr/lib64/sasl2/liblogin.so.2.0.22
7ffff20a9000-7ffff20aa000 rw-p 00003000 fd:00 3446402                    /usr/lib64/sasl2/liblogin.so.2.0.22
7ffff20aa000-7ffff20ac000 r-xp 00000000 08:01 262447                     /lib64/libkeyutils-1.2.so
7ffff20ac000-7ffff22ab000 ---p 00002000 08:01 262447                     /lib64/libkeyutils-1.2.so
7ffff22ab000-7ffff22ac000 rw-p 00001000 08:01 262447                     /lib64/libkeyutils-1.2.so
7ffff22ac000-7ffff22b5000 r-xp 00000000 fd:00 29699                      /usr/lib64/libkrb5support.so.0.1
7ffff22b5000-7ffff24b4000 ---p 00009000 fd:00 29699                      /usr/lib64/libkrb5support.so.0.1
7ffff24b4000-7ffff24b5000 rw-p 00008000 fd:00 29699                      /usr/lib64/libkrb5support.so.0.1
7ffff24b5000-7ffff24b8000 r-xp 00000000 08:01 262407                     /lib64/libcom_err.so.2.1
7ffff24b8000-7ffff26b7000 ---p 00003000 08:01 262407                     /lib64/libcom_err.so.2.1
7ffff26b7000-7ffff26b8000 rw-p 00002000 08:01 262407                     /lib64/libcom_err.so.2.1
7ffff26b8000-7ffff26dc000 r-xp 00000000 fd:00 29678                      /usr/lib64/libk5crypto.so.3.1
7ffff26dc000-7ffff28dc000 ---p 00024000 fd:00 29678                      /usr/lib64/libk5crypto.so.3.1
7ffff28dc000-7ffff28de000 rw-p 00024000 fd:00 29678                      /usr/lib64/libk5crypto.so.3.1
7ffff28de000-7ffff2979000 r-xp 00000000 fd:00 29693                      /usr/lib64/libkrb5.so.3.3
7ffff2979000-7ffff2b79000 ---p 0009b000 fd:00 29693                      /usr/lib64/libkrb5.so.3.3
7ffff2b79000-7ffff2b7d000 rw-p 0009b000 fd:00 29693                      /usr/lib64/libkrb5.so.3.3
7ffff2b7d000-7ffff2ba8000 r-xp 00000000 fd:00 29671                      /usr/lib64/libgssapi_krb5.so.2.2
7ffff2ba8000-7ffff2da7000 ---p 0002b000 fd:00 29671                      /usr/lib64/libgssapi_krb5.so.2.2
7ffff2da7000-7ffff2da9000 rw-p 0002a000 fd:00 29671                      /usr/lib64/libgssapi_krb5.so.2.2
7ffff2da9000-7ffff2db0000 r-xp 00000000 fd:00 557857                     /usr/l
Program received signal SIGABRT, Aborted.
0x00007ffff5afe2f5 in *__GI_raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64	  return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
Current language:  auto; currently minimal
Missing separate debuginfos, use: debuginfo-install db4-4.7.25-11.fc11.x86_64 e2fsprogs-libs-1.41.4-10.fc11.x86_64 keyutils-libs-1.2-5.fc11.x86_64 krb5-libs-1.6.3-20.fc11.x86_64 libgcc-4.4.0-4.x86_64 openssl-0.9.8k-4.fc11.x86_64

(gdb) bt
#0  0x00007ffff5afe2f5 in *__GI_raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007ffff5affb20 in *__GI_abort () at abort.c:88
#2  0x00007ffff5b3b05d in __libc_message (do_abort=2, 
    fmt=0x7fffffffcc00 "/usr/lib64/libkrb5.so.3.3\n7ffff2979000-7ffff2b79000 ---p 0009b000 fd:00 29693", ' ' <repeats 22 times>, "/usr/lib64/libkrb5.so.3.3\n7ffff2b79000-7ffff2b7d000 rw-p 0009b000 fd:00 29693", ' ' <repeats 22 times>, "/u"...)
    at ../sysdeps/unix/sysv/linux/libc_fatal.c:170
#3  0x00007ffff5b40a26 in malloc_printerr (action=3, str=0x7ffff5c00d48 "double free or corruption (!prev)", 
    ptr=<value optimized out>) at malloc.c:6196
#4  0x000000000041143a in qemudDispatchClientFailure (client=0x6bcf60) at qemud.c:1412
#5  0x0000000000411d18 in qemudDispatchClientEvent (watch=-11344, fd=-11284, events=12, opaque=<value optimized out>)
    at qemud.c:2001
#6  0x000000000040ea11 in virEventDispatchHandles (fds=<value optimized out>, nfds=<value optimized out>) at event.c:427
#7  virEventRunOnce (fds=<value optimized out>, nfds=<value optimized out>) at event.c:561
#8  0x00000000004109e9 in qemudOneLoop () at qemud.c:2067
#9  0x0000000000410cae in qemudRunLoop (server=0x663860) at qemud.c:2172
#10 0x00000000004142b7 in main (argc=<value optimized out>, argv=<value optimized out>) at qemud.c:2921
(gdb) bt
#0  0x00007ffff5afe2f5 in *__GI_raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007ffff5affb20 in *__GI_abort () at abort.c:88
#2  0x00007ffff5b3b05d in __libc_message (do_abort=2, 
    fmt=0x7fffffffcc00 "/usr/lib64/libkrb5.so.3.3\n7ffff2979000-7ffff2b79000 ---p 0009b000 fd:00 29693", ' ' <repeats 22 times>, "/usr/lib64/libkrb5.so.3.3\n7ffff2b79000-7ffff2b7d000 rw-p 0009b000 fd:00 29693", ' ' <repeats 22 times>, "/u"...)
    at ../sysdeps/unix/sysv/linux/libc_fatal.c:170
#3  0x00007ffff5b40a26 in malloc_printerr (action=3, str=0x7ffff5c00d48 "double free or corruption (!prev)", 
    ptr=<value optimized out>) at malloc.c:6196
#4  0x000000000041143a in qemudDispatchClientFailure (client=0x6bcf60) at qemud.c:1412
#5  0x0000000000411d18 in qemudDispatchClientEvent (watch=-11344, fd=-11284, events=12, opaque=<value optimized out>)
    at qemud.c:2001
#6  0x000000000040ea11 in virEventDispatchHandles (fds=<value optimized out>, nfds=<value optimized out>) at event.c:427
#7  virEventRunOnce (fds=<value optimized out>, nfds=<value optimized out>) at event.c:561
#8  0x00000000004109e9 in qemudOneLoop () at qemud.c:2067
#9  0x0000000000410cae in qemudRunLoop (server=0x663860) at qemud.c:2172
#10 0x00000000004142b7 in main (argc=<value optimized out>, argv=<value optimized out>) at qemud.c:2921

--- Additional comment from bernie on 2009-05-28 12:58:05 EDT ---

This is the client-side log:

[Thu, 28 May 2009 18:30:47 virt-manager 15356] DEBUG (connection:455) Scheduling background open thread for qemu+tls://bender.lo>
[Thu, 28 May 2009 18:30:47 virt-manager 15356] DEBUG (connection:585) Background thread is running
[Thu, 28 May 2009 18:37:34 virt-manager 15356] ERROR (connection:579) Readonly connection failed.
Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/connection.py", line 573, in _try_open
    self.vmm = libvirt.openReadOnly(self.uri)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 168, in openReadOnly
    if ret is None:raise libvirtError('virConnectOpenReadOnly() failed')
libvirtError: unable to connect to 'bender.local': Invalid argument
[Thu, 28 May 2009 18:37:34 virt-manager 15356] ERROR (connection:618) Unable to open connection to hypervisor URI 'qemu+tls://be>
<class 'libvirt.libvirtError'> unable to connect to 'bender.local': Invalid argument
None

[Thu, 28 May 2009 18:37:34 virt-manager 15356] DEBUG (connection:623) Background open thread complete, scheduling notify
[Thu, 28 May 2009 18:37:34 virt-manager 15356] DEBUG (connection:632) Notifying open result
[Thu, 28 May 2009 18:37:34 virt-manager 15356] DEBUG (error:76) Uncaught Error: Unable to open a connection to the libvirt manag>

--- Additional comment from berrange on 2009-05-28 13:23:46 EDT ---

Created an attachment (id=345799)
Avoid potential double-free

This is a prototype patch to solve what looks to be a double-free in the stack trace above.

--- Additional comment from berrange on 2009-05-28 13:24:20 EDT ---

Scratch build temporarily avaialble here

http://berrange.fedorapeople.org/rpms/bz503066/

--- Additional comment from berrange on 2009-05-29 08:52:43 EDT ---

I've managed to reproduce the problem & verified my patch fixes it.


http://www.redhat.com/archives/libvir-list/2009-May/msg00605.html

Comment 1 Daniel Berrangé 2009-05-29 12:59:54 UTC
To test:

Server side:

 - On libvirtd server configure TLS CA cert and a server cert
 - Enable listen_tls =1 in livirtd.conf
 - Add --listen to /etc/sysconfig/libvirtd
 - Restart libvirtd daemon


Client side:

 - Delibrately configure  libivrt client with a *different* CA cert
 - Connect to server using  virsh -c  qemu+tls://somehostname/system 


Results:

  - client will fail to connect
  - server will crash with SEGV

Sometimes it is neccessary to connect a couple of times with the client before it will crash the server.

Comment 2 Daniel Veillard 2009-06-03 16:02:54 UTC
libvirt-0.6.3-4.el5 has been built in dist-5E-qu-candidate with the
patch,

Daniel

Comment 4 Nan Zhang 2009-06-04 13:25:33 UTC
I tested with libvirt-0.6.3-4.el5 on RHEL-5.4, still has a problem.

Server side:

 - Enable listen_tls = 1 in livirtd.conf
 - Add --listen to /etc/sysconfig/libvirtd
 - Restart libvirtd daemon

[root@dhcp-66-70-85 ~]# service libvirtd restart
Stopping libvirtd daemon:                               [  OK  ]
Starting libvirtd daemon:                               [  OK  ]
[root@dhcp-66-70-85 ~]# service libvirtd status
libvirtd dead but subsys locked

Client side:

[root ~]# virsh -c qemu+tls://10.66.70.85/system
error: failed to connect to the hypervisor

Comment 5 Daniel Berrangé 2009-06-04 13:45:03 UTC
I can't reproduce the failure with 0.6.3-4.el5, so can you try and get more debuginfo, by installing the matching libvirt-debuginfo RPM, and then run libvirtd manually under GDB, eg

 # service libvirtd stop
 # gdb --args /usr/sbin/libvirtd --listen
 (gdb) run

And then on the remote client try connecting again, and if it crashes, then do 'thread apply all backtrace' in the GDB console.

Comment 6 Nan Zhang 2009-06-05 05:37:41 UTC
I retest with 0.6.3-5.el5. On my machine, there will be a CA error as following.

[root@dhcp-66-70-85 ~]# service libvirtd stop
Stopping libvirtd daemon:                                  [  OK  ]
[root@dhcp-66-70-85 ~]# gdb --args /usr/sbin/libvirtd --listen
GNU gdb Fedora (6.8-35.el5)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
(gdb) run
Starting program: /usr/sbin/libvirtd --listen
[Thread debugging using libthread_db enabled]
[New Thread 0x2b6a263971b0 (LWP 4158)]
13:37:07.036: error : Cannot access CA certificate '/etc/pki/CA/cacert.pem': No such file or directory

Program exited with code 02.
(gdb) q
[root@dhcp-66-70-85 ~]# ls /etc/pki/CA/cacert.pem
ls: /etc/pki/CA/cacert.pem: No such file or directory
[root@dhcp-66-70-85 ~]#

Comment 7 Daniel Berrangé 2009-06-05 09:18:16 UTC
Ok, you were not actually testing the scenario I described. You need to actaully configure the server with a CA cert and server certificate and private key, otherwise libvirtd will simply quit upon startup as you see there.

Only once you have the server running with a certificate, can you test the scenario from comment #1

Comment 8 Daniel Berrangé 2009-06-05 09:19:12 UTC
This page provides info on how to configure certificates:

http://libvirt.org/remote.html#Remote_certificates

Comment 9 Nan Zhang 2009-06-05 12:31:21 UTC
Yes, I configured the server with CA, but still has error when client try to connect to the server like following.

[root ~]# virsh -c qemu+tls://10.66.70.85/system
error: failed to connect to the hypervisor

[root@dhcp-66-70-85 CA]# service libvirtd stop
Stopping libvirtd daemon:                                  [  OK  ]
[root@dhcp-66-70-85 CA]# gdb --args /usr/sbin/libvirtd --listen
GNU gdb Fedora (6.8-35.el5)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
(gdb) run
Starting program: /usr/sbin/libvirtd --listen
[Thread debugging using libthread_db enabled]
[New Thread 0x2b709f03e8e0 (LWP 4027)]
[New Thread 0x41ded940 (LWP 4030)]
[New Thread 0x427ee940 (LWP 4031)]
[New Thread 0x431ef940 (LWP 4032)]
[New Thread 0x43bf0940 (LWP 4033)]
[New Thread 0x445f1940 (LWP 4034)]
20:11:41.304: error : gnutls_record_recv: A TLS packet with unexpected length was received.
20:12:11.743: error : gnutls_record_recv: A TLS packet with unexpected length was received.

Comment 10 Daniel Berrangé 2009-06-05 13:10:15 UTC
The client is supposed to see the error 'error: failed to connect to the hypervisor'. All that we were attempting to solve was that the server libvirtd process would crash with SEGV after the 'gnutls_record_recv: A TLS packet with unexpected length was received. ' message.

Since you're most recent comment shows the libvirtd continues to run without crashing, I'd you've now verified that the SEGV crash is solved.

Comment 11 Nan Zhang 2009-06-05 13:45:01 UTC
OK, now is the expected results as following, it's fixed with libvirt 0.6.3-5 in RHEL-5.4.

  - client: fail to connect
  - server: do not crash with SEGV

Comment 13 errata-xmlrpc 2009-09-02 09:22:33 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-1269.html