Bug 512367 - Application using libvirt crashes when having concurrent TLS connections (gnutls problem)
Summary: Application using libvirt crashes when having concurrent TLS connections (gnu...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: libvirt
Version: 5.4
Hardware: i386
OS: Linux
low
urgent
Target Milestone: rc
: ---
Assignee: Daniel Berrangé
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-07-17 14:42 UTC by Daniel Berrangé
Modified: 2014-06-24 03:33 UTC (History)
9 users (show)

Fixed In Version: libvirt-0.6.3-26.el5
Doc Type: Bug Fix
Doc Text:
Clone Of: 512350
Environment:
Last Closed: 2010-03-30 08:10:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Backport of the upstream patch to the 0.6.3 RHEL-5 tree (2.47 KB, patch)
2009-12-22 13:56 UTC, Daniel Veillard
no flags Details | Diff
Setting up a Certificate Authority (3.54 KB, text/plain)
2010-01-07 10:23 UTC, Alex Jia
no flags Details
Issuing server certificates (3.86 KB, text/plain)
2010-01-07 10:24 UTC, Alex Jia
no flags Details
Issuing client certificates (4.28 KB, text/plain)
2010-01-07 10:25 UTC, Alex Jia
no flags Details
cacert.pem (709 bytes, text/plain)
2010-01-17 11:59 UTC, Gunannan Ren
no flags Details
servercert.pem (830 bytes, text/plain)
2010-01-17 12:00 UTC, Gunannan Ren
no flags Details
clientcert.pem (895 bytes, text/plain)
2010-01-17 12:01 UTC, Gunannan Ren
no flags Details
The scripts verifying the bug (1.12 KB, text/x-python)
2010-01-20 12:36 UTC, Gunannan Ren
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2010:0205 0 normal SHIPPED_LIVE libvirt bug fix and enhancement update 2010-03-29 12:27:37 UTC

Description Daniel Berrangé 2009-07-17 14:42:09 UTC
+++ This bug was initially created as a clone of Bug #512350 +++

Description of problem:
I'm using libvirt's python wrapper and get into lots of trouble when having concurrent TLS connections (to xen://localhost/). I assume that is the bug in gnutls which is described on the Internet:

http://groups.google.com/group/google-gadgets-for-linux-user/browse_thread/thread/2c718e4e56be7a49/d16868d743ac18b4?lnk=raot
http://bugzilla.gnome.org/show_bug.cgi?id=172813

Version-Release number of selected component (if applicable):
libvirt-0.6.4 (from Debian lenny stable repo)

How reproducible:
Always, makes libvirt unusable for me

Steps to Reproduce:
Create two concurrent connections to xen://localhost/, e.g. using multithreading (and yes, global locks did *not* help, it's really a gnutls problem) and use the connection somehow. In my test case thread A was getting domain information and thread B was to shutdown a domain.
  
Actual results:
The following errors occur and the python process is aborted immediately:
  python: ath.c:193: _gcry_ath_mutex_lock: Assertion `*lock == ((ath_mutex_t) 0)' failed
  Aborted

Expected results:
Simultaneous TLS connections should not crash the application.

Additional info:
As to the links given above (especially http://lists.gnupg.org/pipermail/gcrypt-devel/2006-January/000911.html), it might help to call the correct gcrypt initialization functions (for threading) when starting libvirtd. In libvirt-0.6.4, gcrypt functions do not appear at all.
*Quickly* porting to OpenSSL as a workaround is not an option, I guess ;-)

I will now try to use a singleton connection object as a workaround but please could somebody solve this problem?

Comment 1 Daniel Berrangé 2009-07-17 14:46:04 UTC
Actually changing my mind about this. It is not a Regression. The server side only ever uses GNUTLS from 1 single thread, so it can't be impacted. THe client side was always broken

Comment 3 Daniel Berrangé 2009-12-16 18:50:14 UTC
A proof of concept posted upstream

http://www.redhat.com/archives/libvir-list/2009-December/msg00486.html

Comment 4 Daniel Veillard 2009-12-22 13:56:35 UTC
Created attachment 379825 [details]
Backport of the upstream patch to the 0.6.3 RHEL-5 tree

This is nearly the upstream patch with just some context changed and
the extra configure bits.

Daniel

Comment 5 Daniel Veillard 2009-12-22 15:25:58 UTC
libvirt-0.6.3-26.el5 has been built into dist-5E-qu-candidate with the fix

Daniel

Comment 7 Alex Jia 2010-01-07 10:20:04 UTC
Using TLS connection is failed,a error message was raised for kvm and xen hypervisor:
[root@dhcp-66-70-173 ~]# virsh -c qemu+tls://10.66.70.62/system
error: unable to connect to '10.66.70.62': Invalid argument
error: failed to connect to the hypervisor

but SSH connection is ok:
[root@dhcp-66-70-173 ~]# virsh -c qemu+ssh://10.66.70.62/system list --all
root.70.62's password:
 Id Name                 State
----------------------------------
  - rhel5u5              shut off

I am not sure whether missing some stuff between client and server.it seems that certificate is ok.


Steps to Reproduce:
server --> 10.66.70.62
client --> 10.66.70.173
1.Setting up a Certificate Authority(on server) and
  -- scp cacert.pem 10.66.70.173:/etc/pki/CA/
  -- scp cakey.pem 10.66.70.173:/etc/pki/CA/
2.Issuing server certificatesscp cakey.pem and
  -- mkdir -p /etc/pki/libvirt/private/
  -- cp serverkey.pem /etc/pki/libvirt/private/
  -- cp servercert.pem /etc/pki/libvirt/
3.Issuing client certificates and
  -- mkdir -p /etc/pki/libvirt/private/
  -- cp clientkey.pem /etc/pki/libvirt/private/
  -- cp clientcert.pem /etc/pki/libvirt/
4.Turn on libvird monitor listening on server
  -- uncomment LIBVIRTD_ARGS="--listen"
  -- enbale listen_tls = 1 in libvirtd.conf(/etc/libvirt/libvirtd.conf)
  -- service libvirtd restart
  -- service iptables stop
5.Remote connection from client to server libvirtd
  -- virsh -c qemu+tls://10.66.70.62/system

Version-Release number of selected component (if applicable):
[root@dhcp-66-70-173 clientkey]# uname -a
Linux dhcp-66-70-173.nay.redhat.com 2.6.18-183.el5 #1 SMP Mon Dec 21 18:37:42 EST 2009 x86_64 x86_64 x86_64 GNU/Linux
[root@dhcp-66-70-173 clientkey]# lsmod|grep kvm
kvm_intel              86664  0
kvm                   223648  2 ksm,kvm_intel
[root@dhcp-66-70-173 clientkey]# rpm -qa|grep libvirt
libvirt-debuginfo-0.6.3-29.el5
libvirt-python-0.6.3-29.el5
libvirt-0.6.3-29.el5
[root@dhcp-66-70-173 clientkey]# rpm -qa|grep kvm
kvm-tools-83-140.el5
kvm-qemu-img-83-140.el5
kmod-kvm-83-140.el5
etherboot-zroms-kvm-5.4.4-13.el5
kvm-83-140.el5

Comment 8 Alex Jia 2010-01-07 10:23:16 UTC
Created attachment 382191 [details]
Setting up a Certificate Authority

Comment 9 Alex Jia 2010-01-07 10:24:14 UTC
Created attachment 382192 [details]
Issuing server certificates

Comment 10 Alex Jia 2010-01-07 10:25:24 UTC
Created attachment 382193 [details]
Issuing client certificates

Comment 11 Daniel Berrangé 2010-01-12 17:27:27 UTC
Please check that port 16514 is open on the firewall for the server running libvirtd,

eg 

  telnet 10.66.70.62  16514

and see if that works

Comment 12 Gunannan Ren 2010-01-15 12:56:20 UTC
I redo the test, "telnet 10.66.70.62 16541" is working, it shows connection set up already.
in the client , errors still report like this:
error: unable to connect to '10.66.70.62': Invalid argument
error: failed to connect to the hypervisor

on the end of libvirtd server, add option "--verbose" on the /usr/sbin/libvirtd command line while the client is connecting, it shows:

07:34:46.331: error : gnutls_record_recv : A TLS packet with unexpected length was received

Comment 13 Daniel Berrangé 2010-01-15 14:10:43 UTC
It looks like your x509 server certificate is not correct. In the attachment in comment 9, I see a subject line of:

Subject: O=Red Hat Emerging Technologies,CN=oirase


The 'CN=oirase' bit is supposed to be using the hostname of your server. 'oirase' is the example hostname from the libvirt documentation - you need to replace this with your own hostname when creating certificates. This must match the hostname used in the libvirt URI *exactly*, so since your URI is

 qemu+ssh://10.66.70.62/system 

The server certificate should end up showing

  CN=10.66.70.62

Comment 14 Gunannan Ren 2010-01-15 14:19:33 UTC
yup I know what you mean, I did it as you said.
it reports the errors like comment 12

Comment 15 Gunannan Ren 2010-01-15 14:24:14 UTC
sorry, I will do it again before adding the above comments

Comment 16 Gunannan Ren 2010-01-17 11:57:53 UTC
I tried again , it still reports errors on end of libvirtd server while client is connecting:

19:53:52.569: error : gnutls_record_recv: A TLS packet with unexpected length was received.

Here is the relevant information:

on libvirtd server(10.66.70.62):

#ps -ef|grep libvirt
nobody    3257     1  0 16:35 ?        00:00:00 /usr/sbin/dnsmasq --strict-order --bind-interfaces --pid-file=/var/run/libvirt/network/default.pid --conf-file=  --listen-address 192.168.122.1 --except-interface lo --dhcp-range 192.168.122.2,192.168.122.254 --dhcp-lease-max=253
root      5869  4797 14 19:53 pts/5    00:00:00 /usr/sbin/libvirtd --listen

on the client(10.66.70.64):

# virsh -c qemu+tls://10.66.70.62/system
error: unable to connect to '10.66.70.62': Invalid argument
error: failed to connect to the hypervisor

Comment 17 Gunannan Ren 2010-01-17 11:59:39 UTC
Created attachment 384904 [details]
cacert.pem

Comment 18 Gunannan Ren 2010-01-17 12:00:38 UTC
Created attachment 384905 [details]
servercert.pem

Comment 19 Gunannan Ren 2010-01-17 12:01:14 UTC
Created attachment 384906 [details]
clientcert.pem

Comment 20 Daniel Berrangé 2010-01-18 10:46:56 UTC
You still have mis-matched hostnames. You are connecting based on IP address

# virsh -c qemu+tls://10.66.70.62/system

But the servercert.pem contains

        Subject: O=RedHat test,CN=dhcp-66-70-62.nay.redhat.com


You need to use *EXACTLY* the same hostname for both, you can't mix & match hostnames with IP addreses, or vica-verca.

So you need to try connecting with

  #virsh -c qemu+tls://dhcp-66-70-62.nay.redhat.com/system

Comment 21 Gunannan Ren 2010-01-19 11:26:13 UTC
yep, using hostname is right, it is successful to have TLS connection, now
next, I try to create concurrent connections using threading to reproduce the bug.

Comment 22 Gunannan Ren 2010-01-20 12:33:16 UTC
The bug has been fixed on libvirt-0.6.3-30.el5

I reproduce the bug on libvirt-0.6.3-24.el5
# python multhread.py rhel5u5_x86_64_kvm
Thread:No 1
Thread:No 2
python: ath.c:193: _gcry_ath_mutex_lock: Assertion `*lock == ((ath_mutex_t) 0)' failed.
Aborted

on libvirt-0.6.3-30.el5, there is no problem.

Comment 23 Gunannan Ren 2010-01-20 12:36:29 UTC
Created attachment 385667 [details]
The scripts verifying the bug

Comment 25 Johnny Liu 2010-02-02 10:43:20 UTC
Verified this bug PASS with libvirt-0.6.3-31.el5 on RHEL-5.5-Server-x86_64-xen

Comment 27 errata-xmlrpc 2010-03-30 08:10:16 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0205.html


Note You need to log in before you can comment on or make changes to this bug.