Bug 677724

Summary: port allocation/deallocation needs to be made thread-safe
Product: Red Hat Enterprise Linux 6 Reporter: Daniel Berrangé <berrange>
Component: libnlAssignee: Laine Stump <laine>
Status: CLOSED ERRATA QA Contact: desktop-bugs <desktop-bugs>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 6.1CC: dallan, dcbw, gerhard.stenzel, jens.osterkamp, jjarvis, jkachuck, laine, mhusnain, mjenner, nhorman, stefanb, tpelka
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: libnl-1.1-14.el6 Doc Type: Bug Fix
Doc Text:
The port allocation/de-allocation was not safe in multi-threaded applications and the logic was incorrect, which resulted in some applications and other libraries being unable to initialize libnl. Port allocation/de-allocation and logic is now fixed, and libnl can now be initialized.
Story Points: ---
Clone Of: 677527 Environment:
Last Closed: 2011-05-19 14:29:51 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 677729, 684385    
Attachments:
Description Flags
patch against libnl-1.1-12 none

Description Daniel Berrangé 2011-02-15 17:02:20 UTC
+++ This bug was initially created as a clone of Bug #677527 +++

Description of problem:

The port allocation/deallocation in socket.c needs to be made thread-safe. Too many different callers hidden in libraries (netcf) may call this code at the same time and corrupt the port allocation. Concrete problem may arise in libvirt.

Since I am a nice guy I provide you a patch, which I haven't tested, though:


Index: lib/socket.c
===================================================================
--- lib.orig/socket.c
+++ lib/socket.c
@@ -89,6 +89,8 @@
  * @{
  */
 
+#include <pthread.h>
+
 #include <netlink-local.h>
 #include <netlink/netlink.h>
 #include <netlink/utils.h>
@@ -117,12 +119,15 @@ static void __init init_default_cb(void)
 }
 
 static uint32_t used_ports_map[32];
+static pthread_mutex_t port_map_mutex = PTHREAD_MUTEX_INITIALIZER;
 
 static uint32_t generate_local_port(void)
 {
 	int i, n;
 	uint32_t pid = getpid() & 0x3FFFFF;
 
+	pthread_mutex_lock(&port_map_mutex);
+
 	for (i = 0; i < 32; i++) {
 		if (used_ports_map[i] == 0xFFFFFFFF)
 			continue;
@@ -141,6 +146,8 @@ static uint32_t generate_local_port(void
 		}
 	}
 
+	pthread_mutex_unlock(&port_map_mutex);
+
 	/* Out of sockets in our own PID namespace, what to do? FIXME */
 	return UINT_MAX;
 }
@@ -152,8 +159,12 @@ static void release_local_port(uint32_t 
 	if (port == UINT_MAX)
 		return;
 	
+
 	nr = port >> 22;
-	used_ports_map[nr / 32] &= ~((nr % 32) + 1);
+
+	pthread_mutex_lock(&port_map_mutex);
+	used_ports_map[nr / 32] &= ~(1 << (nr & 0x1f));
+	pthread_mutex_unlock(&port_map_mutex);
 }
 
 /**

Now need to link against -lpthread as well.



Version-Release number of selected component (if applicable):

libnl-1.1-12.fc14.x86_64
libnl-debuginfo-1.1-12.fc14.x86_64
libnl-devel-1.1-12.fc14.x86_64

--- Additional comment from stefanb.com on 2011-02-15 08:42:50 EST ---

Created attachment 478874 [details]
make libnl thread safe and fix a port allocation bug

Here's the fix that I built libnl with. I did not link with -lpthread, so you may want to add this.

Index: libnl-1.1/lib/socket.c
===================================================================
--- libnl-1.1/lib.orig/socket.c
+++ libnl-1.1/lib/socket.c
@@ -89,6 +89,8 @@
  * @{
  */
 
+#include <pthread.h>
+
 #include <netlink-local.h>
 #include <netlink/netlink.h>
 #include <netlink/utils.h>
@@ -117,12 +119,15 @@ static void __init init_default_cb(void)
 }
 
 static uint32_t used_ports_map[32];
+static pthread_mutex_t port_map_mutex = PTHREAD_MUTEX_INITIALIZER;
 
 static uint32_t generate_local_port(void)
 {
 	int i, n;
 	uint32_t pid = getpid() & 0x3FFFFF;
 
+	pthread_mutex_lock(&port_map_mutex);
+
 	for (i = 0; i < 32; i++) {
 		if (used_ports_map[i] == 0xFFFFFFFF)
 			continue;
@@ -136,11 +141,15 @@ static uint32_t generate_local_port(void
 
 			/* PID_MAX_LIMIT is currently at 2^22, leaving 10 bit
 			 * to, i.e. 1024 unique ports per application. */
-			return pid + (n << 22);
 
+			pthread_mutex_unlock(&port_map_mutex);
+
+			return pid + (n << 22);
 		}
 	}
 
+	pthread_mutex_unlock(&port_map_mutex);
+
 	/* Out of sockets in our own PID namespace, what to do? FIXME */
 	return UINT_MAX;
 }
@@ -153,7 +162,10 @@ static void release_local_port(uint32_t 
 		return;
 	
 	nr = port >> 22;
-	used_ports_map[nr / 32] &= ~((nr % 32) + 1);
+	
+	pthread_mutex_lock(&port_map_mutex);
+	used_ports_map[nr / 32] &= ~(1 << (nr & 0x1f));
+	pthread_mutex_unlock(&port_map_mutex);
 }
 
 /**

Comment 2 Daniel Berrangé 2011-02-15 17:05:21 UTC
This bug is causing failures in libvirt's VEPA support in some scenarios (see
upstream thread
http://www.redhat.com/archives/libvir-list/2011-February/msg00466.html)

Comment 3 RHEL Program Management 2011-02-15 17:17:57 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 4 Stefan Berger 2011-02-25 00:43:19 UTC
Dear 'Red Hat Product Management', the correct functioning of other components, i.e., those in libvirt, depend on the correct functioning of libnl. So, we would like to propose this request for RHEL 6.1.

Comment 5 Stefan Berger 2011-02-25 11:42:19 UTC
Created attachment 480972 [details]
patch against libnl-1.1-12

I added a hunk so that the final library also gets linked with -lpthread. This is necessary due to the mutex being used to protect the global used_ports_map.

Comment 8 Joseph Kachuck 2011-03-04 17:02:47 UTC
=== In Red Hat Customer Portal Case 00427769 ===
--- Comment by IBM bug, proxy on 3/4/2011 10:22 AM ---

------- Comment From gerhard.stenzel.com 2011-03-04 10:19 EDT-------
FYI ..
I updated my two RHEL 6 systems to RHEL 6.1 alpha and verified the "Unknown failure" problem still occurs sometimes if the VM uses a direct interface.  (No problems if no direct interface is used.)

I then installed libnl rpms with the patch on both systems and started a VM and migrated it back and forth several times with no problems.

The interface definition looks like this:

...
<interface type='direct'>
<mac address='52:54:00:11:11:11'/>
<source dev='eth2.3' mode='vepa'/>
<model type='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05'
function='0x0'/>
</interface>
...

Thanks

Comment 11 Joseph Kachuck 2011-03-21 19:56:07 UTC
Hello,
This is the information on how IBM recreates this issue:
Here is interface part from the domain xml.
...
<interface type='direct'>
<mac address='52:54:00:11:11:11'/>
<source dev='eth2.3' mode='vepa'/>
<model type='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</interface>
...

Please let me know if you need any additional details.

Thank You
Joe Kachuck

Comment 13 Laine Stump 2011-03-21 20:18:10 UTC
More specficically, an error situation can be created by defining a kvm guest that has interface type='direct', then run virt-manager while silmultaneously starting/stopping the guest several times using "virsh start" and "virsh shutdown" (or "virsh destroy" if you're in a hurry). This will fairly quickly lead to an error condition, noted by virsh reporting that it's unable to start the guest. Once the fixed libnl is installed (and the system rebooted), you will be able to perform this test without problems.

Comment 14 IBM Bug Proxy 2011-03-26 17:38:07 UTC
------- Comment From tpnoonan.com 2011-03-02 12:06 EDT-------
(In reply to comment #9)
> (In reply to comment #8)
> > Hello,
> > I am seeing 3 BZ that are requested for 6.1, in regards to libnl.:
> > 677725 - port allocation/deallocation is wrong [in libnl-1.1-12]
> > 677724 - port allocation/deallocation needs to be made thread-safe
> Both of these are directly relevant.

internal comment don't reach red hat, making external

------- Comment From gerhard.stenzel.com 2011-03-03 04:25 EDT-------
(In reply to comment #11)
> Hello,
> BZ 677724 has a test patch, but no package. Please let me know if you would
> like the test patch to confirm if it corrects this issue.
> If so I will connect this issue to BZ 677724.

The patch in the description of BZ 677724 is the one which I referred to as "a patch from Stefan Berger for libnl also seemed to fix the problem" in comment #1 of this bugzilla.

In plain words: I tested this patch and it worked.

------- Comment From gerhard.stenzel.com 2011-03-03 10:58 EDT-------
Thank you .. just for the records, I was referring to this comment:

#############
Further investigation revealed that the following call

if (nl_connect(nlhandle, NETLINK_ROUTE) < 0) {

in src/util/macvtap.c in the function

int nlComm()

fails with errno = EADDRINUSE

When replacing the call
nl_handle_destroy(nlhandle);
with
nl_close(nlhandle);
seemed to fix this problem.

Alternatively, a patch from Stefan Berger for libnl also seemed to fix the
problem.

I am not a libnl expert, so I can not judge which is the correct solution
#############






------- Comment From tpnoonan.com 2011-03-09 10:57 EDT-------
please consider as exception for rhel6.1

------- Comment From gerhard.stenzel.com 2011-03-09 11:36 EDT-------
you are aware that without working "direct interfaces" 802.1 Qbg will not work?

See
[Bug 69817] RCP419255- support for IEEE 802.1Qbg in lldpad - bugfixes

------- Comment From mcintire.com 2011-03-23 13:08 EDT-------
Red Hat - is Gerhard's comment on 3/9 enough business justification?

Comment 15 Laine Stump 2011-03-28 06:50:45 UTC
A fix for this bug has been applied and is included in libnl-1.1-14.el6:

  https://brewweb.devel.redhat.com/buildinfo?buildID=160656

Comment 18 IBM Bug Proxy 2011-04-12 11:21:32 UTC
------- Comment From gerhard.stenzel.com 2011-04-12 07:15 EDT-------
thanks for fixing .. snap2 contains libnl-1.1-14 and I could not reproduce the problem anymore .. closing.

Comment 20 Misha H. Ali 2011-04-19 10:40:03 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
The port allocation/de-allocation was not safe in multi-threaded applications and the logic was incorrect, which resulted in some applications and other libraries being unable to initialize libnl. Port allocation/de-allocation and logic is now fixed, and libnl can now be initialized.

Comment 21 errata-xmlrpc 2011-05-19 14:29:51 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0795.html