RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 677724 - port allocation/deallocation needs to be made thread-safe
Summary: port allocation/deallocation needs to be made thread-safe
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libnl
Version: 6.1
Hardware: Unspecified
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Laine Stump
QA Contact: desktop-bugs@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 677729 684385
TreeView+ depends on / blocked
 
Reported: 2011-02-15 17:02 UTC by Daniel Berrangé
Modified: 2018-11-14 14:16 UTC (History)
12 users (show)

Fixed In Version: libnl-1.1-14.el6
Doc Type: Bug Fix
Doc Text:
The port allocation/de-allocation was not safe in multi-threaded applications and the logic was incorrect, which resulted in some applications and other libraries being unable to initialize libnl. Port allocation/de-allocation and logic is now fixed, and libnl can now be initialized.
Clone Of: 677527
Environment:
Last Closed: 2011-05-19 14:29:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
patch against libnl-1.1-12 (1.87 KB, patch)
2011-02-25 11:42 UTC, Stefan Berger
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:0795 0 normal SHIPPED_LIVE libnl bug fix update 2011-05-18 18:08:09 UTC

Description Daniel Berrangé 2011-02-15 17:02:20 UTC
+++ This bug was initially created as a clone of Bug #677527 +++

Description of problem:

The port allocation/deallocation in socket.c needs to be made thread-safe. Too many different callers hidden in libraries (netcf) may call this code at the same time and corrupt the port allocation. Concrete problem may arise in libvirt.

Since I am a nice guy I provide you a patch, which I haven't tested, though:


Index: lib/socket.c
===================================================================
--- lib.orig/socket.c
+++ lib/socket.c
@@ -89,6 +89,8 @@
  * @{
  */
 
+#include <pthread.h>
+
 #include <netlink-local.h>
 #include <netlink/netlink.h>
 #include <netlink/utils.h>
@@ -117,12 +119,15 @@ static void __init init_default_cb(void)
 }
 
 static uint32_t used_ports_map[32];
+static pthread_mutex_t port_map_mutex = PTHREAD_MUTEX_INITIALIZER;
 
 static uint32_t generate_local_port(void)
 {
 	int i, n;
 	uint32_t pid = getpid() & 0x3FFFFF;
 
+	pthread_mutex_lock(&port_map_mutex);
+
 	for (i = 0; i < 32; i++) {
 		if (used_ports_map[i] == 0xFFFFFFFF)
 			continue;
@@ -141,6 +146,8 @@ static uint32_t generate_local_port(void
 		}
 	}
 
+	pthread_mutex_unlock(&port_map_mutex);
+
 	/* Out of sockets in our own PID namespace, what to do? FIXME */
 	return UINT_MAX;
 }
@@ -152,8 +159,12 @@ static void release_local_port(uint32_t 
 	if (port == UINT_MAX)
 		return;
 	
+
 	nr = port >> 22;
-	used_ports_map[nr / 32] &= ~((nr % 32) + 1);
+
+	pthread_mutex_lock(&port_map_mutex);
+	used_ports_map[nr / 32] &= ~(1 << (nr & 0x1f));
+	pthread_mutex_unlock(&port_map_mutex);
 }
 
 /**

Now need to link against -lpthread as well.



Version-Release number of selected component (if applicable):

libnl-1.1-12.fc14.x86_64
libnl-debuginfo-1.1-12.fc14.x86_64
libnl-devel-1.1-12.fc14.x86_64

--- Additional comment from stefanb.com on 2011-02-15 08:42:50 EST ---

Created attachment 478874 [details]
make libnl thread safe and fix a port allocation bug

Here's the fix that I built libnl with. I did not link with -lpthread, so you may want to add this.

Index: libnl-1.1/lib/socket.c
===================================================================
--- libnl-1.1/lib.orig/socket.c
+++ libnl-1.1/lib/socket.c
@@ -89,6 +89,8 @@
  * @{
  */
 
+#include <pthread.h>
+
 #include <netlink-local.h>
 #include <netlink/netlink.h>
 #include <netlink/utils.h>
@@ -117,12 +119,15 @@ static void __init init_default_cb(void)
 }
 
 static uint32_t used_ports_map[32];
+static pthread_mutex_t port_map_mutex = PTHREAD_MUTEX_INITIALIZER;
 
 static uint32_t generate_local_port(void)
 {
 	int i, n;
 	uint32_t pid = getpid() & 0x3FFFFF;
 
+	pthread_mutex_lock(&port_map_mutex);
+
 	for (i = 0; i < 32; i++) {
 		if (used_ports_map[i] == 0xFFFFFFFF)
 			continue;
@@ -136,11 +141,15 @@ static uint32_t generate_local_port(void
 
 			/* PID_MAX_LIMIT is currently at 2^22, leaving 10 bit
 			 * to, i.e. 1024 unique ports per application. */
-			return pid + (n << 22);
 
+			pthread_mutex_unlock(&port_map_mutex);
+
+			return pid + (n << 22);
 		}
 	}
 
+	pthread_mutex_unlock(&port_map_mutex);
+
 	/* Out of sockets in our own PID namespace, what to do? FIXME */
 	return UINT_MAX;
 }
@@ -153,7 +162,10 @@ static void release_local_port(uint32_t 
 		return;
 	
 	nr = port >> 22;
-	used_ports_map[nr / 32] &= ~((nr % 32) + 1);
+	
+	pthread_mutex_lock(&port_map_mutex);
+	used_ports_map[nr / 32] &= ~(1 << (nr & 0x1f));
+	pthread_mutex_unlock(&port_map_mutex);
 }
 
 /**

Comment 2 Daniel Berrangé 2011-02-15 17:05:21 UTC
This bug is causing failures in libvirt's VEPA support in some scenarios (see
upstream thread
http://www.redhat.com/archives/libvir-list/2011-February/msg00466.html)

Comment 3 RHEL Program Management 2011-02-15 17:17:57 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 4 Stefan Berger 2011-02-25 00:43:19 UTC
Dear 'Red Hat Product Management', the correct functioning of other components, i.e., those in libvirt, depend on the correct functioning of libnl. So, we would like to propose this request for RHEL 6.1.

Comment 5 Stefan Berger 2011-02-25 11:42:19 UTC
Created attachment 480972 [details]
patch against libnl-1.1-12

I added a hunk so that the final library also gets linked with -lpthread. This is necessary due to the mutex being used to protect the global used_ports_map.

Comment 8 Joseph Kachuck 2011-03-04 17:02:47 UTC
=== In Red Hat Customer Portal Case 00427769 ===
--- Comment by IBM bug, proxy on 3/4/2011 10:22 AM ---

------- Comment From gerhard.stenzel.com 2011-03-04 10:19 EDT-------
FYI ..
I updated my two RHEL 6 systems to RHEL 6.1 alpha and verified the "Unknown failure" problem still occurs sometimes if the VM uses a direct interface.  (No problems if no direct interface is used.)

I then installed libnl rpms with the patch on both systems and started a VM and migrated it back and forth several times with no problems.

The interface definition looks like this:

...
<interface type='direct'>
<mac address='52:54:00:11:11:11'/>
<source dev='eth2.3' mode='vepa'/>
<model type='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05'
function='0x0'/>
</interface>
...

Thanks

Comment 11 Joseph Kachuck 2011-03-21 19:56:07 UTC
Hello,
This is the information on how IBM recreates this issue:
Here is interface part from the domain xml.
...
<interface type='direct'>
<mac address='52:54:00:11:11:11'/>
<source dev='eth2.3' mode='vepa'/>
<model type='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</interface>
...

Please let me know if you need any additional details.

Thank You
Joe Kachuck

Comment 13 Laine Stump 2011-03-21 20:18:10 UTC
More specficically, an error situation can be created by defining a kvm guest that has interface type='direct', then run virt-manager while silmultaneously starting/stopping the guest several times using "virsh start" and "virsh shutdown" (or "virsh destroy" if you're in a hurry). This will fairly quickly lead to an error condition, noted by virsh reporting that it's unable to start the guest. Once the fixed libnl is installed (and the system rebooted), you will be able to perform this test without problems.

Comment 14 IBM Bug Proxy 2011-03-26 17:38:07 UTC
------- Comment From tpnoonan.com 2011-03-02 12:06 EDT-------
(In reply to comment #9)
> (In reply to comment #8)
> > Hello,
> > I am seeing 3 BZ that are requested for 6.1, in regards to libnl.:
> > 677725 - port allocation/deallocation is wrong [in libnl-1.1-12]
> > 677724 - port allocation/deallocation needs to be made thread-safe
> Both of these are directly relevant.

internal comment don't reach red hat, making external

------- Comment From gerhard.stenzel.com 2011-03-03 04:25 EDT-------
(In reply to comment #11)
> Hello,
> BZ 677724 has a test patch, but no package. Please let me know if you would
> like the test patch to confirm if it corrects this issue.
> If so I will connect this issue to BZ 677724.

The patch in the description of BZ 677724 is the one which I referred to as "a patch from Stefan Berger for libnl also seemed to fix the problem" in comment #1 of this bugzilla.

In plain words: I tested this patch and it worked.

------- Comment From gerhard.stenzel.com 2011-03-03 10:58 EDT-------
Thank you .. just for the records, I was referring to this comment:

#############
Further investigation revealed that the following call

if (nl_connect(nlhandle, NETLINK_ROUTE) < 0) {

in src/util/macvtap.c in the function

int nlComm()

fails with errno = EADDRINUSE

When replacing the call
nl_handle_destroy(nlhandle);
with
nl_close(nlhandle);
seemed to fix this problem.

Alternatively, a patch from Stefan Berger for libnl also seemed to fix the
problem.

I am not a libnl expert, so I can not judge which is the correct solution
#############






------- Comment From tpnoonan.com 2011-03-09 10:57 EDT-------
please consider as exception for rhel6.1

------- Comment From gerhard.stenzel.com 2011-03-09 11:36 EDT-------
you are aware that without working "direct interfaces" 802.1 Qbg will not work?

See
[Bug 69817] RCP419255- support for IEEE 802.1Qbg in lldpad - bugfixes

------- Comment From mcintire.com 2011-03-23 13:08 EDT-------
Red Hat - is Gerhard's comment on 3/9 enough business justification?

Comment 15 Laine Stump 2011-03-28 06:50:45 UTC
A fix for this bug has been applied and is included in libnl-1.1-14.el6:

  https://brewweb.devel.redhat.com/buildinfo?buildID=160656

Comment 18 IBM Bug Proxy 2011-04-12 11:21:32 UTC
------- Comment From gerhard.stenzel.com 2011-04-12 07:15 EDT-------
thanks for fixing .. snap2 contains libnl-1.1-14 and I could not reproduce the problem anymore .. closing.

Comment 20 Misha H. Ali 2011-04-19 10:40:03 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
The port allocation/de-allocation was not safe in multi-threaded applications and the logic was incorrect, which resulted in some applications and other libraries being unable to initialize libnl. Port allocation/de-allocation and logic is now fixed, and libnl can now be initialized.

Comment 21 errata-xmlrpc 2011-05-19 14:29:51 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0795.html


Note You need to log in before you can comment on or make changes to this bug.