Bug 474181

Summary: race in fork()
Product: Red Hat Enterprise Linux 5 Reporter: Dominik Strasser <dominik.strasser>
Component: nss_ldapAssignee: Nalin Dahyabhai <nalin>
Status: CLOSED ERRATA QA Contact: Ondrej Moriš <omoris>
Severity: high Docs Contact:
Priority: high    
Version: 5.2CC: dominik.strasser, dpal, drepper, fredrik.carlsson, jplans, mirko.fit, omoris, ralph, sean
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Fixed In Version: nss_ldap-253-36.el5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-01-13 18:31:56 EST Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description Dominik Strasser 2008-12-02 12:35:04 EST
Description of problem:

It is generally considered as unsafe to call any system functions between fork and exec because during this time, deadlocks can happen.

It seems the glibc has such a race itself:
3 0xf6c4d96b in free (ptr=0xc4ba000)
4 0xf682329a in _nss_ldap_mergeconfigfromdns () from /lib/libnss_ldap.so.2
5 0xf680d205 in _nss_ldap_mergeconfigfromdns () from /lib/libnss_ldap.so.2
6 0xf68026a3 in _nss_ldap_mergeconfigfromdns () from /lib/libnss_ldap.so.2
0000007 0xf67eeb30 in _nss_ldap_test_initgroups_ignoreuser () from /lib/libnss_ldap.so.2
0000008 0xf67f21f4 in _nss_ldap_leave () from /lib/libnss_ldap.so.2
0000009 0x00951b52 in fork () from /lib/libc.so.6
0000010 0x00a14424 in fork () from /lib/libpthread.so.0
0000011 0x0a83c2e2 in TclpCreateProcess ()

This is a part of a gdb backtrace from my application which hung at this point trying to acquire a lock in free(),

It seems that glibc calls _nss_ldap_leave in fork, after the actual fork has already happened.

Version-Release number of selected component (if applicable):

How reproducible:
Unfortunately only in my application. I tried to make a small test example but failed.

Steps to Reproduce:
Actual results:

Expected results:

Additional info:
Comment 1 Dominik Strasser 2008-12-17 16:26:59 EST
Î've raised the priority because this issue leads to frequent hangs in our application.
Comment 2 Fredrik Carlsson 2008-12-29 03:11:20 EST

I can confirm this bug to, Usually its sshd that is affected but crond aswell as postfix can be affected.

It's quite annoying bug and affects the function of the servers so it would be nice to have fixed ;)

Comment 3 Ulrich Drepper 2008-12-30 11:40:46 EST
This is a bug in nss_ldap which is not part of glibc.

Since fork can be called asynchronously it is not allowed to call any function that is not async-safe in the atfork handlers.  nss_ldap's atfork handler calls free() which is not async-safe.
Comment 4 Fredrik Carlsson 2009-01-08 08:42:50 EST
Any news?
Comment 5 Dominik Strasser 2009-07-28 06:55:58 EDT
Any news on this issue ?
It is now 8 months old, and no reaction.
Comment 6 Mirko Fit 2010-04-22 03:06:54 EDT
This issue is more than 2 years old now.
From an outside point of view the fix looks easy, find out why nss_ldap calls free() in atfork() and move the clean up to a safer location.
Comment 18 errata-xmlrpc 2011-01-13 18:31:56 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.