Bug 1012343

Summary: Thread issue in glibc can cause the application to not get any identity information
Product: Red Hat Enterprise Linux 6 Reporter: Kaushik Banerjee <kbanerje>
Component: glibcAssignee: Carlos O'Donell <codonell>
Status: CLOSED ERRATA QA Contact: Sergey Kolosov <skolosov>
Severity: unspecified Docs Contact: Mark Flitter <mflitter>
Priority: unspecified    
Version: 6.5CC: 1105789050, ashankar, codonell, fweimer, jhrozek, mcermak, mflitter, mnewsome, msebor, ohudlick, pfrankli, rhbugs, skolosov
Target Milestone: rcKeywords: Patch
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glibc-2.12-1.207.el6 Doc Type: Bug Fix
Doc Text:
Fix for handling any open file descriptors in the event of thread cancellation The use of POSIX thread cancellation could cause glibc to improperly handle open file descriptors, particularly those held open when processing identity information. To correct this and ensure that functions like getpwuid_r complete, even when the thread is being cancelled, the library calls have been changed to correctly handle open file descriptors in any call from the exec family of functions.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-21 10:34:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1361283    
Attachments:
Description Flags
Tested patch.
none
tst-cancel-getpwuid_r.c none

Description Kaushik Banerjee 2013-09-26 10:23:53 UTC
Description of problem:
Thread issue

Version-Release number of selected component (if applicable):
glibc-2.12-1.130.el6

How reproducible:
Always

Steps to Reproduce:
1. Have only "passwd: files" in /etc/nsswitch.conf
2. su as a local user.
3. Compile and run "client-hang" available from bug 847043, comment 3

Actual results:
$ ./client-hang 
Cancelling thread
Joining...
Joined, trying getpwuid_r call
^C

Expected results:
The program should run to completion
$ ./client-hang 
Cancelling thread
Joining...
Joined, trying getpwuid_r call
Never get here
$

Additional info:
This issue in not reproducible on RHEL7

Comment 4 Jakub Hrozek 2013-10-03 09:30:30 UTC
Just to add further context, this issue was discovered when QE's tests that Kaushik added above started failing. The tests were initially written to test a mutex lock issue in the sssd client and were passing in previous RHEL-6 updates.

So is there any chance the code could have regressed between 6.4 and 6.5 perhaps?

Comment 6 Alexandre Oliva 2013-10-04 08:30:00 UTC
Ok, the good news is that I've duplicated the bug on an old 6.1 VM I had lying around, so this is not a regression.  The bad news is that we have at least two bugs here: the deadlocks we get are different depending on whether or not nscd is running.  

The former is fixed by upstream commit 312be3f9, that (among other things) adds the "c" option to various fopen calls, so that stdio stream operations on /etc/passwd et al are not cancellation points: then the lock used to guard the data structures that control the internal setent/getent calls won't leak, for no other cancellation point is exercised.

The latter turns out to be a bug in the testcase.  When nscd is disabled, every time we call getpwuid_r, we attempt to connect to the nscd socket, and connect is a mandatory cancellation point.  When nscd is enabled, however, the first call makes the connection, and subsequent calls don't call any cancellation points, so the test loops forever, because getpwuid_r is an optional cancellation point, and glibc doesn't introduce an artificial cancellation point in it.  Thus pthread_testcancel() needs to be taken out of the #ifdef/#endif block to fix this bug in the testcase.

Comment 13 Florian Weimer 2016-02-04 14:45:46 UTC
Upstream commit from comment 6:

commit 312be3f9f5eab1643d7dcc7728c76d413d4f2640
Author: Ulrich Drepper <drepper>
Date:   Tue Nov 15 04:24:42 2011 -0500

    Clean up internal fopen uses
    
    No need to ever not use c and e.

Comment 15 Martin Sebor 2016-10-13 00:05:17 UTC
Created attachment 1209871 [details]
Tested patch.

Attached patch posted for review:
http://post-office.corp.redhat.com/archives/tools-patches/2016-September/msg00048.html

Comment 18 Carlos O'Donell 2016-12-02 18:25:50 UTC
Created attachment 1227440 [details]
tst-cancel-getpwuid_r.c

There is no reason the test for this issue should use sleep or yield, we have a perfectly acceptable semaphore implementation that guarantees that you are close as possible to issuing a getpwuid_r without all the overhead of sleeping.

Comment 20 errata-xmlrpc 2017-03-21 10:34:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0680.html

Comment 21 Anssi Johansson 2017-03-29 13:41:43 UTC
Oracle claims the patch was incorrect, causing memory corruption: https://blogs.oracle.com/wim/entry/oracle_linux_6_update_9

Comment 22 Florian Weimer 2017-03-29 14:00:43 UTC
(In reply to Anssi Johansson from comment #21)
> Oracle claims the patch was incorrect, causing memory corruption:
> https://blogs.oracle.com/wim/entry/oracle_linux_6_update_9

Thanks, I filed the regression as bug 1437111.

Comment 23 Carlos O'Donell 2017-03-29 15:57:00 UTC
(In reply to Florian Weimer from comment #22)
> (In reply to Anssi Johansson from comment #21)
> > Oracle claims the patch was incorrect, causing memory corruption:
> > https://blogs.oracle.com/wim/entry/oracle_linux_6_update_9
> 
> Thanks, I filed the regression as bug 1437111.

I've closed 1437111 as a duplicate.

I'm going to use bug 1437147 to track the fix and missing changes.