Bug 629949

Summary: sssd stops on upgrade
Product: Red Hat Enterprise Linux 6 Reporter: Stephen Gallagher <sgallagh>
Component: sssdAssignee: Stephen Gallagher <sgallagh>
Status: CLOSED CURRENTRELEASE QA Contact: Chandrasekar Kannan <ckannan>
Severity: high Docs Contact:
Priority: high    
Version: 6.0CC: benl, ddumas, dpal, grajaiya, jgalipea, mkhusid, myllynen, nalin, syeghiay
Target Milestone: rcKeywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: sssd-1.2.1-28.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 606887
: 638241 (view as bug list) Environment:
Last Closed: 2010-11-29 14:46:18 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 606887, 658158    
Bug Blocks: 638241    

Description Stephen Gallagher 2010-09-03 11:13:01 UTC
I just discovered today that a variation of this same problem is present in SSSD in RHEL6 today. It is a very serious issue and needs to be resolved before the final release.

The problem is that the conditional restart of the SSSD service is run in two places, not just one. Unfortunately, the first time it is run is during the %postun script, which means that the sssd binary no longer exists, and it will fail to start.

Then, when %post rolls around for the new install, SSSD is no longer running, so it refuses to restart it.

We need to fix this before the stable release because the problem is in the %postun script (which will get run no matter what the next time we update, even to fix this problem). We need to make the fix now so that if we have to ship errata for the SSSD later, it won't cause their SSSD systems to stop functioning.

I consider this a regression and a blocker.



+++ This bug was initially created as a clone of Bug #606887 +++

Description of problem:

When sssd is upgraded, it gets shut down but is not restarted.  My upgrades are done via yum-cron.

sssd.log just shows:

(Mon Jun 21 10:40:54 2010) [sssd] [monitor_quit] (0): Terminated: killing children

Version-Release number of selected component (if applicable):
sssd-1.2.1-15.fc13.i686

--- Additional comment from updates on 2010-08-03 08:46:00 EDT ---

sssd-1.2.2-19.fc13 has been submitted as an update for Fedora 13.
http://admin.fedoraproject.org/updates/sssd-1.2.2-19.fc13

--- Additional comment from updates on 2010-08-03 10:45:04 EDT ---

sssd-1.2.2-19.fc12 has been submitted as an update for Fedora 12.
http://admin.fedoraproject.org/updates/sssd-1.2.2-19.fc12

--- Additional comment from updates on 2010-08-05 19:42:05 EDT ---

sssd-1.2.2-19.fc12 has been pushed to the Fedora 12 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update sssd'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/sssd-1.2.2-19.fc12

--- Additional comment from updates on 2010-08-05 19:54:20 EDT ---

sssd-1.2.2-19.fc13 has been pushed to the Fedora 13 stable repository.  If problems still persist, please make note of it in this bug report.

--- Additional comment from updates on 2010-08-13 17:26:02 EDT ---

sssd-1.2.2-19.fc12 has been pushed to the Fedora 12 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 2 Stephen Gallagher 2010-09-03 13:13:52 UTC
I've spent a lot of time this morning digging into this, and it appears to be a different issue from what I originally surmised above.

I am only able to reproduce this issue on a multi-core system, and only inconsistently. I am therefore led to believe that this is a race-condition bug.

Furthermore, at one point I encountered this error message:
Cleanup        : sssd-1.2.1-28.el6.x86_64                                 3/4 
 *** glibc detected *** /usr/bin/python: malloc(): smallbin double linked list corrupted: 0x0000000005b35fb0 ***

This leads me to believe that the root of the problem actually exists in python. I believe it may be related to BZ #537700

There is a workaround for the SSSD. We are currently running our upgrade_config.py script as part of the upgrade portion of %post in the SSSD spec file. This upgrade script has at present no relevance for RHEL-6. (It was previously used to handle upgrades where we had changed the names or acceptable values for options used in the sssd.conf). Since RHEL-6 is shipping SSSD for the first time, there is no need to handle upgrades.

So my proposed solution to this (serious) issue is that we should drop the upgrade_config.py script from the %post section of the spec and work on resolving the underlying python bug at a later date.

Comment 11 Gowrishankar Rajaiyan 2010-09-09 12:31:35 UTC
Verified. Version: sssd-1.2.1-28.el6.

Comment 12 releng-rhel@redhat.com 2010-11-10 21:40:09 UTC
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.

Comment 13 Stephen Gallagher 2010-11-24 18:01:38 UTC
Reopening. We finally tracked down the real cause of this bug upstream. See BZ #606887 for details.

Comment 14 Stephen Gallagher 2010-11-24 22:01:59 UTC
Re-proposing for 6.1.0 (and eventually Z-stream).

Requesting acks.