Red Hat Bugzilla – Bug 629949
sssd stops on upgrade
Last modified: 2015-01-04 18:43:47 EST
I just discovered today that a variation of this same problem is present in SSSD in RHEL6 today. It is a very serious issue and needs to be resolved before the final release.
The problem is that the conditional restart of the SSSD service is run in two places, not just one. Unfortunately, the first time it is run is during the %postun script, which means that the sssd binary no longer exists, and it will fail to start.
Then, when %post rolls around for the new install, SSSD is no longer running, so it refuses to restart it.
We need to fix this before the stable release because the problem is in the %postun script (which will get run no matter what the next time we update, even to fix this problem). We need to make the fix now so that if we have to ship errata for the SSSD later, it won't cause their SSSD systems to stop functioning.
I consider this a regression and a blocker.
+++ This bug was initially created as a clone of Bug #606887 +++
Description of problem:
When sssd is upgraded, it gets shut down but is not restarted. My upgrades are done via yum-cron.
sssd.log just shows:
(Mon Jun 21 10:40:54 2010) [sssd] [monitor_quit] (0): Terminated: killing children
Version-Release number of selected component (if applicable):
--- Additional comment from email@example.com on 2010-08-03 08:46:00 EDT ---
sssd-1.2.2-19.fc13 has been submitted as an update for Fedora 13.
--- Additional comment from firstname.lastname@example.org on 2010-08-03 10:45:04 EDT ---
sssd-1.2.2-19.fc12 has been submitted as an update for Fedora 12.
--- Additional comment from email@example.com on 2010-08-05 19:42:05 EDT ---
sssd-1.2.2-19.fc12 has been pushed to the Fedora 12 testing repository. If problems still persist, please make note of it in this bug report.
If you want to test the update, you can install it with
su -c 'yum --enablerepo=updates-testing update sssd'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/sssd-1.2.2-19.fc12
--- Additional comment from firstname.lastname@example.org on 2010-08-05 19:54:20 EDT ---
sssd-1.2.2-19.fc13 has been pushed to the Fedora 13 stable repository. If problems still persist, please make note of it in this bug report.
--- Additional comment from email@example.com on 2010-08-13 17:26:02 EDT ---
sssd-1.2.2-19.fc12 has been pushed to the Fedora 12 stable repository. If problems still persist, please make note of it in this bug report.
I've spent a lot of time this morning digging into this, and it appears to be a different issue from what I originally surmised above.
I am only able to reproduce this issue on a multi-core system, and only inconsistently. I am therefore led to believe that this is a race-condition bug.
Furthermore, at one point I encountered this error message:
Cleanup : sssd-1.2.1-28.el6.x86_64 3/4
*** glibc detected *** /usr/bin/python: malloc(): smallbin double linked list corrupted: 0x0000000005b35fb0 ***
This leads me to believe that the root of the problem actually exists in python. I believe it may be related to BZ #537700
There is a workaround for the SSSD. We are currently running our upgrade_config.py script as part of the upgrade portion of %post in the SSSD spec file. This upgrade script has at present no relevance for RHEL-6. (It was previously used to handle upgrades where we had changed the names or acceptable values for options used in the sssd.conf). Since RHEL-6 is shipping SSSD for the first time, there is no need to handle upgrades.
So my proposed solution to this (serious) issue is that we should drop the upgrade_config.py script from the %post section of the spec and work on resolving the underlying python bug at a later date.
Verified. Version: sssd-1.2.1-28.el6.
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.
Reopening. We finally tracked down the real cause of this bug upstream. See BZ #606887 for details.
Re-proposing for 6.1.0 (and eventually Z-stream).