Bug 743841

Summary: SSSD can crash due to dbus server removing a UNIX socket
Product: Red Hat Enterprise Linux 6 Reporter: Jakub Hrozek <jhrozek>
Component: sssdAssignee: Stephen Gallagher <sgallagh>
Status: CLOSED ERRATA QA Contact: IDM QE LIST <seceng-idm-qe-list>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.2CC: dpal, grajaiya, jgalipea, jzeleny, kbanerje, prc
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: sssd-1.5.1-57.el6 Doc Type: Bug Fix
Doc Text:
Cause: SSSD components communicate using the DBus protocol. On initializing the DBus server, the DBus library is given a file name that represents a known interface. Dbus creates the socket on server startup. When server shuts down, it calls a DBus cleanup function which removes the socket. Consequence: In case one of components is restarted, a race condition can cause that the socket is removed by the old component instance after the new instance is already running and connected to it. Fix: Path names that contain the server process's PID are passed to DBus and a symlink with a known and defined path name is pointed to the path name with PID. Clients connect to the well-known symlink paths. Result: When the DBus server exits, it only removes the PID-decorated path name. Clients are still connected to the same path no matter what server is the symlink pointed to.
Story Points: ---
Clone Of:
: 748898 (view as bug list) Environment:
Last Closed: 2011-12-06 16:40:56 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 746265, 754121    
Bug Blocks: 748554, 748898, 749255    

Description Jakub Hrozek 2011-10-06 09:38:07 UTC
Description of problem:
SSSD components comunicate using a low-level DBus protocol. On initializing the server dbus is given a filename that represents a known interface and creates the socket When server shuts down, it calls a dbus cleanup function which removes the socket.

Normally, this is OK, problem arise when:
1) a sssd component (a back end for example) is stopped
2) monitor detects it does not respond, kills it and spawns a new one
3) the old back end calls its shutdown function /after/ the new one has started
4) dbus removes the named socket


Version-Release number of selected component (if applicable):
sssd-1.5.1-53

How reproducible:
always

Steps to Reproduce:
1. start sssd
2. killall -STOP sssd_be
3. wait until monitor sends TERM to sssd_be and spawns a new one
4. killall -CONT sssd_be
5. getent passwd foobar
  
Actual results:
the socket /var/lib/sss/pipes/private/sbus-dp_$domain is removed
sssd_nss crashes

Expected results:
sssd only reconnects to the new back end, the old back end shuts down gracefully

Additional info:
The crash was found by Kaushik Banerjee

Comment 2 Stephen Gallagher 2011-10-06 11:46:54 UTC
Upstream ticket:
https://fedorahosted.org/sssd/ticket/1035

Comment 3 Stephen Gallagher 2011-10-06 11:47:58 UTC
Disregard the previous comment, the correct bug is
https://fedorahosted.org/sssd/ticket/1034

Comment 6 Kaushik Banerjee 2011-10-17 12:20:40 UTC
No crashes seen anymore on going through the above steps.

Verified in build:
# rpm -qi sssd | head
Name        : sssd                         Relocations: (not relocatable)
Version     : 1.5.1                             Vendor: Red Hat, Inc.
Release     : 58.el6                        Build Date: Sat 15 Oct 2011 12:14:26 AM IST
Install Date: Mon 17 Oct 2011 12:00:06 PM IST      Build Host: x86-001.build.bos.redhat.com
Group       : Applications/System           Source RPM: sssd-1.5.1-58.el6.src.rpm
Size        : 3599114                          License: GPLv3+
Signature   : (none)
Packager    : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
URL         : http://fedorahosted.org/sssd/
Summary     : System Security Services Daemon

Comment 7 Jan Zeleny 2011-10-27 11:47:12 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: SSSD components communicate using DBus protocol. On initializing the server, DBus is given a file name that represents a known interface and creates the socket. When server shuts down, it calls a DBus cleanup function which removes the socket.
Consequence: In case one of components is restarted, a race condition can cause that the socket is removed by the old component instance after the new instance is already running and connected to it.
Fix: Symlinks are used for components to connect to the socket. When re-spawning, both old and new instance have their respective symlinks.
Result: The race condition doesn't occur any more and SSSD doesn't crash.

Comment 8 Jakub Hrozek 2011-10-27 12:11:46 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,4 +1,4 @@
-Cause: SSSD components communicate using DBus protocol. On initializing the server, DBus is given a file name that represents a known interface and creates the socket. When server shuts down, it calls a DBus cleanup function which removes the socket.
+Cause: SSSD components communicate using the DBus protocol. On initializing the DBus server, the DBus library is given a file name that represents a known interface. Dbus creates the socket on server startup. When server shuts down, it calls a DBus cleanup function which removes the socket.
 Consequence: In case one of components is restarted, a race condition can cause that the socket is removed by the old component instance after the new instance is already running and connected to it.
-Fix: Symlinks are used for components to connect to the socket. When re-spawning, both old and new instance have their respective symlinks.
+Fix: Path names that contain the server process's PID are passed to DBus and a symlink with a known and defined path name is pointed to the path name with PID. Clients connect to the well-known symlink paths.
-Result: The race condition doesn't occur any more and SSSD doesn't crash.+Result: When the DBus server exits, it only removes the PID-decorated path name. Clients are still connected to the same path no matter what server is the symlink pointed to.

Comment 9 errata-xmlrpc 2011-12-06 16:40:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1529.html