Bug 789470

Summary: [RFE] Introduce the concept of a Primary Server in SSSD
Product: Red Hat Enterprise Linux 6 Reporter: Dmitri Pal <dpal>
Component: sssdAssignee: Jakub Hrozek <jhrozek>
Status: CLOSED ERRATA QA Contact: Kaushik Banerjee <kbanerje>
Severity: unspecified Docs Contact:
Priority: high    
Version: 6.3CC: grajaiya, jgalipea, prc, syeghiay
Target Milestone: rcKeywords: FutureFeature
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: sssd-1.9.1-1.el6 Doc Type: Enhancement
Doc Text:
Cause: When the SSSD fails over to another server in its failover list, it would stick with that server as long as it worked. Consequence: If the SSSD failed over to a server in another geography, it wouldn't reconnect to a closer server until it was restarted of the backup server stopped working. Change: The concept of a "backup server" was introduced to the SSSD. Result: If the SSSD fails over to a server which is listed as a "backup server" in the configuration, it periodically tries to reconnect to one of the "primary servers".
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-21 09:35:04 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Dmitri Pal 2012-02-10 20:41:39 UTC
This bug is created as a clone of upstream ticket:
https://fedorahosted.org/sssd/ticket/1128

Currently when SSSD marks a server as unavailable, it will ignore it until all other servers have failed before it will retry the 1st in the list.

I would like to request that sssd rechecks the status of the 1st server in the list as its primary and returns to using it when it is discovered functional again.

Otherwise, the recovery of a primary server requires that an entire data center restart their sssd daemon to return back to the first server in the sssd.conf

Comment 2 RHEL Program Management 2012-07-10 07:06:54 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 3 RHEL Program Management 2012-07-11 02:03:28 UTC
This request was erroneously removed from consideration in Red Hat Enterprise Linux 6.4, which is currently under development.  This request will be evaluated for inclusion in Red Hat Enterprise Linux 6.4.

Comment 5 Kaushik Banerjee 2012-11-28 09:59:35 UTC
Verified with 1.9.2-21

Primary Server testing of AD Provider has been deferred.

Issues with ldap provider logged as bug 880956
Issues with krb5_kpasswd logged as bug 880546

Output of a beaker automation run for ldap provider:
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [   LOG    ] :: primary_server_ldap_001 Primary server down, online after 30s
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

Stopping sssd:                                             [  OK  ]
Starting sssd:                                             [  OK  ]
puser1:*:2001:2001:Posix User1:/home/puser1:
:: [   PASS   ] :: Running 'getent passwd puser1'
spawn ssh -o StrictHostKeyChecking=no root@SERVER1 /etc/init.d/dirsrv stop
root@SERVER1's password: 
Shutting down dirsrv: 
    instance1...[  OK  ]
:: [   PASS   ] :: File '/var/log/sssd/sssd_LDAP.log' should contain 'Successfully removed connection callback'
kau1:*:1111:1111:GECOS TEST:/home/kau2:
:: [   PASS   ] :: Running 'getent passwd kau1'
(Wed Nov 28 15:17:50 2012) [sssd[be[LDAP]]] [be_primary_server_timeout_activate] (0x2000): Primary server reactivation timeout set to 30 seconds
:: [   PASS   ] :: File '/var/log/sssd/sssd_LDAP.log' should contain 'Primary server reactivation timeout set to 30 seconds'
spawn ssh -o StrictHostKeyChecking=no root@SERVER1 /etc/init.d/dirsrv start
root@SERVER1's password: 
Starting dirsrv: 
    instance1...[  OK  ]
:: [15:17:57] ::  Sleep for 30 seconds
spawn ssh -q -l puser1 localhost echo 'login successful'
puser1@localhost's password: 
login successful
:: [   PASS   ] :: Authentication successful, as expected
:: [   PASS   ] :: Running 'auth_success puser1 Secret123'
(Wed Nov 28 15:18:20 2012) [sssd[be[LDAP]]] [be_primary_server_timeout] (0x0400): Looking for primary server!
:: [   PASS   ] :: File '/var/log/sssd/sssd_LDAP.log' should contain 'Looking for primary server'



Output of a beaker automation run for krb5 provider:
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [   LOG    ] :: primary_server_krb5_001 One primary server down, online after 30s
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

:: [   LOG    ] :: Sleeping for 5 seconds
:: [   PASS   ] :: Authentication successful, as expected
:: [   PASS   ] :: Running 'auth_success user_srv1 Server1_123'
:: [   PASS   ] :: Running 'Stopping kdc on Server1'
:: [   PASS   ] :: Authentication successful, as expected
:: [   PASS   ] :: Running 'auth_success puser1 Server2_123'
:: [   PASS   ] :: File '/var/log/sssd/sssd_LDAP.log' should contain 'Primary server reactivation timeout set to 30 seconds'
:: [   PASS   ] :: Running 'Starting kdc on Server1'
:: [   PASS   ] :: Authentication successful, as expected
:: [   PASS   ] :: Running 'auth_success user_srv1 Server1_123'
:: [   PASS   ] :: File '/var/log/sssd/sssd_LDAP.log' should contain 'Looking for primary server'
:: [   LOG    ] :: Duration: 51s
:: [   LOG    ] :: Assertions: 10 good, 0 bad
:: [   PASS   ] :: RESULT: primary_server_krb5_001 One primary server down, online after 30s

Comment 6 errata-xmlrpc 2013-02-21 09:35:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0508.html