Bug 1807934

Summary: sssd failover leads to delayed and failed logins [rhel-7.7.z]
Product: Red Hat Enterprise Linux 7 Reporter: RAD team bot copy to z-stream <autobot-eus-copy>
Component: sssdAssignee: Alexey Tikhonov <atikhono>
Status: CLOSED ERRATA QA Contact: sssd-qe <sssd-qe>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.8CC: apeetham, atikhono, grajaiya, jhrozek, kbanerje, lmiksik, lslebodn, mzidek, ndehadra, ofalk, pbrezina, peter.vreman, sbose, sgoveas, tscherf
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: sssd-1.16.4-21.el7_7.3 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1769755 Environment:
Last Closed: 2020-03-17 16:19:25 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1769755    
Bug Blocks:    

Description RAD team bot copy to z-stream 2020-02-27 14:12:35 UTC
This bug has been copied from bug #1769755 and has been proposed to be backported to 7.7 z-stream (EUS).

Comment 4 Nikhil Dehadrai 2020-03-04 10:27:08 UTC
# rpm -q ipa-server ipa-client sssd
ipa-server-4.6.5-11.el7_7.4.x86_64
ipa-client-4.6.5-11.el7_7.4.x86_64
sssd-1.16.4-21.el7_7.3.x86_64



Verified the bug on the basis of following steps/observations:
1. Setup IPA master at RHEL77z
2. Setup IPA Replica at RHEL77z
3. Setup IPA client at RHEL77z (Ensuring that resolv.conf has entries for both MASTER and REPLICA)
4. Alternately Start / Stop Master and Replica and check if kinit works on client machine


Script used:
while true; do
date
echo --------------------
echo MASTER OFF
ssh -t root.test "ipactl status"
ssh -t root.test "ipactl stop"
ssh -t root.test "ipactl status"
echo REPLICA ON
ssh -t root.test "ipactl restart"
ssh -t root.test "ipactl status"
systemctl stop sssd; rm -rf /var/lib/sss/db/*; systemctl start sssd
kdestroy
klist
echo Secret123 | kinit admin
klist
getent passwd admin
echo ===============================================
date
echo --------------------
echo MASTER ON
ssh -t root.test  "ipactl restart"
ssh -t root.test "ipactl status"
echo REPLICA OFF
ssh -t root.test "ipactl status"
ssh -t root.test "ipactl stop"
ssh -t root.test "ipactl status"
systemctl stop sssd; rm -rf /var/lib/sss/db/*; systemctl start sssd
kdestroy
klist
echo Secret123 | kinit admin
klist
getent passwd admin
echo ===============================================
done

Ran the above script continuously for 10mins and the kinit was successful with FAILOVER from Master to REPLICA and Vice-Versa.
Observations:

Wed Mar  4 05:21:23 EST 2020
+ echo --------------------
--------------------
+ echo MASTER OFF
MASTER OFF
+ ssh -t root.test 'ipactl status'
Directory Service: RUNNING
krb5kdc Service: RUNNING
kadmin Service: RUNNING
named Service: RUNNING
httpd Service: RUNNING
ipa-custodia Service: RUNNING
pki-tomcatd Service: RUNNING
ipa-otpd Service: RUNNING
ipa-dnskeysyncd Service: RUNNING
ipa: INFO: The ipactl command was successful
Connection to master.ipapnq.test closed.
+ ssh -t root.test 'ipactl stop'
Stopping ipa-dnskeysyncd Service
Stopping ipa-otpd Service
Stopping pki-tomcatd Service
Stopping ipa-custodia Service
Stopping httpd Service
Stopping named Service
Stopping kadmin Service
Stopping krb5kdc Service
Stopping Directory Service
ipa: INFO: The ipactl command was successful
Connection to master.ipapnq.test closed.
+ ssh -t root.test 'ipactl status'
Directory Service: STOPPED
Directory Service must be running in order to obtain status of other services
ipa: INFO: The ipactl command was successful
Connection to master.ipapnq.test closed.
+ echo REPLICA ON
REPLICA ON
+ ssh -t root.test 'ipactl restart'
Restarting Directory Service
Restarting krb5kdc Service
Restarting kadmin Service
Restarting named Service
Restarting httpd Service
Restarting ipa-custodia Service
Restarting ntpd Service
Restarting pki-tomcatd Service
Restarting ipa-otpd Service
Restarting ipa-dnskeysyncd Service
ipa: INFO: The ipactl command was successful
Connection to replica1.ipapnq.test closed.
+ ssh -t root.test 'ipactl status'
Directory Service: RUNNING
krb5kdc Service: RUNNING
kadmin Service: RUNNING
named Service: RUNNING
httpd Service: RUNNING
ipa-custodia Service: RUNNING
ntpd Service: RUNNING
pki-tomcatd Service: RUNNING
ipa-otpd Service: RUNNING
ipa-dnskeysyncd Service: RUNNING
ipa: INFO: The ipactl command was successful
Connection to replica1.ipapnq.test closed.
+ systemctl stop sssd
+ rm -rf /var/lib/sss/db/cache_ipapnq.test.ldb /var/lib/sss/db/ccache_IPAPNQ.TEST /var/lib/sss/db/config.ldb /var/lib/sss/db/sssd.ldb /var/lib/sss/db/timestamps_ipapnq.test.ldb
+ systemctl start sssd
+ kdestroy
+ klist
klist: Credentials cache keyring 'persistent:0:0' not found
+ echo Secret123
+ kinit admin
Password for admin:
+ klist
Ticket cache: KEYRING:persistent:0:0
Default principal: admin

Valid starting     Expires            Service principal
03/04/20 05:22:02  03/05/20 05:22:02  krbtgt/IPAPNQ.TEST
+ getent passwd admin
admin:*:162200000:162200000:Administrator:/home/admin:/bin/bash
+ echo ===============================================
===============================================
+ date
Wed Mar  4 05:22:02 EST 2020
+ echo --------------------
--------------------
+ echo MASTER ON
MASTER ON
+ ssh -t root.test 'ipactl restart'
Starting Directory Service
Starting krb5kdc Service
Starting kadmin Service
Starting named Service
Starting httpd Service
Starting ipa-custodia Service
Starting pki-tomcatd Service
Starting ipa-otpd Service
Starting ipa-dnskeysyncd Service
ipa: INFO: The ipactl command was successful
Connection to master.ipapnq.test closed.
+ ssh -t root.test 'ipactl status'
Directory Service: RUNNING
krb5kdc Service: RUNNING
kadmin Service: RUNNING
named Service: RUNNING
httpd Service: RUNNING
ipa-custodia Service: RUNNING
pki-tomcatd Service: RUNNING
ipa-otpd Service: RUNNING
ipa-dnskeysyncd Service: RUNNING
ipa: INFO: The ipactl command was successful
Connection to master.ipapnq.test closed.
+ echo REPLICA OFF
REPLICA OFF
+ ssh -t root.test 'ipactl status'
Directory Service: RUNNING
krb5kdc Service: RUNNING
kadmin Service: RUNNING
named Service: RUNNING
httpd Service: RUNNING
ipa-custodia Service: RUNNING
ntpd Service: RUNNING
pki-tomcatd Service: RUNNING
ipa-otpd Service: RUNNING
ipa-dnskeysyncd Service: RUNNING
ipa: INFO: The ipactl command was successful
Connection to replica1.ipapnq.test closed.
+ ssh -t root.test 'ipactl stop'
Stopping ipa-dnskeysyncd Service
Stopping ipa-otpd Service
Stopping pki-tomcatd Service
Stopping ntpd Service
Stopping ipa-custodia Service
Stopping httpd Service
Stopping named Service
Stopping kadmin Service
Stopping krb5kdc Service
Stopping Directory Service
ipa: INFO: The ipactl command was successful
Connection to replica1.ipapnq.test closed.
+ ssh -t root.test 'ipactl status'
Directory Service: STOPPED
Directory Service must be running in order to obtain status of other services
ipa: INFO: The ipactl command was successful
Connection to replica1.ipapnq.test closed.
+ systemctl stop sssd
+ rm -rf /var/lib/sss/db/cache_ipapnq.test.ldb /var/lib/sss/db/ccache_IPAPNQ.TEST /var/lib/sss/db/config.ldb /var/lib/sss/db/sssd.ldb /var/lib/sss/db/timestamps_ipapnq.test.ldb
+ systemctl start sssd
+ kdestroy
+ klist
klist: Credentials cache keyring 'persistent:0:0' not found
+ echo Secret123
+ kinit admin
Password for admin:
+ klist
Ticket cache: KEYRING:persistent:0:0
Default principal: admin

Valid starting     Expires            Service principal
03/04/20 05:22:35  03/05/20 05:22:35  krbtgt/IPAPNQ.TEST
+ getent passwd admin
admin:*:162200000:162200000:Administrator:/home/admin:/bin/bash


Thus on the basis of above observations, marking the status of bug to "VERIFIED"

Comment 6 errata-xmlrpc 2020-03-17 16:19:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0844