RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1769755 - sssd failover leads to delayed and failed logins
Summary: sssd failover leads to delayed and failed logins
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: sssd
Version: 7.8
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: ---
Assignee: Sumit Bose
QA Contact: ipa-qe
URL:
Whiteboard: sync-to-jira
Depends On:
Blocks: 1122832 1807933 1807934
TreeView+ depends on / blocked
 
Reported: 2019-11-07 11:50 UTC by Oliver Falk
Modified: 2023-09-07 21:02 UTC (History)
13 users (show)

Fixed In Version: sssd-1.16.4-35.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1807933 1807934 (view as bug list)
Environment:
Last Closed: 2020-03-31 19:44:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github SSSD sssd issues 5075 0 None closed sssd failover leads to delayed and failed logins 2020-07-09 16:53:42 UTC
Red Hat Issue Tracker SSSD-1804 0 None None None 2023-09-07 21:02:08 UTC
Red Hat Product Errata RHBA-2020:1053 0 None None None 2020-03-31 19:44:54 UTC

Description Oliver Falk 2019-11-07 11:50:22 UTC
Description of problem:
During testing IPA deployment in my customers environment and running various fail over test scenarios, we recognized that under some circumstances, fail over didn't work as expected and resulted in failed or delayed (~ 60 - 70 seconds) logins.

Version-Release number of selected component (if applicable): 1.16.4


How reproducible: Always.


Steps to Reproduce:
1. Client connected to two IPA servers (A and B)
2. Cut connection to server A
3. Login to client
4. Allow connection to server A
5. Cut connection to server B

If you keep doing this repeatedly, at some point the fail back from B to A doesn't work; SSSD takes a very long time to recognize the connection to server A is restored and uses it again.

Actual results: Logins delayed or not working at all


Expected results: Fail over + fail back work smoothly


Additional info:
* This was already analysed by Sumit Bose and he has a fix for it available.
* Customer case will be linked.
* Exception set to ?
* We'll need this fix in 7.7 z-stream (for EUS) later as well
* It also applies to RHEL 8 AFAIK

Comment 7 Sumit Bose 2019-11-07 12:30:27 UTC
Upstream ticket:
https://pagure.io/SSSD/sssd/issue/4114

Comment 12 Sumit Bose 2019-11-29 11:16:14 UTC
SSSD-1-16:
 - 4897063996b624b71823e61c73916f47832f103a
 - a4dd1eb5087c2f8a3a9133f42efa025221edc1c9

Comment 15 Nikhil Dehadrai 2019-12-13 11:46:53 UTC
[root@master ~]# rpm -q ipa-server ipa-client
ipa-server-4.6.6-11.el7.x86_64
ipa-client-4.6.6-11.el7.x86_64



Verified the bug on the basis of following steps/observations:
1. Setup IPA master at RHEL78
2. Setup IPA Replica at RHEL78
3. Setup IPA client at RHEL78 (Ensuring that resolv.conf has entries for both MASTER and REPLICA)
4. Alternately Start / Stop Master and Replica and check if kinit works on client machine


Script used:
while true; do
date
echo --------------------
echo MASTER OFF
ssh -t root.test "ipactl status"
ssh -t root.test "ipactl stop"
ssh -t root.test "ipactl status"
echo REPLICA ON
ssh -t root.test "ipactl restart"
ssh -t root.test "ipactl status"
systemctl stop sssd; rm -rf /var/lib/sss/db/*; systemctl start sssd
kdestroy
klist
echo Secret123 | kinit admin
klist
getent passwd admin
echo ===============================================
date
echo --------------------
echo MASTER ON
ssh -t root.test  "ipactl restart"
ssh -t root.test "ipactl status"
echo REPLICA OFF
ssh -t root.test "ipactl status"
ssh -t root.test "ipactl stop"
ssh -t root.test "ipactl status"
systemctl stop sssd; rm -rf /var/lib/sss/db/*; systemctl start sssd
kdestroy
klist
echo Secret123 | kinit admin
klist
getent passwd admin
echo ===============================================
done


Ran the above script continuously for 10mins and the kinit was successful with FAILOVER from Master to REPLICA and Vice-Versa.
Observations:

===============================================
Fri Dec 13 06:09:15 EST 2019
--------------------
MASTER OFF
Directory Service: RUNNING
krb5kdc Service: RUNNING
kadmin Service: RUNNING
named Service: RUNNING
httpd Service: RUNNING
ipa-custodia Service: RUNNING
pki-tomcatd Service: RUNNING
ipa-otpd Service: RUNNING
ipa-dnskeysyncd Service: RUNNING
ipa: INFO: The ipactl command was successful
Connection to master.ipa.test closed.
Stopping ipa-dnskeysyncd Service
Stopping ipa-otpd Service
Stopping pki-tomcatd Service
Stopping ipa-custodia Service
Stopping httpd Service
Stopping named Service
Stopping kadmin Service
Stopping krb5kdc Service
Stopping Directory Service
ipa: INFO: The ipactl command was successful
Connection to master.ipa.test closed.
Directory Service: STOPPED
Directory Service must be running in order to obtain status of other services
ipa: INFO: The ipactl command was successful
Connection to master.ipa.test closed.
REPLICA ON
Starting Directory Service
Starting krb5kdc Service
Starting kadmin Service
Starting named Service
Starting httpd Service
Starting ipa-custodia Service
Starting ntpd Service
Starting pki-tomcatd Service
Starting ipa-otpd Service
Starting ipa-dnskeysyncd Service
ipa: INFO: The ipactl command was successful
Connection to replica1.ipa.test closed.
Directory Service: RUNNING
krb5kdc Service: RUNNING
kadmin Service: RUNNING
named Service: RUNNING
httpd Service: RUNNING
ipa-custodia Service: RUNNING
ntpd Service: RUNNING
pki-tomcatd Service: RUNNING
ipa-otpd Service: RUNNING
ipa-dnskeysyncd Service: RUNNING
ipa: INFO: The ipactl command was successful
Connection to replica1.ipa.test closed.
klist: Credentials cache keyring 'persistent:0:0' not found
Password for admin: 
Ticket cache: KEYRING:persistent:0:0
Default principal: admin

Valid starting     Expires            Service principal
12/13/19 06:09:57  12/14/19 06:09:57  krbtgt/IPA.TEST
admin:*:773400000:773400000:Administrator:/home/admin:/bin/bash
===============================================
Fri Dec 13 06:09:56 EST 2019
--------------------
MASTER ON
Starting Directory Service
Starting krb5kdc Service
Starting kadmin Service
Starting named Service
Starting httpd Service
Starting ipa-custodia Service
Starting pki-tomcatd Service
Starting ipa-otpd Service
Starting ipa-dnskeysyncd Service
ipa: INFO: The ipactl command was successful
Connection to master.ipa.test closed.
Directory Service: RUNNING
krb5kdc Service: RUNNING
kadmin Service: RUNNING
named Service: RUNNING
httpd Service: RUNNING
ipa-custodia Service: RUNNING
pki-tomcatd Service: RUNNING
ipa-otpd Service: RUNNING
ipa-dnskeysyncd Service: RUNNING
ipa: INFO: The ipactl command was successful
Connection to master.ipa.test closed.
REPLICA OFF
Directory Service: RUNNING
krb5kdc Service: RUNNING
kadmin Service: RUNNING
named Service: RUNNING
httpd Service: RUNNING
ipa-custodia Service: RUNNING
ntpd Service: RUNNING
pki-tomcatd Service: RUNNING
ipa-otpd Service: RUNNING
ipa-dnskeysyncd Service: RUNNING
ipa: INFO: The ipactl command was successful
Connection to replica1.ipa.test closed.
Stopping ipa-dnskeysyncd Service
Stopping ipa-otpd Service
Stopping pki-tomcatd Service
Stopping ntpd Service
Stopping ipa-custodia Service
Stopping httpd Service
Stopping named Service
Stopping kadmin Service
Stopping krb5kdc Service
Stopping Directory Service
ipa: INFO: The ipactl command was successful
Connection to replica1.ipa.test closed.
Directory Service: STOPPED
Directory Service must be running in order to obtain status of other services
ipa: INFO: The ipactl command was successful
Connection to replica1.ipa.test closed.
klist: Credentials cache keyring 'persistent:0:0' not found
Password for admin: 
Ticket cache: KEYRING:persistent:0:0
Default principal: admin

Valid starting     Expires            Service principal
12/13/19 06:10:33  12/14/19 06:10:32  krbtgt/IPA.TEST
admin:*:773400000:773400000:Administrator:/home/admin:/bin/bash


Thus on the basis of above observations, marking the status of bug to "VERIFIED"

Comment 24 errata-xmlrpc 2020-03-31 19:44:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1053


Note You need to log in before you can comment on or make changes to this bug.