Bug 1576597
Summary: | Clearing SSSD cache is necessary after update from 1.15 to 1.16 | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Josip Vilicic <jvilicic> |
Component: | sssd | Assignee: | SSSD Maintainers <sssd-maint> |
Status: | CLOSED WORKSFORME | QA Contact: | sssd-qe <sssd-qe> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.4 | CC: | grajaiya, jhrozek, jvilicic, lslebodn, mkosek, mzidek, pbrezina, tscherf |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-05-23 10:54:15 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Josip Vilicic
2018-05-09 21:59:14 UTC
This bug report has no configuration or logs. Please see https://docs.pagure.org/SSSD.sssd/users/reporting_bugs.html to learn what is needed in a useful bug report. About the cache removal when downgrade, that is expected. OK, I found some time to actually read the case and it is not clear to me what this bug is about: a) is it about the perceived performance regression? If yes, can we see log files that capture the login or id or whatever is slow? It would be best to be able to compare the old and new version or .. b) is it about having to remove the cache when you downgrade? If yes, then that's expected, we changed the database layout and the indexes somewhat between 1.15 and 1.16 so the database must be upgraded, but we don't support database downgrades (often it's not even possible) 1) This bug was opened because the customer reported having to clear SSSD's cache after upgrading from 1.15 to 1.16: ---------------------- TEST #1: My first test is with the actual environnement when the problem first occurs after the upgrade of the server. kernel: 3.10.0-862.el7.x86_64 sssd: sssd-1.16.0-19.el7.x86_64 Result: The playbook failed with error messages: "Timeout (22s) waiting for privilege escalation prompt" Result time: real 0m23.801s user 0m2.091s sys 0m0.619s ---------------------- 2) The customer made additional test cases, where they downgraded SSSD, which required them to clear the cache after SSSD would not start properly: ---------------------- TEST #2: I downgrade 'sssd' to the previous version, I kept the same kernel. kernel: 3.10.0-862.el7.x86_64 sssd: sssd-1.15.2-50.el7_4.11.x86_64 Result: The playbook complete succesfully. Result time: real 0m13.991s user 0m2.242s sys 0m0.627s As you can see, the execution took half the time for version 1.15 vs 1.16. IMPORTANT: After the downgrade to version 1.15, when I restart 'sssd', I had the following error: # systemctl status sssd.service sssd.service - System Security Services Daemon Loaded: loaded (/usr/lib/systemd/system/sssd.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/sssd.service.d journal.conf Active: failed (Result: exit-code) since Mon 2018-05-07 10:05:31 EDT; 9s ago Process: 47975 ExecStart=/usr/sbin/sssd -i -f (code=exited, status=3) Main PID: 47975 (code=exited, status=3) May 07 10:05:31 ls0oegip100 systemd[1]: Starting System Security Services Daemon... May 07 10:05:31 ls0oegip100 sssd[47975]: Starting up May 07 10:05:31 ls0oegip100 sssd[47975]: Lower version of database is expected! May 07 10:05:31 ls0oegip100 sssd[47975]: Removing cache files in /var/lib/sss/db should fix the issue, but note that removing cache files will also remove all of your cached credentials. May 07 10:05:31 ls0oegip100 systemd[1]: sssd.service: main process exited, code=exited, status=3/NOTIMPLEMENTED May 07 10:05:31 ls0oegip100 systemd[1]: Failed to start System Security Services Daemon. May 07 10:05:31 ls0oegip100 systemd[1]: Unit sssd.service entered failed state. May 07 10:05:31 ls0oegip100 systemd[1]: sssd.service failed. I clear the contents of the '/var/lib/sss/db' directory, the start of the 'sssd' work perfectly. ---------------------- I personally feel we can ignore Test Case #2, but it shows they don't experience the timeout like they do with SSSD 1.16 3) Then 2 additional tests, where they upgrade SSSD from 1.15 to 1.16, experience problems (after having cleared the SSSD cache in TEST #2 above), and only after clearing SSSD's cache *after the upgrade* did things work properly: ---------------------- TEST #3: I upgrade to version 1.16 of 'sssd', kernel: 3.10.0-862.el7.x86_64 sssd: sssd-1.16.0-19.el7.x86_64 Result: The playbook failed with error messages: "Timeout (22s) waiting for privilege escalation prompt" Result time: real 0m23.990s user 0m2.247s sys 0m0.702s TEST #4: With the same versions of the Kernel and sssd, I clear the contents of the '/var/lib/sss/db' directory. kernel: 3.10.0-862.el7.x86_64 sssd: sssd-1.16.0-19.el7.x86_64 Result: The playbook complet succesfully. Result time: real 0m12.109s user 0m2.371s sys 0m0.630s ---------------------- 4) Unfortunately, the customer has moved on after the "workaround" of clearing SSSD's cache after the upgrade and they do not have the resources to continue troubleshooting, so we do not have, and will not receive, SSSD debug logs of the failure. Thank you very much for clearing up the confusion. Since the case is closed and none of our tests showed a performance regression, I think it makes sense to close the bug as WORKSFORME in a couple of days. Since there is no additional information to perform some kind of a post-mortem analysis, I'm going to close this bug. |