Bug 1730377 - fix sss_cache to also reset cached timestamp
 Keywords: Status: Triaged CLOSED WONTFIX
Red Hat Enterprise Linux 7
sssd
7.6
x86_64
Linux

 Reported: 2019-07-16 14:45 UTC by Paul Raines
2021-01-15 15:37 UTC

System ID Private Priority Status Summary Last Updated
Github SSSD sssd issues 4872 0 None open Silent cache corruption and entries not refreshing 2021-02-08 11:34:27 UTC

 Paul Raines 2019-07-16 14:45:36 UTC Description of problem: Changes to the LDAP server Group database will not propagate to some sssd clients using that LDAP server. Even running sss_cache -E will not fix it. Only shutting down sssd, removing the cache_default.ldb and timestamps_default.ldb files from /var/lib/sss/db works, and restarting sssd works. Version-Release number of selected component (if applicable): sssd-1.16.2-13.el7_6.8.x86_64 How reproducible: Very random Steps to Reproduce: 1. Make a change to group entry in LDAP 2. Run 'ssh_cache -E' on clients 3. Check with 'getent group' on clients to see if correct Actual results: Group entry did not change to match LDAP server Expected results: Group entry should change to match LDAP server Additional info: Upstream issue at https://pagure.io/SSSD/sssd/issue/3886 This is a screen capture showing the issue: [root@hound db]# getent group stroke stroke:*:1021:judith [root@hound db]# grep ldap4 /etc/sssd/sssd.conf ldap_uri = ldap://ldap4.mydomain.org, ldap://ldap5.mydomain.org [root@hound db]# ldapsearch -h ldap4 -x -b 'ou=Group,dc=mydomain,dc=org' "(cn=st roke)" | grep memberUid memberUid: judith memberUid: marco memberUid: bgh12 [root@hound db]# sss_cache -G [root@hound db]# sss_cache -E [root@hound db]# getent group stroke stroke:*:1021:judith [root@hound db]# systemctl stop sssd [root@hound db]# \rm cache_default.ldb timestamps_default.ldb [root@hound db]# systemctl start sssd [root@hound db]# getent group stroke stroke:*:1021:judith,marco,bgh12  Sumit Bose 2020-11-24 16:19:44 UTC Hi, I tried to reproduce the issue as it was described in the upstream tickets https://pagure.io/SSSD/sssd/issue/3886 and https://pagure.io/SSSD/sssd/issue/3869 but was not successful. Then I checked again the logs from the upstream tickets and would say that there might have been an issue on the server side which prevented the timestamp-cache logic to update data cache. The timestamp in the 'Adding original mod-Timestamp' debug messages of the groups in question are typically weeks older than the data timestamp of the log entries. So my current best guess is the timestamp on the server side was not updated for whatever reasons (I found some bug reports about such issue) and as a result SSSD thinks that there is no change and no update is needed. Some logs in the upstream tickets and from the attached cases show issue with missing timestamp cache entries (https://github.com/SSSD/sssd/issues/5121) which was recently fixed by Tomas. I was not able to reproduce the observed behavior by selectively removing timestamp entries of the objects involved. About the attached cases in general, the main issue in the cases was a different one and looks resolved. I doubt that any of the cases really has the issue are reported in the upstream tickets. As a result, I was not able to find an issue in SSSD with the data available. However, given that there might be cases where the server side timestamp might by out of sync, it might be worth to think about resetting the cached timestamp with sss_cache as well so that the object must really be read from the server and writing to the cache cannot be skip? If we decide that this is a good idea we have to decide as well if this is something we want to have in RHEL-7. bye, Sumit  Raymond Page 2021-01-15 15:27:48 UTC Resolution: --- → WONTFIX ^^ This type of resolution without customer interaction will directly inform my recommendations to leadership regarding RH solutions. Specifically, the inability to reproduce is not evidence contrary to the existence of an issue, it is evidence the technical lead is incapable of reproducing the issue. If support personnel are not capable of reproducing customer issues, then the support agreements become worthless and calls into question the technical value of RH solutions.  Alexey Tikhonov 2021-01-15 15:37:48 UTC (In reply to Raymond Page from comment #9) > Resolution: --- → WONTFIX > > ^^ This type of resolution without customer interaction will directly inform > my recommendations to leadership regarding RH solutions. > Specifically, the inability to reproduce is not evidence contrary to the > existence of an issue, it is evidence the technical lead is incapable of > reproducing the issue. > If support personnel are not capable of reproducing customer issues, then > the support agreements become worthless and calls into question the > technical value of RH solutions. Please, read explanation in the comment 4 about defined scope of the issue. Taking into account status of RHEL7, this scope can't be addressed here and the issue will be tracked in RHEL8 bz 1902280. Sorry for not making this comment public initially. If there are any additional details available that are missing on engineering side, please work with your support contacts directly. 

