Bug 1730377 - fix sss_cache to also reset cached timestamp
Summary: fix sss_cache to also reset cached timestamp
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: sssd
Version: 7.6
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Sumit Bose
QA Contact: sssd-qe
URL:
Whiteboard: sync-to-jira
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-16 14:45 UTC by Paul Raines
Modified: 2021-01-15 15:37 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-01-15 11:46:22 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github SSSD sssd issues 4872 0 None open Silent cache corruption and entries not refreshing 2021-02-08 11:34:27 UTC

Description Paul Raines 2019-07-16 14:45:36 UTC
Description of problem:

Changes to the LDAP server Group database will not propagate to some
sssd clients using that LDAP server. Even running sss_cache -E will 
not fix it.  Only shutting down sssd, removing the cache_default.ldb 
and timestamps_default.ldb files from /var/lib/sss/db works, and 
restarting sssd works.
 
Version-Release number of selected component (if applicable):

sssd-1.16.2-13.el7_6.8.x86_64

How reproducible:

Very random

Steps to Reproduce:
1.  Make a change to group entry in LDAP
2.  Run 'ssh_cache -E' on clients
3.  Check with 'getent group' on clients to see if correct

Actual results:

Group entry did not change to match LDAP server

Expected results:

Group entry should change to match LDAP server

Additional info:

Upstream issue at https://pagure.io/SSSD/sssd/issue/3886 

This is a screen capture showing the issue:

[root@hound db]# getent group stroke
stroke:*:1021:judith
[root@hound db]# grep ldap4 /etc/sssd/sssd.conf
ldap_uri = ldap://ldap4.mydomain.org, ldap://ldap5.mydomain.org
[root@hound db]# ldapsearch -h ldap4 -x -b 'ou=Group,dc=mydomain,dc=org' "(cn=st
roke)" | grep memberUid
memberUid: judith
memberUid: marco
memberUid: bgh12
[root@hound db]# sss_cache -G
[root@hound db]# sss_cache -E
[root@hound db]# getent group stroke
stroke:*:1021:judith
[root@hound db]# systemctl stop sssd
[root@hound db]# \rm cache_default.ldb timestamps_default.ldb
[root@hound db]# systemctl start sssd
[root@hound db]# getent group stroke
stroke:*:1021:judith,marco,bgh12

Comment 4 Sumit Bose 2020-11-24 16:19:44 UTC
Hi,

I tried to reproduce the issue as it was described in the upstream tickets https://pagure.io/SSSD/sssd/issue/3886 and https://pagure.io/SSSD/sssd/issue/3869 but was not successful.

Then I checked again the logs from the upstream tickets and would say that there might have been an issue on the server side which prevented the timestamp-cache logic to update data cache. The timestamp in the 'Adding original mod-Timestamp' debug messages of the groups in question are typically weeks older than the data timestamp of the log entries. So my current best guess is the timestamp on the server side was not updated for whatever reasons (I found some bug reports about such issue) and as a result SSSD thinks that there is no change and no update is needed.

Some logs in the upstream tickets and from the attached cases show issue with missing timestamp cache entries (https://github.com/SSSD/sssd/issues/5121) which was recently fixed by Tomas. I was not able to reproduce the observed behavior by selectively removing timestamp entries of the objects involved.

About the attached cases in general, the main issue in the cases was a different one and looks resolved. I doubt that any of the cases really has the issue are reported in the upstream tickets.

As a result, I was not able to find an issue in SSSD with the data available. However, given that there might be cases where the server side timestamp might by out of sync, it might be worth to think about resetting the cached timestamp with sss_cache as well so that the object must really be read from the server and writing to the cache cannot be skip? If we decide that this is a good idea we have to decide as well if this is something we want to have in RHEL-7.

bye,
Sumit

Comment 9 Raymond Page 2021-01-15 15:27:48 UTC
Resolution: --- → WONTFIX

^^ This type of resolution without customer interaction will directly inform my recommendations to leadership regarding RH solutions.
Specifically, the inability to reproduce is not evidence contrary to the existence of an issue, it is evidence the technical lead is incapable of reproducing the issue.
If support personnel are not capable of reproducing customer issues, then the support agreements become worthless and calls into question the technical value of RH solutions.

Comment 10 Alexey Tikhonov 2021-01-15 15:37:48 UTC
(In reply to Raymond Page from comment #9)
> Resolution: --- → WONTFIX
> 
> ^^ This type of resolution without customer interaction will directly inform
> my recommendations to leadership regarding RH solutions.
> Specifically, the inability to reproduce is not evidence contrary to the
> existence of an issue, it is evidence the technical lead is incapable of
> reproducing the issue.
> If support personnel are not capable of reproducing customer issues, then
> the support agreements become worthless and calls into question the
> technical value of RH solutions.

Please, read explanation in the comment 4 about defined scope of the issue. Taking into account status of RHEL7, this scope can't be addressed here and the issue will be tracked in RHEL8 bz 1902280. Sorry for not making this comment public initially.

If there are any additional details available that are missing on engineering side, please work with your support contacts directly.


Note You need to log in before you can comment on or make changes to this bug.