1730377 – fix sss_cache to also reset cached timestamp

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1730377 - fix sss_cache to also reset cached timestamp

Summary: fix sss_cache to also reset cached timestamp

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	sssd
Sub Component:
Version:	7.6
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Sumit Bose
QA Contact:	sssd-qe
Docs Contact:
URL:
Whiteboard:	sync-to-jira
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-07-16 14:45 UTC by Paul Raines
Modified:	2023-12-15 16:37 UTC (History)
CC List:	14 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-01-15 11:46:22 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	SSSD sssd issues 4872	0	None	open	Silent cache corruption and entries not refreshing	2021-02-08 11:34:27 UTC

Description Paul Raines 2019-07-16 14:45:36 UTC

Description of problem:

Changes to the LDAP server Group database will not propagate to some
sssd clients using that LDAP server. Even running sss_cache -E will 
not fix it.  Only shutting down sssd, removing the cache_default.ldb 
and timestamps_default.ldb files from /var/lib/sss/db works, and 
restarting sssd works.
 
Version-Release number of selected component (if applicable):

sssd-1.16.2-13.el7_6.8.x86_64

How reproducible:

Very random

Steps to Reproduce:
1.  Make a change to group entry in LDAP
2.  Run 'ssh_cache -E' on clients
3.  Check with 'getent group' on clients to see if correct

Actual results:

Group entry did not change to match LDAP server

Expected results:

Group entry should change to match LDAP server

Additional info:

Upstream issue at https://pagure.io/SSSD/sssd/issue/3886 

This is a screen capture showing the issue:

[root@hound db]# getent group stroke
stroke:*:1021:judith
[root@hound db]# grep ldap4 /etc/sssd/sssd.conf
ldap_uri = ldap://ldap4.mydomain.org, ldap://ldap5.mydomain.org
[root@hound db]# ldapsearch -h ldap4 -x -b 'ou=Group,dc=mydomain,dc=org' "(cn=st
roke)" | grep memberUid
memberUid: judith
memberUid: marco
memberUid: bgh12
[root@hound db]# sss_cache -G
[root@hound db]# sss_cache -E
[root@hound db]# getent group stroke
stroke:*:1021:judith
[root@hound db]# systemctl stop sssd
[root@hound db]# \rm cache_default.ldb timestamps_default.ldb
[root@hound db]# systemctl start sssd
[root@hound db]# getent group stroke
stroke:*:1021:judith,marco,bgh12

Comment 4 Sumit Bose 2020-11-24 16:19:44 UTC

Hi,

I tried to reproduce the issue as it was described in the upstream tickets https://pagure.io/SSSD/sssd/issue/3886 and https://pagure.io/SSSD/sssd/issue/3869 but was not successful.

Then I checked again the logs from the upstream tickets and would say that there might have been an issue on the server side which prevented the timestamp-cache logic to update data cache. The timestamp in the 'Adding original mod-Timestamp' debug messages of the groups in question are typically weeks older than the data timestamp of the log entries. So my current best guess is the timestamp on the server side was not updated for whatever reasons (I found some bug reports about such issue) and as a result SSSD thinks that there is no change and no update is needed.

Some logs in the upstream tickets and from the attached cases show issue with missing timestamp cache entries (https://github.com/SSSD/sssd/issues/5121) which was recently fixed by Tomas. I was not able to reproduce the observed behavior by selectively removing timestamp entries of the objects involved.

About the attached cases in general, the main issue in the cases was a different one and looks resolved. I doubt that any of the cases really has the issue are reported in the upstream tickets.

As a result, I was not able to find an issue in SSSD with the data available. However, given that there might be cases where the server side timestamp might by out of sync, it might be worth to think about resetting the cached timestamp with sss_cache as well so that the object must really be read from the server and writing to the cache cannot be skip? If we decide that this is a good idea we have to decide as well if this is something we want to have in RHEL-7.

bye,
Sumit

Comment 9 Raymond Page 2021-01-15 15:27:48 UTC

Resolution: --- → WONTFIX

^^ This type of resolution without customer interaction will directly inform my recommendations to leadership regarding RH solutions.
Specifically, the inability to reproduce is not evidence contrary to the existence of an issue, it is evidence the technical lead is incapable of reproducing the issue.
If support personnel are not capable of reproducing customer issues, then the support agreements become worthless and calls into question the technical value of RH solutions.

Comment 10 Alexey Tikhonov 2021-01-15 15:37:48 UTC

(In reply to Raymond Page from comment #9)
> Resolution: --- → WONTFIX
> 
> ^^ This type of resolution without customer interaction will directly inform
> my recommendations to leadership regarding RH solutions.
> Specifically, the inability to reproduce is not evidence contrary to the
> existence of an issue, it is evidence the technical lead is incapable of
> reproducing the issue.
> If support personnel are not capable of reproducing customer issues, then
> the support agreements become worthless and calls into question the
> technical value of RH solutions.

Please, read explanation in the comment 4 about defined scope of the issue. Taking into account status of RHEL7, this scope can't be addressed here and the issue will be tracked in RHEL8 bz 1902280. Sorry for not making this comment public initially.

If there are any additional details available that are missing on engineering side, please work with your support contacts directly.

Note You need to log in before you can comment on or make changes to this bug.