Bug 1359208
| Summary: | sssd does not refresh expired cache entries with enumerate=true | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Thorsten Scherf <tscherf> | ||||
| Component: | sssd | Assignee: | Petr Čech <pcech> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Sudhir Menon <sumenon> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 7.2 | CC: | grajaiya, jhrozek, ksiddiqu, lslebodn, mkosek, mzidek, pbrezina, pcech, sbose, sgoveas, tscherf | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | All | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | sssd-1.15.0-2.el7 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2017-08-01 08:58:07 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Thorsten Scherf
2016-07-22 13:57:21 UTC
Iirc there is a special cleanup task for this which is currently disabled by default because it might cause some unexpected slow-downs. Please check account_cache_expiration in man sssd.conf and ldap_purge_cache_timeout in sssd-ldap. HI Sumit. There is a workaround already with "ldap_enumeration_refresh_timeout" but I think it's still a bug when entries are not refreshed/removed after they have expired. ldap_purge_cache_timeout and account_cache_expiration do not seem to be an option because I want to keep old entries in the cache. I only want to get rid of them after they expired - that is after they have been removed and cache timeout has been exceeded. Hi Sumit and Thorsten, I looked at man (sss-ldap) page. I am afraid of this is not a bug. Thorsten, you wrote that you have a workaround with ldap_enumeration_refresh_timeout. Default value of this option is 300 seconds. So the workaround works every 5 minutes. There is another one mechanism, see ldap_purge_cache_timeout. Default behaviour with enumeration true is: "Please note that if enumeration is enabled, the cleanup task is required in order to detect entries removed from the server and can't be disabled. By default, the cleanup task will run every 3 hours with enumeration enabled." So... IMHO there is no issue. What do you think, Thorsten, Sumit? Hi Thorsten, I would like to ask you if you agree with my explanation at Comment #3 I think Thorsten is right in his observation that it is odd that the group members of an expired entry in the cache are not update. Nevertheless I'm currently not sure if this is expected behavior with with enumeration or not. Thorsten, can you check in your first example (without ldap_enumeration_refresh_timeout = 30) if the dataExpireTimestamp is updated after the lookup when the entry is expired? Additionally can you attach the nss and backend logs with debug_level=10? Petr, Summit is right. I would expect an entry which is not valid anymore to be updated in the cache. Here is the requested data: ######################## enumerate = true entry_cache_timeout = 30 enum_cache_timeout = 30 ######################## # service sssd stop;rm -rf /var/lib/sss/db/* /var/lib/sss/mc/*;service sssd start Redirecting to /bin/systemctl stop sssd.service Redirecting to /bin/systemctl start sssd.service # ipa group-add-member --users tuser200 tgroup100 # date; ipa group-remove-member --users tuser200 tgroup100 Mon Sep 5 17:41:50 CEST 2016 Group name: tgroup100 GID: 1221800008 --------------------------- Number of members removed 1 --------------------------- # date; SSS_NSS_USE_MEMCACHE=NO getent group tgroup100 Mon Sep 5 17:41:57 CEST 2016 tgroup100:*:1221800008:tuser200 # ldbsearch -H /var/lib/sss/db/cache_coe.muc.redhat.com.ldb name=tgroup100 [...] dataExpireTimestamp: 1473090134 member: name=tuser200,cn=users,cn=coe.muc.redhat.com,cn=sysdb memberuid: tuser200 # date -d @1473090134 Mon Sep 5 17:42:14 CEST 2016 # date; SSS_NSS_USE_MEMCACHE=NO getent group tgroup100 Mon Sep 5 17:42:59 CEST 2016 tgroup100:*:1221800008:tuser200 # ldbsearch -H /var/lib/sss/db/cache_coe.muc.redhat.com.ldb name=tgroup100 [...] dataExpireTimestamp: 1473090134 member: name=tuser200,cn=users,cn=coe.muc.redhat.com,cn=sysdb memberuid: tuser200 # date -d @1473090134 Mon Sep 5 17:42:14 CEST 2016 So dataExpireTimestamp is not updated after the last lookup although the entry was already expired. Thorsten, Sumit, I went through the reproducer with last code in master. The attribute dataExpireTimestamp is updated, in ts_cache (timestamp cache), after lookup after its expiration. I applied patch for 'better sysdb debugging' and I saw: [sssd[be[beta]]] [sysdb_store_group] (0x1000): The group record of tgroup_1@beta did not change, only updated the timestamp cache [sssd[be[beta]]] [sdap_save_groups] (0x4000): Group 0 processed! [sssd[be[beta]]] [sdap_attrs_get_sid_str] (0x1000): No [objectSIDString] attribute. [0][Success] [sssd[be[beta]]] [sdap_save_grpmem] (0x0400): Failed to get group sid [sssd[be[beta]]] [sdap_get_primary_name] (0x0400): Processing object tgroup_1 [sssd[be[beta]]] [sdap_save_grpmem] (0x0400): Processing group tgroup_1@beta [sssd[be[beta]]] [sdap_save_grpmem] (0x0400): No members for group [tgroup_1@beta] [...] [sssd[be[beta]]] [sss_ldb_ldif2log] (0x10000): ldif [ dn: name=tgroup_1@beta,cn=groups,cn=beta,cn=sysdb changetype: modify replace: lastUpdate lastUpdate: 1473231233 - replace: dataExpireTimestamp dataExpireTimestamp: 1473231263 - ] I interpret it so that SSSD correctly identifies there is no member but group storing/updating fails after it. So, it is a bug, I think. Reproducer: # prepare ipa user-add --first=Test --last=User --email=tuser tuser ipa group-add tgroup_1 # repeat systemctl stop sssd sudo su -c "rm -f /var/lib/sss/db/*" sudo su -c "rm -f /var/lib/sss/mc/*" sssctl logs-remove && systemctl start sssd date; SSS_NSS_USE_MEMCACHE=NO getent group tgroup_1 ipa group-add-member --users=tuser tgroup_1 date; SSS_NSS_USE_MEMCACHE=NO getent group tgroup_1 ipa group-remove-member --users tuser tgroup_1 date; SSS_NSS_USE_MEMCACHE=NO getent group tgroup_1 Configuration: # cat /etc/sssd/sssd.conf [domain/beta] cache_credentials = True krb5_store_password_if_offline = True ipa_domain = beta id_provider = ipa auth_provider = ipa access_provider = ipa ipa_hostname = mirach.beta chpass_provider = ipa dyndns_update = True ipa_server = _srv_, algol.beta dyndns_iface = ens3 ldap_tls_cacert = /etc/ipa/ca.crt enumerate = true entry_cache_timeout = 30 debug_level = 0xFFFF0 [sssd] services = nss, sudo, pam, ssh domains = beta debug_level = 0xFFFFFF0 [nss] homedir_substring = /home enum_cache_timeout = 30 I checked this issue again. It occurs on SSSD 1.14 but I am not able to reproduce it on SSSD 1.13. Upstream ticket: https://fedorahosted.org/sssd/ticket/3182 (In reply to Petr Čech from comment #8) > I checked this issue again. > It occurs on SSSD 1.14 but I am not able to reproduce it on SSSD 1.13. Which exact version of SSSD 1.13 did you use? Was it sssd-1.13.0-40.el7_2.4.x86_64? I read the reproducer again, I forgot wait on timeouts... So, I took sssd-1.13.0-40.el7_2.12.x86_64... and the bug accured. I need to write comments, or wait() to my reproducers :-) I fix version in ticket. Created attachment 1200242 [details]
Potential quick-n-dirty fix
Hi Petr,
have you checked if this patch helps to fix the issue?
bye,
Sumit
(In reply to Sumit Bose from comment #12) > Created attachment 1200242 [details] > Potential quick-n-dirty fix > > Hi Petr, > > have you checked if this patch helps to fix the issue? > IIUC; Petr cannot reproduce with SSSD 1.13. So I assume it is already fixed in latest 1.13 and bug is only on master(1.14). Hi Sumit and Lukas, I am working on it. I have others bug which are connected to the groups. I wrote some intg. tests for it. And I would like to be sure that I will not break anything. Lukas, I did mistake. It is really broken on SSSD 1.13.0. I firstly checked 1.13.5 but wrong way. Sumit, I am working with your "quick-n-dirty" fix. I will inform you during a day. Ticket for this bug was https://fedorahosted.org/sssd/ticket/3182 and the issue was solved in https://fedorahosted.org/sssd/ticket/2940 by e0903f41922721edf292a9f7e6605a4519db53a1 (sssd-1_14_2) (In reply to Petr Čech from comment #16) > Ticket for this bug was > https://fedorahosted.org/sssd/ticket/3182 > > and the issue was solved in > https://fedorahosted.org/sssd/ticket/2940 > > by > e0903f41922721edf292a9f7e6605a4519db53a1 (sssd-1_14_2) Are you sure that it is the same issue? Have you tested with "enumerate = true"? The patches for ticket #2940 do not have a test for "enumerate = true". (In reply to Lukas Slebodnik from comment #17) > (In reply to Petr Čech from comment #16) > > Ticket for this bug was > > https://fedorahosted.org/sssd/ticket/3182 > > > > and the issue was solved in > > https://fedorahosted.org/sssd/ticket/2940 > > > > by > > e0903f41922721edf292a9f7e6605a4519db53a1 (sssd-1_14_2) > > Are you sure that it is the same issue? > Have you tested with "enumerate = true"? Yes, I am sure. Yes, I have tested. > The patches for ticket #2940 do not have a test for > "enumerate = true". If it is possible to add tests for "enumerate = true", I could write it. Is it better new ticket for it or reopen #3182? (In reply to Petr Čech from comment #18) > (In reply to Lukas Slebodnik from comment #17) > > (In reply to Petr Čech from comment #16) > > > Ticket for this bug was > > > https://fedorahosted.org/sssd/ticket/3182 > > > > > > and the issue was solved in > > > https://fedorahosted.org/sssd/ticket/2940 > > > > > > by > > > e0903f41922721edf292a9f7e6605a4519db53a1 (sssd-1_14_2) > > > > Are you sure that it is the same issue? > > Have you tested with "enumerate = true"? > > Yes, I am sure. Yes, I have tested. > Thank you good to know. > > The patches for ticket #2940 do not have a test for > > "enumerate = true". > > If it is possible to add tests for "enumerate = true", I could write it. Is > it better new ticket for it or reopen #3182? It's difficult to write reliable test for enumeration due to race conditions and timing issues. Fix is seen in sssd-1.15.2-29.el7.x86_64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:2294 |