Red Hat Bugzilla – Bug 882290
arithmetic bug in the SSSD causes netgroup midpoint refresh to be always set to 10 seconds
Last modified: 2015-03-10 01:59:04 EDT
This bug is created as a clone of upstream ticket: https://fedorahosted.org/sssd/ticket/1683 The code uses: {{{ lifetime = dom->netgroup_timeout * (step_ctx->nctx->cache_refresh_percent / 100); }}} As both arguments of the division are int, the result is always 0 and we always hit the "fallback" condition that sets the midpoint refresh to 10 seconds. The code should read something like: {{{ lifetime = dom->netgroup_timeout * (step_ctx->nctx->cache_refresh_percent / 100.0); }}}
I think the test should do the following: 0) insert a netgroup into LDAP 1) set entry_cache_timeout to some reasonably low value in the domain provider. I used 60, but ideally the test would use several different values. The default value of entry_cache_nowait_percentage is 50 (percent), which is good enough for the test. Also disable the memcache for the test, it might get in the way -- set memcache_timeout to 1 second, for example. 2) Start with an empty cache to ensure a clean starting point 3) request the test netgroup the returned values should correspond with what was entered in step 0 4) before the midpoint timeout passes, modify the netgroup 5) sleep into the midpoint interval. In this example, it would be 50% of 60 seconds, so sleep 30 6) request the same netgroup again 7) the results returned should be still the same as in step 3 but a refresh should be scheduled on the background 8) the background refresh could be verified in a number of ways -- one is a simple grep through the logs, a better way would be to request the entry again. This time, the modified netgroup should be returned.
Jakub, I have attached this bug to the same case I was speaking to you about in bz 822236...please let me know if you have a patch I can forward on to the customer for testing. -Chris
The fix was pushed upstream yesterday. I'll build you a test package.
Created attachment 661503 [details] sssd-logs-fresh
I do not think tuning is going to be the answer here. We would need to increase the validity times which would come at a risk for incorrect cache.
Simo proposed a new periodic background task that would schedule an out-of-band refresh for any cached entry. It is being tracked as https://fedorahosted.org/sssd/ticket/1713 upstream.
Verified the BZ on SSSD version: sssd-1.9.2-41.el6.x86_64 As part of testing, a netgroup was added to LDAP server. SSSD.CONF was configured with "entry_cache_nowait_percentage = 50" , "memcache_timeout = 1" and "entry_cache_timeout = 60". The netgroup request was sent after clearing the sssd cache. The netgroup triple was then modified to reflect new values. Another request was sent after 30 sec of sleep time and the DOMAIN log was verified for the new set of netgroup triple. Below is the beaker output for the automated script: :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: [ LOG ] :: 882290 - arithmetic bug in the SSSD causes netgroup midpoint refresh to be always set to 10 seconds :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: adding new entry "cn=netgrp_art,ou=Netgroup,dc=example,dc=com" :: [ PASS ] :: Running '> /var/log/sssd/sssd_LDAP.log' Stopping sssd: [ OK ] Starting sssd: [ OK ] :: [17:27:53] :: Sleeping for 5 seconds :: [ PASS ] :: Running 'restart_clearing_cache' netgrp_art (host1, kau10, example.com) :: [ PASS ] :: Running 'getent netgroup netgrp_art' modifying entry "cn=netgrp_art,ou=Netgroup,dc=example,dc=com" :: [ PASS ] :: Running 'sleep 30' netgrp_art (host1, kau10, example.com) :: [ PASS ] :: Running 'getent netgroup netgrp_art' :: [ PASS ] :: Running 'sleep 5' :: [ PASS ] :: File '/var/log/sssd/sssd_LDAP.log' should contain 'host2,ami10,example.com' deleting entry "cn=netgrp_art,ou=Netgroup,dc=example,dc=com" :: [ PASS ] :: Running 'ldapmodify -x -D "cn=Directory Manager" -w Secret123 -H ldap://hubcap.lab.eng.pnq.redhat.com -f /tmp/delnetgrp.ldif'
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0508.html