Bug 882290

Summary: arithmetic bug in the SSSD causes netgroup midpoint refresh to be always set to 10 seconds
Product: Red Hat Enterprise Linux 6 Reporter: Jakub Hrozek <jhrozek>
Component: sssdAssignee: Jakub Hrozek <jhrozek>
Status: CLOSED ERRATA QA Contact: Kaushik Banerjee <kbanerje>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.4CC: apeetham, chhudson, dpal, grajaiya, jgalipea, msauton, okos, pbrezina
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: sssd-1.9.2-37.el6 Doc Type: Bug Fix
Doc Text:
No documentation needed.
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-21 09:41:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 886216    
Attachments:
Description Flags
sssd-logs-fresh none

Description Jakub Hrozek 2012-11-30 15:27:45 UTC
This bug is created as a clone of upstream ticket:
https://fedorahosted.org/sssd/ticket/1683

The code uses:
{{{
lifetime = dom->netgroup_timeout *
                (step_ctx->nctx->cache_refresh_percent / 100);
}}}

As both arguments of the division are int, the result is always 0 and we always hit the "fallback" condition that sets the midpoint refresh to 10 seconds.

The code should read something like:
{{{
lifetime = dom->netgroup_timeout *
                (step_ctx->nctx->cache_refresh_percent / 100.0);
}}}

Comment 1 Jakub Hrozek 2012-12-02 18:22:18 UTC
I think the test should do the following:

0) insert a netgroup into LDAP

1) set entry_cache_timeout to some reasonably low value in the domain provider. I used 60, but ideally the test would use several different values. The default value of entry_cache_nowait_percentage is 50 (percent), which is good enough for the test. Also disable the memcache for the test, it might get in the way -- set memcache_timeout to 1 second, for example.

2) Start with an empty cache to ensure a clean starting point

3) request the test netgroup
the returned values should correspond with what was entered in step 0

4) before the midpoint timeout passes, modify the netgroup

5) sleep into the midpoint interval. In this example, it would be 50% of 60 seconds, so sleep 30

6) request the same netgroup again

7) the results returned should be still the same as in step 3 but a refresh should be scheduled on the background

8) the background refresh could be verified in a number of ways -- one is a simple grep through the logs, a better way would be to request the entry again. This time, the modified netgroup should be returned.

Comment 2 Chris Hudson 2012-12-05 04:32:12 UTC
Jakub, I have attached this bug to the same case I was speaking to you about in bz 822236...please let me know if you have a patch I can forward on to the customer for testing.

-Chris

Comment 3 Jakub Hrozek 2012-12-05 09:29:03 UTC
The fix was pushed upstream yesterday. I'll build you a test package.

Comment 14 Chris Hudson 2012-12-11 15:04:04 UTC
Created attachment 661503 [details]
sssd-logs-fresh

Comment 20 Chris Hudson 2012-12-11 22:10:53 UTC
I do not think tuning is going to be the answer here. We would need to increase the validity times which would come at a risk for incorrect cache.

Comment 21 Jakub Hrozek 2012-12-12 16:14:00 UTC
Simo proposed a new periodic background task that would schedule an out-of-band refresh for any cached entry. It is being tracked as https://fedorahosted.org/sssd/ticket/1713 upstream.

Comment 23 Amith 2012-12-17 12:13:56 UTC
Verified the BZ on SSSD version: sssd-1.9.2-41.el6.x86_64

As part of testing, a netgroup was added to LDAP server. SSSD.CONF was configured with "entry_cache_nowait_percentage = 50" , "memcache_timeout = 1" and "entry_cache_timeout = 60". The netgroup request was sent after clearing the sssd cache. The netgroup triple was then modified to reflect new values. Another request was sent after 30 sec of sleep time and the DOMAIN log was verified for the new set of netgroup triple. 

Below is the beaker output for the automated script:

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [   LOG    ] :: 882290 - arithmetic bug in the SSSD causes netgroup midpoint refresh to be always set to 10 seconds
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

adding new entry "cn=netgrp_art,ou=Netgroup,dc=example,dc=com"

:: [   PASS   ] :: Running '> /var/log/sssd/sssd_LDAP.log'
Stopping sssd:                                             [  OK  ]
Starting sssd:                                             [  OK  ]
:: [17:27:53] ::  Sleeping for 5 seconds
:: [   PASS   ] :: Running 'restart_clearing_cache'
netgrp_art            (host1, kau10, example.com)
:: [   PASS   ] :: Running 'getent netgroup netgrp_art'
modifying entry "cn=netgrp_art,ou=Netgroup,dc=example,dc=com"

:: [   PASS   ] :: Running 'sleep 30'
netgrp_art            (host1, kau10, example.com)
:: [   PASS   ] :: Running 'getent netgroup netgrp_art'
:: [   PASS   ] :: Running 'sleep 5'
:: [   PASS   ] :: File '/var/log/sssd/sssd_LDAP.log' should contain 'host2,ami10,example.com'
deleting entry "cn=netgrp_art,ou=Netgroup,dc=example,dc=com"

:: [   PASS   ] :: Running 'ldapmodify -x -D "cn=Directory Manager" -w Secret123 -H ldap://hubcap.lab.eng.pnq.redhat.com -f /tmp/delnetgrp.ldif'

Comment 24 errata-xmlrpc 2013-02-21 09:41:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0508.html