Bug 1665867

Summary:	proxy provider is not working with enumerate=true when trying to fetch all groups
Product:	Red Hat Enterprise Linux 8	Reporter:	Madhuri <mupadhye>
Component:	sssd	Assignee:	Alexey Tikhonov <atikhono>
Status:	CLOSED ERRATA	QA Contact:	sssd-qe <sssd-qe>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	8.0	CC:	atikhono, blc, dbula, grajaiya, jhrozek, lslebodn, mupadhye, mzidek, pbrezina, sbose, tscherf
Target Milestone:	rc	Flags:	dbula: mirror+
Target Release:	8.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	sssd-2.2.0-1.el8	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-11-05 22:34:01 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1682305
Bug Blocks:

Description Madhuri 2019-01-14 10:05:57 UTC

Description of problem:
The proxy provider is not working with enumerate=true when trying to fetch all groups

Version-Release number of selected component (if applicable):
sssd-2.0.0-36.el8.x86_64

How reproducible:
always


Steps to Reproduce:
1. Configure sssd with a proxy provider
2. Fetch all groups using '# getent group' 


Actual results:
Fetching all groups take too much time(approx 80 to 90 seconds)

Expected results:
Should not take that much time to fetch all groups.

Additional info:
# cat /etc/sssd/sssd.conf
[sssd]
config_file_version = 2
domains = proxy, ldap
sbus_timeout = 30
services = nss, pam

[domain/proxy]
auth_provider = proxy
cache_credentials = True
enumerate = TRUE
id_provider = proxy
debug_level = 0xFFF0
proxy_lib_name = ldap
proxy_pam_target = sssdproxyldap
filter_users = puser10
use_fully_qualified_names = True

[domain/ldap]
id_provider = ldap
auth_provider = ldap
cache_credentials = FALSE
ldap_search_base = dc=bos,dc=redhat,dc=com
chpass_provider = ldap
ldap_id_use_start_tls = True
debug_level = 0xFFF0
min_id  = 1000
enumerate = TRUE
ldap_uri = ldaps://server.example.com:636
ldap_tls_cacert = /etc/openldap/certs/cacert2.pem

[nss]
filter_groups = root
filter_users = root
debug_level = 9

[pam]

Comment 1 Sumit Bose 2019-01-14 10:21:43 UTC

Enumeration caused a crash in the proxy provider:

(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007fa0b6875c95 in __GI_abort () at abort.c:79
#2  0x00007fa0b6e6ed31 in talloc_abort (reason=0x7fa0b6e7c838 "Bad talloc magic value - access after free") at ../talloc.c:500
#3  0x00007fa0b6e6f46d in talloc_abort_access_after_free () at ../talloc.c:525
#4  talloc_chunk_from_ptr (ptr=0x55abd058df10) at ../talloc.c:525
#5  _talloc_steal_loc (new_ctx=0x55abd05d37c0, ptr=0x55abd058df10, location=<optimized out>) at ../talloc.c:1329
#6  0x00007fa0a2730159 in remove_duplicate_group_members (_grp=<synthetic pointer>, orig_grp=0x55abd058df10, mem_ctx=0x55abd05d37c0) at src/providers/proxy/proxy_id.c:708
#7  save_group (sysdb=sysdb@entry=0x55abd0571770, dom=dom@entry=0x55abd05890f0, grp=grp@entry=0x55abd058df10, real_name=0x55abd05c6f20 "Group2@proxy", alias=alias@entry=0x0)
    at src/providers/proxy/proxy_id.c:738
#8  0x00007fa0a2732602 in enum_groups (dom=0x55abd05890f0, sysdb=0x55abd0571770, ctx=0x55abd05a36d0, mem_ctx=0x55abd058de40) at src/providers/proxy/proxy_id.c:1298
#9  proxy_account_info (domain=0x55abd05890f0, be_ctx=<optimized out>, data=<optimized out>, ctx=0x55abd05a36d0, mem_ctx=0x55abd058de40) at src/providers/proxy/proxy_id.c:1632
#10 proxy_account_info_handler_send (mem_ctx=<optimized out>, id_ctx=0x55abd05a36d0, data=<optimized out>, params=0x55abd05bb3e0) at src/providers/proxy/proxy_id.c:1763
#11 0x000055abcec1abf9 in file_dp_request (_dp_req=<synthetic pointer>, req=0x55abd05ad880, request_data=0x55abd059bb40, dp_flags=1, method=DPM_ACCOUNT_HANDLER, target=DPT_ID, 
    name=<optimized out>, domainname=0x55abd05badd0 "proxy", provider=0x55abd0571bf0, mem_ctx=<optimized out>) at src/providers/data_provider/dp_request.c:250
#12 dp_req_send (mem_ctx=0x55abd059baa0, provider=provider@entry=0x55abd0571bf0, domain=domain@entry=0x55abd05badd0 "proxy", name=<optimized out>, target=target@entry=DPT_ID, 
    method=method@entry=DPM_ACCOUNT_HANDLER, dp_flags=1, request_data=0x55abd059bb40, _request_name=0x55abd059baa0) at src/providers/data_provider/dp_request.c:295
#13 0x000055abcec1d90e in dp_get_account_info_send (mem_ctx=<optimized out>, ev=0x55abd0561ab0, sbus_req=<optimized out>, provider=0x55abd0571bf0, dp_flags=1, entry_type=<optimized out>, 
    filter=<optimized out>, domain=0x55abd05badd0 "proxy", extra=0x55abd05bae40 "") at src/providers/data_provider/dp_target_id.c:528
#14 0x00007fa0b76e57f2 in _sbus_sss_invoke_in_uusss_out_qus_step (ev=0x55abd0561ab0, te=<optimized out>, tv=..., private_data=<optimized out>) at src/sss_iface/sbus_sss_invokers.c:2837
#15 0x00007fa0b708bbd9 in tevent_common_invoke_timer_handler (te=te@entry=0x55abd0570370, current_time=..., removed=removed@entry=0x0) at ../tevent_timed.c:369
#16 0x00007fa0b708bd7e in tevent_common_loop_timer_delay (ev=ev@entry=0x55abd0561ab0) at ../tevent_timed.c:441
#17 0x00007fa0b708cf2b in epoll_event_loop_once (ev=0x55abd0561ab0, location=<optimized out>) at ../tevent_epoll.c:922
#18 0x00007fa0b708b1bb in std_event_loop_once (ev=0x55abd0561ab0, location=0x7fa0ba3489d9 "src/util/server.c:724") at ../tevent_standard.c:110
#19 0x00007fa0b7086395 in _tevent_loop_once (ev=ev@entry=0x55abd0561ab0, location=location@entry=0x7fa0ba3489d9 "src/util/server.c:724") at ../tevent.c:772
#20 0x00007fa0b708663b in tevent_common_loop_wait (ev=0x55abd0561ab0, location=0x7fa0ba3489d9 "src/util/server.c:724") at ../tevent.c:895
#21 0x00007fa0b708b14b in std_event_loop_wait (ev=0x55abd0561ab0, location=0x7fa0ba3489d9 "src/util/server.c:724") at ../tevent_standard.c:141
#22 0x00007fa0ba327a07 in server_loop (main_ctx=0x55abd0562f80) at src/util/server.c:724
#23 0x000055abcec0d38b in main (argc=8, argv=<optimized out>) at src/providers/data_provider_be.c:699


I think the reason is the talloc_steal() in the done-block of remove_duplicate_group_members().

 704 done:
 705     talloc_zfree(tmp_ctx);
 706 
 707     if (ret == ENOENT) {
 708         *_grp = talloc_steal(mem_ctx, orig_grp);
 709         ret = EOK;
 710     }
 711 
 712     return ret;
 713 }

because the address pointed to by orig_grp is reused in the do-loop of enum_groups() but due to the talloc_steal() it will be freed in save_group() when the temporary talloc context is freed. To fix this '*_grp = orig_grp;' should be sufficient, in the worst case (if _grp is somewhere freed explicitly) copying the memory should help.

Comment 2 Alexey Tikhonov 2019-01-28 18:50:32 UTC

(In reply to Sumit Bose from comment #1)

> To fix this '*_grp = orig_grp;' should be sufficient

 Technically - yes. But I feel it is a bad idea since it breaks "promise" that one could expect looking at signature of `remove_duplicate_group_members(mem_ctx, orig_group, new_group)`
It is not documented anywhere, but I would rather prefer function to behave consistently and to always return copy of group in given mem context.


> in the worst case (if _grp is somewhere freed explicitly) copying the memory should help.

I would go that way. There is no much overhead (especially taking in account how ineffective code around is anyway).

Comment 3 Sumit Bose 2019-01-29 08:00:49 UTC

(In reply to Alexey Tikhonov from comment #2)
> (In reply to Sumit Bose from comment #1)
> 
> > To fix this '*_grp = orig_grp;' should be sufficient
> 
>  Technically - yes. But I feel it is a bad idea since it breaks "promise"
> that one could expect looking at signature of
> `remove_duplicate_group_members(mem_ctx, orig_group, new_group)`
> It is not documented anywhere, but I would rather prefer function to behave
> consistently and to always return copy of group in given mem context.
> 
> 
> > in the worst case (if _grp is somewhere freed explicitly) copying the memory should help.
> 
> I would go that way. There is no much overhead (especially taking in account
> how ineffective code around is anyway).

I agree, thank you for taking care of this issue.

bye,
Sumit

Comment 4 Alexey Tikhonov 2019-01-29 17:07:57 UTC

Upstream ticket: https://pagure.io/SSSD/sssd/issue/3931

Comment 8 Alexey Tikhonov 2019-02-14 12:51:48 UTC

Upstream PR: https://github.com/SSSD/sssd/pull/737

Comment 9 Jakub Hrozek 2019-03-28 21:46:06 UTC

Fixed as part of:
 * 8efa202
 * cd1538b
 * 29ac739
 * cc9f0f4
 * 0f62cc9
 * feb0832

Comment 12 Madhuri 2019-08-22 07:25:54 UTC

Verified with
[root@ci-vm-10-0-146-233 ~]# rpm -qa sssd
sssd-2.2.0-16.el8.x86_64

Verification steps:

1. Configure sssd with a proxy provider

2. Add enumerate = TRUE option in proxy domain section

[root@ci-vm-10-0-146-233 ~]# cat /etc/sssd/sssd.conf

[sssd]
config_file_version = 2
sbus_timeout = 30
services = pam, nss
domains = proxy, ldap2

[domain/proxy]
auth_provider = proxy
enumerate = True
id_provider = proxy
debug_level = 0xFFF0
proxy_lib_name = ldap
proxy_pam_target = sssdproxyldap
filter_users = puser10

[domain/ldap2]
id_provider = ldap
auth_provider = ldap
chpass_provider = ldap
ldap_id_use_start_tls = True
debug_level = 0xFFF0
enumerate = True
ldap_tls_cacert = /etc/openldap/cacerts/cacert.pem
ldap_uri = ldaps://server.example.com
ldap_search_base = dc=example1,dc=test

3. Stop sssd, Remove the cache and start it again

# systemctl stop sssd; rm -rf /var/lib/sss/db/*; rm -rf /var/log/sssd/*; systemctl start sssd

4. Fetch all groups using '# getent group' and calculate the time

[root@ci-vm-10-0-146-233 ~]# time getent group
pgroup12:*:2012:
pgroup3:*:2003:
pgroup5:*:2005:
pgroup15:*:2015:
duplicate:*:2019:
pgroup10:*:2010:
pgroup11:*:2011:
pgroup0:*:2000:
pgroup7:*:2007:
pgroup2:*:2002:
pgroup9:*:2009:
pgroup18:*:2018:
pgroup14:*:2014:
pgroup4:*:2004:
pgroup6:*:2006:
pgroup17:*:2017:
pgroup1:*:2001:
pgroup13:*:2013:
pgroup16:*:2016:
pgroup8:*:2008:
qgroup13:*:3013:
qgroup8:*:3008:
qgroup16:*:3016:
qgroup5:*:3005:
qgroup11:*:3011:
qgroup0:*:3000:
qgroup7:*:3007:
qgroup2:*:3002:
qgroup9:*:3009:
qgroup15:*:3015:
qgroup10:*:3010:
qgroup4:*:3004:
duplicate:*:3019:
qgroup6:*:3006:
qgroup18:*:3018:
qgroup14:*:3014:
.....
.....
.....

real	0m0.050s
user	0m0.002s
sys	0m0.002s

5. Repeat step 4. to calculate average time,

i. Time taken to fetch all groups, first iteration after deletion cache,
real	0m0.042s
user	0m0.002s
sys	0m0.001s

ii. Time taken to fetch all groups, second iteration after deletion cache,
real	0m0.043s
user	0m0.003s
sys	0m0.000s

iii. Time taken to fetch all groups, third iteration after deletion cache,
real	0m0.040s
user	0m0.000s
sys	0m0.003s

iv.  Time taken to fetch all groups, forth iteration after deletion cache,
real	0m0.044s
user	0m0.001s
sys	0m0.002s

v. Time taken to fetch all groups, fifth iteration after deletion cache,
real	0m0.045s
user	0m0.003s
sys	0m0.001s


To fetch all groups took less time,
From above observations, marking this bug as verified.

Comment 14 errata-xmlrpc 2019-11-05 22:34:01 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:3651