Bug 731666

Summary: Regression in sssd 1.5.12 when working with AD as LDAP provider
Product: [Fedora] Fedora Reporter: Bojan Smojver <bojan>
Component: sssdAssignee: Stephen Gallagher <sgallagh>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 15CC: jhrozek, sbose, sgallagh, ssorce
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: sssd-1.5.13-1.fc15.2 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-09-09 01:30:36 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:

Description Bojan Smojver 2011-08-18 05:31:24 EDT
Description of problem:
I have AD configured as my ldap provider. With 1.5.7, if I issue:

$ id bsmojver

I get:

uid=XXXXX(bsmojver) gid=YYYY(yyy) groups=YYYY(yyy),10(wheel),ZZZZ(zzz),<more group ids and names here>

If I then upgrade to 1.5.12 and issue as root:

# service sssd stop; rm -f /var/lib/sss/db/*; service sssd start

And then, again as myself:

$ id bsmojver

I get:

uid=XXXXX(bsmojver) gid=YYYY groups=YYYY,10(wheel),ZZZZ,<more group ids here>

Version-Release number of selected component (if applicable):
sssd-1.5.12-1.fc15.i686

How reproducible:
Always.

Steps to Reproduce:
1. Install above mentioned version of sssd.
2. Blow away the local DB, start sssd.
3. id <username> for someone in AD.
  
Actual results:
No group names, only group ids.

Expected results:
Should show group names as well, just like 1.5.7. Old version also appears a lot faster doing the lookup.

Additional info:
The problem may be present in 1.5.11 as well, but caching may have hidden it from me. Not too sure.

Relevant part of the config:
------------------
id_provider = ldap
ldap_uri = <bunch of LDAP URLs>
ldap_schema = rfc2307bis
ldap_default_bind_dn = <fancy account>
ldap_default_authtok_type = password
ldap_default_authtok = <something secret>
ldap_user_object_class = user
ldap_user_name = msSFU30Name
ldap_user_gecos = displayName
ldap_user_uid_number = msSFU30UidNumber
ldap_user_gid_number = msSFU30GidNumber
ldap_user_home_directory = msSFU30HomeDirectory
ldap_user_shell = msSFU30LoginShell
ldap_user_principal = userPrincipalName
ldap_group_object_class = group
ldap_group_member = member
ldap_group_gid_number = msSFU30GidNumber
ldap_force_upper_case_realm = True
ldap_search_base = dc=some,dc=domain,dc=org
------------------

Authentication is via krb5 against AD, but it should not matter.
Comment 1 Jakub Hrozek 2011-08-18 12:18:16 EDT
This seems like an issue we fixed upstream the other day -- when non-standard attribute names were used, entries were still saved those non-standard attribute names into our cache, instead of our cache-specific attribute names.

I have built a scratch build for you that includes the fix. Would you mind testing it out? Thank you!

http://koji.fedoraproject.org/koji/taskinfo?taskID=3283243
Comment 2 Bojan Smojver 2011-08-18 19:06:37 EDT
Yes, those packages work. Thanks for the quick response!

PS. It seems slower than 1.5.7 though, but that's a completely different issue, of course, and not the one we're concerned with here.
Comment 3 Jakub Hrozek 2011-08-19 02:47:22 EDT
(In reply to comment #2)
> PS. It seems slower than 1.5.7 though, but that's a completely different issue,
> of course, and not the one we're concerned with here.

How much slower? We implemented some performance improvements in 1.6.0 and more are planned for 1.6.1, but I think it would be nice to get a ballpark figure.

Can you do the following comparison for me? With clear caches:
1) getent passwd there-is-no-such-user # to connect to LDAP
2) time id bsmojver

Ideally at least two times to rule out any anomalies.
Comment 4 Bojan Smojver 2011-08-19 03:22:05 EDT
Two runs with the scratch build (each time sssd was stopped, database files removed, sssd started):
---------------
real	0m30.559s
user	0m0.000s
sys	0m0.023s

real	0m28.191s
user	0m0.001s
sys	0m0.019s
---------------

Two runs with 1.5.7 (same deal, stop, clean, start):
---------------
real	0m3.746s
user	0m0.000s
sys	0m0.026s

real	0m3.876s
user	0m0.000s
sys	0m0.028s
---------------
Comment 5 Jakub Hrozek 2011-08-19 03:40:38 EDT
OK, that seems like a serious slowdown. 

Can you open a separate bugzilla so we don't mix the two issues? It would be very helpful if you could attach log file from the domain (/var/log/sssd/sssd_$yourdomain.log)
Comment 6 Bojan Smojver 2011-08-19 03:46:07 EDT
Sure. I'll kick the debug level up to 9, sanitize the files and attach to a new bug. Just busy with some work stuff right now and over the weekend, so it may be next week.
Comment 7 Stephen Gallagher 2011-08-19 10:43:44 EDT
This response belongs in the new bug, but since I haven't seen one created yet, I'm going to reply here for now.

There were two changes between 1.5.7 and 1.5.12 that could have contributed to this.

1) We added support for the paging control. If on 1.5.7 it was possible that you were trying to retrieve more than 1000 entries at the same time, we were only processing the first thousand results. So if this user is a member of some groups with very large memberships, this could have resulted in a slow-down (since we're now actually processing ALL of them, possibly with multiple round-trips to LDAP for any group with > 1000 members).

2) We also added support for storing non-POSIX groups in the local cache (for handling complex nestings) so this can have had a few possible effects.
 a. There are now more groups being saved to the cache, which takes more time
 b. Because we can now handle POSIX groups that are in the middle of non-POSIX nesting chains, it may be that you will be seeing more groups in the list now.
 c. Given the severity of the performance hit, I also suspect that we may not have been careful enough with our transaction logic, so we may be doing more filesystem writes to the cache than we need to.

These all need to be investigated, and the requested logs will be very helpful in tracking this down.
Comment 8 Bojan Smojver 2011-08-21 00:28:18 EDT
(In reply to comment #7)

> I'm going to reply here for now.

Thanks.

> 1) We added support for the paging control.

The Active Directory I'm doing this against is absolutely massive.

> 2) We also added support for storing non-POSIX groups in the local cache (for
> handling complex nestings) so this can have had a few possible effects.

Yeah, as above, huge and complex tree.

Do you guys mind if I send you the logs privately once I collect them? There will huge amount of data to sanitize, so this way it's easier (and you get the real picture). If you can point me to your GPG public keys, I can have the logs encrypted and sent over.
Comment 9 Fedora Update System 2011-08-29 17:29:25 EDT
sssd-1.5.13-1.fc15.1,evolution-mapi-3.0.2-2.fc15.1,certmonger-0.45-1.fc15.1,openchange-0.9-18.fc15.1,samba4-4.0.0-25.alpha11.fc15.4,libldb-1.0.0-3.fc15,libtevent-0.9.13-1.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/sssd-1.5.13-1.fc15.1,evolution-mapi-3.0.2-2.fc15.1,certmonger-0.45-1.fc15.1,openchange-0.9-18.fc15.1,samba4-4.0.0-25.alpha11.fc15.4,libldb-1.0.0-3.fc15,libtevent-0.9.13-1.fc15
Comment 10 Fedora Update System 2011-08-30 21:40:04 EDT
Package sssd-1.5.13-1.fc15.1, evolution-mapi-3.0.2-2.fc15.1, certmonger-0.45-1.fc15.1, openchange-0.9-18.fc15.1, libldb-1.0.0-3.fc15, libtevent-0.9.13-1.fc15, samba4-4.0.0-25.alpha11.fc15.5:
* should fix your issue,
* was pushed to the Fedora 15 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing sssd-1.5.13-1.fc15.1 evolution-mapi-3.0.2-2.fc15.1 certmonger-0.45-1.fc15.1 openchange-0.9-18.fc15.1 libldb-1.0.0-3.fc15 libtevent-0.9.13-1.fc15 samba4-4.0.0-25.alpha11.fc15.5'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/sssd-1.5.13-1.fc15.1,evolution-mapi-3.0.2-2.fc15.1,certmonger-0.45-1.fc15.1,openchange-0.9-18.fc15.1,samba4-4.0.0-25.alpha11.fc15.5,libldb-1.0.0-3.fc15,libtevent-0.9.13-1.fc15
then log in and leave karma (feedback).
Comment 11 Fedora Update System 2011-09-06 20:27:14 EDT
Package sssd-1.5.13-1.fc15.2, openchange-0.9-18.fc15.2, evolution-mapi-3.0.2-2.fc15.1, certmonger-0.45-1.fc15.1, libldb-1.0.0-3.fc15, libtevent-0.9.13-1.fc15, samba4-4.0.0-25.alpha11.fc15.5:
* should fix your issue,
* was pushed to the Fedora 15 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing sssd-1.5.13-1.fc15.2 openchange-0.9-18.fc15.2 evolution-mapi-3.0.2-2.fc15.1 certmonger-0.45-1.fc15.1 libldb-1.0.0-3.fc15 libtevent-0.9.13-1.fc15 samba4-4.0.0-25.alpha11.fc15.5'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/sssd-1.5.13-1.fc15.2,evolution-mapi-3.0.2-2.fc15.1,certmonger-0.45-1.fc15.1,openchange-0.9-18.fc15.2,samba4-4.0.0-25.alpha11.fc15.5,libldb-1.0.0-3.fc15,libtevent-0.9.13-1.fc15
then log in and leave karma (feedback).
Comment 12 Fedora Update System 2011-09-09 01:30:15 EDT
sssd-1.5.13-1.fc15.2, openchange-0.9-18.fc15.2, certmonger-0.45-1.fc15.1, libldb-1.0.0-3.fc15, libtevent-0.9.13-1.fc15, samba4-4.0.0-25.alpha11.fc15.5, evolution-mapi-3.0.3-2.fc15, evolution-exchange-3.0.3-1.fc15, evolution-3.0.3-1.fc15, evolution-data-server-3.0.3-1.fc15, gtkhtml3-4.0.2-1.fc15 has been pushed to the Fedora 15 stable repository.  If problems still persist, please make note of it in this bug report.