Bug 743133
Summary: | Performance regression with Kerberos authentication against AD | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Bojan Smojver <bojan> |
Component: | sssd | Assignee: | Stephen Gallagher <sgallagh> |
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | low | Docs Contact: | |
Priority: | unspecified | ||
Version: | 16 | CC: | jgibson, jhrozek, sbose, sgallagh, ssorce |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | sssd-1.8.1-7.fc16 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2012-03-17 23:43:38 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Bojan Smojver
2011-10-03 23:35:43 UTC
I think once initgroups is done (which should be since id(1) is very fast for you), there shouldn't be a huge delay. Can you provide logs? SSSD always performs an online initgroups() request prior to performing authentication (regardless of the state of the cache) in order to guarantee that access-control changes (such as being dropped from a group) are up-to-date. So regardless that an 'id' request at the local console comes from the cache, authentication is ALWAYS* done from the server. So if your server layout is extremely complex and has a lot of nestings, it might be very slow. We have some pretty extensive performance enhancements in the 1.6.x series. I'm going to be releasing 1.6.2 upstream soon and then I will ship it in Fedora 16 and Fedora 15. With the addition of the deref/asq searches, I think you'll see your performance increase substantially. *ALWAYS is not strictly true. We have a five second timeout to handle cases where the same user auth is being performed multiple times simultaneously. Thanks for the info. Good to know 1.6.x is coming - happy to test! I just tried 1.5.14 from updates-testing of F-15. Don't see much difference. With F-16 coming, I guess I'll be switching to 1.6.x, so we'll see if that makes things better. I built a scratch-build of 1.6.2 for Fedora 15, if you want to try that out before upgrading. Let me know how it works out for you. http://koji.fedoraproject.org/koji/taskinfo?taskID=3449727 (In reply to comment #5) > I built a scratch-build of 1.6.2 for Fedora 15, if you want to try that out > before upgrading. > > Let me know how it works out for you. > > http://koji.fedoraproject.org/koji/taskinfo?taskID=3449727 It takes a similar amount of time to authenticate with 1.6.2 as it does with 1.5.14. I think I'll need to collect those debug logs and send them over to you privately, so that you can have a look into what is being down by sssd_be. Bojan, would you mind trying something? Please set pam_id_timeout = 60 in the [pam] section of sssd.conf. Restart SSSD and then retry your login. From the logs you sent privately, it looks like what's happening is that (due to the size of your groups) it's taking more than five seconds to perform the initgroups request as part of the login. By increasing the pam_id_timeout, we are telling SSSD that it's okay to trust the group list of the user for sixty seconds (instead of the default 5s). I suspect you'll see at least a 33% speed increase here. (In reply to comment #7) > I suspect you'll see at least a 33% speed increase here. Before I forget, an unrelated comment. If I execute (as myself): id <myuserid> I only get my primary group. If I then execute just id, I get the lot. If I reverse the order here, both commands give full list of groups. Just FYI. Maybe some kind of fluke - no idea. Anyhow, on performance, with pam_id_timeout = 60, I had: -------------------------------- $ time ssh localhost : real 2m33.507s user 0m0.013s sys 0m0.011s -------------------------------- Without that, I had: -------------------------------- $ time ssh localhost : real 2m45.691s user 0m0.015s sys 0m0.008s -------------------------------- So, just a small difference, really. Obviously, this may depend on what AD machines are doing etc. as well. I also tried with pam_id_timeout = 300. This gave me: -------------------------------- $ time ssh localhost : real 2m53.094s user 0m0.014s sys 0m0.010s -------------------------------- So, I think this didn't have the effect we were hoping for. (In reply to comment #8) > (In reply to comment #7) > > > I suspect you'll see at least a 33% speed increase here. > > Before I forget, an unrelated comment. If I execute (as myself): > > id <myuserid> > > I only get my primary group. > > If I then execute just id, I get the lot. If I reverse the order here, both > commands give full list of groups. Just FYI. Maybe some kind of fluke - no > idea. > This shouldn't be the case, is it reproducable? I couldn't reproduce this myself. > Anyhow, on performance, with pam_id_timeout = 60, I had: > -------------------------------- > $ time ssh localhost : > real 2m33.507s > user 0m0.013s > sys 0m0.011s > -------------------------------- > > Without that, I had: > -------------------------------- > $ time ssh localhost : > real 2m45.691s > user 0m0.015s > sys 0m0.008s > -------------------------------- > > So, just a small difference, really. Obviously, this may depend on what AD > machines are doing etc. as well. > We debugged this issue further with Stephen yesterday. It seems that SSH always performs two transactions (as I learned today, it's also two different processes). As far as SSSD is concerned, that's two different clients and two online initgroups operations. I'm pretty sure that if you do "su - <userid>" instead of ssh, login will be about 30% faster because su acts as one client. Also, the following two new tickets might be of interest to you: https://fedorahosted.org/sssd/ticket/1062 https://fedorahosted.org/sssd/ticket/1063 (In reply to comment #9) > This shouldn't be the case, is it reproducable? I couldn't reproduce this > myself. I think not always (I saw it a couple of times). May have been some kind of fluke. > We debugged this issue further with Stephen yesterday. It seems that SSH always > performs two transactions (as I learned today, it's also two different > processes). > > As far as SSSD is concerned, that's two different clients and two online > initgroups operations. > > I'm pretty sure that if you do "su - <userid>" instead of ssh, login will be > about 30% faster because su acts as one client. OK, thanks for that info. (In reply to comment #10) > Also, the following two new tickets might be of interest to you: > https://fedorahosted.org/sssd/ticket/1062 > https://fedorahosted.org/sssd/ticket/1063 Excellent, thank you. Bojan, would you please also give SSSD 1.6.3 a spin? We discovered a pretty serious performance glitch in the cache code that could very easily have an impact on you with your large deployment. Please let me know if you see any benefit there. (In reply to comment #13) > Bojan, would you please also give SSSD 1.6.3 a spin? We discovered a pretty > serious performance glitch in the cache code that could very easily have an > impact on you with your large deployment. Please let me know if you see any > benefit there. Sure. I see a build for F-16 as well (I upgraded all my machines), so I'll try that. Thanks for keeping me up to date. That "time ssh localhost :" command now takes 35 secs (compared to over 2 mins). Login to my Gnome session over xrdp takes about 20 seconds, which also seems faster, although I never actually measured it before. So, an improvement for sure. :-) PS. All this on F-16, of course. sssd-1.7.0-1.fc16 has been submitted as an update for Fedora 16. https://admin.fedoraproject.org/updates/sssd-1.7.0-1.fc16 Upstream ticket: https://fedorahosted.org/sssd/ticket/1063 Package sssd-1.7.0-1.fc16: * should fix your issue, * was pushed to the Fedora 16 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing sssd-1.7.0-1.fc16' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-0237/sssd-1.7.0-1.fc16 then log in and leave karma (feedback). Package sssd-1.7.0-1.fc16: * should fix your issue, * was pushed to the Fedora 16 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing sssd-1.7.0-1.fc16' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-0237/sssd-1.7.0-1.fc16 then log in and leave karma (feedback). Package sssd-1.7.0-5.fc16: * should fix your issue, * was pushed to the Fedora 16 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing sssd-1.7.0-5.fc16' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-0237/sssd-1.7.0-5.fc16 then log in and leave karma (feedback). sssd-1.8.0-5.fc17.beta3 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/sssd-1.8.0-5.fc17.beta3 Package sssd-1.8.0-5.fc17.beta3: * should fix your issue, * was pushed to the Fedora 17 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing sssd-1.8.0-5.fc17.beta3' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-2196/sssd-1.8.0-5.fc17.beta3 then log in and leave karma (feedback). Package sssd-1.8.0-5.fc17.beta3.1: * should fix your issue, * was pushed to the Fedora 17 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing sssd-1.8.0-5.fc17.beta3.1' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-2196/sssd-1.8.0-5.fc17.beta3.1 then log in and leave karma (feedback). sssd-1.8.0-5.fc17.beta3.1 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/FEDORA-2012-2196/sssd-1.8.0-5.fc17.beta3.1 sssd-1.8.0-6.fc16 has been submitted as an update for Fedora 16. https://admin.fedoraproject.org/updates/sssd-1.8.0-6.fc16 Package sssd-1.8.0-6.fc16: * should fix your issue, * was pushed to the Fedora 16 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing sssd-1.8.0-6.fc16' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-2725/sssd-1.8.0-6.fc16 then log in and leave karma (feedback). sssd-1.8.0-6.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report. With cleaned database in /var/lib/sss/db, on first request I get (that's on F-16): ---------------------------- Program received signal SIGSEGV, Segmentation fault. sysdb_attrs_get_el_int (attrs=0x64616f4c, name=0xe5d8d0 "name", alloc=true, el=0xbfcb54b0) at src/db/sysdb.c:309 309 for (i = 0; i < attrs->num; i++) { (gdb) t a a bt Thread 1 (Thread 0xb76f86c0 (LWP 14167)): #0 sysdb_attrs_get_el_int (attrs=0x64616f4c, name=0xe5d8d0 "name", alloc=true, el=0xbfcb54b0) at src/db/sysdb.c:309 #1 0x00e232a4 in sysdb_attrs_primary_name (sysdb=0x220146e8, attrs=0x64616f4c, ldap_attr=0x22021ea8 "cn", _primary=0xbfcb5544) at src/db/sysdb.c:1579 #2 0x00e24322 in sysdb_attrs_primary_name_list (sysdb=0x220146e8, mem_ctx=0x2208d910, attr_list=0x22049798, attr_count=3, ldap_attr=0x22021ea8 "cn", name_list=0xbfcb55d0) at src/db/sysdb.c:1771 #3 0x08e912e6 in rfc2307bis_group_memberships_build (item=0x22098c40, user_data=0x2222f810) at src/providers/ldap/sdap_async_initgroups.c:1851 #4 0x00a24180 in hash_iterate (table=0x22048c08, callback=0x8e91040 <rfc2307bis_group_memberships_build>, user_data=0x2222f810) at dhash/dhash.c:697 #5 0x08e9b17c in save_rfc2307bis_group_memberships (state=0x22049f58) at src/providers/ldap/sdap_async_initgroups.c:1773 #6 sdap_initgr_rfc2307bis_done (subreq=0x0) at src/providers/ldap/sdap_async_initgroups.c:1647 #7 0x00be00bb in _tevent_req_notify_callback (req=0x22049dc8, location=0x8f33a6c "src/providers/ldap/sdap_async_initgroups.c:2353") at ../tevent_req.c:101 #8 0x00be02ab in tevent_req_finish ( location=0x8f33a6c "src/providers/ldap/sdap_async_initgroups.c:2353", state=TEVENT_REQ_DONE, req=0x22049dc8) at ../tevent_req.c:110 #9 _tevent_req_done (req=0x22049dc8, location=0x8f33a6c "src/providers/ldap/sdap_async_initgroups.c:2353") at ../tevent_req.c:116 #10 0x08e922b2 in rfc2307bis_nested_groups_done (subreq=0x0) at src/providers/ldap/sdap_async_initgroups.c:2353 #11 0x00be00bb in _tevent_req_notify_callback (req=0x221ffd28, location=0x8f352bc "src/providers/ldap/sdap_async_initgroups.c:2042") at ../tevent_req.c:101 #12 0x00be0185 in tevent_req_finish (location=<optimized out>, state=<optimized out>, req=<optimized out>) at ../tevent_req.c:110 #13 tevent_req_trigger (ev=0x2200cb40, im=0x221ffd98, private_data=0x221ffd28) at ../tevent_req.c:166 #14 0x00bdf856 in tevent_common_loop_immediate (ev=0x2200cb40) at ../tevent_immediate.c:135 #15 0x00be2416 in std_event_loop_once (ev=0x2200cb40, location=0xe63c03 "src/util/server.c:572") at ../tevent_standard.c:554 #16 0x00bdefc8 in _tevent_loop_once (ev=0x2200cb40, location=0xe63c03 "src/util/server.c:572") at ../tevent.c:504 #17 0x00bdf19c in tevent_common_loop_wait (ev=0x2200cb40, location=0xe63c03 "src/util/server.c:572") at ../tevent.c:605 #18 0x00bdf238 in _tevent_loop_wait (ev=0x2200cb40, location=0xe63c03 "src/util/server.c:572") at ../tevent.c:624 #19 0x00e3b22f in server_loop (main_ctx=0x2200d6c8) at src/util/server.c:572 #20 0x00df9123 in main (argc=4, argv=0xbfcb5b34) at src/providers/data_provider_be.c:2001 ---------------------------- (In reply to comment #28) > With cleaned database in /var/lib/sss/db, on first request I get (that's on > F-16): > ---------------------------- > Program received signal SIGSEGV, Segmentation fault. Thank for for reporting this. Can you include more details, however? A core file would be very helpful, so would be log files and more details about the user and the groups he is a member of (is he a member of any groups? Are there nestings?) Also, I'd like to track this crash as a separate issue. The original bug was tracking a performance problem, not a crash. *** Bug 799141 has been marked as a duplicate of this bug. *** (In reply to comment #29) > (In reply to comment #28) > > With cleaned database in /var/lib/sss/db, on first request I get (that's on > > F-16): > > ---------------------------- > > Program received signal SIGSEGV, Segmentation fault. > > Thank for for reporting this. Can you include more details, however? A core > file would be very helpful, so would be log files and more details about the > user and the groups he is a member of (is he a member of any groups? Are there > nestings?) > > Also, I'd like to track this crash as a separate issue. The original bug was > tracking a performance problem, not a crash. Stephen has my configs, but if you need more data I can send this privately. There is just too much stuff to sanitize, so I am not going to risk posting here. Note that I am the original reporter of the bug and many of the packages were in response to slowness I experienced, so just wanted to note here that with this particular version, like with some others before it, I cannot even complete the testing. Package sssd-1.8.0-6.fc16: * should fix your issue, * was pushed to the Fedora 16 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing sssd-1.8.0-6.fc16' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-2725/sssd-1.8.0-6.fc16 then log in and leave karma (feedback). Sorry, please disregard the above. This bug was not supposed to be associated with that bodhi update. It was necessary to re-push that version of the package to updates-testing so that testing could also occur against FreeIPA (which depends on SSSD 1.8.0+) I have disabled autokarma, so that package won't go to stable until we've resolved this regression. I was able to reproduce the crash with help of extra debugging information Bojan sent me in a private mail. To reproduce, configure a multi-level nesting (at least 3 levels) with several groups on each level. One group on each level is not enough, even if there are loops in the nesting structure. user --- A -- BA --- BAA \ / \ \ / \ --- B - -- BAB \ \ -- BB --- BBA \ \ -- BBB sssd-1.8.1-7.fc16 has been submitted as an update for Fedora 16. https://admin.fedoraproject.org/updates/FEDORA-2012-2725/sssd-1.8.1-7.fc16 Package sssd-1.8.1-7.fc16: * should fix your issue, * was pushed to the Fedora 16 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing sssd-1.8.1-7.fc16' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-2725/sssd-1.8.1-7.fc16 then log in and leave karma (feedback). I see a marginal speed improvement with the latest build. Authentication used to take about 25 secs, now it takes around 20 secs. Note, this is against a massive AD structure. No crashes any more. PS. Note that Windows clients do not take nearly as long to log in. So, still some work to be done, I guess. Performance improvements are always an ongoing process. One thing you might try is setting 'ldap_referrals = false'. Active Directory performs a lot of unnecessary* referrals (three per group nesting level) and disabling this feature can speed things up dramatically. * They are not always unnecessary. If your network environment takes advantage of partial replication, disabling referrals might lead to being unable to log in with users who are part of the global system but not replicated locally. If you are NOT using partial replication, turning off the referrals is MUCH faster. We're working on ways to improve our referral processing, but that's a very complicated topic and well out of scope for this bug. Thanks for the tip. Will try. sssd-1.8.1-7.fc16 has been pushed to the Fedora 16 stable repository. If problems still persist, please make note of it in this bug report. (In reply to comment #38) > One thing you might try is setting 'ldap_referrals = false'. Bingo! 1.6 secs with cache blown away, 0.16 secs with cache. Only about two orders of magnitude faster. :-) Thank you. Much appreciated! |