Bug 743133

Summary: Performance regression with Kerberos authentication against AD
Product: [Fedora] Fedora Reporter: Bojan Smojver <bojan>
Component: sssdAssignee: Stephen Gallagher <sgallagh>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: unspecified    
Version: 16CC: jgibson, jhrozek, sbose, sgallagh, ssorce
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: sssd-1.8.1-7.fc16 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-03-17 23:43:38 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Bojan Smojver 2011-10-03 23:35:43 UTC
Description of problem:
I have Kerberos authentication configured against AD. In bug #731666 we discovered that LDAP provider takes a significant amount of time to return info for myself (probably because AD I'm doing this against is massive).

However, this should not be the problem here, because info about me is already cached (I'm already logged on as myself and id <me> returns quickly). So, I would expect the authentication to proceed rather quickly (because it's just Kerberos in play here). However, not so. It takes better part of a minute to get through, with sssd_be piling up on system calls.

Version-Release number of selected component (if applicable):
sssd-1.5.13-1.fc15.2.i686

How reproducible:
Always.

Steps to Reproduce:
1. ssh localhost from a Gnome session where I'm already logged in.
  
Actual results:
Takes a long time to get in.

Expected results:
Should be quick.

Additional info:
Authentication config is simple:
-----------------
auth_provider = krb5
chpass_provider = krb5
krb5_realm = SOME.DOMAIN.ORG
krb5_server = some.domain.org
krb5_kpasswd = some.domain.org
krb5_store_password_if_offline = true
cache_credentials = True
-----------------

The rest of the config is like in bug #731666.

Comment 1 Jakub Hrozek 2011-10-04 11:48:07 UTC
I think once initgroups is done (which should be since id(1) is very fast for you), there shouldn't be a huge delay.

Can you provide logs?

Comment 2 Stephen Gallagher 2011-10-04 12:38:20 UTC
SSSD always performs an online initgroups() request prior to performing authentication (regardless of the state of the cache) in order to guarantee that access-control changes (such as being dropped from a group) are up-to-date.

So regardless that an 'id' request at the local console comes from the cache, authentication is ALWAYS* done from the server. So if your server layout is extremely complex and has a lot of nestings, it might be very slow.

We have some pretty extensive performance enhancements in the 1.6.x series. I'm going to be releasing 1.6.2 upstream soon and then I will ship it in Fedora 16 and Fedora 15. With the addition of the deref/asq searches, I think you'll see your performance increase substantially.


*ALWAYS is not strictly true. We have a five second timeout to handle cases where the same user auth is being performed multiple times simultaneously.

Comment 3 Bojan Smojver 2011-10-04 13:03:25 UTC
Thanks for the info. Good to know 1.6.x is coming - happy to test!

Comment 4 Bojan Smojver 2011-10-20 22:21:19 UTC
I just tried 1.5.14 from updates-testing of F-15. Don't see much difference.

With F-16 coming, I guess I'll be switching to 1.6.x, so we'll see if that makes things better.

Comment 5 Stephen Gallagher 2011-10-21 15:00:19 UTC
I built a scratch-build of 1.6.2 for Fedora 15, if you want to try that out before upgrading.

Let me know how it works out for you.

http://koji.fedoraproject.org/koji/taskinfo?taskID=3449727

Comment 6 Bojan Smojver 2011-10-23 21:53:54 UTC
(In reply to comment #5)
> I built a scratch-build of 1.6.2 for Fedora 15, if you want to try that out
> before upgrading.
> 
> Let me know how it works out for you.
> 
> http://koji.fedoraproject.org/koji/taskinfo?taskID=3449727

It takes a similar amount of time to authenticate with 1.6.2 as it does with 1.5.14.

I think I'll need to collect those debug logs and send them over to you privately, so that you can have a look into what is being down by sssd_be.

Comment 7 Stephen Gallagher 2011-10-26 13:40:11 UTC
Bojan, would you mind trying something?

Please set

pam_id_timeout = 60

in the [pam] section of sssd.conf.

Restart SSSD and then retry your login.


From the logs you sent privately, it looks like what's happening is that (due to the size of your groups) it's taking more than five seconds to perform the initgroups request as part of the login. By increasing the pam_id_timeout, we are telling SSSD that it's okay to trust the group list of the user for sixty seconds (instead of the default 5s).

I suspect you'll see at least a 33% speed increase here.

Comment 8 Bojan Smojver 2011-10-26 22:46:11 UTC
(In reply to comment #7)
 
> I suspect you'll see at least a 33% speed increase here.

Before I forget, an unrelated comment. If I execute (as myself):

id <myuserid>

I only get my primary group.

If I then execute just id, I get the lot. If I reverse the order here, both commands give full list of groups. Just FYI. Maybe some kind of fluke - no idea.

Anyhow, on performance, with pam_id_timeout = 60, I had:
--------------------------------
$ time ssh localhost :
real	2m33.507s
user	0m0.013s
sys	0m0.011s
--------------------------------

Without that, I had:
--------------------------------
$ time ssh localhost :
real	2m45.691s
user	0m0.015s
sys	0m0.008s
--------------------------------

So, just a small difference, really. Obviously, this may depend on what AD machines are doing etc. as well.

I also tried with pam_id_timeout = 300. This gave me:
--------------------------------
$ time ssh localhost :
real	2m53.094s
user	0m0.014s
sys	0m0.010s
--------------------------------

So, I think this didn't have the effect we were hoping for.

Comment 9 Jakub Hrozek 2011-10-27 07:30:50 UTC
(In reply to comment #8)
> (In reply to comment #7)
> 
> > I suspect you'll see at least a 33% speed increase here.
> 
> Before I forget, an unrelated comment. If I execute (as myself):
> 
> id <myuserid>
> 
> I only get my primary group.
> 
> If I then execute just id, I get the lot. If I reverse the order here, both
> commands give full list of groups. Just FYI. Maybe some kind of fluke - no
> idea.
> 

This shouldn't be the case, is it reproducable? I couldn't reproduce this myself.

> Anyhow, on performance, with pam_id_timeout = 60, I had:
> --------------------------------
> $ time ssh localhost :
> real 2m33.507s
> user 0m0.013s
> sys 0m0.011s
> --------------------------------
> 
> Without that, I had:
> --------------------------------
> $ time ssh localhost :
> real 2m45.691s
> user 0m0.015s
> sys 0m0.008s
> --------------------------------
> 
> So, just a small difference, really. Obviously, this may depend on what AD
> machines are doing etc. as well.
> 

We debugged this issue further with Stephen yesterday. It seems that SSH always performs two transactions (as I learned today, it's also two different processes). 

As far as SSSD is concerned, that's two different clients and two online initgroups operations.

I'm pretty sure that if you do "su - <userid>" instead of ssh, login will be about 30% faster because su acts as one client.

Comment 10 Jakub Hrozek 2011-10-27 11:19:50 UTC
Also, the following two new tickets might be of interest to you:
https://fedorahosted.org/sssd/ticket/1062
https://fedorahosted.org/sssd/ticket/1063

Comment 11 Bojan Smojver 2011-10-27 23:18:39 UTC
(In reply to comment #9)
 
> This shouldn't be the case, is it reproducable? I couldn't reproduce this
> myself.

I think not always (I saw it a couple of times). May have been some kind of fluke.

> We debugged this issue further with Stephen yesterday. It seems that SSH always
> performs two transactions (as I learned today, it's also two different
> processes). 
> 
> As far as SSSD is concerned, that's two different clients and two online
> initgroups operations.
> 
> I'm pretty sure that if you do "su - <userid>" instead of ssh, login will be
> about 30% faster because su acts as one client.

OK, thanks for that info.

Comment 12 Bojan Smojver 2011-10-27 23:18:59 UTC
(In reply to comment #10)
> Also, the following two new tickets might be of interest to you:
> https://fedorahosted.org/sssd/ticket/1062
> https://fedorahosted.org/sssd/ticket/1063

Excellent, thank you.

Comment 13 Stephen Gallagher 2011-11-04 18:01:58 UTC
Bojan, would you please also give SSSD 1.6.3 a spin? We discovered a pretty serious performance glitch in the cache code that could very easily have an impact on you with your large deployment. Please let me know if you see any benefit there.

Comment 14 Bojan Smojver 2011-11-05 00:03:50 UTC
(In reply to comment #13)
> Bojan, would you please also give SSSD 1.6.3 a spin? We discovered a pretty
> serious performance glitch in the cache code that could very easily have an
> impact on you with your large deployment. Please let me know if you see any
> benefit there.

Sure. I see a build for F-16 as well (I upgraded all my machines), so I'll try that. Thanks for keeping me up to date.

Comment 15 Bojan Smojver 2011-11-05 07:01:27 UTC
That "time ssh localhost :" command now takes 35 secs (compared to over 2 mins). Login to my Gnome session over xrdp takes about 20 seconds, which also seems faster, although I never actually measured it before.

So, an improvement for sure. :-)

PS. All this on F-16, of course.

Comment 16 Fedora Update System 2012-01-05 18:12:55 UTC
sssd-1.7.0-1.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/sssd-1.7.0-1.fc16

Comment 17 Stephen Gallagher 2012-01-05 18:17:15 UTC
Upstream ticket:
https://fedorahosted.org/sssd/ticket/1063

Comment 18 Fedora Update System 2012-01-07 23:11:10 UTC
Package sssd-1.7.0-1.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing sssd-1.7.0-1.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-0237/sssd-1.7.0-1.fc16
then log in and leave karma (feedback).

Comment 19 Fedora Update System 2012-01-30 21:03:53 UTC
Package sssd-1.7.0-1.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing sssd-1.7.0-1.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-0237/sssd-1.7.0-1.fc16
then log in and leave karma (feedback).

Comment 20 Fedora Update System 2012-02-05 21:51:08 UTC
Package sssd-1.7.0-5.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing sssd-1.7.0-5.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-0237/sssd-1.7.0-5.fc16
then log in and leave karma (feedback).

Comment 21 Fedora Update System 2012-02-22 14:36:30 UTC
sssd-1.8.0-5.fc17.beta3 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/sssd-1.8.0-5.fc17.beta3

Comment 22 Fedora Update System 2012-02-22 17:45:29 UTC
Package sssd-1.8.0-5.fc17.beta3:
* should fix your issue,
* was pushed to the Fedora 17 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing sssd-1.8.0-5.fc17.beta3'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-2196/sssd-1.8.0-5.fc17.beta3
then log in and leave karma (feedback).

Comment 23 Fedora Update System 2012-02-23 22:28:34 UTC
Package sssd-1.8.0-5.fc17.beta3.1:
* should fix your issue,
* was pushed to the Fedora 17 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing sssd-1.8.0-5.fc17.beta3.1'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-2196/sssd-1.8.0-5.fc17.beta3.1
then log in and leave karma (feedback).

Comment 24 Fedora Update System 2012-02-24 00:34:17 UTC
sssd-1.8.0-5.fc17.beta3.1 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/FEDORA-2012-2196/sssd-1.8.0-5.fc17.beta3.1

Comment 25 Fedora Update System 2012-02-28 21:09:47 UTC
sssd-1.8.0-6.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/sssd-1.8.0-6.fc16

Comment 26 Fedora Update System 2012-03-01 09:22:18 UTC
Package sssd-1.8.0-6.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing sssd-1.8.0-6.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-2725/sssd-1.8.0-6.fc16
then log in and leave karma (feedback).

Comment 27 Fedora Update System 2012-03-02 01:16:59 UTC
sssd-1.8.0-6.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 28 Bojan Smojver 2012-03-02 04:58:39 UTC
With cleaned database in /var/lib/sss/db, on first request I get (that's on F-16):
----------------------------
Program received signal SIGSEGV, Segmentation fault.
sysdb_attrs_get_el_int (attrs=0x64616f4c, name=0xe5d8d0 "name", alloc=true, 
    el=0xbfcb54b0) at src/db/sysdb.c:309
309	    for (i = 0; i < attrs->num; i++) {
(gdb) t a a bt

Thread 1 (Thread 0xb76f86c0 (LWP 14167)):
#0  sysdb_attrs_get_el_int (attrs=0x64616f4c, name=0xe5d8d0 "name", 
    alloc=true, el=0xbfcb54b0) at src/db/sysdb.c:309
#1  0x00e232a4 in sysdb_attrs_primary_name (sysdb=0x220146e8, 
    attrs=0x64616f4c, ldap_attr=0x22021ea8 "cn", _primary=0xbfcb5544)
    at src/db/sysdb.c:1579
#2  0x00e24322 in sysdb_attrs_primary_name_list (sysdb=0x220146e8, 
    mem_ctx=0x2208d910, attr_list=0x22049798, attr_count=3, 
    ldap_attr=0x22021ea8 "cn", name_list=0xbfcb55d0) at src/db/sysdb.c:1771
#3  0x08e912e6 in rfc2307bis_group_memberships_build (item=0x22098c40, 
    user_data=0x2222f810) at src/providers/ldap/sdap_async_initgroups.c:1851
#4  0x00a24180 in hash_iterate (table=0x22048c08, 
    callback=0x8e91040 <rfc2307bis_group_memberships_build>, 
    user_data=0x2222f810) at dhash/dhash.c:697
#5  0x08e9b17c in save_rfc2307bis_group_memberships (state=0x22049f58)
    at src/providers/ldap/sdap_async_initgroups.c:1773
#6  sdap_initgr_rfc2307bis_done (subreq=0x0)
    at src/providers/ldap/sdap_async_initgroups.c:1647
#7  0x00be00bb in _tevent_req_notify_callback (req=0x22049dc8, 
    location=0x8f33a6c "src/providers/ldap/sdap_async_initgroups.c:2353")
    at ../tevent_req.c:101
#8  0x00be02ab in tevent_req_finish (
    location=0x8f33a6c "src/providers/ldap/sdap_async_initgroups.c:2353", 
    state=TEVENT_REQ_DONE, req=0x22049dc8) at ../tevent_req.c:110
#9  _tevent_req_done (req=0x22049dc8, 
    location=0x8f33a6c "src/providers/ldap/sdap_async_initgroups.c:2353")
    at ../tevent_req.c:116
#10 0x08e922b2 in rfc2307bis_nested_groups_done (subreq=0x0)
    at src/providers/ldap/sdap_async_initgroups.c:2353
#11 0x00be00bb in _tevent_req_notify_callback (req=0x221ffd28, 
    location=0x8f352bc "src/providers/ldap/sdap_async_initgroups.c:2042")
    at ../tevent_req.c:101
#12 0x00be0185 in tevent_req_finish (location=<optimized out>, 
    state=<optimized out>, req=<optimized out>) at ../tevent_req.c:110
#13 tevent_req_trigger (ev=0x2200cb40, im=0x221ffd98, private_data=0x221ffd28)
    at ../tevent_req.c:166
#14 0x00bdf856 in tevent_common_loop_immediate (ev=0x2200cb40)
    at ../tevent_immediate.c:135
#15 0x00be2416 in std_event_loop_once (ev=0x2200cb40, 
    location=0xe63c03 "src/util/server.c:572") at ../tevent_standard.c:554
#16 0x00bdefc8 in _tevent_loop_once (ev=0x2200cb40, 
    location=0xe63c03 "src/util/server.c:572") at ../tevent.c:504
#17 0x00bdf19c in tevent_common_loop_wait (ev=0x2200cb40, 
    location=0xe63c03 "src/util/server.c:572") at ../tevent.c:605
#18 0x00bdf238 in _tevent_loop_wait (ev=0x2200cb40, 
    location=0xe63c03 "src/util/server.c:572") at ../tevent.c:624
#19 0x00e3b22f in server_loop (main_ctx=0x2200d6c8) at src/util/server.c:572
#20 0x00df9123 in main (argc=4, argv=0xbfcb5b34)
    at src/providers/data_provider_be.c:2001
----------------------------

Comment 29 Jakub Hrozek 2012-03-02 09:59:47 UTC
(In reply to comment #28)
> With cleaned database in /var/lib/sss/db, on first request I get (that's on
> F-16):
> ----------------------------
> Program received signal SIGSEGV, Segmentation fault.

Thank for for reporting this. Can you include more details, however? A core file would be very helpful, so would be log files and more details about the user and the groups he is a member of (is he a member of any groups? Are there nestings?)

Also, I'd like to track this crash as a separate issue. The original bug was tracking a performance problem, not a crash.

Comment 30 Jakub Hrozek 2012-03-02 10:33:32 UTC
*** Bug 799141 has been marked as a duplicate of this bug. ***

Comment 31 Bojan Smojver 2012-03-02 10:46:40 UTC
(In reply to comment #29)
> (In reply to comment #28)
> > With cleaned database in /var/lib/sss/db, on first request I get (that's on
> > F-16):
> > ----------------------------
> > Program received signal SIGSEGV, Segmentation fault.
> 
> Thank for for reporting this. Can you include more details, however? A core
> file would be very helpful, so would be log files and more details about the
> user and the groups he is a member of (is he a member of any groups? Are there
> nestings?)
> 
> Also, I'd like to track this crash as a separate issue. The original bug was
> tracking a performance problem, not a crash.

Stephen has my configs, but if you need more data I can send this privately. There is just too much stuff to sanitize, so I am not going to risk posting here.

Note that I am the original reporter of the bug and many of the packages were in response to slowness I experienced, so just wanted to note here that with this particular version, like with some others before it, I cannot even complete the testing.

Comment 32 Fedora Update System 2012-03-06 19:27:10 UTC
Package sssd-1.8.0-6.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing sssd-1.8.0-6.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-2725/sssd-1.8.0-6.fc16
then log in and leave karma (feedback).

Comment 33 Stephen Gallagher 2012-03-06 19:37:01 UTC
Sorry, please disregard the above. This bug was not supposed to be associated with that bodhi update. It was necessary to re-push that version of the package to updates-testing so that testing could also occur against FreeIPA (which depends on SSSD 1.8.0+)

I have disabled autokarma, so that package won't go to stable until we've resolved this regression.

Comment 34 Jakub Hrozek 2012-03-08 13:19:35 UTC
I was able to reproduce the crash with help of extra debugging information Bojan sent me in a private mail.

To reproduce, configure a multi-level nesting (at least 3 levels) with several groups on each level. One group on each level is not enough, even if there are loops in the nesting structure.

user --- A    -- BA --- BAA
  \          /     \
   \        /       \
     --- B -         -- BAB
            \
             \
              -- BB --- BBA
                   \
                    \
                     -- BBB

Comment 35 Fedora Update System 2012-03-12 23:43:14 UTC
sssd-1.8.1-7.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/FEDORA-2012-2725/sssd-1.8.1-7.fc16

Comment 36 Fedora Update System 2012-03-15 02:32:20 UTC
Package sssd-1.8.1-7.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing sssd-1.8.1-7.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-2725/sssd-1.8.1-7.fc16
then log in and leave karma (feedback).

Comment 37 Bojan Smojver 2012-03-16 01:50:56 UTC
I see a marginal speed improvement with the latest build. Authentication used to take about 25 secs, now it takes around 20 secs. Note, this is against a massive AD structure.

No crashes any more.

PS. Note that Windows clients do not take nearly as long to log in. So, still some work to be done, I guess.

Comment 38 Stephen Gallagher 2012-03-16 11:33:48 UTC
Performance improvements are always an ongoing process.

One thing you might try is setting 'ldap_referrals = false'. Active Directory performs a lot of unnecessary* referrals (three per group nesting level) and disabling this feature can speed things up dramatically.

* They are not always unnecessary. If your network environment takes advantage of partial replication, disabling referrals might lead to being unable to log in with users who are part of the global system but not replicated locally. If you are NOT using partial replication, turning off the referrals is MUCH faster.

We're working on ways to improve our referral processing, but that's a very complicated topic and well out of scope for this bug.

Comment 39 Bojan Smojver 2012-03-16 12:58:44 UTC
Thanks for the tip. Will try.

Comment 40 Fedora Update System 2012-03-17 23:43:38 UTC
sssd-1.8.1-7.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 41 Bojan Smojver 2012-03-18 04:50:01 UTC
(In reply to comment #38)

> One thing you might try is setting 'ldap_referrals = false'.

Bingo! 1.6 secs with cache blown away, 0.16 secs with cache. Only about two orders of magnitude faster. :-)

Thank you. Much appreciated!