Red Hat Bugzilla – Bug 645832
sssd craches AD
Last modified: 2011-01-10 16:30:04 EST
Description of problem:
We have recently upgraded our test environment to Fedora 14 / sssd 1.4.0-2 and encountered massive problems on AD LDAP connections.
A single LDAP query crashes the relevant 2003 controller and does not give an result on 2008r2 (I guess 2008r2 blocks these kind of queries).
Please attach your /etc/sssd/sssd.conf.
Also helpful would be to set debug_level=9 in your sssd.conf in the [domain/<domainname>] section and attach /var/log/sssd/sssd_<domainname>.log to this bug. (Sanitized, of course).
I'm not aware of anything that SSSD could possibly do to crash Active Directory, but if this is happening it is likely an Active Directory bug that should be reported to Microsoft.
SSSD might be sending bad data - that needs to be determined - but AD crashing because it receives bad data is a serious DoS vulnerability that Microsoft needs to be made aware of.
Of course, it's also worth noting that Active Directory 2003 is EOL (however AD 2003R2 is still kicking, if that's what you're using)
Created attachment 455139 [details]
sssd log file while trying to access 2003dc
Of course I was talking about 2003r2
Note that sssd is not able to crash a 2008r2 server. These seem to drop the bad data.
Created attachment 455140 [details]
log file containing information about unresolvable groups
The logs give the answer:
"LdapErr: DSID-0C090627, comment: In order to perform this operation a successful bind must be completed on the connection."
It would seem to suggest that SSSD tried to talk to AD and was refused because the conversation hadn't established a bind DN.
Since the GSSAPI SASL bind worked fine, my guess is that it was following an internal referral and due to https://fedorahosted.org/sssd/ticket/495 the referral lookup was done anonymously.
Can you try the following command:
ldapsearch -H -R ldap://2003dc.domain -b dc=d,dc=ethz,dc=ch \
If I'm right, this will return a referral object. If that's the case, then it's definitely https://fedorahosted.org/sssd/ticket/495 causing this behaviour.
Also, please report this issue to Microsoft, as any crash bug like this is serious.
I forgot to mention. If the ldapsearch command above does prove that there is a referral causing this problem, referral-following can be disabled by setting
ldap_referrals = False
in the [domain/DOMAIN] section of sssd.conf.
This should work around the problem, but if you depend on referrals they will no longer work either.
Long-term, we need to fix our bug (currently scheduled for inclusion in SSSD 1.5.0, targeted at early January), but Microsoft needs to fix their crash when they get a request outside of a bind.
Just another litte side note. This problem does not exist on 1.3 with exactly the same setup.
With referrals disabled, connection is established correctly but server still crashes.
On client side this is just noticed through a ping timeout.
(In reply to comment #8)
> With referrals disabled, connection is established correctly but server still
> On client side this is just noticed through a ping timeout.
The crashing of AD is very clearly an issue on Microsoft's side, not SSSD. My suspicion at this point is that you have a bug in your referral setup that AD is crashing on. SSSD is merely the trigger. We haven't changed anything in the referral code in SSSD 1.4.0. However, openldap has been updated in Fedora 14 to 2.4.22, which includes among other things a conversion to Mozilla-NSS for crypto support.
Can you confirm for me that your comment #7 was referring to SSSD 1.3 on Fedora 14 with openldap 2.4.22?
Also, have you verified with logs that the crash is actually caused by SSSD? Please check your Active Directory logs to be certain that they are happening at the same time, and only then.
And please report the output of the ldapsearch command I gave you above (and determine whether that action causes the crash as well). Please try it both with and without the (-R) command (which tells ldapsearch whether to try following referrals or not).
we have verified that the crash is cause by sssd because the last connect before the crash happened is done by the keytab used from sssd.
But maybe we should better take a look at the 2008 DC behavior (which is nearly the same, expect the server is not crashing). If ldap referrals are activated the LDAP connection is lost after a few queries (maybe when it reaches the first referral).
ldapsearch -H is for the URI and -R is for a given realm so -H -R URI wont work.
Sorry about the false information about ldapsearch. In older versions of ldapsearch, -R blocked the automatic traversal of referrals. It does not appear that a similar argument exists now.
Can you please at least try to verify whether the Active Directory server crashes when running the command:
ldapsearch -Y gssapi -H ldap://2003dc.domain -b dc=d,dc=ethz,dc=ch \
(You will need to be kinited before doing this)
ldapsearch works fine
Would you mind including a log with the ldap_referral=false? I want to see if the log looks different.
We got different setups, in one setup we are using a round robin dns called 2008dc.domain (or 2003dc.domain).
For testing purpose I have now created a config which directly points to one 2008 dc.
In both versions ldap_referrals needs to be disabled in 1.4 with openldap 2.4.22.
I am planning to use service discovery in the future, but this is not possible at the moment as the setup crashes our 2003 dcs DNS (which will be obsoleted soon).
Which log should I provide?
Please include the log for SSSD that is crashing the AD 2003 server while it has ldap_referral=false.
I have to discuss this with our AD Admins, first because they do not like these kind of interruption on their live environment :/
Hopefully I will come back to you, soon.
(In reply to comment #14)
> I am planning to use service discovery in the future, but this is not possible
> at the moment as the setup crashes our 2003 dcs DNS (which will be obsoleted
Sorry, this is not related to this bug but it caught my attention. Are you saying that any of the DNS SRV queries we are sending are causing problems with the DNS servers you are using?
Created attachment 455954 [details]
sssd.log while ldap_referrals are disabled
note the unexpected result from ldap: size limit exceeded.
Not the DNS queries are leading to the 2003 dc crashes but the ldap queries lead to a crash of the DNS services on the 2003 dcs.
(In reply to comment #18)
> Created attachment 455954 [details]
> sssd.log while ldap_referrals are disabled
> note the unexpected result from ldap: size limit exceeded.
This is a server-side configuration issue. In this situation, it's really not a good idea to be running with enumerate=True. We don't currently have any mechanism for running lookups in size-based chunks. It looks like your server is configured to allow only 1000 entries to be returned in a single request.
With that limitation on your server, I'd recommend strongly against running with enumerate=True enabled. Is there some reason that you absolutely have to have getpwent() running, or is looking up individual entries sufficient?
Okay, that's what I thought (but this message never occured on 1.3 even with enumeration activated).
Now this is what I've done:
- I have set up exactly the same configuration (with referrals enabled) on F13 with sssd 1.3: everything works fine.
- backported openldap 2.4.22 to this setup: still everything works fine.
So in summary these are the things that must have changed in F14, sssd 1.4:
ldap_referrals do no longer work or have been disabled by default in versions < 1.4 (so I might not noticed it)
lookup limitations seems to be ignored somehow before 1.4 and now detected correctly (which is of course the right behavior and does not bother me that much as I do not need enumeration)
groups are not escaped correctly (which is quite annoying as sudo does not work that way)
Ok, I think I've sorted out what each of the bugs here really are.
(In reply to comment #21)
> ldap_referrals do no longer work or have been disabled by default in versions <
> 1.4 (so I might not noticed it)
Starting with 1.2.2, we changed the way we performed group lookups and initgroups calls. There was a bug in this implementation that meant that users and group relationships were not correct in many situations. We resolved this in SSSD 1.4.0. One of the big changes here was that we now have nested group memberships working correctly in RFC2307bis.
I believe that what happened here is that the entry or entries in your AD server that were responding with a referral exist in the nested hierarchy in a place that wasn't being reached in 1.3.0. As a result, when we fixed the nesting problem, this new problem stopped being hidden.
A patch for the rebind issue on referrals has been committed upstream and will be included in the next release. Hopefully this will resolve this part of your problems.
> lookup limitations seems to be ignored somehow before 1.4 and now detected
> correctly (which is of course the right behavior and does not bother me that
> much as I do not need enumeration)
https://fedorahosted.org/sssd/ticket/658 has been opened to enable paging support.
> groups are not escaped correctly (which is quite annoying as sudo does not work
> that way)
I am actively working on fixing https://fedorahosted.org/sssd/ticket/639 right now to solve this issue.
Currently, most of these fixes are targeted at the upstream release 1.5.0, but I will probably backport some of these patches into 1.4.x in the meantime.
sssd-1.5.0-1.fc14 has been submitted as an update for Fedora 14.
sssd-1.5.0-1.fc14 has been pushed to the Fedora 14 testing repository. If problems still persist, please make note of it in this bug report.
If you want to test the update, you can install it with
su -c 'yum --enablerepo=updates-testing update sssd'. You can provide feedback for this update here: https://admin.fedoraproject.org/updates/sssd-1.5.0-1.fc14
sssd-1.5.0-1.fc14 has been pushed to the Fedora 14 stable repository. If problems still persist, please make note of it in this bug report.