Bug 981477 - [RFE] Negative Caching for GSSAPI
Summary: [RFE] Negative Caching for GSSAPI
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: krb5
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Robbie Harwood
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-07-04 21:37 UTC by David Woodhouse
Modified: 2020-09-10 17:33 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-10 17:33:50 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Mozilla Foundation 890908 0 -- RESOLVED Move Negotiate auth off main thread 2020-09-10 17:29:48 UTC

Description David Woodhouse 2013-07-04 21:37:50 UTC
It stops even redrawing itself and I can't even switch to any other tabs or interact with it in any way even to cancel the connection, while Kerberos authentication is being attempted. Is it calling the GSSAPI functions from its main thread?

This can take a *very* long time, in a large Active Directory forest with a large number of servers.

The problem is exacerbated by the fact that it seems to try, and fail, over and over and over again to obtain a ticket for the same server when loading lots of images etc.,

So the result is that I can sniff the traffic on the VPN and see the *same* DNS SRV request for _kerberos._udp.$domain, and then the A and AAAA lookups for each of the *dozens* of servers listed in that SRV record. Over and over and over again, before firefox starts talking to me again.

This makes firefox with Kerberos fairly much unusable unless we also use the the samba-winbind-krb5-locator and a local caching nameserver to try to speed things up — and even then it's barely tolerable.

Comment 1 David Woodhouse 2013-07-08 13:07:31 UTC
We have an internal host which has broken reverse DNS, so Kerberos fails thus:

$ time kvno -S http servicedesk.intel.com
kvno: Server not found in Kerberos database while getting credentials for http/asktech.cps.intel.com@

This failure takes up to five seconds, although it's usually only about a third of a second.

I ran firefox with a breakpoint on gss_init_sec_context(), and pointed it at this web site. It loads, eventually, but only by invoking /usr/bin/ntlm_auth to authenticate with NTLM.

On the way *there*, however, it invokes gss_init_sec_context(), for the *same* SPN, 140 times in quick succession. Where the word "quick" is probably used ill-advisedly.

It even continues to call gss_init_sec_context() *after* it's already fallen back to NTLM and is talking to /usr/bin/ntlm_auth.


Perhaps if it were to only try once, it wouldn't matter so much that it's doing all this from its main thread and is failing even to redraw itself in the meantime.

Comment 2 David Woodhouse 2013-07-08 15:24:51 UTC
Some of this is definitely a firefox bug (doing blocking network stuff from main thread). However, the majority of the problem is probably best mitigated in the GSSAPI library with some kind of negative caching. Assigning to krb5 accordingly.

(I also have a live-http-headers trace here where in about 24 of 120 cases, gss_init_sec_context() returns *success* and firefox does send a large Authentication: Negotiate YIINrgYGKwYBBQUCoIINojCCDZ6gCjAIBg... header to the server. I have no idea what's going on there; what would it be sending?)

Comment 3 Nalin Dahyabhai 2013-07-08 15:33:00 UTC
(In reply to David Woodhouse from comment #2)
> (I also have a live-http-headers trace here where in about 24 of 120 cases,
> gss_init_sec_context() returns *success* and firefox does send a large
> Authentication: Negotiate YIINrgYGKwYBBQUCoIINojCCDZ6gCjAIBg... header to
> the server. I have no idea what's going on there; what would it be sending?)

The browser's apparently sending an initiator token to the server - do you have a complete one we can decode and examine?

Comment 4 Aleksandar Kostadinov 2013-09-06 06:32:35 UTC
FYI since recent FF versions it does not hang completely for me (at least FF22 was good, not sure about FF18+). kerberos auth is much slower than basic auth but that affects only particular tab. Not sure why some users report that it still hangs the whole browser. It might depend on some configuration option as well.

Comment 5 Aleksandar Kostadinov 2014-09-12 14:37:32 UTC
btw I think that slowdown might be related to generic FF auth code. When a site requires basic auth, I can't use other tabs until I enter credentials. Which is sub-optimal IMO. Maybe there's some global block that's triggered while kerberos auth is taking place.

Comment 6 Fedora Admin XMLRPC Client 2014-10-06 16:37:47 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 7 Fedora End Of Life 2015-01-09 18:40:32 UTC
This message is a notice that Fedora 19 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 19. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained. Approximately 4 (four) weeks from now this bug will
be closed as EOL if it remains open with a Fedora 'version' of '19'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 19 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 8 Fedora End Of Life 2015-02-17 15:51:36 UTC
Fedora 19 changed to end-of-life (EOL) status on 2015-01-06. Fedora 19 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 9 Fedora Admin XMLRPC Client 2015-09-01 21:35:37 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 10 Robbie Harwood 2015-09-09 22:27:01 UTC
This really sounds to me like firefox is calling GSSAPI incorrectly.  One does not simply call gss_init_sec_context() one hundred times and change.  Unless you have a good argument why this is a krb5 bug, it sounds to me like firefox is doing their auth in a blocking fashion (which they shouldn't) and this should be reassigned there.

Comment 11 David Woodhouse 2015-09-10 09:28:27 UTC
Yes, Firefox sucks. See comment #2 for the reason this was assigned to krb5:

(In reply to David Woodhouse from comment #2)
> Some of this is definitely a firefox bug (doing blocking network stuff from
> main thread). However, the majority of the problem is probably best
> mitigated in the GSSAPI library with some kind of negative caching.
> Assigning to krb5 accordingly.

See also the discussion at
http://krbdev.mit.narkive.com/wNu53vy3/negative-caching-of-unknown-principals

Comment 12 Martin Stransky 2015-09-10 09:28:43 UTC
Please be more specific, what do you mean with incorrect GSSAPI calling? Where is the problem exactly? I'm not a kerberos expert so we need to good instructions how to fix that.

Comment 13 David Woodhouse 2015-09-10 09:59:57 UTC
The other reason is that Firefox is basically unmaintained. Note the upstream bug, untouched for over two years. Chrome's auth code is almost as bad — there have been patches outstanding to fix its lack of NTLM single-sign-on for even *more* years, with little sign of progress.

So fixing this with a negative cache in krb5 is *extremely* attractive.

Comment 14 David Woodhouse 2015-09-10 10:11:07 UTC
(In reply to Martin Stransky from comment #12)
> Please be more specific, what do you mean with incorrect GSSAPI calling?
> Where is the problem exactly? I'm not a kerberos expert so we need to good
> instructions how to fix that.

Firstly: Do not call it from your main thread. It does blocking network operations which may take *seconds*. And your UI is entirely unresponsive during that time. That one is purely a Firefox issue.

The second issue is that we could really do with some negative caching. If a given SPN doesn't exist, and GSSAPI fails and we fall back to NTLM auth, then *somebody* ought to remember that rather than just trying again and getting that same multi-second delay over and over again for every HTTP request we make (GSSAPI auth is per-request, isn't it? Not per-connection like NTLM auth IIRC). 

I actually think the negative caching lives in krb5 *not* all the applications that use it. Partly so it can be implemented just once and the cache can be shared, and also because I can't quite see *how* an application would sanely implement it in the case where it *does* want to use GSSAPI, but krb5 fails and we fall back to GSS-NTLMSSP (within SPNEGO), *not* completely abandon SPNEGO and fall back to something different like 'WWW-Authenticate: NTLM'. In that case we'd be asking the application to notice that it ended up falling back to GSS-NTLMSSP last time, and deliberately exclude krb5 from the permitted mechanisms for GSSAPI on subsequent attempts to the same host. Please no! The negative cache lives in krb5, surely!

Comment 15 Simo Sorce 2015-10-22 17:43:14 UTC
The negative caching should also be done in Firefox IMO, I do not think the library should get into that business.

Comment 16 David Woodhouse 2015-10-22 20:56:57 UTC
When Firefox does SPNEGO, and krb5 fails and GSSAPI falls back to NTLMSSP... how how would you recommend that Firefox determine that, and make it use NTLMSSP only next time?

Doing the negative caching at the krb5 level as discussed at  krbdev.mit.narkive.com/wNu53vy3/negative-caching-of-unknown-principals seems much better to me.

Comment 17 Simo Sorce 2015-10-23 22:56:29 UTC
When the context is established firefox can query it to find out what mechanism it used, and simply store that information. Next time it tries to authenticate the same URI it will set the allowed nergotiation mechanism only to NTLMSSP.

HTH.

Comment 18 Fedora End Of Life 2015-11-04 10:02:58 UTC
This message is a reminder that Fedora 21 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 21. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '21'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 21 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 19 Fedora End Of Life 2015-12-02 02:49:15 UTC
Fedora 21 changed to end-of-life (EOL) status on 2015-12-01. Fedora 21 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 20 Robbie Harwood 2020-09-10 17:33:50 UTC
I haven't gotten to this in the years it's been open, so realistically it's probably unlikely.  If it's still a big problem, we can open a ticket upstream, but I don't want to hold out false hope by keeping this open.


Note You need to log in before you can comment on or make changes to this bug.