Bug 486006 - dovecot-auth sometimes gets 'stuck' during Kerberos auth
dovecot-auth sometimes gets 'stuck' during Kerberos auth
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: pam_krb5 (Show other bugs)
10
i686 Linux
low Severity high
: ---
: ---
Assigned To: Nalin Dahyabhai
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-02-17 15:59 EST by Ben Webb
Modified: 2009-06-26 22:40 EDT (History)
3 users (show)

See Also:
Fixed In Version: 2.3.5-1.fc10
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-06-26 22:40:38 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
dovecot.conf (47.61 KB, text/plain)
2009-06-03 13:51 EDT, Ben Webb
no flags Details

  None (edit)
Description Ben Webb 2009-02-17 15:59:31 EST
Description of problem:
Occasionally the dovecot-auth process gets stuck during PAM auth; after this happens our users are unable to check their mail. Restarting dovecot fixes the problem, although the dovecot-auth process will not die on SIGTERM (SIGKILL must be sent). This appears to happen randomly, and appears Kerberos related (although we have had no problem with sshd, httpd, or any other Kerberos-related service on our machines).


Version-Release number of selected component (if applicable):
dovecot-1.1.10-1.fc10.i386
pam_krb5-2.3.2-1.fc10.i386


How reproducible:
Happens randomly.

Steps to Reproduce:
1. Start dovecot with passdb pam, and using pam_krb5.
2. Wait...
  
Actual results:
Occasionally dovecot-auth will fail with an error like the following in /var/log/secure:
Feb 17 07:13:52 myserver dovecot-auth: pam_krb5[19950]: authentication fails for 'user' (user@REALM): Authentication service cannot retrieve authentication info (Cannot contact any KDC for requested realm)

Subsequent authentications will then fail until dovecot is restarted. There are, however, no problems with our KDCs as far as we can tell (other Kerberos services work just fine, and we have 3 KDCs for pam_krb5 to choose from).

In fact, on our primary KDC we see the following in the logs, which indicates that the request is coming in just fine - just that the reply seems to be being ignored:
Feb 17 07:13:27 mykdc krb5kdc[1701](info): AS_REQ (7 etypes {18 17 16 23 1 3 2}) myserverIP: ISSUE: authtime 1234883607, etypes {rep=16 tkt=16 ses=16}, user@REALM for krbtgt/REALM@REALM
Feb 17 07:13:27 mykdc krb5kdc[1701](info): DISPATCH: repeated (retransmitted?) request from myserverIP, resending previous response
Feb 17 07:13:27 mykdc krb5kdc[1701](info): DISPATCH: repeated (retransmitted?) request from myserverIP, resending previous response
Feb 17 07:13:34 mykdc krb5kdc[1701](info): DISPATCH: repeated (retransmitted?) request from myserverIP, resending previous response
Feb 17 07:13:41 mykdc krb5kdc[1701](info): DISPATCH: repeated (retransmitted?) request from myserverIP, resending previous response

(Almost identical logs are seen on the secondary and tertiary KDCs at the same time.)


Expected results:
Authentication succeeds like the other 99% of seemingly identical requests (which do succeed).

Additional info:
I will turn on extra debug output from pam_krb5 to see if I get more information next time this problem occurs. But any other ideas in the meantime would be much appreciated!
Comment 1 Ben Webb 2009-02-19 13:33:09 EST
For reference, here is the output from a successful authentication:

Feb 19 09:24:01 myserver dovecot-auth: pam_krb5[20326]: default/local realm 'REALM'
Feb 19 09:24:01 myserver dovecot-auth: pam_krb5[20326]: configured realm 'REALM'
Feb 19 09:24:01 myserver dovecot-auth: pam_krb5[20326]: flag: debug
Feb 19 09:24:01 myserver dovecot-auth: pam_krb5[20326]: flags: forwardable
Feb 19 09:24:01 myserver dovecot-auth: pam_krb5[20326]: flag: no ignore_afs
Feb 19 09:24:01 myserver dovecot-auth: pam_krb5[20326]: flag: no null_afs
Feb 19 09:24:01 myserver dovecot-auth: pam_krb5[20326]: flag: user_check
Feb 19 09:24:01 myserver dovecot-auth: pam_krb5[20326]: flag: no krb4_convert
Feb 19 09:24:01 myserver dovecot-auth: pam_krb5[20326]: flag: krb4_convert_524
Feb 19 09:24:01 myserver dovecot-auth: pam_krb5[20326]: flag: krb4_use_as_req
Feb 19 09:24:01 myserver dovecot-auth: pam_krb5[20326]: will try previously set password first 
Feb 19 09:24:01 myserver dovecot-auth: pam_krb5[20326]: will let libkrb5 ask questions
Feb 19 09:24:01 myserver dovecot-auth: pam_krb5[20326]: flag: no use_shmem
Feb 19 09:24:01 myserver dovecot-auth: pam_krb5[20326]: flag: no external
Feb 19 09:24:01 myserver dovecot-auth: pam_krb5[20326]: flag: warn
Feb 19 09:24:01 myserver dovecot-auth: pam_krb5[20326]: ticket lifetime: 36000s (0d,10h,0m,0s)
Feb 19 09:24:01 myserver dovecot-auth: pam_krb5[20326]: renewable lifetime: 36000s (0d,10h,0m,0s)
Feb 19 09:24:01 myserver dovecot-auth: pam_krb5[20326]: banner: Kerberos 5
Feb 19 09:24:01 myserver dovecot-auth: pam_krb5[20326]: ccache dir: /tmp
Feb 19 09:24:01 myserver dovecot-auth: pam_krb5[20326]: ccname template: FILE:%d/krb5cc_%U_XXXXXX
Feb 19 09:24:01 myserver dovecot-auth: pam_krb5[20326]: keytab: FILE:/etc/krb5.keytab
Feb 19 09:24:01 myserver dovecot-auth: pam_krb5[20326]: token strategy: v4,524,2b,rxk5
Feb 19 09:24:01 myserver dovecot-auth: pam_krb5[20326]: called to authenticate 'user', realm 'REALM'
Feb 19 09:24:01 myserver dovecot-auth: pam_krb5[20326]: authenticating 'user@REALM'
Feb 19 09:24:01 myserver dovecot-auth: pam_krb5[20326]: trying previously-entered password for 'user', allowing libkrb5 to prompt for more
Feb 19 09:24:01 myserver dovecot-auth: pam_krb5[20326]: authenticating 'user@REALM' to 'krbtgt/REALM@REALM'
Feb 19 09:24:01 myserver dovecot-auth: pam_krb5[20326]: krb5_get_init_creds_password(krbtgt/REALM@REALM) returned 0 (Success)
Feb 19 09:24:01 myserver dovecot-auth: pam_krb5[20326]: got result 0 (Success)
Feb 19 09:24:01 myserver dovecot-auth: pam_krb5[20326]: 'user@REALM' passes .k5login check for 'user'
Feb 19 09:24:01 myserver dovecot-auth: pam_krb5[20326]: authentication succeeds for 'user' (user@REALM)
Feb 19 09:24:01 myserver dovecot-auth: pam_krb5[20326]: pam_authenticate returning 0 (Success)

An unsuccessful auth looks exactly the same, up until the point where it fails:
Feb 19 09:54:34 myserver dovecot-auth: pam_krb5[20326]: authenticating 'user@REALM' to 'krbtgt/REALM@REALM'
Feb 19 09:55:00 myserver dovecot-auth: pam_krb5[20326]: krb5_get_init_creds_password(krbtgt/REALM@REALM) returned -1765328228 (Cannot contact any KDC for requested realm)
Feb 19 09:55:00 myserver dovecot-auth: pam_krb5[20326]: got result -1765328228 (Cannot contact any KDC for requested realm)
Feb 19 09:55:00 myserver dovecot-auth: pam_krb5[20326]: authentication fails for 'user' (user@REALM): Authentication service cannot retrieve authentication info (Cannot contact any KDC for requested realm)
Feb 19 09:55:00 myserver dovecot-auth: pam_krb5[20326]: pam_authenticate returning 9 (Authentication service cannot retrieve authentication info)

Again, restarting dovecot fixes the problem. I noticed that the dovecot-auth process (PID 20326) has a lot of files open (1039). Perhaps there is some problem with it exceeding 1024 file descriptors? (I already upped the 'ulimit -n' limit to 2048 for dovecot.) A quick strace on the dovecot-auth process shows the suspicious-looking output:
select(1039, [1033 1034 1035 1036 1037 1038], [9 10 11 12 13 14 1036 1037 1038], [12 13 14 1024], {1, 691408}) = 9 (except [1024], left {1, 691408})
gettimeofday({1235066390, 521449}, NULL) = 0
select(1039, [1033 1034 1035 1036 1037 1038], [9 10 11 12 13 14 1036 1037 1038], [12 13 14 1024], {1, 690711}) = 9 (except [1024], left {1, 690711})
gettimeofday({1235066390, 522149}, NULL) = 0
select(1039, [1033 1034 1035 1036 1037 1038], [9 10 11 12 13 14 1036 1037 1038], [12 13 14 1024], {1, 690011}) = 9 (except [1024], left {1, 690011})
gettimeofday({1235066390, 522849}, NULL) = 0
(repeated a large number of times)
Comment 2 Fedora Admin XMLRPC Client 2009-05-21 09:17:08 EDT
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.
Comment 3 Michal Hlavinka 2009-05-25 11:23:46 EDT
Hi,
thanks for reporting this. Could you please try to reproduce this with new dovecot version? You can find new dovecot packages here:

http://koji.fedoraproject.org/koji/buildinfo?buildID=100213

Please also add auth_debug=yes in /etc/dovecot.conf

If you can still reproduce:
1) check number of opened files
2) change ulimit significantly and try to reproduce again to see if number of opened files is different, when it stops working

Thanks
Comment 4 Ben Webb 2009-05-29 12:23:59 EDT
Yes, it still happens with dovecot-1.1.14-1.fc10.i386.

One thing that appears clear from /var/log/secure: once a given dovecot-auth process encounters the "cannot contact any KDC" error any subsequent authentication attempts fail with the same error. Only that single dovecot-auth process is affected, however - other dovecot-auth processes on the same machine continue to work fine. So the problem will clear itself up eventually (without needing to restart all of dovecot) when/if the problematic dovecot-auth process terminates (at least, until it happens again). Unfortunately that means that by the time I see the problem, typically the dovecot-auth process is no longer around to look at. ;(

This is with ulimit -n 2048. I will up it to 4096 and see if that changes things. I've also turned on auth_verbose and auth_debug.
Comment 5 Ben Webb 2009-06-02 18:14:57 EDT
OK, with ulimit -n 4096, I see the following in /var/log/secure:

Jun  1 17:50:27 server dovecot-auth: pam_krb5[16280]: authentication succeeds for 'user1' (user1@REALM)
[hundreds more successful auths with the same pid and similar logs]
Jun  2 09:59:24 server dovecot-auth: pam_krb5[16280]: authentication succeeds for 'user2' (user2@REALM)
Jun  2 10:00:01 server dovecot-auth: pam_krb5[16280]: authentication fails for 'user3' (user3@REALM): Authentication service cannot retrieve authentication info (Cannot contact any KDC for requested realm)
Jun  2 10:00:29 server dovecot-auth: pam_krb5[16280]: authentication fails for 'user4' (user4@REALM): Authentication service cannot retrieve authentication info (Cannot contact any KDC for requested realm)
Jun  2 10:00:59 server dovecot-auth: pam_krb5[16280]: authentication fails for 'user4' (user4@REALM): Authentication service cannot retrieve authentication info (Cannot contact any KDC for requested realm)
Jun  2 10:06:29 server dovecot-auth: pam_krb5[16280]: authentication fails for 'user5' (user5@REALM): Authentication service cannot retrieve authentication info (Cannot contact any KDC for requested realm)

Looking for a match for that pid in /var/log/maillog I see (together with the 3 lines before it):
Jun  2 10:12:24 server dovecot: auth(default): new auth connection: pid=32611
Jun  2 10:12:27 server dovecot: auth(default): worker-server(user4,aa.bb.cc.dd): Aborted: Worker process died unexpectedly
Jun  2 10:12:27 server dovecot: auth(default): worker-server(user6,AA.BB.CC.DD): Aborted: Worker process died unexpectedly
Jun  2 10:12:27 server dovecot: child 16280 (auth-worker) killed with signal 11 (core dumps disabled)

(I have removed IP addresses, our realm, user names and server names, but the pids and dates are unchanged.)
Comment 6 Michal Hlavinka 2009-06-03 10:25:24 EDT
I've been looking into this, but didn't find anything useful yet. Could you attach your dovecot.conf ? Thanks
Comment 7 Ben Webb 2009-06-03 13:51:11 EDT
Created attachment 346436 [details]
dovecot.conf

Our dovecot configuration, as requested. Nothing too unusual in there, as far as I can tell.
Comment 8 Michal Hlavinka 2009-06-10 16:11:04 EDT
I didn't find anything useful yet, but I was talking with pam_krb5 maintainer and it would be great if you could try this with new pam_krb5-2.3.5

This version is in updates testing now, so you can update it using:
yum update --enablerepo=updates-testing pam_krb5

Could you please try if you can reproduce this with updated version? Thanks
Comment 9 Ben Webb 2009-06-11 11:54:25 EDT
I would be happy to try it, but I don't see it in updates-testing at http://download.fedoraproject.org/pub/fedora/linux/updates/testing/10/i386/
Comment 10 Michal Hlavinka 2009-06-12 02:49:59 EDT
It's seems the package is still pending:

https://admin.fedoraproject.org/updates/pam_krb5-2.3.5-1.fc10

and because this update is marked as security, I don't know if it lands in updates or updates-testing.

You can:
a)wait when package is pushed into updates or updates-testing (this information will occur in the first link), than you can update it using yum update pam_krb5 (or with --enablerepo=updates-testing if it is in update-testing)

or

b) use package directly from koji (they are not gpg signed): download package from 

http://kojipkgs.fedoraproject.org/packages/pam_krb5/2.3.5/1.fc10/i386/

and update it: yum localupdate --nogpgcheck path_to_the_package
Comment 11 Nalin Dahyabhai 2009-06-16 13:34:12 EDT
The package should be in updates-testing as of today (for reasons I don't completely follow, updates weren't being successfully pushed for the last while); I'll give it a week and if nobody screams in agony I intend to request that it be moved to updates.  Thanks!
Comment 12 Ben Webb 2009-06-16 14:13:58 EDT
OK, I'll give that a try. In the meantime, I managed to extract a core file from one of the auth-worker processes that died with signal 11. I'd rather not attach that core itself here since it contains unencrypted passwords, but here's the backtrace (and I can run other tests on it if you like). Perhaps this is a bug fixed in the pam_krb5 package you pointed me to?

gdb /usr/libexec/dovecot/dovecot-auth core.18972 
GNU gdb Fedora (6.8-30.fc10)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...

warning: Can't read pathname for load map: Input/output error.
Reading symbols from /lib/libcrypt-2.9.so...Reading symbols from /usr/lib/debug/lib/libcrypt-2.9.so.debug...done.
done.
Loaded symbols for /lib/libcrypt-2.9.so
Reading symbols from /lib/libpam.so.0.81.13...Reading symbols from /usr/lib/debug/lib/libpam.so.0.81.13.debug...done.
done.
Loaded symbols for /lib/libpam.so.0.81.13
Reading symbols from /lib/libdl-2.9.so...Reading symbols from /usr/lib/debug/lib/libdl-2.9.so.debug...done.
done.
Loaded symbols for /lib/libdl-2.9.so
Reading symbols from /lib/libc-2.9.so...Reading symbols from /usr/lib/debug/lib/libc-2.9.so.debug...done.
done.
Loaded symbols for /lib/libc-2.9.so
Reading symbols from /lib/libaudit.so.0.0.0...Reading symbols from /usr/lib/debug/lib/libaudit.so.0.0.0.debug...done.
done.
Loaded symbols for /lib/libaudit.so.0.0.0
Reading symbols from /lib/ld-2.9.so...Reading symbols from /usr/lib/debug/lib/ld-2.9.so.debug...done.
done.
Loaded symbols for /lib/ld-2.9.so
Reading symbols from /usr/lib/dovecot/auth/libmech_gssapi.so...Reading symbols from /usr/lib/debug/usr/lib/dovecot/auth/libmech_gssapi.so.debug...done.
done.
Loaded symbols for /usr/lib/dovecot/auth/libmech_gssapi.so
Reading symbols from /usr/lib/libgssapi_krb5.so.2.2...Reading symbols from /usr/lib/debug/usr/lib/libgssapi_krb5.so.2.2.debug...done.
done.
Loaded symbols for /usr/lib/libgssapi_krb5.so.2.2
Reading symbols from /usr/lib/libkrb5.so.3.3...Reading symbols from /usr/lib/debug/usr/lib/libkrb5.so.3.3.debug...done.
done.
Loaded symbols for /usr/lib/libkrb5.so.3.3
Reading symbols from /usr/lib/libk5crypto.so.3.1...Reading symbols from /usr/lib/debug/usr/lib/libk5crypto.so.3.1.debug...done.
done.
Loaded symbols for /usr/lib/libk5crypto.so.3.1
Reading symbols from /lib/libcom_err.so.2.1...Reading symbols from /usr/lib/debug/lib/libcom_err.so.2.1.debug...done.
done.
Loaded symbols for /lib/libcom_err.so.2.1
Reading symbols from /usr/lib/libkrb5support.so.0.1...Reading symbols from /usr/lib/debug/usr/lib/libkrb5support.so.0.1.debug...done.
done.
Loaded symbols for /usr/lib/libkrb5support.so.0.1
Reading symbols from /lib/libkeyutils-1.2.so...Reading symbols from /usr/lib/debug/lib/libkeyutils-1.2.so.debug...done.
done.
Loaded symbols for /lib/libkeyutils-1.2.so
Reading symbols from /lib/libresolv-2.9.so...Reading symbols from /usr/lib/debug/lib/libresolv-2.9.so.debug...done.
done.
Loaded symbols for /lib/libresolv-2.9.so
Reading symbols from /lib/libpthread-2.9.so...Reading symbols from /usr/lib/debug/lib/libpthread-2.9.so.debug...done.
done.
Loaded symbols for /lib/libpthread-2.9.so
Reading symbols from /lib/libselinux.so.1...Reading symbols from /usr/lib/debug/lib/libselinux.so.1.debug...done.
done.
Loaded symbols for /lib/libselinux.so.1
Reading symbols from /lib/libnss_files-2.9.so...Reading symbols from /usr/lib/debug/lib/libnss_files-2.9.so.debug...done.
done.
Loaded symbols for /lib/libnss_files-2.9.so
Reading symbols from /lib/security/pam_nologin.so...Reading symbols from /usr/lib/debug/lib/security/pam_nologin.so.debug...done.
done.
Loaded symbols for /lib/security/pam_nologin.so
Reading symbols from /lib/security/pam_env.so...Reading symbols from /usr/lib/debug/lib/security/pam_env.so.debug...done.
done.
Loaded symbols for /lib/security/pam_env.so
Reading symbols from /lib/security/pam_unix.so...Reading symbols from /usr/lib/debug/lib/security/pam_unix.so.debug...done.
done.
Loaded symbols for /lib/security/pam_unix.so
Reading symbols from /lib/libnsl-2.9.so...Reading symbols from /usr/lib/debug/lib/libnsl-2.9.so.debug...done.
done.
Loaded symbols for /lib/libnsl-2.9.so
Reading symbols from /lib/security/pam_krb5.so...Reading symbols from /usr/lib/debug/lib/security/pam_krb5.so.debug...done.
done.
Loaded symbols for /lib/security/pam_krb5.so
Reading symbols from /usr/lib/libkrb4.so.2.0...Reading symbols from /usr/lib/debug/usr/lib/libkrb4.so.2.0.debug...done.
done.
Loaded symbols for /usr/lib/libkrb4.so.2.0
Reading symbols from /usr/lib/libdes425.so.3.0...Reading symbols from /usr/lib/debug/usr/lib/libdes425.so.3.0.debug...done.
done.
Loaded symbols for /usr/lib/libdes425.so.3.0
Reading symbols from /lib/security/pam_deny.so...Reading symbols from /usr/lib/debug/lib/security/pam_deny.so.debug...done.
done.
Loaded symbols for /lib/security/pam_deny.so
Reading symbols from /lib/security/pam_limits.so...Reading symbols from /usr/lib/debug/lib/security/pam_limits.so.debug...done.
done.
Loaded symbols for /lib/security/pam_limits.so
Core was generated by `dovecot-auth -w'.
Program terminated with signal 11, Segmentation fault.
[New process 18972]
#0  0x00180059 in service_fds (context=0x97a4ea0, selstate=0x97a4a20, 
    conns=0x97a25f0, n_conns=7, winning_conn=0xbfdcfd18, seltemp=0x97a4bb0, 
    msg_handler=0x181170 <check_for_svc_unavailable>, 
    msg_handler_data=0xbfdcfdb4) at sendto_kdc.c:1084
1084		    if (FD_ISSET(conns[i].fd, &seltemp->rfds))
(gdb) bt
#0  0x00180059 in service_fds (context=0x97a4ea0, selstate=0x97a4a20, 
    conns=0x97a25f0, n_conns=7, winning_conn=0xbfdcfd18, seltemp=0x97a4bb0, 
    msg_handler=0x181170 <check_for_svc_unavailable>, 
    msg_handler_data=0xbfdcfdb4) at sendto_kdc.c:1084
#1  0x0018082b in krb5int_sendto (context=0x97a4ea0, message=0x97896b8, 
    addrs=0xbfdcfda8, callback_info=0x0, reply=0xbfdcfe00, localaddr=0x0, 
    localaddrlen=0x0, remoteaddr=0x0, remoteaddrlen=0x0, addr_used=0xbfdcfdb8, 
    msg_handler=0x181170 <check_for_svc_unavailable>, 
    msg_handler_data=0xbfdcfdb4) at sendto_kdc.c:1267
#2  0x00180faa in krb5_sendto_kdc (context=0x97a4ea0, message=0x97896b8, 
    realm=0x979a864, reply=0xbfdcfe00, use_master=0xbfdd00b4, tcp_only=0)
    at sendto_kdc.c:410
#3  0x0015e2ce in send_as_request (context=0x97a4ea0, request=0xbfdcfe98, 
    ret_err_reply=0xbfdcff28, ret_as_reply=0xbfdcff24, use_master=0xbfdd00b4)
    at get_in_tkt.c:138
#4  0x0015f5d6 in krb5_get_init_creds (context=0x97a4ea0, creds=0x97a5344, 
    client=0x979a860, prompter=0x1f5840 <_pam_krb5_normal_prompter>, 
    prompter_data=0xbfdd0ec4, start_time=0, 
    in_tkt_service=0xbfdd0ef8 "krbtgt/REALM@REALM", 
    options=0x97a0ea0, gak_fct=0x160ca0 <krb5_get_as_key_password>, 
    gak_data=0xbfdd008c, use_master=0xbfdd00b4, as_reply=0xbfdd00b0)
    at get_in_tkt.c:1157
#5  0x0016174b in krb5_get_init_creds_password (context=0x97a4ea0, 
    creds=0x97a5344, client=0x979a860, password=0x97a1358 "PASSWORD", 
    prompter=0x1f5840 <_pam_krb5_normal_prompter>, data=0xbfdd0ec4, 
    start_time=0, in_tkt_service=0xbfdd0ef8 "krbtgt/REALM@REALM", 
    options=0x97a0ea0) at gic_pwd.c:139
#6  0x001fc11f in v5_get_creds (ctx=0x97a4ea0, pamh=0x978b4e8, 
    creds=0x97a5344, user=0x9773718 "USER", userinfo=0x979a908, 
    options=0x979d5c8, service=0x200fd4 "krbtgt", 
    password=0x97a1358 "PASSWORD", gic_options=0x97a0ea0, 
    prompter=0x1f5840 <_pam_krb5_normal_prompter>, result=0x97a533c)
    at v5.c:1012
#7  0x001f0c2a in pam_sm_authenticate (pamh=0x978b4e8, 
    flags=<value optimized out>, argc=1, argv=0x97977e8) at auth.c:228
#8  0x004ca28f in _pam_dispatch_aux () at pam_dispatch.c:108
#9  _pam_dispatch (pamh=0x978b4e8, flags=0, choice=1) at pam_dispatch.c:407
#10 0x004c9a91 in pam_authenticate (pamh=0x978b4e8, flags=0) at pam_auth.c:34
#11 0x08061c44 in try_pam_auth () at passdb-pam.c:163
#12 pam_verify_plain_call () at passdb-pam.c:276
#13 pam_verify_plain (request=0x9787458, password=0x9764040 "PASSWORD", 
    callback=0x8058400 <verify_plain_callback>) at passdb-pam.c:323
#14 0x0805800a in auth_worker_handle_passv (args=<value optimized out>)
    at auth-worker-client.c:169
#15 auth_worker_handle_line (line=<value optimized out>)
    at auth-worker-client.c:409
#16 auth_worker_input (client=0x9771690) at auth-worker-client.c:445
#17 0x0806f3b0 in io_loop_handler_run (ioloop=0x976c1a8) at ioloop-epoll.c:209
#18 0x0806e810 in io_loop_run (ioloop=0x976c1a8) at ioloop.c:336
#19 0x0805a98e in main (argc=Cannot access memory at address 0x1
) at main.c:349
(gdb)
Comment 13 Nalin Dahyabhai 2009-06-16 15:19:26 EDT
This doesn't look like the problem that 2.3.3 fixed, but then the tipoff there was an SELinux denial when the end of the unclosed pipe was passed along to a child process that didn't need it, so we got an AVC denial.

If you still have the core around, can you ask gdb to 'print i' and 'print conns[i].fd'?  If the value of conns[i].fd is high enough, that might put the bit being checked out-of-bounds, but the default size matches the default limit on open descriptors, so that'd be unusual.  Is there any chance that you've upped that limit?
Comment 14 Ben Webb 2009-06-16 15:53:58 EDT
Sure, I still have the core:
(gdb) print i
$1 = 6
(gdb) print conns[i].fd
$2 = 1380426625

That second value looks out of bounds to me. ;)

As for the default limits, I run dovecot with ulimit -n 2048 - see comment #1 on this bug. (If I don't, I run out of file descriptors pretty quickly.)
Comment 15 Nalin Dahyabhai 2009-06-16 17:09:38 EDT
(In reply to comment #14)
> Sure, I still have the core:
> (gdb) print i
> $1 = 6
> (gdb) print conns[i].fd
> $2 = 1380426625
> 
> That second value looks out of bounds to me. ;)
> 
> As for the default limits, I run dovecot with ulimit -n 2048 - see comment #1
> on this bug. (If I don't, I run out of file descriptors pretty quickly.)  

My bad.  Yeah, that'd definitely trigger a bug -- the sendto code doesn't resize the fd_set, and the default size only handles 1024 (unless I'm reading it wrong).

Hopefully the update fixes it so that you don't exercise handles that high.  That krb5 doesn't handle descriptors that high is probably a different-but-related bug.
Comment 16 Ben Webb 2009-06-17 10:38:08 EDT
(In reply to comment #15)
> Hopefully the update fixes it so that you don't exercise handles that high. 

OK, I am trying now with pam_krb5-2.3.5-1.fc10.i386 and will keep you updated. Meanwhile I collected a few more core files - each crash is in exactly the same place as the one I already showed you. So I'm pretty sure we know where the problem is now.

> That krb5 doesn't handle descriptors that high is probably a
> different-but-related bug.  

Perhaps it makes sense to reassign this bug to pam_krb5 then? Or I could certainly open a new bug for this "descriptors exceeding hard-coded limit" problem, although I'm not sure how I could reproduce it easily outside of this dovecot environment already described.
Comment 17 Nalin Dahyabhai 2009-06-17 13:21:00 EDT
(In reply to comment #16)
> > That krb5 doesn't handle descriptors that high is probably a
> > different-but-related bug.  
> 
> Perhaps it makes sense to reassign this bug to pam_krb5 then?

If this indeed fixes it, then definitely, yes.  (It should be closable as soon as the update gets moved from testing to stable.)

>                                                               Or I could
> certainly open a new bug for this "descriptors exceeding hard-coded limit"
> problem, although I'm not sure how I could reproduce it easily outside of this
> dovecot environment already described.  

We can contrive a test case if need be.

Thanks!
Comment 18 Ben Webb 2009-06-22 16:48:21 EDT
The problem with dovecot appears to be fixed with the latest pam_krb5. I have been running our IMAP server for several days and have not seen any "cannot contact KDC" errors or worker segfaults. Furthermore, looking in /proc/<PID>/fd/, where <PID> is the pid of each dovecot-auth process, I see only 10-20 fds open at any given time (previously I was typically seeing more than 1000, hence the need for the 'ulimit -n'). So as far as I can see, this bug can be closed once the pam_krb5 update makes it to stable.

For the krb5 segfault, I opened a separate bug, bug 507462.
Comment 19 Fedora Update System 2009-06-26 22:40:26 EDT
pam_krb5-2.3.5-1.fc10 has been pushed to the Fedora 10 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.