Escalated to Bugzilla from IssueTracker
While authenticating apache with kerberos using mod_auth_kerb.so module, Apache opens /var/tmp/rc_HTTP_{UID} for reading and writing and never closes the file descriptor. For every authentication attempt, it opens this file with different fds (obviously) and all those fds remain open till httpd is restarted. Eventually the server runs out of file descriptor and apache fails to authenticate and other services on this system fail. How to reproduce: (In short) Involves three systems. KDC, Apache and Client - Configure /etc/krb5.conf of all three systems appropriate. (authconfig) - Setup a linux server as KDC. - Configure an RHEL4 system as the apache server. Install mod_auth_kerb package. - Make /var/www/html/krb protected by kerberos authentication. An example entry on my test system is as below in /etc/httpd/conf.d/auth_kerb.conf. <Directory /var/www/html/krb> AuthType Kerberos AuthName "Kerberos Login" KrbMethodNegotiate On KrbMethodK5Passwd Off KrbAuthRealms PNQ.REDHAT.COM Krb5KeyTab /etc/krb5.keytab require valid-user </Directory> - Create a service principle for HTTP. addprinc -randkey HTTP/dhcp6-122.pnq.redhat.com.COM Extract that to the keytab. ktadd HTTP/dhcp6-122.pnq.redhat.com - Restart httpd. - Log in as a kerberos users on the Client. Open Firefox. Perform as explained at http://www.grolmsnet.de/kerbtut/firefox.html - Access "http://dhcp6-122.pnq.redhat.com/krb" on the client. No password would be asked and Apache would authenticate the user using the ticket in the client and server cache. - Once this is done, close the firefox session and check /proc/apache-pid/fd to see the open fds which are not closed. Below is what I see in my test system after doing three authentications. # for i in `pidof httpd`; do ls -l /proc/$i/fd | grep rc_ ; done lrwx------ 1 root root 64 Apr 9 18:09 13 -> /var/tmp/rc_HTTP_48 lrwx------ 1 root root 64 Apr 9 18:09 13 -> /var/tmp/rc_HTTP_48 lrwx------ 1 root root 64 Apr 9 18:09 13 -> /var/tmp/rc_HTTP_48 Looking at an strace, it seems that it never closes the fd. Below is the strace from customer's system. 30383 stat("/var/tmp/rc_HTTP_10001", {st_dev=makedev(8, 18), st_ino=7292061, st_mode=S_IFREG|0600, st_nlink=1, st_uid=10001, st_gid=25, st_blksize=4096, st_blocks=8, st_size=906, st_atime=2009/04/08-07:04:06, st_mtime=2009/04/08-07:04:06, st_ctime=2009/04/08-07:04:06}) = 0 /* Running stat() in /var/tmp/rc_HTTP_10001 */ 30383 geteuid() = 10001 /* geteuid to get effective uid of user apache */ 30383 open("/var/tmp/rc_HTTP_10001", O_RDWR) = 1235 /* opened the /var/tmp/rc_HTTP_10001 and got file descriptor 1234*/ 30383 read(1235, "\5\1", 2) = 2 /* reading from the file*/ ............ ........... ......... 30383 read(1235, "", 4) = 0 /* Last Read from rc_HTTP_10001*/ 30383 lseek(1235, 906, SEEK_SET) = 906 /* Last lseek on rc_HTTP_10001*/ 30383 write(1235, "\25\0\0\0vsedlack.COM\0&\0\0\0HTTP/kenlx038.us.schp.com.COM\0X\4\0\0\246\204\334I", 75) = 75 /* writing to /var/tmp/rc_HTTP_10001 */ 30383 fsync(1235) /* Running fsync to flush the writes to rc_HTTP_10001 to disk immediately*/ After this it never issues a close(1235) to close the fd, hence the fd is kept open. This looks like a bug and I am currently going through the code to know where the code need to be patched. This need to be resolved as this affects customer's production environment. This problem does not happen in RHEL5 as it does not open file rc_HTTP_uid from the disk, but looks like it's differently implemnted, may be in in MEMORY: instead of FILE:. --Sadique This event sent from IssueTracker by mpoole [Support Engineering Group] issue 284502
Going through the code, this does not seem to be a problem with mod_auth_kerb, but with krb5 code. See the strace attached. geteuid() in the strace happens here. krb5-1.3.4/src/lib/krb5/krb/srv_rcache.c 48 unsigned long uid = geteuid(); Then it builds the cache name depending how it's called, may be. 81 strcpy(cachename, "rc_"); 82 p = 3; 83 for (i = 0; i < piece->length; i++) { 84 if (piece->data[i] == '-') { 85 cachename[p++] = '-'; 86 cachename[p++] = '-'; 87 continue; 88 } 89 if (!isvalidrcname((int) piece->data[i])) { 90 sprintf(tmp, "%03o", piece->data[i]); 91 cachename[p++] = '-'; 92 cachename[p++] = tmp[0]; 93 cachename[p++] = tmp[1]; 94 cachename[p++] = tmp[2]; 95 continue; 96 } 97 cachename[p++] = piece->data[i]; 98 } 99 100 #ifdef HAVE_GETEUID 101 cachename[p++] = '_'; 102 while (tens) { 103 cachename[p++] = '0' + ((uid / tens) % 10); 104 tens /= 10; 105 } 106 #endif 107 108 cachename[p++] = '\0'; Now it has rc_HTTP_uid_of_httpd in cachename Then it calls krb5_rc_resolve() to replay the cache. 110 if ((retval = krb5_rc_resolve(context, rcache, cachename))) krb5_rc_resolve() is defined in krb5-1.3.4/src/lib/krb5/rcache/rcfns.c krb5_rc_resolve (krb5_context context, krb5_rcache id, char *name) { return krb5_x((id)->ops->resolve,(context, id, name)); } krb5_x is a macro as defined in include/krb5.hin include/krb5.hin:#define krb5_x(ptr,args) ((ptr)?((*(ptr)) args):(abort(),1)) include/krb5.hin:#define krb5_xc(ptr,args) ((ptr)?((*(ptr)) args):(abort(),(char*)0)) include/krb5.hin:#define krb5_x(ptr,args) ((*(ptr)) args) include/krb5.hin:#define krb5_xc(ptr,args) ((*(ptr)) args) hmm,,, not sure where to go from here to know where the file descriptor need to be closed. SEG, any idea? Upstream and rhel5 is not using krb5_rc_resolve() in srv_rccache.c. It uses krb5_rc_resolve_full() as below. if ((retval = krb5_rc_resolve_full(context, &rcache, cachename))) krb5_rc_resolve_full() is available in rhel4 as well, but that does not work for me. May be the way it's called is not correct in my testing. --Sadique This event sent from IssueTracker by mpoole [Support Engineering Group] issue 284502
Attachmnts: 80-httpd-1880432.tar.bz2: sosreport from apache server. See customer's /etc/httpd/conf.d/ssl.conf to see kerberos authentication configurations. 30383.strace: Strace which clearly shows the file descriptor opened rc_HTTP_10001 is never closed. kenlx038.121608.txt: shows 5576 open file descriptors on the system. I have a reproducer setup in Pune LAB and can be accessed. KDC - 10.65.6.214 root/redkrb Server- 10.65.6.122 root/redhat client - 10.65.7.135 root/redhat --Sadique This event sent from IssueTracker by mpoole [Support Engineering Group] issue 284502
I used the below script to collect the strace from httpd processes while authentication happens. #!/bin/bash mkdir -p /tmp/httpd-1880432; rm -f /tmp/httpd-1880432/* for i in `pidof httpd`; do strace -fvvv -s 1024 -o /tmp/httpd-1880432/$i.strace -p $i & #ltrace -fS -n 4 -s 1024 -o /tmp/httpd-1880432/$i.ltrace -p $i & done This event sent from IssueTracker by mpoole [Support Engineering Group] issue 284502
The sosreport attachment has not come across from IT - can you attach that so I can look at the configuration? Also, can I log into those test boxes and try some debugging, sometime this week?
Created attachment 340534 [details] patch to makefile for ld parameters
Created attachment 340535 [details] patch to ensure compatible rcache setting is used on RHEL4
After much testing of variations we can confirm that mod_auth_kerb-5.1 from RHEL5 when tweaked a little does indeed fix the problem on RHEL4. I have attached the two required patches. The ld tweak is required as the linker in RHEL4 does not understand "-Wl,-export-symbols-regex -Wl,auth_kerb_module" The rcache tweak is due to fact that RHEL4 kerberos libraries cannot cope with a cachetype of "none" despite the fact this is used in the upstream code.
Will this fix be applied in 4.9 or is there an test RPM already built with the applied patches that we could play with in the meantime? I'm currently affected by this bug on RHEL4.8. Thanks!