Description of problem: After switching to kerberos authentication (via mod_auth_kerb) we observed huge CPU load caused by httpd workers (example from top is bellow). It appears randomly, but quite often and httpd needs to be restarted to fix it. There is probably some dead lock, as strace shows waiting for futex: futex(0x43a339d0, FUTEX_WAIT, 6054, NULL Not sure, if it's related, but in the log we found following error message: [Wed Sep 12 03:26:23 2012] [error] (120006)APR does not understand this error code: proxy: read response failed from 127.0.0.1:8009 (localhost) [Wed Sep 12 03:27:45 2012] [error] ajp_read_header: ajp_ilink_receive failed [Wed Sep 12 03:27:45 2012] [error] (120006)APR does not understand this error code: proxy: read response failed from 127.0.0.1:8009 (localhost) [Wed Sep 12 03:36:43 2012] [error] [client 10.34.3.225] krb5_get_init_creds_password() failed: KDC reply did not match expectations, referer: https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/soa-6.0/30/ The problem appeared once we started to use mod_kerb_auth, so it's very likely it's a bug in mod_auth_kerb (and maybe be related to bugfix for BZ #734098) Version-Release number of selected component (if applicable): httpd-2.2.3-65.el5_8 mod_auth_kerb-5.1-3.el5_7.1 How reproducible: Appears often, but randomly Steps to Reproduce: Cannot reproduce reliably Actual results: Huge CPU load caused by httpd workers Expected results: httpd workers don't consume a lot fo CPU Additional info: Example of CPU load from top: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 6050 apache 15 0 730m 33m 4116 S 506.5 0.1 4817:30 httpd.worker 16956 apache 18 0 730m 32m 4104 S 383.9 0.1 3513:04 httpd.worker 23840 apache 16 0 730m 35m 4100 S 327.6 0.1 3514:37 httpd.worker 8926 apache 15 0 646m 32m 4100 S 248.1 0.1 2618:09 httpd.worker 6052 apache 15 0 690m 33m 4116 S 84.1 0.1 874:47.38 httpd.worker
mod_auth_kerb-5.1-3.el5_7.1 should include the fix for the known threading problem. Can you get a backtrace from the thread which is hung, or the strace output from one of the workers consuming lots of CPU time?
Also, it would be useful to know whether switching httpd to prefork solves the problem.
Hi, this is what I got from eng-ops: [root@jenkins ~]# strace -p 22409 Process 22409 attached - interrupt to quit futex(0x47b3a9d0, FUTEX_WAIT, 22419, NULL 22409.pid (END) Any idea how to get more detail information what is happening there? Thanks