Bug 856532

Summary: Huge CPU load caused by httpd workers
Product: Red Hat Enterprise Linux 5 Reporter: Vojtech Juranek <vjuranek>
Component: mod_auth_kerbAssignee: Luboš Uhliarik <luhliari>
Status: CLOSED INSUFFICIENT_DATA QA Contact: BaseOS QE Security Team <qe-baseos-security>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 5.8CC: jorton
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-03-13 17:02:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vojtech Juranek 2012-09-12 09:04:41 UTC
Description of problem:
After switching to kerberos authentication (via mod_auth_kerb) we observed huge CPU load caused by httpd workers (example from top is bellow). It appears randomly, but quite often and httpd needs to be restarted to fix it. There is probably some dead lock, as strace shows waiting for futex:

futex(0x43a339d0, FUTEX_WAIT, 6054, NULL

Not sure, if it's related, but in the log we found following error message:

[Wed Sep 12 03:26:23 2012] [error] (120006)APR does not understand this error code: proxy: read response failed from 127.0.0.1:8009 (localhost)
[Wed Sep 12 03:27:45 2012] [error] ajp_read_header: ajp_ilink_receive failed
[Wed Sep 12 03:27:45 2012] [error] (120006)APR does not understand this error code: proxy: read response failed from 127.0.0.1:8009 (localhost)
[Wed Sep 12 03:36:43 2012] [error] [client 10.34.3.225] krb5_get_init_creds_password() failed: KDC reply did not match expectations, referer: https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/soa-6.0/30/

The problem appeared once we started to use mod_kerb_auth, so it's very likely it's a bug in mod_auth_kerb (and maybe be related to bugfix for BZ #734098)

Version-Release number of selected component (if applicable):
httpd-2.2.3-65.el5_8
mod_auth_kerb-5.1-3.el5_7.1

How reproducible:
Appears often, but randomly

Steps to Reproduce:
Cannot reproduce reliably
  
Actual results:
Huge CPU load caused by httpd workers

Expected results:
httpd workers don't consume a lot fo CPU

Additional info:
Example of CPU load from top:
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND     
 6050 apache    15   0  730m  33m 4116 S 506.5  0.1   4817:30 httpd.worker                                                                  
16956 apache    18   0  730m  32m 4104 S 383.9  0.1   3513:04 httpd.worker                                                                  
23840 apache    16   0  730m  35m 4100 S 327.6  0.1   3514:37 httpd.worker                                                                  
 8926 apache    15   0  646m  32m 4100 S 248.1  0.1   2618:09 httpd.worker                                                                  
 6052 apache    15   0  690m  33m 4116 S 84.1  0.1 874:47.38 httpd.worker

Comment 1 Joe Orton 2012-09-27 13:45:29 UTC
mod_auth_kerb-5.1-3.el5_7.1 should include the fix for the known threading problem.

Can you get a backtrace from the thread which is hung, or the strace output from one of the workers consuming lots of CPU time?

Comment 2 Joe Orton 2012-09-27 13:47:16 UTC
Also, it would be useful to know whether switching httpd to prefork solves the problem.

Comment 3 Vojtech Juranek 2012-10-04 09:02:55 UTC
Hi,
this is what I got from eng-ops:

[root@jenkins ~]# strace -p 22409 
Process 22409 attached - interrupt to quit

futex(0x47b3a9d0, FUTEX_WAIT, 22419, NULL
22409.pid (END)

Any idea how to get more detail information what is happening there?
Thanks