Bug 856532 - Huge CPU load caused by httpd workers
Summary: Huge CPU load caused by httpd workers
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: mod_auth_kerb
Version: 5.8
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: rc
: ---
Assignee: Luboš Uhliarik
QA Contact: BaseOS QE Security Team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-09-12 09:04 UTC by Vojtech Juranek
Modified: 2021-01-14 09:21 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-03-13 17:02:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Vojtech Juranek 2012-09-12 09:04:41 UTC
Description of problem:
After switching to kerberos authentication (via mod_auth_kerb) we observed huge CPU load caused by httpd workers (example from top is bellow). It appears randomly, but quite often and httpd needs to be restarted to fix it. There is probably some dead lock, as strace shows waiting for futex:

futex(0x43a339d0, FUTEX_WAIT, 6054, NULL

Not sure, if it's related, but in the log we found following error message:

[Wed Sep 12 03:26:23 2012] [error] (120006)APR does not understand this error code: proxy: read response failed from 127.0.0.1:8009 (localhost)
[Wed Sep 12 03:27:45 2012] [error] ajp_read_header: ajp_ilink_receive failed
[Wed Sep 12 03:27:45 2012] [error] (120006)APR does not understand this error code: proxy: read response failed from 127.0.0.1:8009 (localhost)
[Wed Sep 12 03:36:43 2012] [error] [client 10.34.3.225] krb5_get_init_creds_password() failed: KDC reply did not match expectations, referer: https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/soa-6.0/30/

The problem appeared once we started to use mod_kerb_auth, so it's very likely it's a bug in mod_auth_kerb (and maybe be related to bugfix for BZ #734098)

Version-Release number of selected component (if applicable):
httpd-2.2.3-65.el5_8
mod_auth_kerb-5.1-3.el5_7.1

How reproducible:
Appears often, but randomly

Steps to Reproduce:
Cannot reproduce reliably
  
Actual results:
Huge CPU load caused by httpd workers

Expected results:
httpd workers don't consume a lot fo CPU

Additional info:
Example of CPU load from top:
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND     
 6050 apache    15   0  730m  33m 4116 S 506.5  0.1   4817:30 httpd.worker                                                                  
16956 apache    18   0  730m  32m 4104 S 383.9  0.1   3513:04 httpd.worker                                                                  
23840 apache    16   0  730m  35m 4100 S 327.6  0.1   3514:37 httpd.worker                                                                  
 8926 apache    15   0  646m  32m 4100 S 248.1  0.1   2618:09 httpd.worker                                                                  
 6052 apache    15   0  690m  33m 4116 S 84.1  0.1 874:47.38 httpd.worker

Comment 1 Joe Orton 2012-09-27 13:45:29 UTC
mod_auth_kerb-5.1-3.el5_7.1 should include the fix for the known threading problem.

Can you get a backtrace from the thread which is hung, or the strace output from one of the workers consuming lots of CPU time?

Comment 2 Joe Orton 2012-09-27 13:47:16 UTC
Also, it would be useful to know whether switching httpd to prefork solves the problem.

Comment 3 Vojtech Juranek 2012-10-04 09:02:55 UTC
Hi,
this is what I got from eng-ops:

[root@jenkins ~]# strace -p 22409 
Process 22409 attached - interrupt to quit

futex(0x47b3a9d0, FUTEX_WAIT, 22419, NULL
22409.pid (END)

Any idea how to get more detail information what is happening there?
Thanks


Note You need to log in before you can comment on or make changes to this bug.