856532 – Huge CPU load caused by httpd workers

Bug 856532 - Huge CPU load caused by httpd workers

Summary: Huge CPU load caused by httpd workers

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	mod_auth_kerb
Sub Component:
Version:	5.8
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	rc
Target Release:	---
Assignee:	Luboš Uhliarik
QA Contact:	BaseOS QE Security Team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-09-12 09:04 UTC by Vojtech Juranek
Modified:	2021-01-14 09:21 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-03-13 17:02:53 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Vojtech Juranek 2012-09-12 09:04:41 UTC

Description of problem:
After switching to kerberos authentication (via mod_auth_kerb) we observed huge CPU load caused by httpd workers (example from top is bellow). It appears randomly, but quite often and httpd needs to be restarted to fix it. There is probably some dead lock, as strace shows waiting for futex:

futex(0x43a339d0, FUTEX_WAIT, 6054, NULL

Not sure, if it's related, but in the log we found following error message:

[Wed Sep 12 03:26:23 2012] [error] (120006)APR does not understand this error code: proxy: read response failed from 127.0.0.1:8009 (localhost)
[Wed Sep 12 03:27:45 2012] [error] ajp_read_header: ajp_ilink_receive failed
[Wed Sep 12 03:27:45 2012] [error] (120006)APR does not understand this error code: proxy: read response failed from 127.0.0.1:8009 (localhost)
[Wed Sep 12 03:36:43 2012] [error] [client 10.34.3.225] krb5_get_init_creds_password() failed: KDC reply did not match expectations, referer: https://jenkins.mw.lab.eng.bos.redhat.com/hudson/job/soa-6.0/30/

The problem appeared once we started to use mod_kerb_auth, so it's very likely it's a bug in mod_auth_kerb (and maybe be related to bugfix for BZ #734098)

Version-Release number of selected component (if applicable):
httpd-2.2.3-65.el5_8
mod_auth_kerb-5.1-3.el5_7.1

How reproducible:
Appears often, but randomly

Steps to Reproduce:
Cannot reproduce reliably
  
Actual results:
Huge CPU load caused by httpd workers

Expected results:
httpd workers don't consume a lot fo CPU

Additional info:
Example of CPU load from top:
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND     
 6050 apache    15   0  730m  33m 4116 S 506.5  0.1   4817:30 httpd.worker                                                                  
16956 apache    18   0  730m  32m 4104 S 383.9  0.1   3513:04 httpd.worker                                                                  
23840 apache    16   0  730m  35m 4100 S 327.6  0.1   3514:37 httpd.worker                                                                  
 8926 apache    15   0  646m  32m 4100 S 248.1  0.1   2618:09 httpd.worker                                                                  
 6052 apache    15   0  690m  33m 4116 S 84.1  0.1 874:47.38 httpd.worker

Comment 1 Joe Orton 2012-09-27 13:45:29 UTC

mod_auth_kerb-5.1-3.el5_7.1 should include the fix for the known threading problem.

Can you get a backtrace from the thread which is hung, or the strace output from one of the workers consuming lots of CPU time?

Comment 2 Joe Orton 2012-09-27 13:47:16 UTC

Also, it would be useful to know whether switching httpd to prefork solves the problem.

Comment 3 Vojtech Juranek 2012-10-04 09:02:55 UTC

Hi,
this is what I got from eng-ops:

[root@jenkins ~]# strace -p 22409 
Process 22409 attached - interrupt to quit

futex(0x47b3a9d0, FUTEX_WAIT, 22419, NULL
22409.pid (END)

Any idea how to get more detail information what is happening there?
Thanks

Note You need to log in before you can comment on or make changes to this bug.