Bug 117832 - httpd.worker consumes 100% CPU
Summary: httpd.worker consumes 100% CPU
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: httpd
Version: 3.0
Hardware: i686
OS: Linux
medium
high
Target Milestone: ---
Assignee: Joe Orton
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-03-09 00:42 UTC by Franky Wong
Modified: 2007-11-30 22:07 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-03-09 19:45:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Franky Wong 2004-03-09 00:42:30 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6)
Gecko/20040113

Description of problem:
I am using the worker MPM on Apache.  I start Apache with 4 children,
each maintaining a minimum of 50 threads.  After running for a few
hours, apache (e.g. one of the threads) starts to consume 100% CPU. 
Given a few more hours, there will be enough threads to take up all
four CPU in the webserver.

When the problem happens, I stop the network load balancer from
sending any more HTTP requests to the webserver.  With no requests
coming in, apache continue to consuming 100% CPU.

The problem happens on both the Apache 2.0.46 (/usr/sbin/httpd.worker)
that comes with RHEL 3.0, and Apache 2.0.48 that I compile from source.

Other webservers on Apache 2.0.48 and the same configuration, but
running RHEL 2.1, do not have this problem.

When the problem happens, I could use "top", "ps -fmHC httpd.worker"
or "ps -fmHC httpd" to find out the thread which is taking up the CPU.

Then I use gdb and attach the offending thread, which seems to be
spending its time in malloc_consolidate () from /lib/tls/libc.so.6. 
Other threads of the same parent process are in in _dl_sysinfo_int80
() from /lib/ld-linux.so.2.  The parent process itself is in
pthread_join () from /lib/tls/libpthread.so.0.  I think the parent
process and other threads are doing fine.  It is just the
malloc_consolidate but I cannot tell for sure.

Version-Release number of selected component (if applicable):
httpd-2.0.46-26.ent

How reproducible:
Always

Steps to Reproduce:
1. Start apache
2. Wait for an hour or two

Actual Results:  One of the apache threads starts to consume 100% CPU.
 Given enough time, there will be enough threads consuming all CPU on
the system.  They continue to consume CPU even when there is no http
traffic to the system.

Expected Results:  All HTTP requests require little or no server
processing.  Apache with the same configuration on other RHEL 2.1
webservers, and serving the same HTTP requests, take up less than 10%
CPU time.

Additional info:

# uname -a
Linux paprika 2.4.21-9.0.1.ELsmp #1 SMP Mon Feb 9 22:26:51 EST 2004
i686 i686 i386 GNU/Linux

RHEL 3.0 was installed with all package collections.

I have these in /etc/sysctl.conf.

net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
kernel.sysrq = 0
kernel.core_uses_pid = 1
fs.file-max = 65536
net.ipv4.tcp_fin_timeout=30
net.ipv4.tcp_keepalive_time=600
net.ipv4.tcp_syncookies=1
net.ipv4.icmp_ignore_bogus_error_responses=1

I use "ulimit -H -n 8192" in the script that starts apache.

/lib/tls/libc.so.6 is a sym link to libc-2.3.2.so, also in /lib/tls.

/lib/tls/libpthread.so.0 is a sym link to libpthread-0.60.so, also in
/lib/tls.

Comment 1 Joe Orton 2004-03-09 08:52:24 UTC
Hello, thanks for the report.  Can you give an idea of what modules
are in use here: is the server handling mostly static content, or is 
mod_include being exercised etc?

Can you get a backtrace from gdb out of a thread hung in
malloc_consolidate?

Comment 2 Joe Orton 2004-03-09 09:47:52 UTC
It would also be useful to know whether SSL is in use.

Comment 3 Franky Wong 2004-03-09 17:35:13 UTC
SSL is used, mod_include is not.  Also ColdFusion MX 6.1 is used and 
the module is compiled from source.  The webserver is handling static 
content, and sending dynamic content (.cfm files) to the local JRun 
server via the ColdFusion jrun module.

LoadModule access_module modules/mod_access.so
LoadModule auth_module modules/mod_auth.so
LoadModule log_config_module modules/mod_log_config.so
LoadModule setenvif_module modules/mod_setenvif.so
LoadModule ssl_module modules/mod_ssl.so
LoadModule mime_module modules/mod_mime.so
LoadModule status_module modules/mod_status.so
LoadModule autoindex_module modules/mod_autoindex.so
LoadModule info_module modules/mod_info.so
LoadModule dir_module modules/mod_dir.so
LoadModule alias_module modules/mod_alias.so
LoadModule rewrite_module modules/mod_rewrite.so
LoadModule 
jrun_module /raid/coldfusion/runtime/lib/wsconfig/1/mod_jrun20.so

The webserver was restarted this morning and I guess it has 
not "accumulated" enough time / traffic for any of the httpd 
processes to consume the CPU yet.  I will send in the backtrace when 
that happens.

The same problem happens on Xvnc on the same system (bug 117885), and 
apache 1.3.27 on another RHEL 3 system that has the same 
configuration (bug 117886).

Comment 4 Franky Wong 2004-03-09 19:07:54 UTC
# ps -fmHC httpd
UID        PID  PPID  C STIME TTY          TIME CMD
root     25041     1  0 Mar08 ?        00:00:04 /raid/apache/bin/httpd
-k start
apache   30029 25041  0 10:20 ?        00:00:00  
/raid/apache/bin/httpd -k start
apache   30031 30029  0 10:20 ?        00:00:00    
/raid/apache/bin/httpd -k start
apache   30033 30029  0 10:20 ?        00:00:00    
/raid/apache/bin/httpd -k start
apache   30038 30029 91 10:20 ?        00:44:05    
/raid/apache/bin/httpd -k start
apache   30053 30029  0 10:20 ?        00:00:00    
/raid/apache/bin/httpd -k start
apache   30055 30029  0 10:20 ?        00:00:00    
/raid/apache/bin/httpd -k start
apache   30063 30029  0 10:20 ?        00:00:00    
/raid/apache/bin/httpd -k start
apache   30064 30029  0 10:20 ?        00:00:00    
/raid/apache/bin/httpd -k start
apache   30068 30029  0 10:20 ?        00:00:00    
/raid/apache/bin/httpd -k start
apache   30073 30029  0 10:20 ?        00:00:00    
/raid/apache/bin/httpd -k start
...

# gdb
GNU gdb Red Hat Linux (6.0post-0.20031117.6rh)
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and
you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for
details.
This GDB was configured as "i386-redhat-linux-gnu".
(gdb) attach 30029
Attaching to process 30029
Reading symbols from /raid/apache/bin/httpd...(no debugging symbols
found)...done.
Using host libthread_db library "/lib/tls/libthread_db.so.1".
Reading symbols from /usr/lib/libz.so.1...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /lib/libssl.so.4...(no debugging symbols
found)...done.
Loaded symbols for /lib/libssl.so.4
Reading symbols from /lib/libcrypto.so.4...(no debugging symbols
found)...done.
Loaded symbols for /lib/libcrypto.so.4
Reading symbols from /raid/apache/lib/libaprutil-0.so.0...(no
debugging symbols found)...done.
Loaded symbols for /raid/apache/lib/libaprutil-0.so.0
Reading symbols from /usr/lib/libgdbm.so.2...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib/libgdbm.so.2
Reading symbols from /lib/libdb-4.1.so...(no debugging symbols
found)...done.
Loaded symbols for /lib/libdb-4.1.so
Reading symbols from /lib/tls/libpthread.so.0...(no debugging symbols
found)...done.
[Thread debugging using libthread_db enabled]
[New Thread -1223506176 (LWP 30029)]
[New Thread -1675289680 (LWP 30073)]
[New Thread -1622840400 (LWP 30068)]
[New Thread -1580880976 (LWP 30064)]
[New Thread -1570391120 (LWP 30063)]
[New Thread -1486472272 (LWP 30055)]
[New Thread -1465492560 (LWP 30053)]
[New Thread -1308144720 (LWP 30038)]
[New Thread -1255695440 (LWP 30033)]
[New Thread -1234715728 (LWP 30031)]
Loaded symbols for /lib/tls/libpthread.so.0
Reading symbols from /usr/lib/libexpat.so.0...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib/libexpat.so.0
Reading symbols from /raid/apache/lib/libapr-0.so.0...(no debugging
symbols found)...done.
Loaded symbols for /raid/apache/lib/libapr-0.so.0
Reading symbols from /lib/tls/librt.so.1...(no debugging symbols
found)...done.
Loaded symbols for /lib/tls/librt.so.1
Reading symbols from /lib/tls/libm.so.6...(no debugging symbols
found)...done.
Loaded symbols for /lib/tls/libm.so.6
Reading symbols from /lib/libcrypt.so.1...(no debugging symbols
found)...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /lib/libnsl.so.1...(no debugging symbols
found)...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libdl.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/tls/libc.so.6...(no debugging symbols
found)...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /usr/kerberos/lib/libgssapi_krb5.so.2...(no
debugging symbols found)...done.
Loaded symbols for /usr/kerberos/lib/libgssapi_krb5.so.2
Reading symbols from /usr/kerberos/lib/libkrb5.so.3...(no debugging
symbols found)...done.
Loaded symbols for /usr/kerberos/lib/libkrb5.so.3
Reading symbols from /usr/kerberos/lib/libcom_err.so.3...(no debugging
symbols found)...done.
Loaded symbols for /usr/kerberos/lib/libcom_err.so.3
Reading symbols from /usr/kerberos/lib/libk5crypto.so.3...(no
debugging symbols found)...done.
Loaded symbols for /usr/kerberos/lib/libk5crypto.so.3
Reading symbols from /lib/libresolv.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/ld-linux.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /raid/apache/modules/mod_access.so...(no
debugging symbols found)...done.
Loaded symbols for /raid/apache/modules/mod_access.so
Reading symbols from /raid/apache/modules/mod_auth.so...(no debugging
symbols found)...done.
Loaded symbols for /raid/apache/modules/mod_auth.so
Reading symbols from /raid/apache/modules/mod_log_config.so...(no
debugging symbols found)...done.
Loaded symbols for /raid/apache/modules/mod_log_config.so
Reading symbols from /raid/apache/modules/mod_setenvif.so...(no
debugging symbols found)...done.
Loaded symbols for /raid/apache/modules/mod_setenvif.so
Reading symbols from /raid/apache/modules/mod_ssl.so...(no debugging
symbols found)...done.
Loaded symbols for /raid/apache/modules/mod_ssl.so
Reading symbols from /raid/apache/modules/mod_mime.so...(no debugging
symbols found)...done.
Loaded symbols for /raid/apache/modules/mod_mime.so
Reading symbols from /raid/apache/modules/mod_status.so...(no
debugging symbols found)...done.
Loaded symbols for /raid/apache/modules/mod_status.so
Reading symbols from /raid/apache/modules/mod_autoindex.so...(no
debugging symbols found)...done.
Loaded symbols for /raid/apache/modules/mod_autoindex.so
Reading symbols from /raid/apache/modules/mod_info.so...(no debugging
symbols found)...done.
Loaded symbols for /raid/apache/modules/mod_info.so
Reading symbols from /raid/apache/modules/mod_dir.so...(no debugging
symbols found)...done.
Loaded symbols for /raid/apache/modules/mod_dir.so
Reading symbols from /raid/apache/modules/mod_alias.so...(no debugging
symbols found)...done.
Loaded symbols for /raid/apache/modules/mod_alias.so
Reading symbols from /raid/apache/modules/mod_rewrite.so...(no
debugging symbols found)...done.
Loaded symbols for /raid/apache/modules/mod_rewrite.so
Reading symbols from
/raid/coldfusion/runtime/lib/wsconfig/1/mod_jrun20.so...(no debugging
symbols found)...done.
Loaded symbols for /raid/coldfusion/runtime/lib/wsconfig/1/mod_jrun20.so
Reading symbols from /lib/libgcc_s.so.1...(no debugging symbols
found)...done.
Loaded symbols for /lib/libgcc_s.so.1
0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2

(gdb) info threads
  10 Thread -1234715728 (LWP 30031)  0xb75ebc32 in _dl_sysinfo_int80
() from /lib/ld-linux.so.2
  9 Thread -1255695440 (LWP 30033)  0xb75ebc32 in _dl_sysinfo_int80 ()
from /lib/ld-linux.so.2
  8 Thread -1308144720 (LWP 30038)  0xb7235123 in malloc_consolidate
() from /lib/tls/libc.so.6
  7 Thread -1465492560 (LWP 30053)  0xb75ebc32 in _dl_sysinfo_int80 ()
from /lib/ld-linux.so.2
  6 Thread -1486472272 (LWP 30055)  0xb75ebc32 in _dl_sysinfo_int80 ()
from /lib/ld-linux.so.2
  5 Thread -1570391120 (LWP 30063)  0xb75ebc32 in _dl_sysinfo_int80 ()
from /lib/ld-linux.so.2
  4 Thread -1580880976 (LWP 30064)  0xb75ebc32 in _dl_sysinfo_int80 ()
from /lib/ld-linux.so.2
  3 Thread -1622840400 (LWP 30068)  0xb75ebc32 in _dl_sysinfo_int80 ()
from /lib/ld-linux.so.2
  2 Thread -1675289680 (LWP 30073)  0xb75ebc32 in _dl_sysinfo_int80 ()
from /lib/ld-linux.so.2
  1 Thread -1223506176 (LWP 30029)  0xb75ebc32 in _dl_sysinfo_int80 ()
from /lib/ld-linux.so.2

(gdb) thread 8
[Switching to thread 8 (Thread -1308144720 (LWP 30038))]#0  0xb7235123
in malloc_consolidate ()
   from /lib/tls/libc.so.6

(gdb) bt
#0  0xb7235123 in malloc_consolidate () from /lib/tls/libc.so.6
#1  0xb7234769 in _int_malloc () from /lib/tls/libc.so.6
#2  0xb7233b0d in malloc () from /lib/tls/libc.so.6
#3  0xb70d42dd in tvToMillis () from
/raid/coldfusion/runtime/lib/wsconfig/1/mod_jrun20.so
#4  0xb70d13cf in ?? () from
/raid/coldfusion/runtime/lib/wsconfig/1/mod_jrun20.so
#5  0xb2074810 in ?? ()
#6  0x000002df in ?? ()
#7  0x00000004 in ?? ()
#8  0x00000026 in ?? ()
#9  0x08203490 in ?? ()
#10 0x00000018 in ?? ()
#11 0xb2074268 in ?? ()
#12 0x000002de in ?? ()
#13 0xb2074810 in ?? ()
#14 0x082016f0 in ?? ()
#15 0xb2074268 in ?? ()
#16 0xb70d2d54 in longerThan () from
/raid/coldfusion/runtime/lib/wsconfig/1/mod_jrun20.so
Previous frame identical to this frame (corrupt stack?)

(gdb) detach
Detaching from program: /raid/apache/bin/httpd, process 30029


Comment 5 Joe Orton 2004-03-09 19:45:55 UTC
We can only deal with bug reports for bugs in software which is
included in Red Hat Enterprise Linux; it's not clear what
"/raid/apache/bin/httpd" is in your backtraces above.

Backtraces into malloc are usually some kind of memory corruption bugs
(such as a double-free); since the module calling malloc here seems to
be the third-party ColdFusion software, that would be the first
suspect of the cause of the bug; you should report this issue to the
module vendor in the first place.

#0  0xb7235123 in malloc_consolidate () from /lib/tls/libc.so.6
#1  0xb7234769 in _int_malloc () from /lib/tls/libc.so.6
#2  0xb7233b0d in malloc () from /lib/tls/libc.so.6
#3  0xb70d42dd in tvToMillis () from
/raid/coldfusion/runtime/lib/wsconfig/1/mod_jrun20.so
#4  0xb70d13cf in ?? () from
/raid/coldfusion/runtime/lib/wsconfig/1/mod_jrun20.so

To track this down further, you could use debugging tricks such as
running httpd.worker under ElectricFence:

 # ef /usr/sbin/httpd.worker

or enabling extra malloc checking in glibc:

 # MALLOC_CHECK_=2 /usr/sbin/httpd.worker

There is one known obscure memory handling bug in httpd.worker but it
looks unlikely that it is the cause this bug; you can try the test
update packages at:

  http://people.redhat.com/jorton/Taroon-httpd/

to eliminate that possibility.

Please re-open this report if a likely issue in the httpd package is
identified.



Note You need to log in before you can comment on or make changes to this bug.