848182 – Runaway site processes are consuming 100% cpu

Bug 848182 - Runaway site processes are consuming 100% cpu

Summary: Runaway site processes are consuming 100% cpu

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	OKD
Classification:	Red Hat
Component:	Website
Sub Component:
Version:	2.x
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Clayton Coleman
QA Contact:	libra bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-08-14 20:35 UTC by Thomas Wiest
Modified:	2015-05-15 01:13 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed:	2012-08-21 15:12:17 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Thomas Wiest 2012-08-14 20:35:52 UTC

Description of problem:
Some site processes are consuming a lot of CPU power and seem to not be doing anything. Also, these processes never seem to finish.

If I restart apache, things go back to normal for awhile, but then after a few hours, I see these processes start popping up again.


Here's what they look like in top (note the unusual time+):
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
17598 libra_pa  20   0  213m  99m 3652 R 66.7  1.4 659:24.89 Rails: /var/www/stickshift/site            
19645 libra_pa  20   0  206m  92m 1984 R 62.5  1.3 653:30.31 Rails: /var/www/stickshift/site


I can't seem to figure out what they're doing as strace prints nothing (notice that I let them sit there for a few minutes before killing the strace).

# time strace -f -s600 -p 17598
Process 17598 attached - interrupt to quit
^CProcess 17598 detached

real    2m6.192s
user    0m0.000s
sys     0m0.011s
# time strace -f -s600 -p 19645
Process 19645 attached - interrupt to quit
^CProcess 19645 detached

real    2m19.966s
user    0m0.001s
sys     0m0.007s
#


Passenger still shows them as active:
/var/www/stickshift/site:
  App root: /var/www/stickshift/site
  * PID: 28694   Sessions: 0    Processed: 3598    Uptime: 6m 46s
  * PID: 8410    Sessions: 0    Processed: 5568    Uptime: 18m 2s
  * PID: 28708   Sessions: 0    Processed: 13292   Uptime: 41m 30s
  * PID: 28384   Sessions: 0    Processed: 8214    Uptime: 43m 10s
  * PID: 28678   Sessions: 0    Processed: 213     Uptime: 6m 47s
  * PID: 28949   Sessions: 0    Processed: 156     Uptime: 6m 34s
  * PID: 28718   Sessions: 0    Processed: 8636    Uptime: 41m 22s
  * PID: 17598   Sessions: 1    Processed: 31      Uptime: 14h 19m 27s
  * PID: 19645   Sessions: 1    Processed: 5       Uptime: 14h 8m 20s



Version-Release number of selected component (if applicable):
rhc-site-0.96.8-1.el6_3.noarch


How reproducible:
Sporadic, but reliably reproducible.


Steps to Reproduce:
1. Unknown. I just restart httpd and then after a few hours I see these processes taking up a lot of CPU.

  
Actual results:
Processes that are taking up a lot of CPU, but seem to not be doing anything.


Expected results:
Processes that are only taking up CPU power to actually do something.

Comment 1 Clayton Coleman 2012-08-15 15:28:09 UTC

Does this happen in the broker as well?

Comment 2 Thomas Wiest 2012-08-15 17:17:10 UTC

It doesn't seem to.

Comment 3 Clayton Coleman 2012-08-21 00:49:01 UTC

Thomas, any updates on getting repro info from production?

Agreed in defect triage this can miss the sprint while debugging.

Comment 4 Thomas Wiest 2012-08-21 15:12:17 UTC

We saw it twice in 1 day, I restarted httpd both time, and after the second restart, we haven't seen the problem since.

I'm going to close this bug and if it happens again, I'll re-open.

Note You need to log in before you can comment on or make changes to this bug.