Bug 848182 - Runaway site processes are consuming 100% cpu
Runaway site processes are consuming 100% cpu
Status: CLOSED WORKSFORME
Product: OpenShift Origin
Classification: Red Hat
Component: Website (Show other bugs)
2.x
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: Clayton Coleman
libra bugs
: FutureFeature, Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-08-14 16:35 EDT by Thomas Wiest
Modified: 2015-05-14 21:13 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-08-21 11:12:17 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Thomas Wiest 2012-08-14 16:35:52 EDT
Description of problem:
Some site processes are consuming a lot of CPU power and seem to not be doing anything. Also, these processes never seem to finish.

If I restart apache, things go back to normal for awhile, but then after a few hours, I see these processes start popping up again.


Here's what they look like in top (note the unusual time+):
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
17598 libra_pa  20   0  213m  99m 3652 R 66.7  1.4 659:24.89 Rails: /var/www/stickshift/site            
19645 libra_pa  20   0  206m  92m 1984 R 62.5  1.3 653:30.31 Rails: /var/www/stickshift/site


I can't seem to figure out what they're doing as strace prints nothing (notice that I let them sit there for a few minutes before killing the strace).

# time strace -f -s600 -p 17598
Process 17598 attached - interrupt to quit
^CProcess 17598 detached

real    2m6.192s
user    0m0.000s
sys     0m0.011s
# time strace -f -s600 -p 19645
Process 19645 attached - interrupt to quit
^CProcess 19645 detached

real    2m19.966s
user    0m0.001s
sys     0m0.007s
#


Passenger still shows them as active:
/var/www/stickshift/site:
  App root: /var/www/stickshift/site
  * PID: 28694   Sessions: 0    Processed: 3598    Uptime: 6m 46s
  * PID: 8410    Sessions: 0    Processed: 5568    Uptime: 18m 2s
  * PID: 28708   Sessions: 0    Processed: 13292   Uptime: 41m 30s
  * PID: 28384   Sessions: 0    Processed: 8214    Uptime: 43m 10s
  * PID: 28678   Sessions: 0    Processed: 213     Uptime: 6m 47s
  * PID: 28949   Sessions: 0    Processed: 156     Uptime: 6m 34s
  * PID: 28718   Sessions: 0    Processed: 8636    Uptime: 41m 22s
  * PID: 17598   Sessions: 1    Processed: 31      Uptime: 14h 19m 27s
  * PID: 19645   Sessions: 1    Processed: 5       Uptime: 14h 8m 20s



Version-Release number of selected component (if applicable):
rhc-site-0.96.8-1.el6_3.noarch


How reproducible:
Sporadic, but reliably reproducible.


Steps to Reproduce:
1. Unknown. I just restart httpd and then after a few hours I see these processes taking up a lot of CPU.

  
Actual results:
Processes that are taking up a lot of CPU, but seem to not be doing anything.


Expected results:
Processes that are only taking up CPU power to actually do something.
Comment 1 Clayton Coleman 2012-08-15 11:28:09 EDT
Does this happen in the broker as well?
Comment 2 Thomas Wiest 2012-08-15 13:17:10 EDT
It doesn't seem to.
Comment 3 Clayton Coleman 2012-08-20 20:49:01 EDT
Thomas, any updates on getting repro info from production?

Agreed in defect triage this can miss the sprint while debugging.
Comment 4 Thomas Wiest 2012-08-21 11:12:17 EDT
We saw it twice in 1 day, I restarted httpd both time, and after the second restart, we haven't seen the problem since.

I'm going to close this bug and if it happens again, I'll re-open.

Note You need to log in before you can comment on or make changes to this bug.