Description of problem: Some site processes are consuming a lot of CPU power and seem to not be doing anything. Also, these processes never seem to finish. If I restart apache, things go back to normal for awhile, but then after a few hours, I see these processes start popping up again. Here's what they look like in top (note the unusual time+): PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 17598 libra_pa 20 0 213m 99m 3652 R 66.7 1.4 659:24.89 Rails: /var/www/stickshift/site 19645 libra_pa 20 0 206m 92m 1984 R 62.5 1.3 653:30.31 Rails: /var/www/stickshift/site I can't seem to figure out what they're doing as strace prints nothing (notice that I let them sit there for a few minutes before killing the strace). # time strace -f -s600 -p 17598 Process 17598 attached - interrupt to quit ^CProcess 17598 detached real 2m6.192s user 0m0.000s sys 0m0.011s # time strace -f -s600 -p 19645 Process 19645 attached - interrupt to quit ^CProcess 19645 detached real 2m19.966s user 0m0.001s sys 0m0.007s # Passenger still shows them as active: /var/www/stickshift/site: App root: /var/www/stickshift/site * PID: 28694 Sessions: 0 Processed: 3598 Uptime: 6m 46s * PID: 8410 Sessions: 0 Processed: 5568 Uptime: 18m 2s * PID: 28708 Sessions: 0 Processed: 13292 Uptime: 41m 30s * PID: 28384 Sessions: 0 Processed: 8214 Uptime: 43m 10s * PID: 28678 Sessions: 0 Processed: 213 Uptime: 6m 47s * PID: 28949 Sessions: 0 Processed: 156 Uptime: 6m 34s * PID: 28718 Sessions: 0 Processed: 8636 Uptime: 41m 22s * PID: 17598 Sessions: 1 Processed: 31 Uptime: 14h 19m 27s * PID: 19645 Sessions: 1 Processed: 5 Uptime: 14h 8m 20s Version-Release number of selected component (if applicable): rhc-site-0.96.8-1.el6_3.noarch How reproducible: Sporadic, but reliably reproducible. Steps to Reproduce: 1. Unknown. I just restart httpd and then after a few hours I see these processes taking up a lot of CPU. Actual results: Processes that are taking up a lot of CPU, but seem to not be doing anything. Expected results: Processes that are only taking up CPU power to actually do something.
Does this happen in the broker as well?
It doesn't seem to.
Thomas, any updates on getting repro info from production? Agreed in defect triage this can miss the sprint while debugging.
We saw it twice in 1 day, I restarted httpd both time, and after the second restart, we haven't seen the problem since. I'm going to close this bug and if it happens again, I'll re-open.