Bug 821836
| Summary: | top stops showing the most cpu consuming process after a while | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Jussi Eloranta <eloranta> | ||||
| Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 16 | CC: | fedora, gansalmon, itamar, jcapik, jforbes, jonathan, kernel-maint, madhu.chinakonda | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2012-11-14 15:29:02 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Jussi Eloranta
2012-05-15 14:48:39 UTC
Hello Jussi. May I ask you to collect some debug data? Once the top starts showing the 0% load, please enter the following command (and replace the <pid> placeholder with PID of the "classical" process. Then wait approximately 10 seconds and stop it by pressing CTRL+C ... while : ;do cat /proc/<pid>/stat; sleep 1; done > process-stat.txt And ... please, don't stop the "classical" process once you have the result. I might need more information after analysing it. Attach the process-stat.txt here, please. Thank you. Regards, Jaromir. Maybe one more note. It would be nice to have a similar file recording the process state "shortly" before it falls down to 0% ... it doesn't necessarily be shortly before it happens ... let say one day before? Just don't do that immediately after the process start. So ... if it is possible, provide me with 2 files ... one recording stats before the issue appears and one after ... Or .... if you have enough disc space, you can start the recording shortly after starting the "classical" process and stop it when the issue appears. The recording takes nearly 200 bytes per second (=16MB/day). I believe your drive can handle that. Please, let me know. Regards, Jaromir. Created attachment 584982 [details]
requested stat file for the currently running program
Hi Jussi. And that's it. The counters are not changing. So, if you're sure the process takes the whole CPU, then this is very likely a kernel issue. Would you like me to change the component to kernel? Thanks and have a nice day. Regards, Jaromir. Yes, it is running fine as evidenced by its output. It outputs data every iteration and prints out the wall clock time / iter. The load average also stays at 32 (the process takes 32 cpus). Yes, change this to kernel - it would nice to get this sorted out. Ok, thank you. Changing to kernel then. What kernel version was this with? Do the recent 3.4 or 3.5 updates fix it? # Mass update to all open bugs. Kernel 3.6.2-1.fc16 has just been pushed to updates. This update is a significant rebase from the previous version. Please retest with this kernel, and let us know if your problem has been fixed. In the event that you have upgraded to a newer release and the bug you reported is still present, please change the version field to the newest release you have encountered the issue with. Before doing so, please ensure you are testing the latest kernel update in that release and attach any new and relevant information you may have gathered. If you are not the original bug reporter and you still experience this bug, please file a new report, as it is possible that you may be seeing a different problem. (Please don't clone this bug, a fresh bug referencing this bug in the comment is sufficient). With no response, we are closing this bug under the assumption that it is no longer an issue. If you still experience this bug, please feel free to reopen the bug report. |