Red Hat Bugzilla – Bug 600465
Strange performance degradation modality (some relation to udev suspected)
Last modified: 2018-04-11 08:05:56 EDT
Created attachment 421329 [details]
X server log
Description of problem:
Short version: if CPU usage spikes, it never comes back down
Long version: I can get my machine into a state where CPU usage is high, sometimes by watching a flash video in my browser (perhaps in conjunction with some other activity), sometimes by other means. Naturally, at such times the load average climbs and the X server becomes less responsive. Normally the responsiveness should return to normal when the CPU hogs complete (or are killed), but this now reliably fails to happen. At such times, "top" shows Xorg using > 70%, a couple of udevd's using between 30% and 60% each, and for some reason apcupsd is always near the top of the list too.
Here's the weird part: I can ctrl-alt-f2 into a (textual) virtual terminal and my machine is just as snappy as always. The CPU usage and load average drop to near zero. But if I alt-f1 back to the X session, I'm immediately in performance hell again.
I've tried killing off offending processes, but sooner or the performance problem returns, usually without any obvious trigger (unlike when it first appears). The only solution is a restart, and on one occasion only a cold restart worked!
Version-Release number of selected component (if applicable):
See this thread about udevd sucking up CPU cycles under Ubuntu: http://ubuntuforums.org/showthread.php?t=1361018
At the moment my desktop is totally unusable because of this bug, despite several warm and cold restarts. Its recurrence has gotten worse since I described it above.
Not sure whether the X server really is the culprit, but it's still the case that switching to a textual virtual terminal makes the system immediately responsive again.
Very likely to be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=528312
Well, bug 528312 is temporary ... could you wait a minute or two? Does the system settle down after some time?
The system definitely doesn't settle down after waiting even as many as six or ten hours or so, which is about the longest I've observed the symptoms happening continuously. However, by killing off this or that process -- never the same ones, seemingly -- it's possible to get ordinary performance back for a short time, but after a few minutes the symptoms spontaneously return.
I've confirmed, via "udevadm monitor" that a "udev flood" relating to the Intel DRM system is happening at these times of performance degradation.
It's worth noting that this problem was recurring a few times each day when I first reported it, but in the past few days it has happened less than once per day. The only difference I can think of between then and now has been the ambient temperature -- it was very hot, but it cooled off. This weekend is supposed to be very hot again, so we'll see if the problem worsens...
BTW, if you read through all the comments of bug 528312 you'll see that though the symptoms are brief for some, they're persistent for others (like me), so "temporary" isn't really a good description of the problem.
Thanks for the bug report. We have reviewed the information you have provided above, and there is some additional information we require that will be helpful in our diagnosis of this issue.
Please add drm.debug=0x04 to the kernel command line, restart computer, wait until the freeze happens, collect the following info via ssh
* your X server config file (/etc/X11/xorg.conf, if available),
* X server log file (/var/log/Xorg.*.log)
* output of the dmesg command, and
* system log (/var/log/messages)
and attach to the bug report as individual uncompressed file attachments using the bugzilla file attachment link above.
We will review this issue again once you've had a chance to attach this information.
Thanks in advance.
(In reply to comment #5)
> BTW, if you read through all the comments of bug 528312 you'll see that though
> the symptoms are brief for some, they're persistent for others (like me), so
> "temporary" isn't really a good description of the problem.
Yes, I had for some time suspicion that we have two bugs in bug 538312 meshed together. Thank you for confirming my suspicion. Could we get aside from information requested in the comment 6 also some reasonable sample of the stderr/stdout output from udevadm when the issue happens, please?
Created attachment 423508 [details]
X server log from while the bug is happening
Created attachment 423509 [details]
dmesg output from when the bug is happening (with kernel param drm.debug=0x04)
Created attachment 423510 [details]
/var/log/messages from reboot to when the bug happens
Created attachment 423511 [details]
Very short excerpt of very repetitive "udevadm monitor" output
This is interesting: last night the problem recurred, but without any apparent udev activity. The X server was sluggish to the point of paralysis, as usual; switching to a text VT restored responsiveness; but there were no udevd's near the top of the "top" output, and no output from "udevadm monitor."
I've now seen the symptoms in this bug a couple of dozen times (sigh) but AFAIK this is the first time udev has been missing from the picture.
Update: I built a kernel with the patch from https://bugzilla.redhat.com/show_bug.cgi?id=528312#c118 (and ran it with the appropriate flags) and it DID NOT HELP. The symptoms appeared after a warm and a cold reboot.
So I tried disabling the uevent patch in xorg-x11-drv-intel as suggested in https://bugzilla.redhat.com/show_bug.cgi?id=528312#c70 and it DID HELP. That is, the udev storms continued to happen, but they did not slow X to a crawl. In fact, a udev storm is happening right now as I type this, but it's monopolizing just one of my four cores, which I can live with for now.
(Cross-posting this update to bug #528312.)
This message is a reminder that Fedora 12 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 12. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '12'.
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 12's end of life.
Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 12 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora please change the 'version' of this
bug to the applicable version. If you are unable to change the version,
please add a comment here and someone will do it for you.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
The process we are following is described here:
Fedora 12 changed to end-of-life (EOL) status on 2010-12-02. Fedora 12 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.
If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version.
Thank you for reporting this bug and we are sorry it could not be fixed.