From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225 Description of problem: Installed RHL9 with no problems and ran it for a couple of weeks with no problem. Upgraded kernel to 2.4.20-9 no problem, and ran for couple of weeks with no problem. Upgraded kernel last Thursday to 2.4.20-13.9 and system seemed to hang every so often. Telnetting into system I could see X running at 100% cpu. Ran previous kernel (2.4.20-9) and same thing happens on that now too. I use KDE (upgraded via up2date). Example: 00:32:17 up 1 day, 7:21, 0 users, load average: 1.41, 1.21, 0.93 66 processes: 64 sleeping, 2 running, 0 zombie, 0 stopped CPU states: 34.4% user 0.4% system 0.0% nice 0.0% iowait 65.0% idle Mem: 513596k av, 489628k used, 23968k free, 0k shrd, 107512k buff 279032k actv, 0k in_d, 3212k in_c Swap: 1044208k av, 2532k used, 1041676k free 199732k cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 22486 root 25 0 267M 10M 4052 R 99.5 2.1 22:01 0 X Uname shows: Linux jhorne 2.4.20-9custom #4 Fri May 2 14:17:52 BST 2003 i686 i686 i386 GNU/Linux 'custom' is rebuilt kernel with NTFS module and correct (pentium 4) cpu set. Problem exists in both kernels with unmodified kernels. No errors seen in dmesg, messages or xfree86.0.log. Graphics card is nvidia (ugh!) - problem exists with both redhat 'nv' and nvidia's own 'nvidia' drivers. Disabled dri and glx - no change. Mouse is ps/2; standard 102-key keyboard. This is my work PC and I have already lost one day (Friday) just trying to get the thing stable. Runs okay in command-line (init 3) mode, but no good as a desktop PC. PC is an RM accelerator 2GHz P4 Xeon. Version-Release number of selected component (if applicable): 2.4.20-13.9 and now 2.4.20-9 How reproducible: Always Steps to Reproduce: 1. Just reboot :-( 2. 3. Actual Results: X runs at 100% cpu utilisation. Expected Results: X should run at low cpu utilisation. Additional info:
Created attachment 91763 [details] strace of X process PC was running at 100%cpu. Note the 'top' command 'size' column is up to about 270MB of memory as well! This is usually a low value - even for X. I ran stracve on the X process to see why it was cpu-bound, the attachment shows a tight loop of some sort involving the ALRM signal.
From the debian mailing list (via google) I found a reference to this: http://marc.theaimsgroup.com/?l=xfree86&m=104395921006480&w=2 I don't understand all the techie stuff but I may see if I can rebuild xfree without the 'SMART_SCHEDULE' to see if that gets around the problem for me. Note - other messages from others who have had this problem seemed to indicate that it is not a kernel problem nor an nvidia driver problem but an xfree server problem which gets triggered by something in the nvidia drivers.
The 'XFree86' X server has an undocumented option it seems '-dumbSched' (from the src rpm in /usr/src/redhat/SOURCES/xfree*/xc/programs/Xserver/utils.c. check the path though!). This seems to disable the SMART_SCHEDULE stuff. Tried using this but with no luck. By default it seems gdm is used and I cant see how/where it starts the X server. Tried setting to use xdm server (putting DISPLAYMANAGER in /etc/sysconfig/desktop) but X startup fails - error log says: May 1 14:56:43 jhorne gdm[4460]: Failed to start X server several times in a short time period; disabling display :0 Sigh. I'm now using the on-board i810 graphics port, and have been allowed to put in an order for an ATI card (I've used these at home with no porblems at all).
Moved to 'XFree86' component (from 'kernel') since this is an XFree issue and not directly the kernel. I hope this is okay. I note from other bugs reports that since I am (was) using an nvidia card, the problem is complicated by being unable to support the nvidia drivers.
I can't see how a kernel upgrade alone, will cause *only* the "nv" and "nvidia" driver to make X go to 100% CPU usage. If your system has had the nvidia binary modules loaded at all since boot, it is unsupported. If you can reboot your system and never load the nvidia kernel module at all, and reproduce this using only the "nv" driver as shipped with Red Hat Linux, then please report this to http://bugs.xfree86.org and the Nvidia "nv" driver maintainer (who works at Nvidia) will likely investigate the problem. Red Hat has no knowledge of the operation of Nvidia hardware, and no access to the documentation of that hardware. *** This bug has been marked as a duplicate of 73733 ***
Nvidia's installer now lets you recompile their module, so the REDHAT practice of calling a 9.0 bug the same thing as the 7.3 bug is MOOT. Please try to address the bug and/or contact Nvidia. My rh9 system with the same driver and card, same kernel version, just a single cpu and not smp is FINE at home.....so I think that the issue is with RedHat software not Nvidia software. Everything is current. For the person who tried to help and suggested I needed kdeartwork, I know I need that. I appreciate the help. rpm -qui kdeartwork gets a full listing of the kdeartwork build's stats, etc. It is installed. I just wish that RedHat would put some effort into this and not just blaim others. Mandrake doesn't have this problem...it makes me wonder if the hack in the kernel redhat has added are "good." jb
why would we contact NVidia to fix a bug YOU encounter with THEIR binary only kernel module ?
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.