91098 – Upgrade to 2.4.20-13.9 now X runs at 100% cpu

Bug 91098 - Upgrade to 2.4.20-13.9 now X runs at 100% cpu

Summary: Upgrade to 2.4.20-13.9 now X runs at 100% cpu

Keywords:
Status:	CLOSED DUPLICATE of bug 73733
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	XFree86
Sub Component:
Version:	9
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Arjan van de Ven
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2003-05-18 00:01 UTC by John Horne
Modified:	2007-04-18 16:53 UTC (History)
CC List:	0 users
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2006-02-21 18:53:08 UTC
Embargoed:

Attachments	(Terms of Use)
strace of X process (473 bytes, text/plain) 2003-05-18 01:23 UTC, John Horne	no flags	Details
View All

Description John Horne 2003-05-18 00:01:20 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225

Description of problem:
Installed RHL9 with no problems and ran it for a couple of weeks with
no problem. Upgraded kernel to 2.4.20-9 no problem, and ran for couple of weeks
with no problem. Upgraded kernel last Thursday to 2.4.20-13.9 and system seemed
to hang every so often. Telnetting into system I could see X running at 100%
cpu. Ran previous kernel (2.4.20-9) and same thing happens on that now too. I
use KDE (upgraded via up2date). Example:

 00:32:17  up 1 day,  7:21,  0 users,  load average: 1.41, 1.21, 0.93
66 processes: 64 sleeping, 2 running, 0 zombie, 0 stopped
CPU states:  34.4% user   0.4% system   0.0% nice   0.0% iowait  65.0% idle
Mem:   513596k av,  489628k used,   23968k free,       0k shrd,  107512k buff
                    279032k actv,       0k in_d,    3212k in_c
Swap: 1044208k av,    2532k used, 1041676k free                  199732k cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
22486 root      25   0  267M  10M  4052 R    99.5  2.1  22:01   0 X

Uname shows:
Linux jhorne 2.4.20-9custom #4 Fri May 2 14:17:52 BST 2003 i686 i686 i386 GNU/Linux

'custom' is rebuilt kernel with NTFS module and correct (pentium 4) cpu set.
Problem exists in both kernels with unmodified kernels.

No errors seen in dmesg, messages or xfree86.0.log.

Graphics card is nvidia (ugh!) - problem exists with both redhat 'nv' and
nvidia's own 'nvidia' drivers. Disabled dri and glx - no change. Mouse is ps/2;
standard 102-key keyboard.

This is my work PC and I have already lost one day (Friday) just trying to get
the thing stable. Runs okay in command-line (init 3) mode, but no good as a
desktop PC.

PC is an RM accelerator 2GHz P4 Xeon.

Version-Release number of selected component (if applicable):
2.4.20-13.9 and now 2.4.20-9

How reproducible:
Always

Steps to Reproduce:
1. Just reboot :-(
2.
3.
    

Actual Results:  X runs at 100% cpu utilisation.

Expected Results:  X should run at low cpu utilisation.

Additional info:

Comment 1 John Horne 2003-05-18 01:23:03 UTC

Created attachment 91763 [details]
strace of X process

PC was running at 100%cpu. Note the 'top' command 'size' column is up to about
270MB of memory as well! This is usually a low value - even for X. I ran
stracve on the X process to see why it was cpu-bound, the attachment shows a
tight loop of some sort involving the ALRM signal.

Comment 2 John Horne 2003-05-18 19:10:35 UTC

From the debian mailing list (via google) I found a reference to this:

  http://marc.theaimsgroup.com/?l=xfree86&m=104395921006480&w=2

I don't understand all the techie stuff but I may see if I can rebuild xfree
without the 'SMART_SCHEDULE' to see if that gets around the problem for me.

Note - other messages from others who have had this problem seemed to indicate
that it is not a kernel problem nor an nvidia driver problem but an xfree server
problem which gets triggered by something in the nvidia drivers.

Comment 3 John Horne 2003-05-20 15:30:36 UTC

The 'XFree86' X server has an undocumented option it seems '-dumbSched' (from
the src rpm in /usr/src/redhat/SOURCES/xfree*/xc/programs/Xserver/utils.c. check
the path though!). This seems to disable the SMART_SCHEDULE stuff.

Tried using this but with no luck. By default it seems gdm is used and I cant
see how/where it starts the X server. Tried setting to use xdm server (putting
DISPLAYMANAGER in /etc/sysconfig/desktop) but X startup fails - error log says:

May  1 14:56:43 jhorne gdm[4460]: Failed to start X server several times in a  
short time period; disabling display :0

Sigh. I'm now using the on-board i810 graphics port, and have been allowed to
put in an order for an ATI card (I've used these at home with no porblems at all).

Comment 4 John Horne 2003-05-27 10:02:31 UTC

Moved to 'XFree86' component (from 'kernel') since this is an XFree issue and
not directly the kernel. I hope this is okay.

I note from other bugs reports that since I am (was) using an nvidia card, the
problem is complicated by being unable to support the nvidia drivers.

Comment 5 Mike A. Harris 2003-05-27 10:35:32 UTC

I can't see how a kernel upgrade alone, will cause *only* the "nv" and "nvidia"
driver to make X go to 100% CPU usage.

If your system has had the nvidia binary modules loaded at all since boot,
it is unsupported.  If you can reboot your system and never load the nvidia
kernel module at all, and reproduce this using only the "nv" driver as shipped
with Red Hat Linux, then please report this to http://bugs.xfree86.org and the
Nvidia "nv" driver maintainer (who works at Nvidia) will likely investigate
the problem.

Red Hat has no knowledge of the operation of Nvidia hardware, and no access
to the documentation of that hardware.



*** This bug has been marked as a duplicate of 73733 ***

Comment 6 jeffrey.buchsbaum 2003-06-02 14:06:33 UTC

Nvidia's installer now lets you recompile their module, so the REDHAT practice 
of calling a 9.0 bug the same thing as the 7.3 bug is MOOT. 

Please try to address the bug and/or contact Nvidia. My rh9 system with the 
same driver and card, same kernel version, just a single cpu and not smp is 
FINE at home.....so I think that the issue is with RedHat software not Nvidia 
software.

Everything is current.

For the person who tried to help and suggested I needed kdeartwork, I know I 
need that.  I appreciate the help.

rpm -qui kdeartwork gets a full listing of the kdeartwork build's stats, etc. 
It is installed.

I just wish that RedHat would put some effort into this and not just blaim 
others.

Mandrake doesn't have this problem...it makes me wonder if the hack in the 
kernel redhat has added are "good."

jb

Comment 7 Arjan van de Ven 2003-06-02 14:10:02 UTC

why would we contact NVidia to fix a bug YOU encounter with THEIR binary only
kernel module ?

Comment 8 Red Hat Bugzilla 2006-02-21 18:53:08 UTC

Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.

Note You need to log in before you can comment on or make changes to this bug.