Bug 90014

Summary: gdm failure and automatic restart
Product: [Retired] Red Hat Linux Reporter: Simon Perreault <nomis80>
Component: XFree86Assignee: X/OpenGL Maintenance List <xgl-maint>
Status: CLOSED CURRENTRELEASE QA Contact: David Lawrence <dkl>
Severity: high Docs Contact:
Priority: medium    
Version: 9CC: mharris
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-09-29 22:31:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 100644    
Attachments:
Description Flags
/var/log/messages
none
/etc/X11/XF86Config
none
/var/log/XFree86.0.log
none
lspci -vvn none

Description Simon Perreault 2003-05-01 02:52:55 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.3) Gecko/20030312

Description of problem:
When I first boot up, gdm works fine. When I logout, gdm does not want to come back.

When I logout, X shuts down, X comes back up (I see the animated hourglass) and
then X shuts down automatically. gdm does not appear. X restarts several times
and then I get a text message saying that the X server could not be started
several times and so gdm has given up.

This is not really a gdm bug since kdm exhibits the exact same behavior.
However, kdm seems to be less intelligent than gdm since it loops forever
without saying that it has tried enough times and will abandon.

This is also clearly not an X bug since I can start X fine from the command
line. I have tried with the most basic X drivers, and X still works fine but gdm
or kdm wont start.

Since I had installed RH9 as an update to RH8, I tried formatting and
reinstalling from scratch. I did *not* play with the default system settings.
After about a week, the problem came back.

I looked at everything in /var/log but the problem is not logged anywhere.

I tried starting gdm or kdm by hand (ie. not by "init 5") and the problem still
happens.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Install RH9.
2. Do magic step?
3. Boot computer.
4. Login.
5. Logout.
    

Actual Results:  gdm kept restarting X.

Expected Results:  gdm should have started normally.

Additional info:

I tried with RH8 RPMs for gdm, kdm and pam. The bug was still there.

Comment 1 Havoc Pennington 2003-05-01 05:08:11 UTC
It could well be an X server bug, just one that only happens the second time you 
start the server or something.

What video hardware do you have?

Comment 2 Mike A. Harris 2003-05-01 06:02:14 UTC
Attach your X server log file and config file as bugzilla file attachments,
as well as /var/log/messages.



Comment 3 Simon Perreault 2003-05-01 14:04:58 UTC
Created attachment 91437 [details]
/var/log/messages

As you may see in the log, I did not have to login/logout. The display manager
came up fine, and then I logged in as root in a text console. I did init 3 ;
init 5 and the problem reappeared.

Comment 4 Simon Perreault 2003-05-01 14:05:29 UTC
Created attachment 91438 [details]
/etc/X11/XF86Config

Comment 5 Simon Perreault 2003-05-01 14:07:14 UTC
Created attachment 91439 [details]
/var/log/XFree86.0.log

Sorry, I forgot to mention that I'm running RH9 on a ThinkPad R32, which has an
ATI Radeon Mobility M6.

I also tried with the ati driver, which doesn't use dri, and that didn't fix
it.

Comment 6 Simon Perreault 2003-05-01 14:43:41 UTC
I want to stress that this is *not* an X bug. When I use startx, KDE loads 
fine and everything works. I have tried with rawhide (4.3.0-6) X rpms but 
gdm still doens't work. 
 
Also, the problem has gotten worse (but better from a debugging 
standpoint): gdm wont let me login once. It always restarts the X server 
once the animated hourglass has been seen. 

Comment 7 Havoc Pennington 2003-05-01 14:51:26 UTC
Understood that startx works, it is very possible to have X bugs that only 
happen in certain contexts or hardware states, though.

mharris: didn't gafton say M6 was messed up?



Comment 8 Simon Perreault 2003-05-01 14:57:06 UTC
Ok, I feel stupid now. I must be in the majority of bug reporters here eh?  
  
In /etc/fonts/fonts.conf, I had uncommented the part labeled "Enable  
sub-pixel rendering" so that it would do sub-pixel antialiasing on my  
LCD. When I recommented that part, gdm worked fine again.  
  
There are a few strange things though:  
1) Why did that setting affect gdm and not the rest of X? Why did it work  
with startx?  
2) Why was that setting problematic at all? If it comes commented by  
default, one would expect it to work if it got unocmmented.  
3) If X was at fault, why wasn't anything logged?  
4) And strangest of all, why do I still get sub-pixel antialiasing after I  
recommented that part of fonts.conf?  
  
Anyway, thanks for your time, keep up the good work!  

Comment 9 Havoc Pennington 2003-05-01 15:00:03 UTC
The fonts.conf subpixel setting probably gets overwritten by gnome/kde 
GUI-managed settings once you log in, but conceivably it changes how gdm 
does the subpixel.

If it caused this problem, I have no idea how. ;-) Perhaps it triggers
a different alpha blending codepath in the X server. 

Comment 10 George Lebl 2003-07-14 18:49:10 UTC
What likely happened is that the X server crashed and left its lockfile around.
 It is concievable that the pid from the lock file exists or some such thing.  X
would then think that its running on that display already.  Fix would be to add
the following code into gdm_server_wipe_cookies in server.c:


	    g_snprintf (buf, sizeof (buf), "/tmp/.X%d-lock", disp->dispnum);

	    unlink (buf);

	    g_snprintf (buf, sizeof (buf), "/tmp/.X11-unix/X%d", disp->dispnum);

	    unlink (buf);


Comment 11 George Lebl 2003-07-15 15:51:28 UTC
Just to clarify my comment on the above.  That wouldn't fix it if X was crashing
constantly of course, then gdm would just tell you that your X is crashing
constantly rather then the confusing message about display being busy.  It would
"fix" the problem where the X server crashes on SIGTERM, to allow a second
login.  The best way to fix it is how it's "fixed" in the CVS, which is to turn
AlwaysRestartServer back to false, however that has security implications for
the current stable release unless you reset rlimits in the code somewhere.  If I
would get a nickel for everytime pam or X are broken in some respect I'd be
rich.  (Of course if I had a penny for every time gdm is broken, I'd be even
richer:)

Comment 12 Alexander Larsson 2003-09-04 11:48:07 UTC
We upgraded to the new stable from CVS now, so this should be "fixed" for gdm
then. Still, XFree86 shouldn't crash like that.

Comment 13 Mike A. Harris 2003-09-04 17:46:06 UTC
(II) RADEON(0): Video RAM override, using 16384 kB instead of 16384 kB


Don't ever use the VideoRAM setting in your config file.  The driver will
autodetect RAM automatically properly.  Using this setting unless it is
absolutely required, will cause serious side effects and driver instability.


Comment 14 Mike A. Harris 2003-09-04 17:49:03 UTC
May  1 09:57:07 localhost kernel: PCI: Found IRQ 11 for device 01:00.0
May  1 09:57:07 localhost kernel: PCI: Sharing IRQ 11 with 00:1d.0

Can you please provide the complete output of the following command in
a text file and attach it:

    lspci -vvn

Comment 15 Simon Perreault 2003-09-04 18:04:26 UTC
Created attachment 94209 [details]
lspci -vvn

Here is `lspci -vvn`.

Comment 16 Mike A. Harris 2003-09-04 18:50:44 UTC
Your kernel log above shows the video card using IRQ11 however your
lspci shows it using IRQ10.  What is worse, is that the majority of
hardware in your entire system is all sharing IRQ 10, including the
video card.

Very scary.  ;o)

If there is a problem with shared IRQ's with _any_ of the hardware in your
system which has registered an interrupt handler in its driver, that
hardware can cause instability across the system in any other device driver
which has hooked the same shared IRQ.

One common example of this problem, is a sound card and video card sharing
the same IRQ, and starting X generating startup sounds.  If the sound driver
is instable or has any issues, it can cause a complete system lockup similar
to what is described in this report.

One suggestion, is thus to disable all non-essential hardware including
sound card, network card, etc. and see if X will start up ok in this state.
If it does start, it is likely that shared IRQ contention is occuring between
your device drivers.



Comment 17 Simon Perreault 2003-09-04 19:09:47 UTC
One thing I should have mentioned is that I recently upgraded to linux-2.6.0-test4. 
So this could explain things like IRQ numbers moving around. What could explain 
all devices having the same IRQ number could be that lspci needs to be recompiled 
to take the new kernel into account. 
 
This is a stock Thinkpad R32, so it can't be an unstable system. The only IRQ-
related problems I've been having is the keyboard generating duplicated signals or 
forgetting to send a "release" signal, but this has been fixed by enabling ACPI in 
the kernel. 
 
BTW, see the message I posted on 2003-05-01 10:57. gdm works fine now, even 
though I use kdm. There were still some open questions, but as a user my problem 
is fixed. I can still help you get information to help you answer the remaining open 
questions if you want to. 

Comment 18 Mike A. Harris 2003-10-06 11:13:34 UTC
I don't have ATI Radeon Mobility hardware, so I can't personally investigate
this issue on real hardware to troubleshoot the problem.  We do not support
the 2.6.x kernel being ran on our OS products as it is not officially released
yet, and not supported by us.

Are you still able to reproduce this problem using a stock RHL 9 system plus
official errata updates, including the latest official Red Hat kernel erratum?

Comment 19 Mike A. Harris 2004-09-29 22:31:07 UTC
Since this bugzilla report was filed, there have been several major
updates to the X Window System, which may resolve this issue.  We
encourage you to upgrade to the latest version of Fedora Core
(http://fedora.redhat.com).

If this issue turns out to still be reproduceable in the latest
version of Fedora Core, please file a bug report in the X.Org
bugzilla located at http://bugs.freedesktop.org in the "xorg"
component.

Once you've filed your bug report to X.Org, if you paste the new
bug URL here, Red Hat will continue to track the issue in the
centralized X.Org bug tracker, and will review any bug fixes
that become available for consideration in future updates.