79678 – Radeon hangs on logout

Bug 79678 - Radeon hangs on logout

Summary: Radeon hangs on logout

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	XFree86
Sub Component:
Version:	9
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	X/OpenGL Maintenance List
QA Contact:
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	80690 81702 (view as bug list)
Depends On:	80968
Blocks:	79578 82776
TreeView+	depends on / blocked

Reported:	2002-12-14 23:27 UTC by Kjartan Maraas
Modified:	2008-06-14 14:01 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2008-06-14 14:01:49 UTC
Embargoed:

Attachments	(Terms of Use)
XFree logfile (28.43 KB, text/plain) 2002-12-15 23:26 UTC, Kjartan Maraas	no flags	Details
This is /var/log/messages (17.62 KB, text/plain) 2002-12-15 23:28 UTC, Kjartan Maraas	no flags	Details
djoo's XF86 config file (3.14 KB, application/octet-stream) 2003-01-15 23:19 UTC, David Joo	no flags	Details
djoo's xsession-errors (1.29 KB, application/octet-stream) 2003-01-15 23:21 UTC, David Joo	no flags	Details
djoo's var/log/messages (226.88 KB, application/octet-stream) 2003-01-15 23:22 UTC, David Joo	no flags	Details
View All

Description Kjartan Maraas 2002-12-14 23:27:53 UTC

Description of problem:
I see a hard hang when logging out using the latest XFree from rawhide. This is
with a Compaq laptop that has a Radeon mobility card.

01:00.0 VGA compatible controller: ATI Technologies Inc Radeon Mobility M6 LY (p
rog-if 00 [VGA])
        Subsystem: Compaq Computer Corporation: Unknown device b111
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Step
ping+ SERR- FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort
- <MAbort- >SERR- <PERR-
        Latency: 66 (2000ns min), cache line size 08
        Interrupt: pin A routed to IRQ 11
        Region 0: Memory at 48000000 (32-bit, prefetchable) [size=128M]
        Region 1: I/O ports at 3000 [size=256]
        Region 2: Memory at 40200000 (32-bit, non-prefetchable) [size=64K]
        Expansion ROM at <unassigned> [disabled] [size=128K]
        Capabilities: [58] AGP version 2.0
                Status: RQ=47 SBA+ 64bit- FW- Rate=x1,x2,x4
                Command: RQ=31 SBA+ AGP+ 64bit- FW- Rate=x1
        Capabilities: [50] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot
-,D3cold-)


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
    
Actual results:


Expected results:


Additional info:

Comment 1 Mike A. Harris 2002-12-15 08:25:46 UTC

Please supply XFree86 log file and config file, in all XFree86 bug reports
regardless of problem nature.  If it is a problem causing a crash/hang,
also attach /var/log/messages.

Comment 2 Kjartan Maraas 2002-12-15 23:26:47 UTC

Created attachment 88747 [details]
XFree logfile

Comment 3 Kjartan Maraas 2002-12-15 23:28:17 UTC

Created attachment 88748 [details]
This is /var/log/messages

Comment 4 Mike A. Harris 2002-12-17 22:17:08 UTC

Does this occur in beta2 as well?

Comment 5 Kjartan Maraas 2002-12-18 19:16:28 UTC

I'll see if I can downgrade to that version. The rawhide version is newer than
the one in beta2 I think. Hmm. There is no XFree in rawhide at the moment...

Comment 6 Mike A. Harris 2002-12-18 21:12:12 UTC

I meant more "does this occur in current XFree86" than in beta2 specifically,
sorry for the confusion.

Comment 7 Kjartan Maraas 2002-12-18 21:19:17 UTC

Yes, it happens with the latest XFree I found in rawhide a couple of days ago,
20021210-something I think. I also see the keyboard/pointer stop responding when
the screen blanking sets in when running on battery. If I do alt+ctrl+backspace
I don't see the hang though...

Comment 8 Mike A. Harris 2002-12-20 11:22:59 UTC

Dec 15 16:19:49 localhost cardmgr[682]: starting, version is 3.1.31
des 15 16:19:49 localhost rc: Starting pcmcia:  succeeded
Dec 15 16:19:49 localhost cardmgr[682]: config error, file 'config' line 1053:
syntax error
Dec 15 16:19:49 localhost cardmgr[682]: config error, file 'config' line 2129:
no function bindings
Dec 15 16:19:49 localhost cardmgr[682]: watching 2 sockets
Dec 15 16:19:49 localhost cardmgr[682]: Card Services release does not match

That looks suspicious.

Comment 9 Mike A. Harris 2002-12-20 11:25:21 UTC

Can you paste more of the messages file, perhaps over a few crash/reboot
cycles?  I think this may be a non-XFree86 problem perhaps.

Comment 10 Mike A. Harris 2003-01-14 20:46:16 UTC

By the way...  My last comment, I meant that the cardmgr errors in the
logfile snippet above that were not X related problems.  I wasn't meaning
this bug report isn't X related..  just to clarify, since when I reread
what I said above it sounded wrong.

Comment 11 Kjartan Maraas 2003-01-14 21:29:50 UTC

The problem goes away if I disable DRI.

Comment 12 Mike A. Harris 2003-01-15 07:44:17 UTC

Djoo, can you add your data here too?  Attach log+config.

Could both of you also attach your /var/log/messages and make sure it's big
enough to contain useful info from boot time onward (or attach the logrotated
ones also if need be).

I believe this is a kernel DRM issue.

Comment 13 Mike A. Harris 2003-01-15 07:48:22 UTC

Changed bug to be for public-beta for duping dupes against.

Comment 14 Mike A. Harris 2003-01-15 07:50:45 UTC

*** Bug 80690 has been marked as a duplicate of this bug. ***

Comment 15 Mike A. Harris 2003-01-15 08:00:38 UTC

Added djoo to CC, as he has the same problem.
 
Since this is DRI related, and since it only happens when using ?dm, I suspect
there is a race condition of some kind in the kernel DRM.  When using startx,
it seems to not crash, but when using ?dm it does.

In addition to supplying all of your kernel messages logs showing the kernel
crash (hopefully), could some of you try the following:

Make sure DRI is enabled first.  Run startx, run some 3D apps, quit X.  Wait
10 seconds, repeat.  Do this 3 or 4 times and see if you get a hang.  If not,
then proceed.

Create this script, and run it:

#!/bin/bash
startx $@
startx $@
startx $@
startx $@
startx $@

Run the script, and it should startx up, then run some 3D apps, then quit X
and the script should quit and restart X immediately.  I want to see if we
can get the machine to lock up merely by using startx with no time delay,
in order to test the theory it is a DRI related race condition and that
startx or xdm/kdm/gdm doesn't matter.

Please update the bug report with the results of this testing.

Comment 16 Mike A. Harris 2003-01-15 08:06:06 UTC

If the above test does what I think it will, we've got more data to go on
for a proper fix.  In case we don't find one however, we can probably insert
a couple second delay somewhere to bandaid over the hypothetical race.

Comment 17 Mike A. Harris 2003-01-15 12:11:20 UTC

*** Bug 81702 has been marked as a duplicate of this bug. ***

Comment 18 David Joo 2003-01-15 23:19:04 UTC

Created attachment 89393 [details]
djoo's XF86 config file

Comment 19 David Joo 2003-01-15 23:21:11 UTC

Created attachment 89394 [details]
djoo's xsession-errors

Comment 20 David Joo 2003-01-15 23:22:13 UTC

Created attachment 89395 [details]
djoo's var/log/messages

Comment 21 ctm 2003-01-15 23:59:52 UTC

As I mentioned in 80690, although I didn't do it enough times to be
statistically significant, doing a "sync" before I logged out seemed to lessen
the likelihood of the machine locking up.  In my lockup case, all I needed to do
most times was boot the machine, log in and log out, without running any 3D apps.

I hope you'll reconsider and not add a couple second delay to decrease the
likelihood of the machine locking up.  If the problem is related to a race
condition between other system I/O and something to do with DRI, then such a
delay may indeed help some people, but is likely to burn people who have their
machines doing something (perhaps important) when they log out.

I've been bitten so many times by goofy delays being added to software that I
can't keep track of them.  One example is the sleep that was added to
/etc/rc.d/init.d/postgresql after starting the postgresql server before calling
pidof.  On two machines I administered the delay wasn't always enough, so
sometimes postgresql just wouldn't come up.  I can think of two more off the top
of my head, but it seems that every engineer who puts in a sleep recognizes the
other sleeps as bad but sees his own as justified by his special circumstances.
 Meanwhile, software gets slower and more flaky.

Comment 22 Mike A. Harris 2003-01-16 00:24:06 UTC

ctm:

Well, let me perhaps put it a different way then.  Considering there are
over 200 open X bug reports assigned to one single engineer, in all
likelyhood, some of these 200 bug reports are not going to get fixed in
time for the final release of the operating system.  In such case, one
wants to fix as many bugs as possible - be it by actually "fixing" the
real problem, or by providing a temporary "workaround" that eliminates
the problem behaviour.

There are 2 possible outcomes in the case I propose above:

1) The problem can be directly identified and nailed down and a proper
   fix can be had.

2) The problem does not get found in time, and so would end up not being
   fixed with a "proper" fix.

Let's assume for all intents and purposes that this bug ends up being
one of the #2 types.  That isn't at all unreasonable with 200 open bug
reports to deal with.  In such a case, there are 2 alternatives that
I can see:

1) Provide a workaround that allows the user to use their computer and not
   experience the problem being reported in the report, even though it
   may be a temporary workaround such as a time delay.

2) Do nothing, and leave the user's computer crash hard, possibly losing
   data, and requiring a hard reboot.

If faced with these choices, what would you choose?  Of course, you'd
choose the first one, which is the "proper" fix.  Let's say that that
does not happen for one reason or another.  Do you choose #1 above, which
allows you, as well as other people having the same problem to use your
computer, or #2 which causes your machine to crash.

It is not possible to fix 200 bugs.  It is possible to fix some of them
however, and it's possible to provide workarounds for many of the remaining
ones.  If a workaround is simple enough in lieu of a proper fix, it is
rediculous to not provide it, simply due to the reasons you've outlined
above.

So, while I appreciate you providing valid and useful data that can
contribute to this bug report receiving a proper fix, it is entirely
a just as feasible that it will not.  Either way, I will be the judge
of wether A, B, or C happens, and I'll only get to one of those
resolutions by having full co-operation of the people having the problems,
and without negative commentary.

Also, the software is likely to be much much less flaky, if more people
volunteer to debug and troubleshoot it, and contribute patches to fix
bugs also.  If you're interested in helping do so, I'd be more than
willing to help you, or anyone else learn how to debug the X server.

Comment 23 Need Real Name 2003-01-16 04:43:53 UTC

mharris

Now let me chime in. It's perfectly logical that a fix as proposed could get
done as Mike has so wonderfully put. On the other hand it's not like the end of
the world for us poor radeon users,
we would just have to bite the bullet and buy some other functional card for
3D... hmmm guess that leaves only nvidia. 

I don't like that scenario. 

Mike what do you want us to do to help you out?

Comment 24 Kjartan Maraas 2003-01-16 09:06:28 UTC

I tried the script above and couldn't get it to hang that way, so maybe it's
related to the display manager instead? I'll try using xdm or kdm and see what
happens. As to having a choice to swap the card...that's kindof hard for us
laptop users ;-)

Comment 25 Need Real Name 2003-01-17 01:10:40 UTC

Please consider my interest in any XFree bug done and done. I'm outta here.
If it works great, if it doesn't to hell with it.

Comment 26 Mike A. Harris 2003-01-17 01:59:34 UTC

I'm glad I picked the "friendly" bug report to close all the duplicates
against.  It will help significantly to help solve the problem.  ;o)

Now that the rude people are done making sarcastic and unnecessary
commentary that doesn't do anything to help find a solution....  let's
continue troubleshooting where we left off.  ;o)

Comment 27 Mike A. Harris 2003-01-17 03:15:16 UTC

djoo:  Please make sure when attaching files that are text files that
you set the mimetype to text/plain

Comment 28 Kjartan Maraas 2003-01-18 11:14:13 UTC

I managed to log out successfully yesterday after doing 'sync' a couple of
times, but I can't reproduce it today :-/

Definitely seems like there are timing/race issues here.

Comment 29 ctm 2003-01-19 04:33:35 UTC

If I change /etc/X11/gdm/gdm.conf's AlwaysRestartServer variable from false to
true, the problem is masked and the machine doesn't lock up.

This appears to work even when the machine is quite busy.

Two things that don't prevent the lockup are using the drm code from
kernel-2.4.20-2.21 and the drm code fromX Free86-4.2.99.3-20030115.0 (which
requires radeon_irq.o and radeon_mem.o to be added to radeon-objs in order to link).

My comment on January 15th wasn't meant to say that there shouldn't be a
bandaid, only that using sleeps to bandaid race conditions is generally bad.  If
doing Y while doing Z locks up the machine hard, it may be possible to prevent Y
from happening at all.  In the case of gdm, for example, there may be something
at logout time that can be done to avoid the race totally.  Restarting the X
server (via the changed gdm.conf) appears to work from my tiny test case, but I
don't know the gdm, X and kernel code paths to know that I really am avoiding Y;
perhaps I too am just delaying Y.

Comment 30 Pawel Salek 2003-01-22 14:40:31 UTC

Observed similar symptoms (frequent freeze on logout) on ATI8500LE.

Comment 31 Gerry Tool 2003-01-22 14:50:13 UTC

This behaviour is still in Phoebe 2 with my Radeon 7500 card.  I have Load "dri"
commented out to avoid the lockups until there is a fix.

Comment 32 Don Hardaway 2003-01-22 19:25:35 UTC

I have been battling problems in my dell c800 laptop for a long time.  mharris,
I am taking you up on your offer mentioned above:

"If you're interested in helping do so, I'd be more than
willing to help you, or anyone else learn how to debug the X server."

I have the ATI Mobility M4 with 32MB memory and a Sharp screen (1600x1200).  My
lockups as you have plenty of feedback from me in the past involve my machine's
screen going white and the n locking up.

I have the latest beta 2 and still get white screens of death.  I am ready to
learn how to assist you in debugging this monster.

Comment 33 Pawel Salek 2003-01-22 21:29:59 UTC

AlwaysRestartServer=false in gdm.conf gets rid of the hang for me.

Comment 34 Kjartan Maraas 2003-01-22 21:37:09 UTC

And the latest gdm from rawhide gets rid of the hang for me and that still has
AlwaysRestartServer=true in gdm.conf. Go get gdm-2.4.1.1-1 and test it please.

Comment 35 Pawel Salek 2003-01-22 21:44:09 UTC

I meant "AlwaysRestartServer=true works for me". Sorry for the confusion.

Comment 36 Don Hardaway 2003-01-23 14:28:47 UTC

Ok, i installed the latest gdm from rawhide and everything seem to humming along
fine.  I switched back and forth between console and the desktop, opened mail,
browsed and everything seemed to be solid until this morning.  I had mozilla
browser opened and switch to the console--bingo the screen went white and my
laptop froze.  Logged in again and immediately switched to console and it locked
up again with the screen fading into a completely white screen.

What files do you need from me?  And what can i do to get more info--strace or
something.

Comment 37 Don Hardaway 2003-01-23 14:29:49 UTC

Ok, i installed the latest gdm from rawhide and everything seem to humming along
fine.  I switched back and forth between console and the desktop, opened mail,
browsed and everything seemed to be solid until this morning.  I had mozilla
browser opened and switch to the console--bingo the screen went white and my
laptop froze.  Logged in again and immediately switched to console and it locked
up again with the screen fading into a completely white screen.

What files do you need from me?  And what can i do to get more info--strace or
something.

Comment 38 Kjartan Maraas 2003-02-05 20:32:11 UTC

This bug is really about hangs when logging out, no? I think hanging/freezing
when going to the console and back is a different class of bug, right? Maybe
/var/log/messages and /var/log/XFree86.log.0 will give Mike or someone a clue?

Does it fail at different resolutions depths etc?

Comment 39 Don Hardaway 2003-02-05 20:47:44 UTC

mine hangs also sometimes when logging out. i can not also get 24 bit color only
16 bit.

Comment 40 Mike Chambers 2003-02-11 03:21:17 UTC

For anyone that is CC'd on this bug, are you still experiencing problems with 
this bug?  If so and you either haven't done it already, or have upgraded to 
newer (rawhide/up2date?) packages, please submit your xfree log, xfree config 
file, output of dmesg and /var/log/messages.  Make sure they are all listed in 
plain text files please, thanks.

Comment 41 Mike A. Harris 2003-02-24 10:09:47 UTC

I believe this problem may be fixed perhaps in XFree86-4.2.99.902-20030223.0
with CVS checkin:

 939. Check pScrn->vtSema before calling xf86SetCursor() from
      xf86CursorCloseScreen().  This avoids a segfault at exit with some
      drivers (Alan Hourihane).

That assumes it is the same issue of course.

Setting to MODIFIED pending testing of the fix, please set to RAWHIDE if the
problem is no longer present, or to ASSIGNED if still affected.

Thanks.

Comment 42 Bill Nottingham 2003-07-28 21:52:35 UTC

Closing some bugs that have been in MODIFIED for a while. Please reopen if the
problem persists.

Comment 43 Mike A. Harris 2008-06-14 14:01:20 UTC

Aside from it generating unnecessary bug spam, and being useless...  Why was
this bug updated to change the priority and severity, even though it is a CLOSED
bug?  I've been getting way too much excessive bugzilla email for nonsensical
bug updates to closed bugs lately.  If it were only a couple a year or something
I'd just ignore them and delete it, but I'm getting a multitude of them every
month.  Since there is no way to reassign a closed bug to de-spam oneself, it
doesn't make sense to change the priority/severity of a closed bug either.

Here is the bug spam:


Please do not reply directly to this email. All additional
comments should be made in the comments box of this bug report.

Summary: Radeon hangs on logout


https://bugzilla.redhat.com/show_bug.cgi?id=79678


bugzilla changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |medium
           Priority|normal                      |medium

Note You need to log in before you can comment on or make changes to this bug.