Red Hat Bugzilla – Bug 79678
Radeon hangs on logout
Last modified: 2008-06-14 10:01:49 EDT
Description of problem:
I see a hard hang when logging out using the latest XFree from rawhide. This is
with a Compaq laptop that has a Radeon mobility card.
01:00.0 VGA compatible controller: ATI Technologies Inc Radeon Mobility M6 LY (p
rog-if 00 [VGA])
Subsystem: Compaq Computer Corporation: Unknown device b111
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Step
ping+ SERR- FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort
- <MAbort- >SERR- <PERR-
Latency: 66 (2000ns min), cache line size 08
Interrupt: pin A routed to IRQ 11
Region 0: Memory at 48000000 (32-bit, prefetchable) [size=128M]
Region 1: I/O ports at 3000 [size=256]
Region 2: Memory at 40200000 (32-bit, non-prefetchable) [size=64K]
Expansion ROM at <unassigned> [disabled] [size=128K]
Capabilities:  AGP version 2.0
Status: RQ=47 SBA+ 64bit- FW- Rate=x1,x2,x4
Command: RQ=31 SBA+ AGP+ 64bit- FW- Rate=x1
Capabilities:  Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot
Version-Release number of selected component (if applicable):
Steps to Reproduce:
Please supply XFree86 log file and config file, in all XFree86 bug reports
regardless of problem nature. If it is a problem causing a crash/hang,
also attach /var/log/messages.
Created attachment 88747 [details]
Created attachment 88748 [details]
This is /var/log/messages
Does this occur in beta2 as well?
I'll see if I can downgrade to that version. The rawhide version is newer than
the one in beta2 I think. Hmm. There is no XFree in rawhide at the moment...
I meant more "does this occur in current XFree86" than in beta2 specifically,
sorry for the confusion.
Yes, it happens with the latest XFree I found in rawhide a couple of days ago,
20021210-something I think. I also see the keyboard/pointer stop responding when
the screen blanking sets in when running on battery. If I do alt+ctrl+backspace
I don't see the hang though...
Dec 15 16:19:49 localhost cardmgr: starting, version is 3.1.31
des 15 16:19:49 localhost rc: Starting pcmcia: succeeded
Dec 15 16:19:49 localhost cardmgr: config error, file 'config' line 1053:
Dec 15 16:19:49 localhost cardmgr: config error, file 'config' line 2129:
no function bindings
Dec 15 16:19:49 localhost cardmgr: watching 2 sockets
Dec 15 16:19:49 localhost cardmgr: Card Services release does not match
That looks suspicious.
Can you paste more of the messages file, perhaps over a few crash/reboot
cycles? I think this may be a non-XFree86 problem perhaps.
By the way... My last comment, I meant that the cardmgr errors in the
logfile snippet above that were not X related problems. I wasn't meaning
this bug report isn't X related.. just to clarify, since when I reread
what I said above it sounded wrong.
The problem goes away if I disable DRI.
Djoo, can you add your data here too? Attach log+config.
Could both of you also attach your /var/log/messages and make sure it's big
enough to contain useful info from boot time onward (or attach the logrotated
ones also if need be).
I believe this is a kernel DRM issue.
Changed bug to be for public-beta for duping dupes against.
*** Bug 80690 has been marked as a duplicate of this bug. ***
Added djoo to CC, as he has the same problem.
Since this is DRI related, and since it only happens when using ?dm, I suspect
there is a race condition of some kind in the kernel DRM. When using startx,
it seems to not crash, but when using ?dm it does.
In addition to supplying all of your kernel messages logs showing the kernel
crash (hopefully), could some of you try the following:
Make sure DRI is enabled first. Run startx, run some 3D apps, quit X. Wait
10 seconds, repeat. Do this 3 or 4 times and see if you get a hang. If not,
Create this script, and run it:
Run the script, and it should startx up, then run some 3D apps, then quit X
and the script should quit and restart X immediately. I want to see if we
can get the machine to lock up merely by using startx with no time delay,
in order to test the theory it is a DRI related race condition and that
startx or xdm/kdm/gdm doesn't matter.
Please update the bug report with the results of this testing.
If the above test does what I think it will, we've got more data to go on
for a proper fix. In case we don't find one however, we can probably insert
a couple second delay somewhere to bandaid over the hypothetical race.
*** Bug 81702 has been marked as a duplicate of this bug. ***
Created attachment 89393 [details]
djoo's XF86 config file
Created attachment 89394 [details]
Created attachment 89395 [details]
As I mentioned in 80690, although I didn't do it enough times to be
statistically significant, doing a "sync" before I logged out seemed to lessen
the likelihood of the machine locking up. In my lockup case, all I needed to do
most times was boot the machine, log in and log out, without running any 3D apps.
I hope you'll reconsider and not add a couple second delay to decrease the
likelihood of the machine locking up. If the problem is related to a race
condition between other system I/O and something to do with DRI, then such a
delay may indeed help some people, but is likely to burn people who have their
machines doing something (perhaps important) when they log out.
I've been bitten so many times by goofy delays being added to software that I
can't keep track of them. One example is the sleep that was added to
/etc/rc.d/init.d/postgresql after starting the postgresql server before calling
pidof. On two machines I administered the delay wasn't always enough, so
sometimes postgresql just wouldn't come up. I can think of two more off the top
of my head, but it seems that every engineer who puts in a sleep recognizes the
other sleeps as bad but sees his own as justified by his special circumstances.
Meanwhile, software gets slower and more flaky.
Well, let me perhaps put it a different way then. Considering there are
over 200 open X bug reports assigned to one single engineer, in all
likelyhood, some of these 200 bug reports are not going to get fixed in
time for the final release of the operating system. In such case, one
wants to fix as many bugs as possible - be it by actually "fixing" the
real problem, or by providing a temporary "workaround" that eliminates
the problem behaviour.
There are 2 possible outcomes in the case I propose above:
1) The problem can be directly identified and nailed down and a proper
fix can be had.
2) The problem does not get found in time, and so would end up not being
fixed with a "proper" fix.
Let's assume for all intents and purposes that this bug ends up being
one of the #2 types. That isn't at all unreasonable with 200 open bug
reports to deal with. In such a case, there are 2 alternatives that
I can see:
1) Provide a workaround that allows the user to use their computer and not
experience the problem being reported in the report, even though it
may be a temporary workaround such as a time delay.
2) Do nothing, and leave the user's computer crash hard, possibly losing
data, and requiring a hard reboot.
If faced with these choices, what would you choose? Of course, you'd
choose the first one, which is the "proper" fix. Let's say that that
does not happen for one reason or another. Do you choose #1 above, which
allows you, as well as other people having the same problem to use your
computer, or #2 which causes your machine to crash.
It is not possible to fix 200 bugs. It is possible to fix some of them
however, and it's possible to provide workarounds for many of the remaining
ones. If a workaround is simple enough in lieu of a proper fix, it is
rediculous to not provide it, simply due to the reasons you've outlined
So, while I appreciate you providing valid and useful data that can
contribute to this bug report receiving a proper fix, it is entirely
a just as feasible that it will not. Either way, I will be the judge
of wether A, B, or C happens, and I'll only get to one of those
resolutions by having full co-operation of the people having the problems,
and without negative commentary.
Also, the software is likely to be much much less flaky, if more people
volunteer to debug and troubleshoot it, and contribute patches to fix
bugs also. If you're interested in helping do so, I'd be more than
willing to help you, or anyone else learn how to debug the X server.
Now let me chime in. It's perfectly logical that a fix as proposed could get
done as Mike has so wonderfully put. On the other hand it's not like the end of
the world for us poor radeon users,
we would just have to bite the bullet and buy some other functional card for
3D... hmmm guess that leaves only nvidia.
I don't like that scenario.
Mike what do you want us to do to help you out?
I tried the script above and couldn't get it to hang that way, so maybe it's
related to the display manager instead? I'll try using xdm or kdm and see what
happens. As to having a choice to swap the card...that's kindof hard for us
laptop users ;-)
Please consider my interest in any XFree bug done and done. I'm outta here.
If it works great, if it doesn't to hell with it.
I'm glad I picked the "friendly" bug report to close all the duplicates
against. It will help significantly to help solve the problem. ;o)
Now that the rude people are done making sarcastic and unnecessary
commentary that doesn't do anything to help find a solution.... let's
continue troubleshooting where we left off. ;o)
djoo: Please make sure when attaching files that are text files that
you set the mimetype to text/plain
I managed to log out successfully yesterday after doing 'sync' a couple of
times, but I can't reproduce it today :-/
Definitely seems like there are timing/race issues here.
If I change /etc/X11/gdm/gdm.conf's AlwaysRestartServer variable from false to
true, the problem is masked and the machine doesn't lock up.
This appears to work even when the machine is quite busy.
Two things that don't prevent the lockup are using the drm code from
kernel-2.4.20-2.21 and the drm code fromX Free86-126.96.36.199-20030115.0 (which
requires radeon_irq.o and radeon_mem.o to be added to radeon-objs in order to link).
My comment on January 15th wasn't meant to say that there shouldn't be a
bandaid, only that using sleeps to bandaid race conditions is generally bad. If
doing Y while doing Z locks up the machine hard, it may be possible to prevent Y
from happening at all. In the case of gdm, for example, there may be something
at logout time that can be done to avoid the race totally. Restarting the X
server (via the changed gdm.conf) appears to work from my tiny test case, but I
don't know the gdm, X and kernel code paths to know that I really am avoiding Y;
perhaps I too am just delaying Y.
Observed similar symptoms (frequent freeze on logout) on ATI8500LE.
This behaviour is still in Phoebe 2 with my Radeon 7500 card. I have Load "dri"
commented out to avoid the lockups until there is a fix.
I have been battling problems in my dell c800 laptop for a long time. mharris,
I am taking you up on your offer mentioned above:
"If you're interested in helping do so, I'd be more than
willing to help you, or anyone else learn how to debug the X server."
I have the ATI Mobility M4 with 32MB memory and a Sharp screen (1600x1200). My
lockups as you have plenty of feedback from me in the past involve my machine's
screen going white and the n locking up.
I have the latest beta 2 and still get white screens of death. I am ready to
learn how to assist you in debugging this monster.
AlwaysRestartServer=false in gdm.conf gets rid of the hang for me.
And the latest gdm from rawhide gets rid of the hang for me and that still has
AlwaysRestartServer=true in gdm.conf. Go get gdm-188.8.131.52-1 and test it please.
I meant "AlwaysRestartServer=true works for me". Sorry for the confusion.
Ok, i installed the latest gdm from rawhide and everything seem to humming along
fine. I switched back and forth between console and the desktop, opened mail,
browsed and everything seemed to be solid until this morning. I had mozilla
browser opened and switch to the console--bingo the screen went white and my
laptop froze. Logged in again and immediately switched to console and it locked
up again with the screen fading into a completely white screen.
What files do you need from me? And what can i do to get more info--strace or
This bug is really about hangs when logging out, no? I think hanging/freezing
when going to the console and back is a different class of bug, right? Maybe
/var/log/messages and /var/log/XFree86.log.0 will give Mike or someone a clue?
Does it fail at different resolutions depths etc?
mine hangs also sometimes when logging out. i can not also get 24 bit color only
For anyone that is CC'd on this bug, are you still experiencing problems with
this bug? If so and you either haven't done it already, or have upgraded to
newer (rawhide/up2date?) packages, please submit your xfree log, xfree config
file, output of dmesg and /var/log/messages. Make sure they are all listed in
plain text files please, thanks.
I believe this problem may be fixed perhaps in XFree86-184.108.40.2062-20030223.0
with CVS checkin:
939. Check pScrn->vtSema before calling xf86SetCursor() from
xf86CursorCloseScreen(). This avoids a segfault at exit with some
drivers (Alan Hourihane).
That assumes it is the same issue of course.
Setting to MODIFIED pending testing of the fix, please set to RAWHIDE if the
problem is no longer present, or to ASSIGNED if still affected.
Closing some bugs that have been in MODIFIED for a while. Please reopen if the
Aside from it generating unnecessary bug spam, and being useless... Why was
this bug updated to change the priority and severity, even though it is a CLOSED
bug? I've been getting way too much excessive bugzilla email for nonsensical
bug updates to closed bugs lately. If it were only a couple a year or something
I'd just ignore them and delete it, but I'm getting a multitude of them every
month. Since there is no way to reassign a closed bug to de-spam oneself, it
doesn't make sense to change the priority/severity of a closed bug either.
Here is the bug spam:
Please do not reply directly to this email. All additional
comments should be made in the comments box of this bug report.
Summary: Radeon hangs on logout
What |Removed |Added