Bug 79678
Summary: | Radeon hangs on logout | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Kjartan Maraas <kmaraas> | ||||||||||||
Component: | XFree86 | Assignee: | X/OpenGL Maintenance List <xgl-maint> | ||||||||||||
Status: | CLOSED RAWHIDE | QA Contact: | |||||||||||||
Severity: | medium | Docs Contact: | |||||||||||||
Priority: | medium | ||||||||||||||
Version: | 9 | CC: | djoo, ericb, hardawayd, mike, pawsa, sheep, wtogami | ||||||||||||
Target Milestone: | --- | Keywords: | Reopened | ||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | All | ||||||||||||||
OS: | Linux | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | Environment: | ||||||||||||||
Last Closed: | 2008-06-14 14:01:49 UTC | Type: | --- | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Bug Depends On: | 80968 | ||||||||||||||
Bug Blocks: | 79578, 82776 | ||||||||||||||
Attachments: |
|
Description
Kjartan Maraas
2002-12-14 23:27:53 UTC
Please supply XFree86 log file and config file, in all XFree86 bug reports regardless of problem nature. If it is a problem causing a crash/hang, also attach /var/log/messages. Created attachment 88747 [details]
XFree logfile
Created attachment 88748 [details]
This is /var/log/messages
Does this occur in beta2 as well? I'll see if I can downgrade to that version. The rawhide version is newer than the one in beta2 I think. Hmm. There is no XFree in rawhide at the moment... I meant more "does this occur in current XFree86" than in beta2 specifically, sorry for the confusion. Yes, it happens with the latest XFree I found in rawhide a couple of days ago, 20021210-something I think. I also see the keyboard/pointer stop responding when the screen blanking sets in when running on battery. If I do alt+ctrl+backspace I don't see the hang though... Dec 15 16:19:49 localhost cardmgr[682]: starting, version is 3.1.31 des 15 16:19:49 localhost rc: Starting pcmcia: succeeded Dec 15 16:19:49 localhost cardmgr[682]: config error, file 'config' line 1053: syntax error Dec 15 16:19:49 localhost cardmgr[682]: config error, file 'config' line 2129: no function bindings Dec 15 16:19:49 localhost cardmgr[682]: watching 2 sockets Dec 15 16:19:49 localhost cardmgr[682]: Card Services release does not match That looks suspicious. Can you paste more of the messages file, perhaps over a few crash/reboot cycles? I think this may be a non-XFree86 problem perhaps. By the way... My last comment, I meant that the cardmgr errors in the logfile snippet above that were not X related problems. I wasn't meaning this bug report isn't X related.. just to clarify, since when I reread what I said above it sounded wrong. The problem goes away if I disable DRI. Djoo, can you add your data here too? Attach log+config. Could both of you also attach your /var/log/messages and make sure it's big enough to contain useful info from boot time onward (or attach the logrotated ones also if need be). I believe this is a kernel DRM issue. Changed bug to be for public-beta for duping dupes against. *** Bug 80690 has been marked as a duplicate of this bug. *** Added djoo to CC, as he has the same problem. Since this is DRI related, and since it only happens when using ?dm, I suspect there is a race condition of some kind in the kernel DRM. When using startx, it seems to not crash, but when using ?dm it does. In addition to supplying all of your kernel messages logs showing the kernel crash (hopefully), could some of you try the following: Make sure DRI is enabled first. Run startx, run some 3D apps, quit X. Wait 10 seconds, repeat. Do this 3 or 4 times and see if you get a hang. If not, then proceed. Create this script, and run it: #!/bin/bash startx $@ startx $@ startx $@ startx $@ startx $@ Run the script, and it should startx up, then run some 3D apps, then quit X and the script should quit and restart X immediately. I want to see if we can get the machine to lock up merely by using startx with no time delay, in order to test the theory it is a DRI related race condition and that startx or xdm/kdm/gdm doesn't matter. Please update the bug report with the results of this testing. If the above test does what I think it will, we've got more data to go on for a proper fix. In case we don't find one however, we can probably insert a couple second delay somewhere to bandaid over the hypothetical race. *** Bug 81702 has been marked as a duplicate of this bug. *** Created attachment 89393 [details]
djoo's XF86 config file
Created attachment 89394 [details]
djoo's xsession-errors
Created attachment 89395 [details]
djoo's var/log/messages
As I mentioned in 80690, although I didn't do it enough times to be statistically significant, doing a "sync" before I logged out seemed to lessen the likelihood of the machine locking up. In my lockup case, all I needed to do most times was boot the machine, log in and log out, without running any 3D apps. I hope you'll reconsider and not add a couple second delay to decrease the likelihood of the machine locking up. If the problem is related to a race condition between other system I/O and something to do with DRI, then such a delay may indeed help some people, but is likely to burn people who have their machines doing something (perhaps important) when they log out. I've been bitten so many times by goofy delays being added to software that I can't keep track of them. One example is the sleep that was added to /etc/rc.d/init.d/postgresql after starting the postgresql server before calling pidof. On two machines I administered the delay wasn't always enough, so sometimes postgresql just wouldn't come up. I can think of two more off the top of my head, but it seems that every engineer who puts in a sleep recognizes the other sleeps as bad but sees his own as justified by his special circumstances. Meanwhile, software gets slower and more flaky. ctm: Well, let me perhaps put it a different way then. Considering there are over 200 open X bug reports assigned to one single engineer, in all likelyhood, some of these 200 bug reports are not going to get fixed in time for the final release of the operating system. In such case, one wants to fix as many bugs as possible - be it by actually "fixing" the real problem, or by providing a temporary "workaround" that eliminates the problem behaviour. There are 2 possible outcomes in the case I propose above: 1) The problem can be directly identified and nailed down and a proper fix can be had. 2) The problem does not get found in time, and so would end up not being fixed with a "proper" fix. Let's assume for all intents and purposes that this bug ends up being one of the #2 types. That isn't at all unreasonable with 200 open bug reports to deal with. In such a case, there are 2 alternatives that I can see: 1) Provide a workaround that allows the user to use their computer and not experience the problem being reported in the report, even though it may be a temporary workaround such as a time delay. 2) Do nothing, and leave the user's computer crash hard, possibly losing data, and requiring a hard reboot. If faced with these choices, what would you choose? Of course, you'd choose the first one, which is the "proper" fix. Let's say that that does not happen for one reason or another. Do you choose #1 above, which allows you, as well as other people having the same problem to use your computer, or #2 which causes your machine to crash. It is not possible to fix 200 bugs. It is possible to fix some of them however, and it's possible to provide workarounds for many of the remaining ones. If a workaround is simple enough in lieu of a proper fix, it is rediculous to not provide it, simply due to the reasons you've outlined above. So, while I appreciate you providing valid and useful data that can contribute to this bug report receiving a proper fix, it is entirely a just as feasible that it will not. Either way, I will be the judge of wether A, B, or C happens, and I'll only get to one of those resolutions by having full co-operation of the people having the problems, and without negative commentary. Also, the software is likely to be much much less flaky, if more people volunteer to debug and troubleshoot it, and contribute patches to fix bugs also. If you're interested in helping do so, I'd be more than willing to help you, or anyone else learn how to debug the X server. mharris Now let me chime in. It's perfectly logical that a fix as proposed could get done as Mike has so wonderfully put. On the other hand it's not like the end of the world for us poor radeon users, we would just have to bite the bullet and buy some other functional card for 3D... hmmm guess that leaves only nvidia. I don't like that scenario. Mike what do you want us to do to help you out? I tried the script above and couldn't get it to hang that way, so maybe it's related to the display manager instead? I'll try using xdm or kdm and see what happens. As to having a choice to swap the card...that's kindof hard for us laptop users ;-) Please consider my interest in any XFree bug done and done. I'm outta here. If it works great, if it doesn't to hell with it. I'm glad I picked the "friendly" bug report to close all the duplicates against. It will help significantly to help solve the problem. ;o) Now that the rude people are done making sarcastic and unnecessary commentary that doesn't do anything to help find a solution.... let's continue troubleshooting where we left off. ;o) djoo: Please make sure when attaching files that are text files that you set the mimetype to text/plain I managed to log out successfully yesterday after doing 'sync' a couple of times, but I can't reproduce it today :-/ Definitely seems like there are timing/race issues here. If I change /etc/X11/gdm/gdm.conf's AlwaysRestartServer variable from false to true, the problem is masked and the machine doesn't lock up. This appears to work even when the machine is quite busy. Two things that don't prevent the lockup are using the drm code from kernel-2.4.20-2.21 and the drm code fromX Free86-4.2.99.3-20030115.0 (which requires radeon_irq.o and radeon_mem.o to be added to radeon-objs in order to link). My comment on January 15th wasn't meant to say that there shouldn't be a bandaid, only that using sleeps to bandaid race conditions is generally bad. If doing Y while doing Z locks up the machine hard, it may be possible to prevent Y from happening at all. In the case of gdm, for example, there may be something at logout time that can be done to avoid the race totally. Restarting the X server (via the changed gdm.conf) appears to work from my tiny test case, but I don't know the gdm, X and kernel code paths to know that I really am avoiding Y; perhaps I too am just delaying Y. Observed similar symptoms (frequent freeze on logout) on ATI8500LE. This behaviour is still in Phoebe 2 with my Radeon 7500 card. I have Load "dri" commented out to avoid the lockups until there is a fix. I have been battling problems in my dell c800 laptop for a long time. mharris, I am taking you up on your offer mentioned above: "If you're interested in helping do so, I'd be more than willing to help you, or anyone else learn how to debug the X server." I have the ATI Mobility M4 with 32MB memory and a Sharp screen (1600x1200). My lockups as you have plenty of feedback from me in the past involve my machine's screen going white and the n locking up. I have the latest beta 2 and still get white screens of death. I am ready to learn how to assist you in debugging this monster. AlwaysRestartServer=false in gdm.conf gets rid of the hang for me. And the latest gdm from rawhide gets rid of the hang for me and that still has AlwaysRestartServer=true in gdm.conf. Go get gdm-2.4.1.1-1 and test it please. I meant "AlwaysRestartServer=true works for me". Sorry for the confusion. Ok, i installed the latest gdm from rawhide and everything seem to humming along fine. I switched back and forth between console and the desktop, opened mail, browsed and everything seemed to be solid until this morning. I had mozilla browser opened and switch to the console--bingo the screen went white and my laptop froze. Logged in again and immediately switched to console and it locked up again with the screen fading into a completely white screen. What files do you need from me? And what can i do to get more info--strace or something. Ok, i installed the latest gdm from rawhide and everything seem to humming along fine. I switched back and forth between console and the desktop, opened mail, browsed and everything seemed to be solid until this morning. I had mozilla browser opened and switch to the console--bingo the screen went white and my laptop froze. Logged in again and immediately switched to console and it locked up again with the screen fading into a completely white screen. What files do you need from me? And what can i do to get more info--strace or something. This bug is really about hangs when logging out, no? I think hanging/freezing when going to the console and back is a different class of bug, right? Maybe /var/log/messages and /var/log/XFree86.log.0 will give Mike or someone a clue? Does it fail at different resolutions depths etc? mine hangs also sometimes when logging out. i can not also get 24 bit color only 16 bit. For anyone that is CC'd on this bug, are you still experiencing problems with this bug? If so and you either haven't done it already, or have upgraded to newer (rawhide/up2date?) packages, please submit your xfree log, xfree config file, output of dmesg and /var/log/messages. Make sure they are all listed in plain text files please, thanks. I believe this problem may be fixed perhaps in XFree86-4.2.99.902-20030223.0 with CVS checkin: 939. Check pScrn->vtSema before calling xf86SetCursor() from xf86CursorCloseScreen(). This avoids a segfault at exit with some drivers (Alan Hourihane). That assumes it is the same issue of course. Setting to MODIFIED pending testing of the fix, please set to RAWHIDE if the problem is no longer present, or to ASSIGNED if still affected. Thanks. Closing some bugs that have been in MODIFIED for a while. Please reopen if the problem persists. Aside from it generating unnecessary bug spam, and being useless... Why was this bug updated to change the priority and severity, even though it is a CLOSED bug? I've been getting way too much excessive bugzilla email for nonsensical bug updates to closed bugs lately. If it were only a couple a year or something I'd just ignore them and delete it, but I'm getting a multitude of them every month. Since there is no way to reassign a closed bug to de-spam oneself, it doesn't make sense to change the priority/severity of a closed bug either. Here is the bug spam: Please do not reply directly to this email. All additional comments should be made in the comments box of this bug report. Summary: Radeon hangs on logout https://bugzilla.redhat.com/show_bug.cgi?id=79678 bugzilla changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|normal |medium Priority|normal |medium |