Bug 152648
Summary: | 6.8.2 regression - radeon 7000 hang | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Ronald Kuetemeier <bugzilla> | ||||||||
Component: | xorg-x11 | Assignee: | Kristian Høgsberg <krh> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | David Lawrence <dkl> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 3 | CC: | bogado, dwmw2, jason.wilson, jay, morenstein, olivier.baudron, pasky, robert, wtogami, xgl-maint | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | 6.8.2-1.FC3.45.2 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2005-09-22 19:59:30 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Ronald Kuetemeier
2005-03-30 22:44:07 UTC
Created attachment 112493 [details]
file includes xorg.conf Xorg.log for 6.8.1 and xorg.log for 6.8.2
Ok, I think I've identified the regression, and I've built a set of new 6.8.2 RPMs with the fix. Could you please try to update to the xorg-x11 RPMs here: http://people.redhat.com/krh/fdo1912 and see if they fix the problem you see? Thanks, Kristian Kristian, very cool. Or should I say it works, problem fixed. Since I write this from a 6.8.2. Thanks, Ronald Kristian, I'm terribly sorry, but try again. I had this great idea of booting, just a for a test. Before I just had stopped X and restarted looked into the log file that it showed 6.8.2 and was happy. Dang hardware not reseted, so I did what I did. And since I got into trouble reverting the patched system ... it took me a while to get back to a working system since my file system was not cooperative after the hard stop. Anyhow sorry for the confusion this has caused. Ronald Btw I still managed to save the log file after the soft restart and after the boot (which didn't work). Just if you need it, let me know. Take a look at the last lines from from the restart file (last lines in the diff too) Diff (boot restart) shows this : 17c17 < (==) Log file: "/var/log/Xorg.0.log", Time: Thu Mar 31 15:57:48 2005 --- > (==) Log file: "/var/log/Xorg.0.log", Time: Thu Mar 31 15:34:03 2005 878c878 < (II) RADEON(0): [drm] mapped SAREA 0xf8919000 to 0xb6e48000 --- > (II) RADEON(0): [drm] mapped SAREA 0xf8919000 to 0xb6f08000 883c883 < (II) RADEON(0): [pci] Ring mapped at 0xb6d47000 --- > (II) RADEON(0): [pci] Ring mapped at 0xb6e07000 886c886 < (II) RADEON(0): [pci] Ring read ptr mapped at 0xb6d46000 --- > (II) RADEON(0): [pci] Ring read ptr mapped at 0xb6e06000 889c889 < (II) RADEON(0): [pci] Vertex/indirect buffers mapped at 0xb6b46000 --- > (II) RADEON(0): [pci] Vertex/indirect buffers mapped at 0xb6c06000 892c892 < (II) RADEON(0): [pci] GART Texture map mapped at 0xb6666000 --- > (II) RADEON(0): [pci] GART Texture map mapped at 0xb6726000 977a978,979 > (II) RADEON(0): [drm] removed 1 reserved context for kernel > (II) RADEON(0): [drm] unmapping 8192 bytes of SAREA 0xf8919000 at 0xb6f08000 (In reply to comment #4) > Kristian, > I'm terribly sorry, but try again. I had this great idea of booting, just a for > a test. Before I just had stopped X and restarted looked into the log file that > it showed 6.8.2 and was happy. Dang hardware not reseted, so I did what I did. > And since I got into trouble reverting the patched system ... it took me a while > to get back to a working system since my file system was not cooperative after > the hard stop. I don't quite understand the situation you're describing. Did the new 6.8.2 not work for you after all or what was the problem? In comment #3 it sounds like the patch worked, what happend after that? > Anyhow sorry for the confusion this has caused. No problem. Kristian, let me try again, still working on my coffee so bare with me. The system worked after a restart without booting. I got into the habit of starting X from cli when it came out, since it was the only way to start the GUI anyway at that time. Anyhow, while it worked it didn't "feel" right, I thought this is not the same test as I did before. I didn't reboot and start. So I reluctantly rebooted the machine, don't know there is something about booting a Unix/Linux box. Starting X _after_ the reboot it immediately hang again. No input possible and the machine indicated an hardware problem, so I did a hard stop (turn it off). Which got me into trouble (not an X problem), even I run a raid controller and lvm installed with ext3. Long story short version, it run after installing and restarting X without a reboot. It broke/hang after a reboot. Hope this helps, Ronald Kristian, I finally had the time to run some more tests. Installed the new version (xorg-x11-6.8.2-1.FC3.13.fdo1912) and disabled the kernel radeon module -> reboot -> start X -> crash. Reboot -> disable dri from loading -> start X -> works. Just in case: Rebooted again didn't change anything (dri and radeon module still disabled) -> works. Hope this helps, Ronald I have this problem also the initial release of Xorg-X11 that was shipped with fedora core 3, but all went ok when I updated to the xorg-x11-6.8.1-12.FC3.21, the problem went back with the release of xorg-x11-6.8.2-1.FC3.13. My system is a toshiba Satellite A65 S1066 and it has a Radeon 7000. (In reply to comment #9) > I have this problem also the initial release of Xorg-X11 that was shipped with > fedora core 3, but all went ok when I updated to the xorg-x11-6.8.1-12.FC3.21, > the problem went back with the release of xorg-x11-6.8.2-1.FC3.13. > > My system is a toshiba Satellite A65 S1066 and it has a Radeon 7000. Did you also test the RPMs I posted in comment #2 ? Do they fix you problem or does it still crash? Does disabling DRI fix the problem? Kristian, I am having the same problem with a Dell 2800. The problem only occurs with an smp kernel. I do not have the problem with a non smp kernel. I tried the FC3.13.fdo1912 RPM's and the crash still occurs almost immediately after I enter my user name and password into the graphics console (level 5). With the above RPM's, I commented out the load "dri" line in /etc/X11/xorg.conf and, with this change and with the smp kernel, this is a workaround. I have tested the http://people.redhat.com/krh/fdo1912 RPMs and it seems to work for me. I am too using the smp kernel as morenstein, since my CPU has hyper-threading capabilities. FYI, this bug is in "MODIFIED" state currently, which means it's considered fixed and closed. If anyone has tested the proposed fix and it does not solve the problem, you have to change the bug state back to "ASSIGNED" in order for the issue to be reopened. Thought I'd mention that, as people sometimes update a MODIFIED bug to indicate the problem still exists, but we never see it again because "MODIFIED" means "CLOSED". (I get all bug emails and just happened to spot this one) - CAN-2005-0767 : drm race in radeon in new (smp): Product : Fedora Core 3 Name : kernel Version : 2.6.11 Release : 1.14_FC3 Summary : The Linux kernel (the core of the Linux operating system) Did not fix the problem. Still need to disable dri. Ronald, based on the Mike Harris comment above, I think that you need to change the bug state. I tried, but could not change it. Sorry can't: No option to change it. From my understanding of these kind of systems, without looking at bugzilla. Only the developers and their Management should be able to change it. I'm just a user, and therefore what do I know. Kristian, are you there :-) Ronald A couple of notes. First, Bug 141295 may be related to this bug, though the submission date on it seems to predate the xorg-x11-6.8.2-1 release. I am experiencing frequent (within a couple seconds or minutes of login) hangs with a PCI Radeon 7000, xorg-x11-6.8.2-1-FC3.13, kernel-2.6.10-1.770_FC3smp, Intel D865PERL motherboard, P4 w/HT enabled. I do not have a 100% reliable method of reproducing the hangs, but they are quite frequent, usually within seconds of logging in, and happen during normal desktop use. The system does not require a 3d-acceleration-using application to be running to freeze. Hangs do NOT seem to occur in the following situations: * If I use kernel-2.6.10-1.770_FC3 (no smp) * [Very well tested] If I downgrade to xorg-x11-6.8.1-12.i386.rpm (and the associated 6.8.1 packages). * If I comment out 'Load "dri"' in my xorg.conf * [Limited testing] Once I have booted into xorg-x11-6.8.1, telinit 3, upgrade to xorg-x11-6.8.2, and telinit 5 does not produce hangs. I have also tried rmmodding radeon while in runlevel 3, and still do not see hangs once xorg-x11-6.8.2 starts. * [Limited testing] Once I have booted into xorg-x11-6.8.2, stopped it before it freezes, downgraded to xorg-x11-6.8.1, and restarted with xorg-x11-6.8.1 (if I have not rmmodded radeon -- I have not tested with rmmodding radeon in this situation). The RPMs linked to in comment #2 do *not* resolve the problem for me either. This bug needs to be reopened. Bug back in ASSIGNED. Could you guys try the 6.8.1 driver with the 6.8.2 RPM? That is, install the 6.8.1 known good xorg-x11 RPM, then copy these two files: /usr/X11R6/lib/modules/drivers/ati_drv.o /usr/X11R6/lib/modules/drivers/radeon_drv.o to a temporary location (for example, your home directory), the upgrade to 6.8.2 and copy them back into the directory /usr/X11R6/lib/modules/drivers overwriting the 6.8.2 files there. Then run through the same tests and see if it still crashes. Remember to re-enable DRI if you did disable it before. Thanks, Kristian Kristian, that seems to work, i.e. old driver on new install. I had the drivers installed on a different partition, so it was easy to just copy them over. I still have the apps installed from No. 2 but have dri installed. Just glanced through the log file and didn't see anything scary. BTW I didn't forget to reboot this time around. Let's see what the others have to say. Ronald Kristian, while we are on it. I just looked with strings at the modules, found the same version for the ati_drv.o driver and decided what the heck. Copied the 6.8.2 ati driver from #2 back and rebooted. Still works. Hope some elimination helps. In other words the new radeon driver is the problem. Ronald Using the 6.8.1 radeon_drv.o file under 6.8.2 also works for me. Thanks for the quick responses. I've been looking through the radeon driver changes from 6.8.1 to 6.8.2 and found another change that might be the cause of the lock-ups you're seeing. I built another set of RPMs with this change reverted: http://people.redhat.com/krh/fdo1912.1 Please give them a try, and remember to reenable DRI and revert other workarounds you might have added. Thanks! Hi Kristian, works for me. Quick look at the log file also didn't show anything unusual, as far as I can see. Except that you are still building on a 2.6.9 kernel :-) Thanks looks great, lets see what the others have to say. Ronald The kernel version installed on a build machine makes no difference. The machine that is compiling something is not the machine that will end up running the compiled result, so any software that compiles differently depending on what version of the OS kernel is "running" on the machine when it is compiled - is broken. X.Org compilation does not depend on any kernel version running on the machine. The "acecad" driver, needs to have certain kernel header files available to build with, but the version of the running kernel makes no difference. The kernel version of the build system is put into the Xorg log file in order for X developers to be able to diagnose problems such as the one in the acecad driver. Basically - it makes no difference what the version of the kernel is running on the machine you compile code on, or if it does, the code you're compiling is buggy. Hope this demystifies things a bit. ;o) (In reply to comment #24) I installed the rpm's this morning and reactivated the Load "dri" in xorg.conf and rebooted the smp kernel (2.6.10-1.770_FC3smp). I logged onto the graphics console and played for about five minutes and it didn't crash. This is on a new Dell PowerEdge 2800 dual processor. I believe you have the problem resolved. Good morning Kristian, about #26 Sorry, Sorry, Sorry, I know, I know. Just kidding my weird humor. Didn't want to push a button, that was just what I noticed at the first glance, and I thought that was funny. Since I had to jump through hoops to get FC3 installed on this machine. lvm , Dell SATA raid controller and kernel 2.6 don't mix that well on a PE 1800 and the org. FC3 kernel won't even run an install. As Alan Cox pointed out somewhere that bug was fixed in I believe 2.6.11 and back ported in FC3 to 2.6.10 ... Sorry again, Ronald Ronald) That was me in #26, rather than Kristian. ;o) You didn't push any buttons however... Occasionally someone notices the kernel version of the buildsystem printed in a build log, or runtime log file for the X server or something, and thinks that this is a problem for some odd reason. I believe that they just misunderstand what the intention is of the message. The message is more or less useless for the end user and means nothing, but is occasionally important for developers, such as the issue I mentioned about the acecad driver above. I responded, because I am the one who originally patched X to include this information as I thought it would be useful in troubleshooting certain types of rare and obscure problems reported. It's come in handy a couple of times when X did have a broken dependancy on the runtime kernel of the system, however with this message, we generally will catch any bugs of that nature that get added to the upstream source now. ;o) My intention in pointing this out, was merely to clarify that that particular piece of information is not important to the end user generally speaking, and that it does not mean what one might think it means. ;o) Hope this clears things up. Thanks. Using the RPMs in Comment #24, I have been unable to hang the machine in a day of use. Given that the hangs normally occur within a minute of login to X, I think that I can safely say that the problem has been resolved for me. Thanks for the fast turnaround on the code, Kristian. Your fixing had better latency than my testing. :-) Created attachment 113300 [details]
ltrace output of glxinfo command
This is output of 'ltrace glxinfo' with DRI enabled in /etc/X11/xorg.conf.
Will DRI module not loaded the command completes successfully.
(Apologies - comments got lost before I attached attachment in previous item) I have been having similar problems (see Bug #141295 for more details). Installing RPM's from Commend #24 resolved problems with machine haning on login. Unfortunately the new RPM's cause all GL based programs to fail with a SEGV - including glxinfo. Attachment #113300 [details] has an ltrace output. Jason, do you have ALL FC3 updates including the latest udev installed then rebooted? *** Bug 141295 has been marked as a duplicate of this bug. *** I have all FC3 updates installed as of today and have rebooted. udev is udev-039-10.FC3.7 The problem with GL program not running has been around for quite a while (3 months?). It is not something that particularly worries me as don't need 3D - just like the screensavers to work. I need the SMP functionality more. Created attachment 113309 [details]
gdb output from run of glxinfo
A script output of a run of glxinfo through gdb.
Gives more information on function calls and in which libraries the error
occurs.
(In reply to comment #36) > Created an attachment (id=113309) [edit] > gdb output from run of glxinfo > > A script output of a run of glxinfo through gdb. > Gives more information on function calls and in which libraries the error > occurs. The errors you're seeing in gdb and ltrace are normally caught by libGL internally and are used to detect wether the processor supports SSE or MMX. If GL programs also crash outside gdb, you need to continue (use the 'c' command) past this exception to see the real bug. If you're able to get a backtrace from the real bug, please open a new bug with that attached. I'm closing this bug now, since the lockup issue seems to be closed. Thanks, Kristian xorg-x11-6.8.1-ati-radeon-dynamic-clocks-fix-2.patch causes a regression for me when dynamic clocks are disabled (which appears to be the default). The radeon_drv.o from 6.8.2-23 works fine with rhpl-0.161 on my Apple PowerBook5,3 but the one in 6.8.2-29 locks the machine hard after disabling dynamic clocks, which the kernel had enabled. (In reply to comment #38) > xorg-x11-6.8.1-ati-radeon-dynamic-clocks-fix-2.patch causes a regression for me > when dynamic clocks are disabled (which appears to be the default). Did any of the 6.8.1 RPMs work for you? The patch basically restores the 6.8.1 behavior which we had pretty good results with. I'm not sure -- I had manually configured X with DynamicClocks enabled at that point. Are the 6.8.1 RPMs still available somewhere so I can try them? The original problem may well be fixed by Ben's radeon patches at http://gate.crashing.org/~benh/xorg/ Could we build a package with those present and xorg-x11-6.8.1-ati-radeon-dynamic-clocks-fix-2.patch removed, for the original reported to test? All those who reported lockups, please could you test the RPMs from ftp://zeniv.uk.linux.org/pub/people/dwmw2/x-radeon/ In response to comment #39: Even with dynamic clocks disabled, FC3's xorg-x11-6.8.1-12 works correctly, as does FC4's 6.8.2-23 and my own 6.8.2-30.radeon.1 package. The lockup I observed with 6.8.2-29 appears to be a new phenomenon which is _not_ a restoration of 6.8.1 behaviour. I am observing the same behaviour in some dual x86_64 machines I have recently observed, although not *necessarily* connected with Radeon 7000. The machines are [root@callista ~]# uname -a Linux callista.physics.nat 2.6.11-1.14_FC3smp #1 SMP Thu Apr 7 19:36:23 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux [root@callista ~]# lspci | grep VGA 10:0d.0 VGA compatible controller: ATI Technologies Inc Radeon RV100 QY [Radeon 7000/VE] [root@callista ~]# rpm -q xorg-x11 xorg-x11-6.8.1-12 [root@attila ~]# uname -a Linux attila 2.6.11-1.14_FC3smp #1 SMP Thu Apr 7 19:36:23 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux [root@attila ~]# lspci | grep VGA 05:00.0 VGA compatible controller: nVidia Corporation NV37GL [Quadro FX 330] (rev a2) [root@attila ~]# rpm -q xorg-x11 xorg-x11-6.8.1-12 Both machines were installed recently, and after performing a yum update from (a mirror of) the official fedora repos, the graphical environment froze several seconds after login. I eventually traced it to the xorg*-6.8.2-1.FC3.13.x86_64 packages, which are now excluded in yum.conf. However, callista's display (with the Radeon 7000) froze during normal use today, despite using 6.8.1. I have commented out 'load "dri"' from xorg.conf and am betting on this preventing another freeze. I'm hoping it won't be necessary to disable one processor just to stabilise the machine! I've updated the patch to a newer one dwmw2 sent me via email. This is now in rawhide CVS (but not built yet). It is also in FC3/FC4 CVS branches and will be in future updates. I've just encountered this problem on Fedora Core 3 with all current stable updates to the kernel and xorg. Given that the patch has been entered into CVS (today - what luck!), how long will it be before I can run up2date and get a working RPM? Will installing the test RPM's posted by David interfere with the official fix? We initially did not plan on releasing another FC3 update until there was another security issue, or critical flaw discovered (using our own decision metrics). Due to bug #168844 having a larger impact than we initially perceived however, we have reassessed the situation and decided that we should release another update for FC3 right away that includes a fix for bug #168844, even though users will still get hit by the problem as documented in that bug report. Since the PPC fix is already in CVS, and is restricted to PPC, it does not represent a risk of regression on other architectures so I am planning on including it in the upcoming FC3 update as well. Just a lucky side effect for FC3/PPC users. ;o) FC4 however will not receive this fix until there is another security issue or major/critical issue to resolve. I'm building the FC3 update right now, and plan on releasing it tomorrow or the next day. I'll update the bug then. FC3 update built and submitted to release team. xorg-x11-6.8.2-1.FC3.45.2 should be available as erratum sometime tomorrow. From User-Agent: XML-RPC xorg-x11-6.8.2-1.FC3.45.2 has been pushed for FC3, which should resolve this issue. If these problems are still present in this version, then please make note of it in this bug report. Prior to the xorg-x11-6.8.2-1.FC3.45.2 update, I could boot successfully, log in, etc. After a certain amount of graphics activity (scrolling, VNC connection, etc) the system would lock up hard. Now I get a corrupted display during the graphical boot (purplish screen with broken diagonal magenta lines on it), and somewhere it crashes to a black screen and is unresponsive (might be when resolution changes for gdm). X is eating 99.9% of CPU; even SSH'ing in slows to a crawl. The last thing that managed to open seems to be X and gdm, but my screen is black: "/usr/X11R6/bin/X :0 -audit 0 -auth /var/gdm/: 0.Xauth -nolisten tcp vt7". Removing DRI from my xorg.conf resolves the problem. Should I file a new bug or should this one be reopened? Please test the xorg-x11-/6.8.2-37.FC4.49.3.radeon.1 packages from ftp://zeniv.uk.linux.org/pub/people/dwmw2/x-radeon David, thank you! These packages are marked as FC4 - can they be installed on FC3 without issue, and will I be able to upgrade from these to the "official" fix whenever we reach that point? I'm not sure if the packages will be entirely suited to FC3 -- you'll almost certainly be able to just take the radeon_drv.o and use it though if you have problems with the full package. Failing both of those, let me know and I'll try to build FC3 packages for you. I don't think there's a huge likelihood that the fixes (which are from xorg-x11 CVS HEAD) are going to end up in an offical FC3 erratum; I've encountered enough resistance just asking for them to be in FC4. But that's up to the X11 team. Let's see if it actually fixes your problem first, before we talk about whether we'll ship the fix for real. With the current (official) FC4 packages on i386, I see symptoms similar to what you describe -- after a while the console locks up. I can log in and see that X is waiting for something DRM-related, so I can well believe that disabling DRI will fix it. I haven't yet tested my i386 machine with my new packages -- I'll be doing that too. (In reply to comment #49) > From User-Agent: XML-RPC > > xorg-x11-6.8.2-1.FC3.45.2 has been pushed for FC3, which should resolve this > issue. If these problems are still present in this version, then please make > note of it in this bug report. What doesn't work with the official FC3 update I just released? Not sure why you'd want to use the FC4 packages on FC3 when we just released an update for FC3. ;o) Or did you guys miss that comment above. ;) Mike, my comment above (#50) was in relation to the newly released update. Prior to the update I had working video, but it would lock after a short period; now I have corrupted video during and then no video, with X apparently going haywire. I'd be glad to check logs for anything that may help. I'm using kernel 2.6.12-1.1376_FC3smp, and xorg-x11-6.8.2-1.FC3.45.2. In both cases, disabling DRI resolves the problem. David, I'll try extracting the proper radeon_drv.o file later and I'll report back. Well, some mild success with David's driver. Now the graphical boot works just fine. If I switch my resolution down to 800x600@16, then I can log in. As soon as I tried a VNC connection it started writing furiously to the hard drive, and then about 20 seconds of that later, it switched off. If I use my normal resolution of 1600x1200@32, then X seems to die (or use 99.9% of one processor) once it finishes boot and the screen goes black. I don't think it made it to the final login screen, as typing in my user name and password blind did not work. My comment #55 may be a completely different bug. I installed a fresh copy of Fedora Core 4 and had no problems. Upgrading all packages to current levels also had no problems - until I enabled the current 2.6.12-based kernel. On doing that, the problems I described in #55 with the screen going black and X going nuts occurred. Reverting to 2.6.11 allows me to use DRI and everything. Bug appears to be resolved on Fedora Core 4 with current packages. |