Bug 462157 - X hang on ATI Radeon r500
X hang on ATI Radeon r500
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: xorg-x11-drv-ati (Show other bugs)
rawhide
i386 Linux
urgent Severity urgent
: ---
: ---
Assigned To: Dave Airlie
Fedora Extras Quality Assurance
:
Depends On:
Blocks: F10Blocker/F10FinalBlocker
  Show dependency treegraph
 
Reported: 2008-09-12 22:27 EDT by Peng Huang
Modified: 2013-01-09 23:48 EST (History)
16 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-11-18 17:38:19 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
I recoded the screen with camera (2.36 MB, video/mpeg)
2008-09-12 22:30 EDT, Peng Huang
no flags Details
my xorg.conf (688 bytes, text/plain)
2008-09-12 22:31 EDT, Peng Huang
no flags Details
Xorg.0.log with fc9's kernel (94.26 KB, text/plain)
2008-09-12 22:32 EDT, Peng Huang
no flags Details
Xorg log (305.75 KB, text/plain)
2008-09-25 12:17 EDT, Gian Paolo Mureddu
no flags Details
dmesg output (42.48 KB, text/plain)
2008-09-25 12:18 EDT, Gian Paolo Mureddu
no flags Details
The xorg.conf I'm using (1.05 KB, text/plain)
2008-09-25 12:20 EDT, Gian Paolo Mureddu
no flags Details
RadeonHD xorg.conf (1.05 KB, application/octet-stream)
2008-10-07 02:21 EDT, Gian Paolo Mureddu
no flags Details
VESA xorg.conf (1010 bytes, text/plain)
2008-10-29 15:53 EDT, Michael Tughan
no flags Details
New xorg_x11_drv_ati xorg.conf (1.06 KB, text/plain)
2008-10-29 15:55 EDT, Michael Tughan
no flags Details
Xorg's log (54.05 KB, text/plain)
2008-10-29 15:57 EDT, Michael Tughan
no flags Details
kernel oops message, Xorg.conf, Xorg.0.log tgz'ed (10.79 KB, application/x-gzip)
2008-10-30 18:54 EDT, Hin-Tak Leung
no flags Details
gdb backtrace while it hungs (12.43 KB, text/plain)
2008-11-01 12:11 EDT, Hin-Tak Leung
no flags Details
another gdb, x server stuck in a differen place. (2.38 KB, text/plain)
2008-11-01 18:17 EDT, Hin-Tak Leung
no flags Details
gdb backtrace 3 (2.56 KB, application/octet-stream)
2008-11-01 19:29 EDT, Hin-Tak Leung
no flags Details
gdb backtrace #4 (1.86 KB, application/octet-stream)
2008-11-01 19:30 EDT, Hin-Tak Leung
no flags Details
gdb backtrace of my first hang under XAA (2.14 KB, application/octet-stream)
2008-11-04 14:17 EST, Hin-Tak Leung
no flags Details
another gdb strace under XAA, with more recent drv-ati (2.04 KB, text/plain)
2008-11-11 09:19 EST, Hin-Tak Leung
no flags Details
xorg log, showing a backtrace under kernel 2.6.27.5-101.fc10.x86_64 (50.20 KB, text/plain)
2008-11-12 23:01 EST, Hin-Tak Leung
no flags Details
gdb backtrace with kernel-2.6.27.5-104 and XAA ati-6.9.0-44 (4.03 KB, text/plain)
2008-11-14 19:32 EST, Hin-Tak Leung
no flags Details
Screenshot with compiz (142.93 KB, image/png)
2008-11-18 21:50 EST, Peng Huang
no flags Details
another gdb backtrace - sorry, it hung... (2.45 KB, text/plain)
2008-11-19 01:43 EST, Hin-Tak Leung
no flags Details
another gdb backtrace under kernel-2.6.27.5-120.fc10, xorg-x11-drv-ati-6.9.0-54.fc10 (2.46 KB, application/octet-stream)
2008-11-19 13:02 EST, Hin-Tak Leung
no flags Details
gdb backtrace when xserver stop responding with EXA and latest drv_ati, etc (2.50 KB, application/octet-stream)
2008-11-20 12:27 EST, Hin-Tak Leung
no flags Details
gdb backtrace, 2nd stuck with EXA within a few hours. (2.65 KB, application/octet-stream)
2008-11-20 13:03 EST, Hin-Tak Leung
no flags Details

  None (edit)
Description Peng Huang 2008-09-12 22:27:49 EDT
Description of problem:


Version-Release number of selected component (if applicable):
xorg-x11-drv-ati-6.9.0-14.fc10.i386
xorg-x11-drv-r128-6.8.0-1.fc10.i386
xorg-x11-server-common-1.5.0-6.fc10.i386
xorg-x11-server-devel-1.5.0-6.fc10.i386
xorg-x11-server-utils-7.4-3.fc10.i386
xorg-x11-server-Xephyr-1.5.0-6.fc10.i386
xorg-x11-server-Xorg-1.5.0-6.fc10.i386

kernel-2.6.26.3-29.fc9.i686
kernel-2.6.27-0.322.rc6.fc10.i686
kernel-2.6.27-0.323.rc6.fc10.i686

How reproducible:
Always

Steps to Reproduce:
1.
2.
3.
  
Actual results:
1. If I use fc10's kernels, fedora will boot with a new graphics (a fedora logo on it, I think it is plymouth). And the X always hang up after login. The mouse can be moved on the screen, but screen does update anymore, and Ctrl + Alt + Backspace does not work. When I press power button of my laptop, the computer will be shutdown in one or two minutes. (So I think the kernel is OK. Just X hang up)

2. If I use fc9's kernels, fedora will boot without the graphics screen (the plymouth), X will not hang up. But abnormal graphical display occurs. If I move the cursor, or type some things in the term, a small rectangle of something appears from time to time on the right edge of the screen. If the screen is static, the abnormal thing will not happen.

Expected results:
X does not hang up.
X displays correctly.

Additional info:
Comment 1 Peng Huang 2008-09-12 22:30:49 EDT
Created attachment 316642 [details]
I recoded the screen with camera
Comment 2 Peng Huang 2008-09-12 22:31:33 EDT
Created attachment 316643 [details]
my xorg.conf
Comment 3 Peng Huang 2008-09-12 22:32:30 EDT
Created attachment 316644 [details]
Xorg.0.log with fc9's kernel
Comment 4 Gian Paolo Mureddu 2008-09-25 12:16:44 EDT
I was about to post a new bug report when I found this, hopefully my problem qualifies to be a duplicate of this. It would appear that the with the latest batch of updates, this problem is not as severe as with the previous Xorg update, however a combination of Xorg + x11-drv-ati-6.8.0-19 may be the culprits.

Here's what I originally intended to report:

Summary: "Computer unresponsive under graphics stress"

Viewing or working with large images (raster/vector) causes the system to a severe slow down to the point of X freezing (gkrellm no longer updates the krells, mouse cursor moves around VERY slowly and "jumpy", I don't seem able to input any key combination to snap out of X). Through an SSH session I am able to run top (albeit the system also feels very sluggish) and all I'm able to see is that Xorg, hald-addon-storage and top are pretty much consuming all the available CPU. Killing the offending processes I'm unable to restart an X session, and a hard reboot is required (pressing the power button for more than 5 secs on my laptop)

Hardware profile:
http://www.smolts.org/client/show/?uuid=pub_f3889c13-ed5b-44b9-a6ab-72db21a4aaa5

Before the latest update, I was able to reliably reproduce this problem with this image:

http://upload.wikimedia.org/wikipedia/commons/e/e2/Full_moon.png

**One interesting thing while I was testing this URL, the machine "stuttered" for a bit while loading this image, Xorg CPU usage spiked to 100% on one core, and PulseAudio skipped sound (had Rythmbox playing some tunes), when I opened a terminal to check the smolt profile, the system froze when I moved the terminal window (with fake transparent background)

The offending processes are:

Xorg - 100% - 120% CPU usage during problem.
hald-addon-storage ~50% CPU usage during problem (interestingly enough it seems to have problems to poll the DVD drive)
Top - it eats about 45% trying to poll the resources.

At this point I'm no longer sure what the problem is or where it lies. I had rebooted to the previous kernel, same problem, had tried to use GIT ati drivers, which I had working just fine with the previous kernel, same problem, reverted back to Fedora's provided driver, same problem. Reverted back to Xorg 1.4.99, and while it seemed to be more stable, in the end I ran into the same problem. Before updates stared rolling out again, I had the following configuration working _just_fine_:

Composited Desktop through Metacity.
ATi driver from GIT
Had to add AccelMethod EXA to Xorg in order for composite to be fast enough without tearing.

* Monitoring the system from the SSH terminal for any changes in /var/log/messages and triggering the problem does not show anything of special interest.

* I've got a few APIC error messages on dmesg

* All the evidence thus far point to an X or driver problem.

* The Xorg.0.log.old is flooded with:

[mi] mieqEneque: out-of-order valuator event: dropping.
[mi] EQ overflowing. The server is probaly stuck in an infinite loop.

I don't think I can debug this further (I lack the skill, and probably the means to)
Comment 5 Gian Paolo Mureddu 2008-09-25 12:17:57 EDT
Created attachment 317704 [details]
Xorg log
Comment 6 Gian Paolo Mureddu 2008-09-25 12:18:38 EDT
Created attachment 317705 [details]
dmesg output
Comment 7 Gian Paolo Mureddu 2008-09-25 12:20:45 EDT
Created attachment 317706 [details]
The xorg.conf I'm using
Comment 8 Matěj Cepl 2008-10-02 12:34:30 EDT
Yes, the first log is also fill of

[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] mieqEnequeue: out-of-order valuator event; dropping.
Comment 9 Gian Paolo Mureddu 2008-10-03 03:08:52 EDT
Little update:

I installed and tried out a build from GIT of the xorg-x11-drv-ati driver (checked out Sept. 30th) and X still hangs. I have not been able to capture in the Xorg.0.log.old or any other X log for that matter (after cleaning [deleting] any X log, by the way) and even when triggering the problem (trying to view at full size and panning a 1600x1200 image in EoG). Also updating to the latest kernel (2.6.26.5-45.fc9) does not help either (in case this might have something to do with the DRM module, radeon or some such). Also, I am using a framebuffer VT (vga=803) if that might have something to do with it. The APIC errors go away when I add noapic command line option.
Comment 10 Matěj Cepl 2008-10-03 16:35:07 EDT
Does putting nomodeset on the kernel command line help? If yes, this is duplicate of bug 464896.
Comment 11 Peng Huang 2008-10-03 21:32:10 EDT
(In reply to comment #10)
> Does putting nomodeset on the kernel command line help? If yes, this is
> duplicate of bug 464896.
X does not crash with nomodeset. But X still display some graphics abnormally.  It is same with kernel from FC9. Please look the video https://bugzilla.redhat.com/attachment.cgi?id=316642 . It was captured from FC9 kernel.
Comment 12 Gian Paolo Mureddu 2008-10-04 23:05:23 EDT
(In reply to comment #10)
> Does putting nomodeset on the kernel command line help? If yes, this is
> duplicate of bug 464896.

Will try this. I'll report back when I have run some tests.
Comment 13 Gian Paolo Mureddu 2008-10-05 01:44:32 EDT
Well, I have tried and can consistently cause the issue even when running with nomodeset. Reliably reproducing the problem with a large image viewed in EoG.
Comment 14 Gian Paolo Mureddu 2008-10-05 22:02:43 EDT
Ok, been trying to give it a shot and try to debug this, but I don't seem able to install xorg-x11-server-debuginfo, there seems to be some missing deps which yum does not seem able to solve.:

su -c 'yum --enablerepo=fedora-debuginfo -y install xorg-x11-server-debuginfo'
Contraseña:
Loaded plugins: fastestmirror, refresh-packagekit
Loading mirror speeds from cached hostfile
 * livna: wftp.tu-chemnitz.de
 * fedora: mirror.newnanutilities.org
 * updates-newkey: mirror.cogentco.com
 * fedora-debuginfo: mirror.cogentco.com
 * updates: mirror.cogentco.com
Setting up Install Process
Parsing package install arguments
Resolving Dependencies
--> Running transaction check
---> Package xorg-x11-server-debuginfo.x86_64 0:1.4.99.901-29.20080415.fc9 set to be updated
--> Processing Dependency: libGLcore.so()(64bit) for package: xorg-x11-server-debuginfo
--> Processing Dependency: libxtrap.so()(64bit) for package: xorg-x11-server-debuginfo
--> Processing Dependency: libdri2.so()(64bit) for package: xorg-x11-server-debuginfo
--> Finished Dependency Resolution
xorg-x11-server-debuginfo-1.4.99.901-29.20080415.fc9.x86_64 from fedora-debuginfo has depsolving problems
  --> Missing Dependency: libdri2.so()(64bit) is needed by package xorg-x11-server-debuginfo-1.4.99.901-29.20080415.fc9.x86_64 (fedora-debuginfo)
xorg-x11-server-debuginfo-1.4.99.901-29.20080415.fc9.x86_64 from fedora-debuginfo has depsolving problems
  --> Missing Dependency: libGLcore.so()(64bit) is needed by package xorg-x11-server-debuginfo-1.4.99.901-29.20080415.fc9.x86_64 (fedora-debuginfo)
xorg-x11-server-debuginfo-1.4.99.901-29.20080415.fc9.x86_64 from fedora-debuginfo has depsolving problems
  --> Missing Dependency: libxtrap.so()(64bit) is needed by package xorg-x11-server-debuginfo-1.4.99.901-29.20080415.fc9.x86_64 (fedora-debuginfo)
Error: Missing Dependency: libxtrap.so()(64bit) is needed by package xorg-x11-server-debuginfo-1.4.99.901-29.20080415.fc9.x86_64 (fedora-debuginfo)
Error: Missing Dependency: libGLcore.so()(64bit) is needed by package xorg-x11-server-debuginfo-1.4.99.901-29.20080415.fc9.x86_64 (fedora-debuginfo)
Error: Missing Dependency: libdri2.so()(64bit) is needed by package xorg-x11-server-debuginfo-1.4.99.901-29.20080415.fc9.x86_64 (fedora-debuginfo)

I'm trying to follow this Debug how to for the Xserver:
http://www.x.org/wiki/Development/Documentation/ServerDebugging
Comment 15 Gian Paolo Mureddu 2008-10-05 22:07:29 EDT
Just noticed that Yum is trying to pull the debuginfo package for Xorg 1.4.99 and not 1.5.2, does this have anything to do with the newkeys repos? If so how do I install the correct debuginfo?
Comment 16 José Matos 2008-10-06 14:10:20 EDT
I have the same problem running rawhide

http://www.smolts.org/client/show/pub_6824ca5e-57e4-4026-b764-c5dc475eb220

The only way to get a running X is to pass nomodeset as a kernel parameter and even in that case I get artifacts in the 40% rightmost part of the screen.
Comment 17 Gian Paolo Mureddu 2008-10-07 02:20:23 EDT
Well, the same type of problem I am running into with the xorg-x11-drv-ati driver is also present with the xorg-x11-drv-radeonhd from Koji, however the log says a couple of interesting things, like it doesn't recognize the card I am using it with (my laptop's) and that AtomBIOS actually reports the right type of card, the chipset is obviously supported... But a couple things caught my eye:

(II) RADEONHD(0): Unknown card detected: 0x791F:0x1179:0xFF1A.
	If - and only if - your card does not work or does not work optimally
	please contact radeonhd@opensuse.org to help rectify this.
	Use the subject: 0x791F:0x1179:0xFF1A: <name of board>
	and *please* describe the problems you are seeing
	in your message.

(II) RADEONHD(0): ATOM BIOS Rom: 
	SubsystemVendorID: 0x1179 SubsystemID: 0xff1a
	IOBaseAddress: 0x9000
	Filename: br26107b.bin
	BIOS Bootup Message: 

ATI Radeon Xpress ?1250? for MW10A

At any rate, I'm starting to suspect EXA to be the culprit. Will have to perform a couple of tests (with driver ati [radeon] and radeonhd, see how it goes). One thing was kind of different, though: With radeonhd the particular problem I am having takes a little bit longer than with the radeon driver, but still occurs (instead of immediately after starting panning an image in EoG, it takes several "passes" for it to happen [anywhere from two to four]), I'll attach the Xorg.0.log of this session anyway, for completeness sake.
Comment 18 Gian Paolo Mureddu 2008-10-07 02:21:35 EDT
Created attachment 319620 [details]
RadeonHD xorg.conf
Comment 19 Gian Paolo Mureddu 2008-10-07 10:57:11 EDT
Well, I tried to disable Option "Accel Method" "EXA" from my xorg.conf and guess what? X stopped hanging or entering any infinite loops. So this indeed relates to EXA, I'll try to report this upstream on their bug tracker, hopefully it'll get some attention from upstream devs.
Comment 20 Hin-Tak Leung 2008-10-28 23:05:04 EDT
I also experience the occasional X hangs with the xorg-x11-drv-ati radeon 
driver, for
 
01:05.0 VGA compatible controller: ATI Technologies Inc RS690M [Radeon X1200 Series]

ssh from another box still works - so the symptom is same as comment 4. I tried debuginfo-install, etc and got all the debuginfo packages, but running

gdb /usr/bin/Xorg <pid> (as root)

gives a strange message abou ptrace not permitted.

If somebody can explain what that gdb message means I can give gdb a try next time it happens...
Comment 21 Gian Paolo Mureddu 2008-10-29 01:47:00 EDT
During the weekend I was able to test a number of distributions and found that the particular problem I was experiencing in Fedora 9 is no longer present in distributions with 1.5.2 XServer (Ubuntu 8.10 and F10
Comment 22 Matěj Cepl 2008-10-29 11:23:56 EDT
(In reply to comment #20)
> gdb /usr/bin/Xorg <pid> (as root)
> 
> gives a strange message abou ptrace not permitted.

That's SELinux -- run (as root) setenforce 0 before running gdb.
Comment 23 Hin-Tak Leung 2008-10-29 11:56:22 EDT
(In reply to comment #22)
Argh, thanks for the SElinux tips. Next time when the x server gets stuck, I will know exactly what to do :-).
Comment 24 Michael Tughan 2008-10-29 15:38:37 EDT
Not sure if this is my same problem but a lot of the symptoms seem to be the same. However, I'm running into this problem with both xorg-x11-drv-ati and the vesa driver for Xorg. This laptop is running an ATI Radeon HD3470. When X starts up, the screen fades in from black, but once that's done, the login box is blank and keeps flashing. I'll get my xorg.conf and Xorg.0.log up as soon as I can. Running the Fedora 10 beta x86_64 with a couple updates, including Xorg 1.5.2-10.fc10 and kernel 2.6.27.4-58.fc10.
Comment 25 Michael Tughan 2008-10-29 15:53:18 EDT
Created attachment 321862 [details]
VESA xorg.conf

The VESA configured xorg.conf. This is also the file I used to roll back to if graphics problems happened and I couldn't debug immediately.
Comment 26 Michael Tughan 2008-10-29 15:55:36 EDT
Created attachment 321863 [details]
New xorg_x11_drv_ati xorg.conf

Customized by looking over this bug before. Exhibits the same symptoms, with the exclusion that it runs at the native res of 1440x900; hardly any use in this situation.
Comment 27 Michael Tughan 2008-10-29 15:57:21 EDT
Created attachment 321864 [details]
Xorg's log

My Xorg.0.log. Probably contains information from both the VESA and xorg_x11_drv_ati configurations, as well as from running with no xorg.conf.
Comment 28 Hin-Tak Leung 2008-10-30 18:54:24 EDT
Created attachment 321993 [details]
kernel oops message, Xorg.conf, Xorg.0.log tgz'ed

I have managed to debuginfo-install all the required stuff, and do setenforce 0,
and ran gdb on the running process. However, it takes forever at the stage of "Attaching ..." and Xorg still uses 100% of CPU, and gdb never managed to attach completely.

So I kill -9 the xserver, the type "reboot" at the ssh session as root; at which point the kernel oops'ed. (I think it also oops'ed in the past when I tried to reboot from ssh). 

So this is the oops message plus a bit and after extracted from my /var/log/message (showing my debuginfo-install as well - so that shows what versions of what is), my xorg.conf - I added the Accel EXA line recently, but the hang is regardless of this line - and also Xorg.0.log.old (the before reboot log, of course). 

Oh, I am running a koji kernel, 2.6.26.7-86.fc9.x86_64 . The hardware is (from lspci):
VGA compatible controller: ATI Technologies Inc RS690M [Radeon X1200 Series]


I think the oops message says that it is stuck in the kernel doing drm stuff - any opnion? I have been eyeing drv-ati-6.9.0 from fc10 for some time now; and it seems to have some relevant stuff. I am a bit reluctant to upgrade *both* the kernel and the bulk of the X for a problem which may or may not be fixed, though...
Comment 29 Matěj Cepl 2008-10-31 08:49:29 EDT
Well, unfortunately, I have to ask you to upgrade -- there were so many changes in both DRM and userspace drivers (not mentioning numerous changes in Xserver itself), that testing of your old versions don't make much sense.
Comment 30 Hin-Tak Leung 2008-10-31 12:51:13 EDT
fair enough... I see quite a few radeon-related changes in rawhide, and some looks relevant. I just like to keep the number of rawhide packages I use to a minimum and only because I need it or am testing something on it - I am using a koji kernel because I am involved with one of the wireless drivers, and having another bleeding-edge piece is going to hurt a little.

I see a 2.6.27.x kernel has just hit f9-candidate, so what are the minimum I need to be up to date without going wholesale rawhide? as far as I can see I need t least 3 pieces, a 2.6.27.x kernel, libdrm* and drv-ati ; anything else?
Comment 31 Hin-Tak Leung 2008-10-31 14:24:24 EDT
So I grafted a whole bunch of rawhide rpms onto my fedora 9 system,
and I don't know if I'll get a X server hang yet (it only happens about once
every couple of days), but I am have boot-up problem - without nomodeset, it hangs at the end of the progress bar and never get to the GDM log in screen.
Also, I am observing tears/flickers when the screen scrolls - before and after 
the upgrade, but seems more frequent after. So the upgrade has not been good so far.

-----------------
xorg-x11-server-Xorg-1.5.2-10.fc10            Fri 31 Oct 2008 18:00:13 GMT
xorg-x11-server-common-1.5.2-10.fc10          Fri 31 Oct 2008 18:00:11 GMT
mesa-libGL-devel-7.2-0.13.fc10                Fri 31 Oct 2008 17:37:41 GMT
xorg-x11-drv-ati-6.9.0-38.fc10                Fri 31 Oct 2008 17:37:36 GMT
mesa-libGL-7.2-0.13.fc10                      Fri 31 Oct 2008 17:37:36 GMT
kernel-2.6.27.4-69.fc10                       Fri 31 Oct 2008 17:37:01 GMT
libdrm-2.4.0-0.21.fc10                        Fri 31 Oct 2008 17:36:49 GMT
kernel-devel-2.6.27.4-69.fc10                 Fri 31 Oct 2008 17:34:48 GMT
kernel-doc-2.6.27.4-69.fc10                   Fri 31 Oct 2008 17:34:02 GMT
libdrm-devel-2.4.0-0.21.fc10                  Fri 31 Oct 2008 17:33:22 GMT
kernel-headers-2.6.27.4-69.fc10               Fri 31 Oct 2008 17:33:16 GMT
mkinitrd-6.0.68-1.fc10                        Fri 31 Oct 2008 17:33:13 GMT
plymouth-0.6.0-0.2008.10.27.7.fc10            Fri 31 Oct 2008 17:33:11 GMT
plymouth-plugin-solar-0.6.0-0.2008.10.27.7.fc10 Fri 31 Oct 2008 17:33:09 GMT
fedora-logos-10.0.0-2.fc10                    Fri 31 Oct 2008 17:32:45 GMT
plymouth-scripts-0.6.0-0.2008.10.27.7.fc10    Fri 31 Oct 2008 17:32:44 GMT
plymouth-plugin-label-0.6.0-0.2008.10.27.7.fc10 Fri 31 Oct 2008 17:32:43 GMT
nash-6.0.68-1.fc10                            Fri 31 Oct 2008 17:32:41 GMT
plymouth-libs-0.6.0-0.2008.10.27.7.fc10       Fri 31 Oct 2008 17:32:37 GMT
mesa-libGL-7.2-0.13.fc10                      Fri 31 Oct 2008 17:32:36 GMT
mesa-dri-drivers-7.2-0.13.fc10                Fri 31 Oct 2008 17:32:35 GMT
initscripts-8.84-1                            Fri 31 Oct 2008 17:32:29 GMT
libdrm-2.4.0-0.21.fc10                        Fri 31 Oct 2008 17:32:04 GMT
kernel-firmware-2.6.27.4-69.fc10              Fri 31 Oct 2008 17:24:47 GMT
----------------
Comment 32 Hin-Tak Leung 2008-10-31 15:43:05 EDT
Okay, further to my previous comment, I have a hang, so the upgrade did not help.
Also, unfortunately with the new kernel, I did not have the wireless-related bleeding-edge update I mentioned, so my ssh session died during debuginfo-install. 

All in all, the effort went into upgrade is fruitless, and seems to bring more problem (the more frequent flicker/tear and requires nomodeset) than it solves...
Comment 33 Hin-Tak Leung 2008-11-01 12:11:41 EDT
Created attachment 322174 [details]
gdb backtrace while it  hungs 

okay, I have a gdb backtrace with most of the debug info - 
this time it seems to be stuck from ioctl () calling from drmDMA()
calling from RADEONCPGetBuffer(). I was launching firefox recovering a crashed session - i.e. launching a lot of windows at the same time. This may or may not be related - afterall, firefox is one of the most used application. 

I have drv-ati, libdrm and kernel from koji, and drv-ati-debuginfo from koji also, and the rest from rawhide, but debuginfo seems to pick up libdrm-debug info from rawhide.  

---
xorg-x11-drv-ati-debuginfo-6.9.0-38.fc10      Fri 31 Oct 2008 19:39:21 GMT
openssl-debuginfo-0.9.8g-9.fc9                Fri 31 Oct 2008 19:38:27 GMT
libpciaccess-debuginfo-0.10.3-2.fc9           Fri 31 Oct 2008 19:38:23 GMT
libdrm-debuginfo-2.4.0-0.21.fc10              Fri 31 Oct 2008 19:38:22 GMT
pixman-debuginfo-0.10.0-1.fc9                 Fri 31 Oct 2008 19:38:21 GMT
dbus-debuginfo-1.2.4-1.fc9                    Fri 31 Oct 2008 19:38:18 GMT
libselinux-debuginfo-2.0.67-4.fc9             Fri 31 Oct 2008 19:38:15 GMT
audit-debuginfo-1.7.5-1.fc9                   Fri 31 Oct 2008 19:38:11 GMT
libXau-debuginfo-1.0.3-5.fc9                  Fri 31 Oct 2008 19:38:07 GMT
hal-debuginfo-0.5.11-2.fc9                    Fri 31 Oct 2008 19:38:04 GMT
glibc-debuginfo-2.8-8                         Fri 31 Oct 2008 19:37:28 GMT
xorg-x11-server-debuginfo-1.5.2-10.fc10       Fri 31 Oct 2008 19:36:48 GMT
libfontenc-debuginfo-1.0.4-5.fc9              Fri 31 Oct 2008 19:36:35 GMT
libXfont-debuginfo-1.3.2-1.fc9                Fri 31 Oct 2008 19:36:31 GMT
libXdmcp-debuginfo-1.0.2-5.fc9                Fri 31 Oct 2008 19:36:27 GMT
---------

This should be useful to somebody...
Comment 34 Hin-Tak Leung 2008-11-01 18:17:33 EDT
Created attachment 322189 [details]
another gdb, x server stuck in a differen place.

gdb backtrace with the latest koji (xorg-x11-drv-ati-6.9.0-41 , newer than my
last gdb collected with -38, please note), where the x server shoots up to 100 to 150% CPU (dual core). mouse *movement* still works, but neither clicking nor keyboard works, and also some apps no longer refresh. The way I interprete it as a hang is when the gnome System Monitor applet no longer moves.

I was just opening a bookmark in firefox at the time. (i.e. one broswer window, or maybe another behind).

still booting with nomodeset (without it, my machine won't ever go into GDM).
Comment 35 Hin-Tak Leung 2008-11-01 19:29:01 EDT
Created attachment 322195 [details]
gdb backtrace 3

3rd backtrace, yet another one, also while firefox restore sessions.
Comment 36 Hin-Tak Leung 2008-11-01 19:30:14 EDT
Created attachment 322196 [details]
gdb backtrace #4

yet another, also while firefox restores session.
Comment 37 Hin-Tak Leung 2008-11-01 20:15:06 EDT
Argh, I feel really stupid not reading comment 19 carefully. All 4 of my gdb backtraces involves exa* routines. I think I was confused by comment 4 suggesting 
EXA is the way to go, and also that in Xorg.0.log, when XAA (default) is used,
there is a warning and suggestion to use EXA instead - therefore I have been using EXA quite soon (maybe the 2nd reboot, etc after I saw the warning in Xorg.0.log) after I switched from fglrx.

In any case, I have switched back to XAA, and voila, I can restore my firefox sessions, and the flicker and tearing is gone.

So - can somebody either fix EXA (based on the gdb backtrace) or remove the XAA-related warning in the Xorg.0.log? It is embarrassing that 
"(II) RADEON(0): XAA Render acceleration unsupported on Radeon 9500/9700 and newer. Please use EXA instead." 

unsupported acceleration works better than supported one...

Anyway, I'll wait and see if I ever get a hang with XAA. It is a bit stupid that I had EXA enabled very soon after I switched from fglrx, based on that message in Xorg.0.log.
Comment 38 Andy Lutomirski 2008-11-02 11:18:47 EST
I can reproduce this with no xorg.conf on Fedora 10.  This is with nomodeset on the kernel command line (I can't boot at all without that).  Reproducing takes about five seconds of anything in KDE (even opening Konsole).  Switching to XAA seems to fix it.

killall -9 X doesn't kill X when this happens.

I got dmesg after a hang with drm debug=1 and radeon dynclks=0:

[drm:radeon_do_wait_for_fifo] wait for fifo failed status : 0x9003C100 0x00080100
[drm:drm_ioctl] ret = fffffff0
[drm:drm_ioctl] pid=2749, cmd=0x6444, nr=0x44, dev 0xe200, auth=1
[drm:radeon_cp_idle] 
[drm:radeon_do_cp_idle] 
[drm:radeon_do_wait_for_fifo] wait for fifo failed status : 0x9003C100 0x00080100
[drm:drm_ioctl] ret = fffffff0
[drm:drm_ioctl] pid=2749, cmd=0x6444, nr=0x44, dev 0xe200, auth=1
[drm:radeon_cp_idle] 
[drm:radeon_do_cp_idle] 
[drm:radeon_do_wait_for_fifo] wait for fifo failed status : 0x9003C100 0x00080100
[drm:drm_ioctl] ret = fffffff0
[drm:drm_ioctl] pid=2749, cmd=0x6444, nr=0x44, dev 0xe200, auth=1
[drm:radeon_cp_idle] 
[drm:radeon_do_cp_idle] 
[drm:radeon_do_wait_for_fifo] wait for fifo failed status : 0x9003C100 0x00080100
[drm:drm_ioctl] ret = fffffff0
[drm:drm_ioctl] pid=2749, cmd=0x6444, nr=0x44, dev 0xe200, auth=1
[drm:radeon_cp_idle] 
[drm:radeon_do_cp_idle] 
[drm:radeon_do_wait_for_fifo] wait for fifo failed status : 0x9003C100 0x00080100
[drm:drm_ioctl] ret = fffffff0
[drm:drm_ioctl] pid=2749, cmd=0x6444, nr=0x44, dev 0xe200, auth=1
[drm:radeon_cp_idle] 
[drm:radeon_do_cp_idle] 
[drm:radeon_do_wait_for_fifo] wait for fifo failed status : 0x9003C100 0x00080100
[drm:drm_ioctl] ret = fffffff0
[drm:drm_ioctl] pid=2749, cmd=0x6444, nr=0x44, dev 0xe200, auth=1
[drm:radeon_cp_idle] 
[drm:radeon_do_cp_idle] 
[drm:radeon_do_wait_for_fifo] wait for fifo failed status : 0x9003C100 0x00080100
[drm:drm_ioctl] ret = fffffff0
[drm:drm_ioctl] pid=2749, cmd=0x6444, nr=0x44, dev 0xe200, auth=1
[drm:radeon_cp_idle] 
[drm:radeon_do_cp_idle] 
[drm:radeon_do_wait_for_fifo] wait for fifo failed status : 0x9003C100 0x00080100
[drm:drm_ioctl] ret = fffffff0
[drm:drm_ioctl] pid=2749, cmd=0x6444, nr=0x44, dev 0xe200, auth=1
[drm:radeon_cp_idle] 
[drm:radeon_do_cp_idle] 

and lots more.

My package versions are:

xorg-x11-drv-ati-6.9.0-38.fc10.x86_64
kernel-2.6.27.4-68.fc10.x86_64
xorg-x11-server-Xorg-1.5.2-10.fc10.x86_64

lspci says:

01:05.0 VGA compatible controller: ATI Technologies Inc Radeon 2100 (prog-if 00 [VGA controller])
        Subsystem: Giga-byte Technology Device d000
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 32, Cache Line Size: 4 bytes
        Interrupt: pin A routed to IRQ 18
        Region 0: Memory at d8000000 (64-bit, prefetchable) [size=128M]
        Region 2: Memory at fdef0000 (64-bit, non-prefetchable) [size=64K]
        Region 4: I/O ports at ee00 [size=256]
        Region 5: Memory at fdd00000 (32-bit, non-prefetchable) [size=1M]
        Capabilities: [50] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [80] Message Signalled Interrupts: Mask- 64bit+ Count=1/1 Enable-
                Address: 0000000000000000  Data: 0000
        Kernel modules: radeon
00: 02 10 6e 79 07 00 10 00 00 00 00 03 01 20 00 00
10: 0c 00 00 d8 00 00 00 00 04 00 ef fd 00 00 00 00
20: 01 ee 00 00 00 00 d0 fd 00 00 00 00 58 14 00 d0
30: 00 00 00 00 50 00 00 00 00 00 00 00 0a 01 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 58 14 00 d0
50: 01 80 02 06 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 05 00 80 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Comment 39 Hin-Tak Leung 2008-11-04 14:17:20 EST
Created attachment 322470 [details]
gdb backtrace of my first hang under XAA

since switching to XAA, this is my first hang, so I have to say XAA seems to work better...

I had a kernel upgrade since, and these are the relevant parts:

kernel-2.6.27.4-73.fc10                       Mon 03 Nov 2008 21:30:27 GMT
kernel-headers-2.6.27.4-73.fc10               Mon 03 Nov 2008 21:30:09 GMT
kernel-devel-2.6.27.4-73.fc10                 Mon 03 Nov 2008 21:25:27 GMT
kernel-doc-2.6.27.4-73.fc10                   Mon 03 Nov 2008 21:22:41 GMT
kernel-firmware-2.6.27.4-73.fc10              Mon 03 Nov 2008 21:21:57 GMT
xorg-x11-drv-ati-debuginfo-6.9.0-41.fc10      Sat 01 Nov 2008 21:21:31 GMT
xorg-x11-drv-ati-6.9.0-41.fc10                Sat 01 Nov 2008 21:21:12 GMT
xorg-x11-server-Xorg-1.5.2-10.fc10            Fri 31 Oct 2008 18:00:13 GMT
xorg-x11-server-common-1.5.2-10.fc10          Fri 31 Oct 2008 18:00:11 GMT
libdrm-2.4.0-0.21.fc10                        Fri 31 Oct 2008 17:36:49 GMT






kernel 2.6.27.4-73.fc10.x86_64
Comment 40 Mace Moneta 2008-11-07 13:30:20 EST
I'm having the same problem with Intel G45 X4500HD.  Xorg.0.log shows:

[mi] EQ overflowing. The server is probably stuck in an infinite loop.

Backtrace:
0: /usr/bin/Xorg(xorg_backtrace+0x26) [0x4e7746]
1: /usr/bin/Xorg(mieqEnqueue+0x291) [0x4c82b1]
2: /usr/bin/Xorg(xf86PostMotionEventP+0xc4) [0x490be4]
3: /usr/bin/Xorg(xf86PostMotionEvent+0xa9) [0x490db9]
4: /usr/lib64/xorg/modules/input//evdev_drv.so [0x1445126]
5: /usr/bin/Xorg [0x47d495]
6: /usr/bin/Xorg [0x468f67]
7: /lib64/libc.so.6 [0x312b033100]
8: /lib64/libc.so.6(ioctl+0x7) [0x312b0de207]
9: /usr/lib64/libdrm.so.2 [0x3142e03023]
10: /usr/lib64/libdrm.so.2(drmWaitVBlank+0x20) [0x3142e036c0]
11: /usr/lib64/dri/i965_dri.so [0x61fd092]
12: /usr/lib64/dri/i965_dri.so(driWaitForVBlank+0xcb) [0x61fd293]
13: /usr/lib64/dri/i965_dri.so(intelSwapBuffers+0x23f) [0x6202d7a]
14: /usr/lib64/dri/i965_dri.so [0x61fd3d6]
15: /usr/lib64/xorg/modules/extensions//libglx.so [0xc0572f]
16: /usr/lib64/xorg/modules/extensions//libglx.so [0xbf9656]
17: /usr/lib64/xorg/modules/extensions//libglx.so [0xbfc8f2]
18: /usr/bin/Xorg(Dispatch+0x364) [0x446894]
19: /usr/bin/Xorg(main+0x45d) [0x42ccdd]
20: /lib64/libc.so.6(__libc_start_main+0xe6) [0x312b01e546]
21: /usr/bin/Xorg [0x42c0b9]
[mi] mieqEnequeue: out-of-order valuator event; dropping.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
Comment 41 Samuel Sieb 2008-11-10 22:54:53 EST
I'm getting the same thing now on F9 fully up-to-date with current packages.  The X server hangs with 100% CPU and I see those [mi] messages in the X log.  gdb never seems to be able to attach to the process.  But I can ssh in and reboot the computer.  I have an on-board RS690.
Comment 42 Hin-Tak Leung 2008-11-11 09:19:25 EST
Created attachment 323173 [details]
another gdb strace under XAA, with more recent drv-ati

This is very similiar to attachment 322470 [details] , comment 39, except I am using a more recent kernel-2.6.27.5-92.fc10  and xorg-x11-drv-ati-6.9.0-44.fc10 from koji.

Despite the log entries of both kernel-2.6.27.5-92.fc10 and 
xorg-x11-drv-ati-6.9.0-44.fc10 saying some work has gone into redeon-modesetting,
it is still not working reliably... while the frequency of hang since I switch 
to XAA has drammatically drop (only once a few days now, unlike under EXA which can be as frequent as a few times per day), it is nonetheless frustrating to have a hang and having to find a different machine to ssh in to run gdb and reboot it...
Comment 43 Gian Paolo Mureddu 2008-11-11 11:13:41 EST
This is definitely a problem within xorg-XServer 1.5.0.2 rather than the driver. I've switched to rawhide and with XServer 1.5.2 and Xorg-drv-ati 6.9.0 things are working now much, MUCH better, EXA no longer freezes the computer. To rule-out any Fedora specific changes, I did build from source Xorg-drv-ati straight from the release tarball, just as I did previously in Fedora 9, but with XServer 1.5.2 instead. It was already suggested to me in the Xorg bugzilla that this problem might be a regression introduced in some patch to Xorg from Fedora, which lead me to try with a more recent XServer, I don't know if XServer 1.5.2 from F10 will ever cascade down to F9, or if the offending bits will be removed from it (problem being identifying such bits).
Comment 44 Hin-Tak Leung 2008-11-11 11:42:40 EST
I am already using xserver 1.5.2 from rawhide/f10. (see my previous post about specific partial upgrades). 

# rpm -qa | grep -i x11-server
xorg-x11-server-debuginfo-1.5.2-10.fc10.x86_64
xorg-x11-server-devel-1.5.0-2.fc9.x86_64
xorg-x11-server-Xorg-1.5.2-10.fc10.x86_64
xorg-x11-server-utils-7.4-1.fc9.x86_64
xorg-x11-server-common-1.5.2-10.fc10.x86_64

In any case, this issue and the modeset boot-hang problem (which is not related but also to do with changes introduced to interaction with display hardware) needs to be addresses before f10, I think.
Comment 45 Hin-Tak Leung 2008-11-12 23:01:33 EST
Created attachment 323420 [details]
xorg log, showing a backtrace under kernel 2.6.27.5-101.fc10.x86_64

A new regression with kernel 2.6.27.5-101.fc10.x86_64 from koji. X server backtraces while trying to start.

With nomodeset (won't boot without), boot to the point where GDM tries to start but one only gets the blue star background plus the battery icon - no other GDM session stuff, and no log-in box. Mouse pointer still moves very slowly. Since there is no log-in box, and no GDM reboot buttons, etc, and networking isn't functional yet (no network manager), one has to power-cycle the box. Tried it twice.

booting and older kernel 2.6.27.5-94.fc10.x86_64 - get around this. So it looks like one of these three changes is sh*t:

* Wed Nov 12 2008 Dave Airlie <airlied@redhat.com> 2.6.27.5-101
- drm/intel: further interrupt fixes

* Tue Nov 11 2008 Chuck Ebbert <cebbert@redhat.com> 2.6.27.5-100
- Check for additional ATI chipset timer bugs (#470939, #470723)

* Tue Nov 11 2008 Dave Airlie <airlied@redhat.com> 2.6.27.5-99
- drm rebase patches against latest upstream tree.
Comment 46 Mace Moneta 2008-11-12 23:19:46 EST
I had similar problems with -101; getting black screens after plymouth as gdm was about to start.  I'd have to reboot multiple times before it worked.  Dropped back to -100, and it's working again.  That makes the only change in -101 the culprit:

* Wed Nov 12 2008 Dave Airlie <airlied@redhat.com> 2.6.27.5-101
- drm/intel: further interrupt fixes

I'm on a Supermicro C2SEA motherboard, G45 X4500HD.
Comment 47 Hin-Tak Leung 2008-11-13 00:09:32 EST
I am on a toshiba laptop with 
ATI Technologies Inc RS690M [Radeon X1200 Series].

To be honest I am a bit disappointed with the lack of progress with the nomodeset issue and this hang issue - consider how important either is, and consider how close fedora 10's supposed release date (10 days?) now - with the nomodeset flag, all the boot-up eye-candy is gone and worse than pre-plymonth-time and back to win 3.1 level, and this hang (particularly the latest which affects GDM) make long-session X usage rather hazzardous. 

I am hoping to try out koji as often as possible, and I hope this get fixed before fc10 is released.
Comment 48 Gian Paolo Mureddu 2008-11-13 01:22:57 EST
I'm really starting to wonder where this problem lies, is it in Xorg, or the driver? I lean towards a problem in Xorg since this happens with either Xorg-ati-drv or Xorg-radeonhd-drv, and even though both share quite a bit of code (in fact DRI support for R500 in the radeonhd driver is imported straight from the ati driver). Is there a way to check this against an unpatched version of the XServer? Maybe the focus should be to try to determine where the regression occurred.
Comment 49 Dave Airlie 2008-11-13 02:10:52 EST
this bug is a bit all over the place, 48 comments and nothing on?

Please try -104 when it finished to fix any regression since -94.

However can the ati person open a new bug so this remains the intel persons bug. X hanging isn't always the same reason, and certainly not when using different hw.
Comment 50 Hin-Tak Leung 2008-11-13 03:21:46 EST
(In reply to comment #49)
> this bug is a bit all over the place, 48 comments and nothing on?
> 
> Please try -104 when it finished to fix any regression since -94.
> 
> However can the ati person open a new bug so this remains the intel persons
> bug. X hanging isn't always the same reason, and certainly not when using
> different hw.

I'll give -104 a try when it finishes building. Is there a mistake somewhere - the original poster listed drv-ati in his report, and explicitly using "radeon" in his xorg.conf in comment 3... and even the compoent part of the bug report itself say "ati-drv" - I thought this is the drv-ati/radeon bug entry,
- i.e. intel or even radeonhd users should go elsewhere? 

Perhaps I was being a bit harsh in comment 47. This seems to be a driver <-> kernel-drm interaction problem, which is difficult to debug. About 1/4 of the 48 comments are mine, but half of mine are detailed gdb backtraces including debugging line-numbers with various koji builds, and should be useful...
Comment 51 Gian Paolo Mureddu 2008-11-13 03:53:56 EST
(In reply to comment #49)
> this bug is a bit all over the place, 48 comments and nothing on?
> 
> Please try -104 when it finished to fix any regression since -94.
> 
> However can the ati person open a new bug so this remains the intel persons
> bug. X hanging isn't always the same reason, and certainly not when using
> different hw.

I thought the Bug reporter was using ATi hardware? Certainly the packages stated at the beginning of the bug report clearly state ATi hardware (Xorg modules r128 [for Rage 128, I assume] and ati 6.9.0 are listed), so when did this become an Intel-related bug?
Comment 52 Dave Airlie 2008-11-13 04:10:31 EST
ah the bug just got messy in the middle then.

okay so its an ati problem on an r500. So G45 people go to another bug. ATI people stay.

However the bug is now impossible to follow, so I've no idea what people are important to considered it fixed or how I can close it ever.


In any case -104 should fix one bunch of dumb regressions, and I'm hoping to push some more fixes in for -ati before GA.
Comment 53 Gian Paolo Mureddu 2008-11-13 04:17:59 EST
Ok, so let me get this straight, kernel -104 contains a number of regression fixes that may be relevant to the people experiencing this problem (ATi hardware)?

Will try with it and do some more testing then report back either success or failure. In my experience, however, since I "updated to rawhide" a few weeks ago, I have not seen this particular issue as frequently as it used to be (if anything has happened to me on counted occasions), but it does still happen (especially with composite enabled, EXA and has been reproducible with Cairo-dock).
Comment 54 Hin-Tak Leung 2008-11-13 05:39:26 EST
I think only comment 40 and comment 46 (same person) has intel hardware, but comment 46 is relevant...

The way I see the *summary* of this bug is relatively simple - most people using rawhide have the occasional symptom of mouse movement still works but nothing GUI else, but ssh from outside still works; same with f9 (I was one of them but I have upgraded). At least two of us found that switching over from EXA to XAA helped substantially; I have provided a few gdb backtraces with line numbers against koji builds, under both EXA and XAA. The backtrace shows that the stuck happens when trying to copy data buffer between userland and kernel through drm's ioctl. Under EXA, I can quite reliably trigger it by restoring a firefox session with a dozen widows totally 100+ tabs.

If I were more familiar with how the X server works (which unfortunately I am not, but I am reasonably competent with "general debugging") and its code base, this is how I would try to fix this: 

- the X server should give up and go back to unaccelerated mode after a certain number of unsuccessful tries of copying data through the kernel, rather than keep hammering the kernel repeatedly and quickly; 

- drm retries should have a sleep/pause occasionally to let other human-interface events such as switching VT by key strokes through; so that it is possible to reset the X server with ctrl-alt-backspace or ctrl-alt-f<n> reboot
when everything fails...

- the kernel drm should process ioctl correctly, check parameters, and quickly; if there is anything wrong with the incoming ioctl, emit some debug info through dmesg for debugging. I know there is "modprobe drm debug=1", but that's not useful because most people let the Xserver do the modprobe and never have debug on, until the problem happens... so *some* of what debug=1 does should be the default until this is fixed...

Does this sound like a reasonable plan?
Comment 55 Gian Paolo Mureddu 2008-11-13 06:21:35 EST
The way you describe this, and the way this problem is presenting (seems to occur on more hardware than the one pertaining this bug) it would seem as if the XServer and DRM API were clashing at some point (that's the way I understand what you are saying). So if I understand correctly this problem *seems* to be the XServer pushing data through the DRM module, and when the DRM module cannot react to these XServer petitions, the XServer simply keeps on hammering the DRM module, rendering all I/O to a halt, is that some how it? If so, this seems to be clearly an upstream problem, the issue is "whose?". The only thing I can think of on Fedora that might be different from other systems that I've tested (with Xorg 7.4/ati drv 6.9.0) are patches to both the XServer and kernel (involving DRM)... Does this sound like a reasonable "conclusion"?
Comment 56 Hin-Tak Leung 2008-11-13 07:08:05 EST
(In reply to comment #55)
I have done a bit of homework some weeks ago and AFAIK, unfortunately, there is no "upstream". Dave Airlie, the assignee of this bug, is also a substantial contributor to drv-ati and kernel drm . ( See: 
http://cgit.freedesktop.org/xorg/driver/xf86-video-ati/log/
http://cgit.freedesktop.org/~airlied/drm/log/ )
So nobody should be making any comments about patches being rawhide-specific, because it is simply that changes land in rawhide *first*, and he
should be one of the most qualified people to deal with this.
Comment 57 Gian Paolo Mureddu 2008-11-13 12:18:58 EST
I didn't mean the comment to sound like I was placing the "blame" on anyone, rather that I was following what was suggested in the Xrog tracker (where I also posted this bug), and IIRC it was Alex Deucher who suggested that the root of the problem *could* be a regression in Fedora's XServer (maybe due to a patch). I know many of the Fedora developers are alos "upstream" developers, which means that a LOT of changes land first (than any other distribution) in Fedora's Rawhide. I apologize if my comment somehow sounded like I was "seeking a culprit", I was not, rather following leads.
Comment 58 Hin-Tak Leung 2008-11-14 19:32:59 EST
Created attachment 323671 [details]
gdb backtrace with kernel-2.6.27.5-104 and XAA ati-6.9.0-44

-104 indeed fixes the -101 regression. Thanks.

I have another hang with -104. I also noticed that where the X server got stuck is a little different (still drm/ioctl-related). So hope this new info is useful.
I ran gdb three times to see if
the x server is merely cycling through a few states very fast, and so this show three. (and seems to be identical). This is still with 6.9.0-44. I just upgraded to 6.9.0-46 before rebooting (I suppose I'll look for a koji upgrade whenever I have a hang, just hoping it will go away eventually).
Comment 59 Bill Nottingham 2008-11-17 13:46:18 EST
Can you test the 113 kernel available at http://kojipkgs.fedoraproject.org/packages/kernel/2.6.27.5/113.fc10/?
Comment 60 Will Woods 2008-11-17 13:56:08 EST
Changing bug summary to be a bit more specific, following airlied's lead from
comment #52.

(In reply to comment #58)
> I have another hang with -104. I also noticed that where the X server got stuck
> is a little different (still drm/ioctl-related). So hope this new info is
> useful.

You said "another hang with -104" - when does your system hang? How do you know
the other regression is fixed if it's still hanging at startup?
Comment 61 Hin-Tak Leung 2008-11-17 14:24:47 EST
(In reply to comment #60)
> You said "another hang with -104" - when does your system hang? How do you know
> the other regression is fixed if it's still hanging at startup?

I meant that the specific regression with -101 (GDM won't start) is gone, but -104 still have the the general problem with the older -94/-91 releases, namely
once in a while, mouse movement still works but clicking has no effect, ssh inwards works and Xorg consumes 100% CPU and is found to be in drm/ioctl routines. (the specific code location seem to be slightly different). Sorry about the imprecise wording.
Comment 62 Will Woods 2008-11-17 14:38:17 EST
Okay, so we've fixed the "X hang at startup" problem, which was the original bug report here. I'd suggest we close this bug now, but there's a lot of good debugging info, so I'm just changing the subject of the bug to reflect the current status.

So now we're back on "EQ overflowing. The server is probably stuck in an infinite loop.", as seen in numerous other bugs - bug 444449 (i945 / Radeon M6 LY), bug 464866 (i945), etc. Specifically with r500 chips. Right?

Does this happen randomly, or only when running compiz / switching terminals / resuming from suspend / etc?
Comment 63 Hin-Tak Leung 2008-11-17 14:57:32 EST
Even the initial report wasn't about "hang at start-up" - it is about mouse-pointer moving but no click actions "right after login-in". (which in real terms is quite a lot further after X server starts).

My "EQ overflowing" tends to happen when I am doing something with firefox (maybe it is just because it is a frequently used application), e.g. scrolling or switching tabs, or when dragging gnome-terminal windows around. I have an RS690M (X1200) - I read somewhere that RS690M is technically an r300 rather than a r500/r600. (very confusing).

I can think of one way where the initial poster's report fits into the later pattern: if the initial poster has his session preference configured to automatically restore favourite applications, and one of them is firefox, for example.
Comment 64 Samuel Sieb 2008-11-17 15:10:35 EST
I have an RS690 and the problem was occurring on facebook with selecting someone to send a message to, where it pops up a list.  This is consistent with all other hangs I've had where it's been something overlaid that fades in (or possibly out).  In the past it's been compiz, but in this case, there was no compiz involved.
Comment 65 Dave Airlie 2008-11-17 16:10:44 EST
can you try the latest X server? 1.5.3-5 is in koji.

I've fixed some dodgy EXA paths that affect radeon
Comment 66 Hin-Tak Leung 2008-11-17 18:09:17 EST
(In reply to comment #65)
> can you try the latest X server? 1.5.3-5 is in koji.
> 
> I've fixed some dodgy EXA paths that affect radeon

Yes, EXA seems to work a bit better now.

kernel-2.6.27.5-113.fc10                      Mon 17 Nov 2008 21:40:42 GMT
xorg-x11-drv-ati-6.9.0-48.fc10                Mon 17 Nov 2008 21:38:30 GMT
xorg-x11-server-Xorg-1.5.3-5.fc10             Mon 17 Nov 2008 21:38:21 GMT
xorg-x11-server-common-1.5.3-5.fc10           Mon 17 Nov 2008 21:38:19 GMT
Comment 67 Matěj Cepl 2008-11-18 08:23:36 EST
(In reply to comment #66)
> Yes, EXA seems to work a bit better now.

Ehm, sorry, "a bit better"? Is this bug fixed or not?
Comment 68 Hin-Tak Leung 2008-11-18 08:43:00 EST
(In reply to comment #67)
> (In reply to comment #66)
> > Yes, EXA seems to work a bit better now.
> 
> Ehm, sorry, "a bit better"? Is this bug fixed or not?

Under EXA, scrolling gnome-terminals shows tearing (which isn't visible with XAA).
Probably unrelated. 

The thing is, nobody has a "reliable" way of getting "EQ overflowing". Unless somebody comes along with an identification of where/how that can happen in code paths, we'll just have to wait and see...

BTW, I have just got onto drv-ati-6.9.0-51.fc10 and kernel-2.6.27.5-113.fc10 .
(and will continue to look at koji). If I get a "EQ overflowing", you will hear from me...
Comment 69 Jesse Keating 2008-11-18 17:38:19 EST
We believe that the problems reported by the original poster of this bug have been fixed with the latest X server.  If you have other specific problems, please file those as new bug reports.
Comment 70 Peng Huang 2008-11-18 21:48:01 EST
X dose not hang still now. But the ati drivers still have two problems.
1. Like Hin-Tak Leung said: scrolling gnome-terminals shows tearing
2. When compiz is running, X server will be very slow, and the shadow around the window can not be displayed correctly.

Version of kernel & ati driver:
kernel-2.6.27.5-116.fc10.i686
xorg-x11-drv-ati-6.9.0-53.fc10.i386
Comment 71 Peng Huang 2008-11-18 21:50:48 EST
Created attachment 323992 [details]
Screenshot with compiz
Comment 72 Hin-Tak Leung 2008-11-19 01:43:52 EST
Created attachment 324005 [details]
another gdb backtrace - sorry, it hung...

Sorry guys, can somebody re-open the bug?

It hung with the latest(?), and here is the gdb back strace. Still stuck at drm/ioctl so I say it is the same bug. Please re-open.

xorg-x11-drv-ati-6.9.0-51.fc10                Tue 18 Nov 2008 12:22:18 GMT
kernel-2.6.27.5-116.fc10                      Tue 18 Nov 2008 12:21:38 GMT
xorg-x11-server-Xorg-1.5.3-5.fc10             Mon 17 Nov 2008 21:38:21 GMT

This time I was re-positioning a gnome-terminal window when it happened. (firefox at the back but it basically is there most of the time anyway).

I see kernel-2.6.27.5-120.fc10, xorg-x11-drv-ati-6.9.0-54.fc10 are out in koji in the last few hours... so I'll upgrade, I guess.
Comment 73 Hin-Tak Leung 2008-11-19 13:02:05 EST
Created attachment 324080 [details]
another gdb backtrace under kernel-2.6.27.5-120.fc10, xorg-x11-drv-ati-6.9.0-54.fc10

So I have upgraded to even more latest kernel drm and drv-ati since I posted attachement 324005; and have another one of those "mouse moves but nothing else"
moments with kernel-2.6.27.5-120.fc10, xorg-x11-drv-ati-6.9.0-54.fc10 .

This time it happened while I was dragging a view-page-source-window of firefox
around.
Comment 74 Dave Airlie 2008-11-19 16:19:41 EST
This an XAA backtrace.... so you are booting with nomodeset and XAA accel?

so not the same problem at all. open a new bug clearly stating nomodeset + XAA
Comment 75 Hin-Tak Leung 2008-11-19 16:52:04 EST
(In reply to comment #74)
> This an XAA backtrace.... so you are booting with nomodeset and XAA accel?
> 
> so not the same problem at all. open a new bug clearly stating nomodeset + XAA

Both are correct - I have tried removing nomodeset every time I reboot to a new koji kernel. So far, every one of them hangs at the end of the blue-start backgrounded progress bar just before the X server starts. Is removing nomodeset supposed to work now? It isn't.

As for XAA... I am sorry, but the tearing while scrolling under EXA is very unsighty. I'll be happy to switch to EXA for general use (or general debugging, for that matter) if the tearing goes away... besides, the last time I checked, XAA is the default (i.e. if neither is specified in xorg.conf) despite it subsequently emits a warning in Xorg.0.log. So at the possible risk of either getting stuck, I would opt for XAA instead of EXA, just because it works better. (despite it saying "unsuported"). What I mean is, EXA is supported but no-good. :-(.

This is a bit curious - according to xorg.conf in comment 2, the original poster did not specify an Accel method, so I would have thought XAA is used, but his log shows EXA.
Comment 76 Hin-Tak Leung 2008-11-19 17:16:38 EST
okay, so I have removed Accel from xorg.conf and rebooted. The default is EXA now, so I'll let it run that way until it get stucks(!). Still needs nomodeset to boot. 

The tearing on scroll is quite noticeable - e.g. with the default blue background, start one gnome-terminal, run dmesg to have some text, then scroll back with the scroll-bar. (same tearing happens for firefox as well). Also there are occasional flicker, from what looks like windows moving side-ways by a few pixels and back? - so EXA seems buggy. (neither the tearing nor flicker happen under XAA).

I'll fill a separate bug with XAA then...

I just found a "reliable" way of seeing the flicker - moving the mouse pointer quickly up-and-down across the bottom edge of the comment text-box! (saw it when I move the mouse just before pressing "commit").
Comment 77 Hin-Tak Leung 2008-11-20 12:27:43 EST
Created attachment 324210 [details]
gdb backtrace when  xserver stop responding with EXA and latest drv_ati, etc

Please re-open bug. This is another "mouse movement still works, 100% CPU" instance with the latest everthing, under EXA. See gdb backtrace. The versions are below:

xorg-x11-drv-ati-6.9.0-55.fc10 
kernel-2.6.27.5-120.fc10
xorg-x11-server-Xorg-1.5.3-5.fc10
Comment 78 Hin-Tak Leung 2008-11-20 13:03:55 EST
Created attachment 324216 [details]
gdb backtrace, 2nd stuck with EXA within a few hours.

This is the backtrace of another stuck within a few hours of the earlier, both under EXA.

Please re-open bug. 

I am sorry, EXA just seems rather worse than XAA, so I am switching back.
Comment 79 Hin-Tak Leung 2008-11-30 17:11:21 EST
Recently the nomodeset situation seems to have changed, so I tried various
combinations. There is no winners, and each of them all have their own
problems:

radeon + EXA: tearing while scrolling, and most recently, font/screen
corruption when firefox is used. (bug 473815)

radeon + XAA: need to boot with nomodeset, or it goes into a black screen
instead of GDM. (bug 464896 ?)

radeonhd : no xvideo (bug 473819)

For general use, radeon + XAA is the best at the moment. 
(RS690M [Radeon X1200 Series])

Note You need to log in before you can comment on or make changes to this bug.