Bug 462157
Description
Peng Huang
2008-09-13 02:27:49 UTC
Created attachment 316642 [details]
I recoded the screen with camera
Created attachment 316643 [details]
my xorg.conf
Created attachment 316644 [details]
Xorg.0.log with fc9's kernel
I was about to post a new bug report when I found this, hopefully my problem qualifies to be a duplicate of this. It would appear that the with the latest batch of updates, this problem is not as severe as with the previous Xorg update, however a combination of Xorg + x11-drv-ati-6.8.0-19 may be the culprits. Here's what I originally intended to report: Summary: "Computer unresponsive under graphics stress" Viewing or working with large images (raster/vector) causes the system to a severe slow down to the point of X freezing (gkrellm no longer updates the krells, mouse cursor moves around VERY slowly and "jumpy", I don't seem able to input any key combination to snap out of X). Through an SSH session I am able to run top (albeit the system also feels very sluggish) and all I'm able to see is that Xorg, hald-addon-storage and top are pretty much consuming all the available CPU. Killing the offending processes I'm unable to restart an X session, and a hard reboot is required (pressing the power button for more than 5 secs on my laptop) Hardware profile: http://www.smolts.org/client/show/?uuid=pub_f3889c13-ed5b-44b9-a6ab-72db21a4aaa5 Before the latest update, I was able to reliably reproduce this problem with this image: http://upload.wikimedia.org/wikipedia/commons/e/e2/Full_moon.png **One interesting thing while I was testing this URL, the machine "stuttered" for a bit while loading this image, Xorg CPU usage spiked to 100% on one core, and PulseAudio skipped sound (had Rythmbox playing some tunes), when I opened a terminal to check the smolt profile, the system froze when I moved the terminal window (with fake transparent background) The offending processes are: Xorg - 100% - 120% CPU usage during problem. hald-addon-storage ~50% CPU usage during problem (interestingly enough it seems to have problems to poll the DVD drive) Top - it eats about 45% trying to poll the resources. At this point I'm no longer sure what the problem is or where it lies. I had rebooted to the previous kernel, same problem, had tried to use GIT ati drivers, which I had working just fine with the previous kernel, same problem, reverted back to Fedora's provided driver, same problem. Reverted back to Xorg 1.4.99, and while it seemed to be more stable, in the end I ran into the same problem. Before updates stared rolling out again, I had the following configuration working _just_fine_: Composited Desktop through Metacity. ATi driver from GIT Had to add AccelMethod EXA to Xorg in order for composite to be fast enough without tearing. * Monitoring the system from the SSH terminal for any changes in /var/log/messages and triggering the problem does not show anything of special interest. * I've got a few APIC error messages on dmesg * All the evidence thus far point to an X or driver problem. * The Xorg.0.log.old is flooded with: [mi] mieqEneque: out-of-order valuator event: dropping. [mi] EQ overflowing. The server is probaly stuck in an infinite loop. I don't think I can debug this further (I lack the skill, and probably the means to) Created attachment 317704 [details]
Xorg log
Created attachment 317705 [details]
dmesg output
Created attachment 317706 [details]
The xorg.conf I'm using
Yes, the first log is also fill of [mi] EQ overflowing. The server is probably stuck in an infinite loop. [mi] mieqEnequeue: out-of-order valuator event; dropping. Little update: I installed and tried out a build from GIT of the xorg-x11-drv-ati driver (checked out Sept. 30th) and X still hangs. I have not been able to capture in the Xorg.0.log.old or any other X log for that matter (after cleaning [deleting] any X log, by the way) and even when triggering the problem (trying to view at full size and panning a 1600x1200 image in EoG). Also updating to the latest kernel (2.6.26.5-45.fc9) does not help either (in case this might have something to do with the DRM module, radeon or some such). Also, I am using a framebuffer VT (vga=803) if that might have something to do with it. The APIC errors go away when I add noapic command line option. Does putting nomodeset on the kernel command line help? If yes, this is duplicate of bug 464896. (In reply to comment #10) > Does putting nomodeset on the kernel command line help? If yes, this is > duplicate of bug 464896. X does not crash with nomodeset. But X still display some graphics abnormally. It is same with kernel from FC9. Please look the video https://bugzilla.redhat.com/attachment.cgi?id=316642 . It was captured from FC9 kernel. (In reply to comment #10) > Does putting nomodeset on the kernel command line help? If yes, this is > duplicate of bug 464896. Will try this. I'll report back when I have run some tests. Well, I have tried and can consistently cause the issue even when running with nomodeset. Reliably reproducing the problem with a large image viewed in EoG. Ok, been trying to give it a shot and try to debug this, but I don't seem able to install xorg-x11-server-debuginfo, there seems to be some missing deps which yum does not seem able to solve.: su -c 'yum --enablerepo=fedora-debuginfo -y install xorg-x11-server-debuginfo' Contraseña: Loaded plugins: fastestmirror, refresh-packagekit Loading mirror speeds from cached hostfile * livna: wftp.tu-chemnitz.de * fedora: mirror.newnanutilities.org * updates-newkey: mirror.cogentco.com * fedora-debuginfo: mirror.cogentco.com * updates: mirror.cogentco.com Setting up Install Process Parsing package install arguments Resolving Dependencies --> Running transaction check ---> Package xorg-x11-server-debuginfo.x86_64 0:1.4.99.901-29.20080415.fc9 set to be updated --> Processing Dependency: libGLcore.so()(64bit) for package: xorg-x11-server-debuginfo --> Processing Dependency: libxtrap.so()(64bit) for package: xorg-x11-server-debuginfo --> Processing Dependency: libdri2.so()(64bit) for package: xorg-x11-server-debuginfo --> Finished Dependency Resolution xorg-x11-server-debuginfo-1.4.99.901-29.20080415.fc9.x86_64 from fedora-debuginfo has depsolving problems --> Missing Dependency: libdri2.so()(64bit) is needed by package xorg-x11-server-debuginfo-1.4.99.901-29.20080415.fc9.x86_64 (fedora-debuginfo) xorg-x11-server-debuginfo-1.4.99.901-29.20080415.fc9.x86_64 from fedora-debuginfo has depsolving problems --> Missing Dependency: libGLcore.so()(64bit) is needed by package xorg-x11-server-debuginfo-1.4.99.901-29.20080415.fc9.x86_64 (fedora-debuginfo) xorg-x11-server-debuginfo-1.4.99.901-29.20080415.fc9.x86_64 from fedora-debuginfo has depsolving problems --> Missing Dependency: libxtrap.so()(64bit) is needed by package xorg-x11-server-debuginfo-1.4.99.901-29.20080415.fc9.x86_64 (fedora-debuginfo) Error: Missing Dependency: libxtrap.so()(64bit) is needed by package xorg-x11-server-debuginfo-1.4.99.901-29.20080415.fc9.x86_64 (fedora-debuginfo) Error: Missing Dependency: libGLcore.so()(64bit) is needed by package xorg-x11-server-debuginfo-1.4.99.901-29.20080415.fc9.x86_64 (fedora-debuginfo) Error: Missing Dependency: libdri2.so()(64bit) is needed by package xorg-x11-server-debuginfo-1.4.99.901-29.20080415.fc9.x86_64 (fedora-debuginfo) I'm trying to follow this Debug how to for the Xserver: http://www.x.org/wiki/Development/Documentation/ServerDebugging Just noticed that Yum is trying to pull the debuginfo package for Xorg 1.4.99 and not 1.5.2, does this have anything to do with the newkeys repos? If so how do I install the correct debuginfo? I have the same problem running rawhide http://www.smolts.org/client/show/pub_6824ca5e-57e4-4026-b764-c5dc475eb220 The only way to get a running X is to pass nomodeset as a kernel parameter and even in that case I get artifacts in the 40% rightmost part of the screen. Well, the same type of problem I am running into with the xorg-x11-drv-ati driver is also present with the xorg-x11-drv-radeonhd from Koji, however the log says a couple of interesting things, like it doesn't recognize the card I am using it with (my laptop's) and that AtomBIOS actually reports the right type of card, the chipset is obviously supported... But a couple things caught my eye: (II) RADEONHD(0): Unknown card detected: 0x791F:0x1179:0xFF1A. If - and only if - your card does not work or does not work optimally please contact radeonhd to help rectify this. Use the subject: 0x791F:0x1179:0xFF1A: <name of board> and *please* describe the problems you are seeing in your message. (II) RADEONHD(0): ATOM BIOS Rom: SubsystemVendorID: 0x1179 SubsystemID: 0xff1a IOBaseAddress: 0x9000 Filename: br26107b.bin BIOS Bootup Message: ATI Radeon Xpress ?1250? for MW10A At any rate, I'm starting to suspect EXA to be the culprit. Will have to perform a couple of tests (with driver ati [radeon] and radeonhd, see how it goes). One thing was kind of different, though: With radeonhd the particular problem I am having takes a little bit longer than with the radeon driver, but still occurs (instead of immediately after starting panning an image in EoG, it takes several "passes" for it to happen [anywhere from two to four]), I'll attach the Xorg.0.log of this session anyway, for completeness sake. Created attachment 319620 [details]
RadeonHD xorg.conf
Well, I tried to disable Option "Accel Method" "EXA" from my xorg.conf and guess what? X stopped hanging or entering any infinite loops. So this indeed relates to EXA, I'll try to report this upstream on their bug tracker, hopefully it'll get some attention from upstream devs. I also experience the occasional X hangs with the xorg-x11-drv-ati radeon driver, for 01:05.0 VGA compatible controller: ATI Technologies Inc RS690M [Radeon X1200 Series] ssh from another box still works - so the symptom is same as comment 4. I tried debuginfo-install, etc and got all the debuginfo packages, but running gdb /usr/bin/Xorg <pid> (as root) gives a strange message abou ptrace not permitted. If somebody can explain what that gdb message means I can give gdb a try next time it happens... During the weekend I was able to test a number of distributions and found that the particular problem I was experiencing in Fedora 9 is no longer present in distributions with 1.5.2 XServer (Ubuntu 8.10 and F10 (In reply to comment #20) > gdb /usr/bin/Xorg <pid> (as root) > > gives a strange message abou ptrace not permitted. That's SELinux -- run (as root) setenforce 0 before running gdb. (In reply to comment #22) Argh, thanks for the SElinux tips. Next time when the x server gets stuck, I will know exactly what to do :-). Not sure if this is my same problem but a lot of the symptoms seem to be the same. However, I'm running into this problem with both xorg-x11-drv-ati and the vesa driver for Xorg. This laptop is running an ATI Radeon HD3470. When X starts up, the screen fades in from black, but once that's done, the login box is blank and keeps flashing. I'll get my xorg.conf and Xorg.0.log up as soon as I can. Running the Fedora 10 beta x86_64 with a couple updates, including Xorg 1.5.2-10.fc10 and kernel 2.6.27.4-58.fc10. Created attachment 321862 [details]
VESA xorg.conf
The VESA configured xorg.conf. This is also the file I used to roll back to if graphics problems happened and I couldn't debug immediately.
Created attachment 321863 [details]
New xorg_x11_drv_ati xorg.conf
Customized by looking over this bug before. Exhibits the same symptoms, with the exclusion that it runs at the native res of 1440x900; hardly any use in this situation.
Created attachment 321864 [details]
Xorg's log
My Xorg.0.log. Probably contains information from both the VESA and xorg_x11_drv_ati configurations, as well as from running with no xorg.conf.
Created attachment 321993 [details]
kernel oops message, Xorg.conf, Xorg.0.log tgz'ed
I have managed to debuginfo-install all the required stuff, and do setenforce 0,
and ran gdb on the running process. However, it takes forever at the stage of "Attaching ..." and Xorg still uses 100% of CPU, and gdb never managed to attach completely.
So I kill -9 the xserver, the type "reboot" at the ssh session as root; at which point the kernel oops'ed. (I think it also oops'ed in the past when I tried to reboot from ssh).
So this is the oops message plus a bit and after extracted from my /var/log/message (showing my debuginfo-install as well - so that shows what versions of what is), my xorg.conf - I added the Accel EXA line recently, but the hang is regardless of this line - and also Xorg.0.log.old (the before reboot log, of course).
Oh, I am running a koji kernel, 2.6.26.7-86.fc9.x86_64 . The hardware is (from lspci):
VGA compatible controller: ATI Technologies Inc RS690M [Radeon X1200 Series]
I think the oops message says that it is stuck in the kernel doing drm stuff - any opnion? I have been eyeing drv-ati-6.9.0 from fc10 for some time now; and it seems to have some relevant stuff. I am a bit reluctant to upgrade *both* the kernel and the bulk of the X for a problem which may or may not be fixed, though...
Well, unfortunately, I have to ask you to upgrade -- there were so many changes in both DRM and userspace drivers (not mentioning numerous changes in Xserver itself), that testing of your old versions don't make much sense. fair enough... I see quite a few radeon-related changes in rawhide, and some looks relevant. I just like to keep the number of rawhide packages I use to a minimum and only because I need it or am testing something on it - I am using a koji kernel because I am involved with one of the wireless drivers, and having another bleeding-edge piece is going to hurt a little. I see a 2.6.27.x kernel has just hit f9-candidate, so what are the minimum I need to be up to date without going wholesale rawhide? as far as I can see I need t least 3 pieces, a 2.6.27.x kernel, libdrm* and drv-ati ; anything else? So I grafted a whole bunch of rawhide rpms onto my fedora 9 system, and I don't know if I'll get a X server hang yet (it only happens about once every couple of days), but I am have boot-up problem - without nomodeset, it hangs at the end of the progress bar and never get to the GDM log in screen. Also, I am observing tears/flickers when the screen scrolls - before and after the upgrade, but seems more frequent after. So the upgrade has not been good so far. ----------------- xorg-x11-server-Xorg-1.5.2-10.fc10 Fri 31 Oct 2008 18:00:13 GMT xorg-x11-server-common-1.5.2-10.fc10 Fri 31 Oct 2008 18:00:11 GMT mesa-libGL-devel-7.2-0.13.fc10 Fri 31 Oct 2008 17:37:41 GMT xorg-x11-drv-ati-6.9.0-38.fc10 Fri 31 Oct 2008 17:37:36 GMT mesa-libGL-7.2-0.13.fc10 Fri 31 Oct 2008 17:37:36 GMT kernel-2.6.27.4-69.fc10 Fri 31 Oct 2008 17:37:01 GMT libdrm-2.4.0-0.21.fc10 Fri 31 Oct 2008 17:36:49 GMT kernel-devel-2.6.27.4-69.fc10 Fri 31 Oct 2008 17:34:48 GMT kernel-doc-2.6.27.4-69.fc10 Fri 31 Oct 2008 17:34:02 GMT libdrm-devel-2.4.0-0.21.fc10 Fri 31 Oct 2008 17:33:22 GMT kernel-headers-2.6.27.4-69.fc10 Fri 31 Oct 2008 17:33:16 GMT mkinitrd-6.0.68-1.fc10 Fri 31 Oct 2008 17:33:13 GMT plymouth-0.6.0-0.2008.10.27.7.fc10 Fri 31 Oct 2008 17:33:11 GMT plymouth-plugin-solar-0.6.0-0.2008.10.27.7.fc10 Fri 31 Oct 2008 17:33:09 GMT fedora-logos-10.0.0-2.fc10 Fri 31 Oct 2008 17:32:45 GMT plymouth-scripts-0.6.0-0.2008.10.27.7.fc10 Fri 31 Oct 2008 17:32:44 GMT plymouth-plugin-label-0.6.0-0.2008.10.27.7.fc10 Fri 31 Oct 2008 17:32:43 GMT nash-6.0.68-1.fc10 Fri 31 Oct 2008 17:32:41 GMT plymouth-libs-0.6.0-0.2008.10.27.7.fc10 Fri 31 Oct 2008 17:32:37 GMT mesa-libGL-7.2-0.13.fc10 Fri 31 Oct 2008 17:32:36 GMT mesa-dri-drivers-7.2-0.13.fc10 Fri 31 Oct 2008 17:32:35 GMT initscripts-8.84-1 Fri 31 Oct 2008 17:32:29 GMT libdrm-2.4.0-0.21.fc10 Fri 31 Oct 2008 17:32:04 GMT kernel-firmware-2.6.27.4-69.fc10 Fri 31 Oct 2008 17:24:47 GMT ---------------- Okay, further to my previous comment, I have a hang, so the upgrade did not help. Also, unfortunately with the new kernel, I did not have the wireless-related bleeding-edge update I mentioned, so my ssh session died during debuginfo-install. All in all, the effort went into upgrade is fruitless, and seems to bring more problem (the more frequent flicker/tear and requires nomodeset) than it solves... Created attachment 322174 [details]
gdb backtrace while it hungs
okay, I have a gdb backtrace with most of the debug info -
this time it seems to be stuck from ioctl () calling from drmDMA()
calling from RADEONCPGetBuffer(). I was launching firefox recovering a crashed session - i.e. launching a lot of windows at the same time. This may or may not be related - afterall, firefox is one of the most used application.
I have drv-ati, libdrm and kernel from koji, and drv-ati-debuginfo from koji also, and the rest from rawhide, but debuginfo seems to pick up libdrm-debug info from rawhide.
---
xorg-x11-drv-ati-debuginfo-6.9.0-38.fc10 Fri 31 Oct 2008 19:39:21 GMT
openssl-debuginfo-0.9.8g-9.fc9 Fri 31 Oct 2008 19:38:27 GMT
libpciaccess-debuginfo-0.10.3-2.fc9 Fri 31 Oct 2008 19:38:23 GMT
libdrm-debuginfo-2.4.0-0.21.fc10 Fri 31 Oct 2008 19:38:22 GMT
pixman-debuginfo-0.10.0-1.fc9 Fri 31 Oct 2008 19:38:21 GMT
dbus-debuginfo-1.2.4-1.fc9 Fri 31 Oct 2008 19:38:18 GMT
libselinux-debuginfo-2.0.67-4.fc9 Fri 31 Oct 2008 19:38:15 GMT
audit-debuginfo-1.7.5-1.fc9 Fri 31 Oct 2008 19:38:11 GMT
libXau-debuginfo-1.0.3-5.fc9 Fri 31 Oct 2008 19:38:07 GMT
hal-debuginfo-0.5.11-2.fc9 Fri 31 Oct 2008 19:38:04 GMT
glibc-debuginfo-2.8-8 Fri 31 Oct 2008 19:37:28 GMT
xorg-x11-server-debuginfo-1.5.2-10.fc10 Fri 31 Oct 2008 19:36:48 GMT
libfontenc-debuginfo-1.0.4-5.fc9 Fri 31 Oct 2008 19:36:35 GMT
libXfont-debuginfo-1.3.2-1.fc9 Fri 31 Oct 2008 19:36:31 GMT
libXdmcp-debuginfo-1.0.2-5.fc9 Fri 31 Oct 2008 19:36:27 GMT
---------
This should be useful to somebody...
Created attachment 322189 [details]
another gdb, x server stuck in a differen place.
gdb backtrace with the latest koji (xorg-x11-drv-ati-6.9.0-41 , newer than my
last gdb collected with -38, please note), where the x server shoots up to 100 to 150% CPU (dual core). mouse *movement* still works, but neither clicking nor keyboard works, and also some apps no longer refresh. The way I interprete it as a hang is when the gnome System Monitor applet no longer moves.
I was just opening a bookmark in firefox at the time. (i.e. one broswer window, or maybe another behind).
still booting with nomodeset (without it, my machine won't ever go into GDM).
Created attachment 322195 [details]
gdb backtrace 3
3rd backtrace, yet another one, also while firefox restore sessions.
Created attachment 322196 [details]
gdb backtrace #4
yet another, also while firefox restores session.
Argh, I feel really stupid not reading comment 19 carefully. All 4 of my gdb backtraces involves exa* routines. I think I was confused by comment 4 suggesting EXA is the way to go, and also that in Xorg.0.log, when XAA (default) is used, there is a warning and suggestion to use EXA instead - therefore I have been using EXA quite soon (maybe the 2nd reboot, etc after I saw the warning in Xorg.0.log) after I switched from fglrx. In any case, I have switched back to XAA, and voila, I can restore my firefox sessions, and the flicker and tearing is gone. So - can somebody either fix EXA (based on the gdb backtrace) or remove the XAA-related warning in the Xorg.0.log? It is embarrassing that "(II) RADEON(0): XAA Render acceleration unsupported on Radeon 9500/9700 and newer. Please use EXA instead." unsupported acceleration works better than supported one... Anyway, I'll wait and see if I ever get a hang with XAA. It is a bit stupid that I had EXA enabled very soon after I switched from fglrx, based on that message in Xorg.0.log. I can reproduce this with no xorg.conf on Fedora 10. This is with nomodeset on the kernel command line (I can't boot at all without that). Reproducing takes about five seconds of anything in KDE (even opening Konsole). Switching to XAA seems to fix it. killall -9 X doesn't kill X when this happens. I got dmesg after a hang with drm debug=1 and radeon dynclks=0: [drm:radeon_do_wait_for_fifo] wait for fifo failed status : 0x9003C100 0x00080100 [drm:drm_ioctl] ret = fffffff0 [drm:drm_ioctl] pid=2749, cmd=0x6444, nr=0x44, dev 0xe200, auth=1 [drm:radeon_cp_idle] [drm:radeon_do_cp_idle] [drm:radeon_do_wait_for_fifo] wait for fifo failed status : 0x9003C100 0x00080100 [drm:drm_ioctl] ret = fffffff0 [drm:drm_ioctl] pid=2749, cmd=0x6444, nr=0x44, dev 0xe200, auth=1 [drm:radeon_cp_idle] [drm:radeon_do_cp_idle] [drm:radeon_do_wait_for_fifo] wait for fifo failed status : 0x9003C100 0x00080100 [drm:drm_ioctl] ret = fffffff0 [drm:drm_ioctl] pid=2749, cmd=0x6444, nr=0x44, dev 0xe200, auth=1 [drm:radeon_cp_idle] [drm:radeon_do_cp_idle] [drm:radeon_do_wait_for_fifo] wait for fifo failed status : 0x9003C100 0x00080100 [drm:drm_ioctl] ret = fffffff0 [drm:drm_ioctl] pid=2749, cmd=0x6444, nr=0x44, dev 0xe200, auth=1 [drm:radeon_cp_idle] [drm:radeon_do_cp_idle] [drm:radeon_do_wait_for_fifo] wait for fifo failed status : 0x9003C100 0x00080100 [drm:drm_ioctl] ret = fffffff0 [drm:drm_ioctl] pid=2749, cmd=0x6444, nr=0x44, dev 0xe200, auth=1 [drm:radeon_cp_idle] [drm:radeon_do_cp_idle] [drm:radeon_do_wait_for_fifo] wait for fifo failed status : 0x9003C100 0x00080100 [drm:drm_ioctl] ret = fffffff0 [drm:drm_ioctl] pid=2749, cmd=0x6444, nr=0x44, dev 0xe200, auth=1 [drm:radeon_cp_idle] [drm:radeon_do_cp_idle] [drm:radeon_do_wait_for_fifo] wait for fifo failed status : 0x9003C100 0x00080100 [drm:drm_ioctl] ret = fffffff0 [drm:drm_ioctl] pid=2749, cmd=0x6444, nr=0x44, dev 0xe200, auth=1 [drm:radeon_cp_idle] [drm:radeon_do_cp_idle] [drm:radeon_do_wait_for_fifo] wait for fifo failed status : 0x9003C100 0x00080100 [drm:drm_ioctl] ret = fffffff0 [drm:drm_ioctl] pid=2749, cmd=0x6444, nr=0x44, dev 0xe200, auth=1 [drm:radeon_cp_idle] [drm:radeon_do_cp_idle] and lots more. My package versions are: xorg-x11-drv-ati-6.9.0-38.fc10.x86_64 kernel-2.6.27.4-68.fc10.x86_64 xorg-x11-server-Xorg-1.5.2-10.fc10.x86_64 lspci says: 01:05.0 VGA compatible controller: ATI Technologies Inc Radeon 2100 (prog-if 00 [VGA controller]) Subsystem: Giga-byte Technology Device d000 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 32, Cache Line Size: 4 bytes Interrupt: pin A routed to IRQ 18 Region 0: Memory at d8000000 (64-bit, prefetchable) [size=128M] Region 2: Memory at fdef0000 (64-bit, non-prefetchable) [size=64K] Region 4: I/O ports at ee00 [size=256] Region 5: Memory at fdd00000 (32-bit, non-prefetchable) [size=1M] Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [80] Message Signalled Interrupts: Mask- 64bit+ Count=1/1 Enable- Address: 0000000000000000 Data: 0000 Kernel modules: radeon 00: 02 10 6e 79 07 00 10 00 00 00 00 03 01 20 00 00 10: 0c 00 00 d8 00 00 00 00 04 00 ef fd 00 00 00 00 20: 01 ee 00 00 00 00 d0 fd 00 00 00 00 58 14 00 d0 30: 00 00 00 00 50 00 00 00 00 00 00 00 0a 01 00 00 40: 00 00 00 00 00 00 00 00 00 00 00 00 58 14 00 d0 50: 01 80 02 06 00 00 00 00 00 00 00 00 00 00 00 00 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80: 05 00 80 00 00 00 00 00 00 00 00 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Created attachment 322470 [details]
gdb backtrace of my first hang under XAA
since switching to XAA, this is my first hang, so I have to say XAA seems to work better...
I had a kernel upgrade since, and these are the relevant parts:
kernel-2.6.27.4-73.fc10 Mon 03 Nov 2008 21:30:27 GMT
kernel-headers-2.6.27.4-73.fc10 Mon 03 Nov 2008 21:30:09 GMT
kernel-devel-2.6.27.4-73.fc10 Mon 03 Nov 2008 21:25:27 GMT
kernel-doc-2.6.27.4-73.fc10 Mon 03 Nov 2008 21:22:41 GMT
kernel-firmware-2.6.27.4-73.fc10 Mon 03 Nov 2008 21:21:57 GMT
xorg-x11-drv-ati-debuginfo-6.9.0-41.fc10 Sat 01 Nov 2008 21:21:31 GMT
xorg-x11-drv-ati-6.9.0-41.fc10 Sat 01 Nov 2008 21:21:12 GMT
xorg-x11-server-Xorg-1.5.2-10.fc10 Fri 31 Oct 2008 18:00:13 GMT
xorg-x11-server-common-1.5.2-10.fc10 Fri 31 Oct 2008 18:00:11 GMT
libdrm-2.4.0-0.21.fc10 Fri 31 Oct 2008 17:36:49 GMT
kernel 2.6.27.4-73.fc10.x86_64
I'm having the same problem with Intel G45 X4500HD. Xorg.0.log shows: [mi] EQ overflowing. The server is probably stuck in an infinite loop. Backtrace: 0: /usr/bin/Xorg(xorg_backtrace+0x26) [0x4e7746] 1: /usr/bin/Xorg(mieqEnqueue+0x291) [0x4c82b1] 2: /usr/bin/Xorg(xf86PostMotionEventP+0xc4) [0x490be4] 3: /usr/bin/Xorg(xf86PostMotionEvent+0xa9) [0x490db9] 4: /usr/lib64/xorg/modules/input//evdev_drv.so [0x1445126] 5: /usr/bin/Xorg [0x47d495] 6: /usr/bin/Xorg [0x468f67] 7: /lib64/libc.so.6 [0x312b033100] 8: /lib64/libc.so.6(ioctl+0x7) [0x312b0de207] 9: /usr/lib64/libdrm.so.2 [0x3142e03023] 10: /usr/lib64/libdrm.so.2(drmWaitVBlank+0x20) [0x3142e036c0] 11: /usr/lib64/dri/i965_dri.so [0x61fd092] 12: /usr/lib64/dri/i965_dri.so(driWaitForVBlank+0xcb) [0x61fd293] 13: /usr/lib64/dri/i965_dri.so(intelSwapBuffers+0x23f) [0x6202d7a] 14: /usr/lib64/dri/i965_dri.so [0x61fd3d6] 15: /usr/lib64/xorg/modules/extensions//libglx.so [0xc0572f] 16: /usr/lib64/xorg/modules/extensions//libglx.so [0xbf9656] 17: /usr/lib64/xorg/modules/extensions//libglx.so [0xbfc8f2] 18: /usr/bin/Xorg(Dispatch+0x364) [0x446894] 19: /usr/bin/Xorg(main+0x45d) [0x42ccdd] 20: /lib64/libc.so.6(__libc_start_main+0xe6) [0x312b01e546] 21: /usr/bin/Xorg [0x42c0b9] [mi] mieqEnequeue: out-of-order valuator event; dropping. [mi] EQ overflowing. The server is probably stuck in an infinite loop. I'm getting the same thing now on F9 fully up-to-date with current packages. The X server hangs with 100% CPU and I see those [mi] messages in the X log. gdb never seems to be able to attach to the process. But I can ssh in and reboot the computer. I have an on-board RS690. Created attachment 323173 [details] another gdb strace under XAA, with more recent drv-ati This is very similiar to attachment 322470 [details] , comment 39, except I am using a more recent kernel-2.6.27.5-92.fc10 and xorg-x11-drv-ati-6.9.0-44.fc10 from koji. Despite the log entries of both kernel-2.6.27.5-92.fc10 and xorg-x11-drv-ati-6.9.0-44.fc10 saying some work has gone into redeon-modesetting, it is still not working reliably... while the frequency of hang since I switch to XAA has drammatically drop (only once a few days now, unlike under EXA which can be as frequent as a few times per day), it is nonetheless frustrating to have a hang and having to find a different machine to ssh in to run gdb and reboot it... This is definitely a problem within xorg-XServer 1.5.0.2 rather than the driver. I've switched to rawhide and with XServer 1.5.2 and Xorg-drv-ati 6.9.0 things are working now much, MUCH better, EXA no longer freezes the computer. To rule-out any Fedora specific changes, I did build from source Xorg-drv-ati straight from the release tarball, just as I did previously in Fedora 9, but with XServer 1.5.2 instead. It was already suggested to me in the Xorg bugzilla that this problem might be a regression introduced in some patch to Xorg from Fedora, which lead me to try with a more recent XServer, I don't know if XServer 1.5.2 from F10 will ever cascade down to F9, or if the offending bits will be removed from it (problem being identifying such bits). I am already using xserver 1.5.2 from rawhide/f10. (see my previous post about specific partial upgrades). # rpm -qa | grep -i x11-server xorg-x11-server-debuginfo-1.5.2-10.fc10.x86_64 xorg-x11-server-devel-1.5.0-2.fc9.x86_64 xorg-x11-server-Xorg-1.5.2-10.fc10.x86_64 xorg-x11-server-utils-7.4-1.fc9.x86_64 xorg-x11-server-common-1.5.2-10.fc10.x86_64 In any case, this issue and the modeset boot-hang problem (which is not related but also to do with changes introduced to interaction with display hardware) needs to be addresses before f10, I think. Created attachment 323420 [details]
xorg log, showing a backtrace under kernel 2.6.27.5-101.fc10.x86_64
A new regression with kernel 2.6.27.5-101.fc10.x86_64 from koji. X server backtraces while trying to start.
With nomodeset (won't boot without), boot to the point where GDM tries to start but one only gets the blue star background plus the battery icon - no other GDM session stuff, and no log-in box. Mouse pointer still moves very slowly. Since there is no log-in box, and no GDM reboot buttons, etc, and networking isn't functional yet (no network manager), one has to power-cycle the box. Tried it twice.
booting and older kernel 2.6.27.5-94.fc10.x86_64 - get around this. So it looks like one of these three changes is sh*t:
* Wed Nov 12 2008 Dave Airlie <airlied> 2.6.27.5-101
- drm/intel: further interrupt fixes
* Tue Nov 11 2008 Chuck Ebbert <cebbert> 2.6.27.5-100
- Check for additional ATI chipset timer bugs (#470939, #470723)
* Tue Nov 11 2008 Dave Airlie <airlied> 2.6.27.5-99
- drm rebase patches against latest upstream tree.
I had similar problems with -101; getting black screens after plymouth as gdm was about to start. I'd have to reboot multiple times before it worked. Dropped back to -100, and it's working again. That makes the only change in -101 the culprit: * Wed Nov 12 2008 Dave Airlie <airlied> 2.6.27.5-101 - drm/intel: further interrupt fixes I'm on a Supermicro C2SEA motherboard, G45 X4500HD. I am on a toshiba laptop with ATI Technologies Inc RS690M [Radeon X1200 Series]. To be honest I am a bit disappointed with the lack of progress with the nomodeset issue and this hang issue - consider how important either is, and consider how close fedora 10's supposed release date (10 days?) now - with the nomodeset flag, all the boot-up eye-candy is gone and worse than pre-plymonth-time and back to win 3.1 level, and this hang (particularly the latest which affects GDM) make long-session X usage rather hazzardous. I am hoping to try out koji as often as possible, and I hope this get fixed before fc10 is released. I'm really starting to wonder where this problem lies, is it in Xorg, or the driver? I lean towards a problem in Xorg since this happens with either Xorg-ati-drv or Xorg-radeonhd-drv, and even though both share quite a bit of code (in fact DRI support for R500 in the radeonhd driver is imported straight from the ati driver). Is there a way to check this against an unpatched version of the XServer? Maybe the focus should be to try to determine where the regression occurred. this bug is a bit all over the place, 48 comments and nothing on? Please try -104 when it finished to fix any regression since -94. However can the ati person open a new bug so this remains the intel persons bug. X hanging isn't always the same reason, and certainly not when using different hw. (In reply to comment #49) > this bug is a bit all over the place, 48 comments and nothing on? > > Please try -104 when it finished to fix any regression since -94. > > However can the ati person open a new bug so this remains the intel persons > bug. X hanging isn't always the same reason, and certainly not when using > different hw. I'll give -104 a try when it finishes building. Is there a mistake somewhere - the original poster listed drv-ati in his report, and explicitly using "radeon" in his xorg.conf in comment 3... and even the compoent part of the bug report itself say "ati-drv" - I thought this is the drv-ati/radeon bug entry, - i.e. intel or even radeonhd users should go elsewhere? Perhaps I was being a bit harsh in comment 47. This seems to be a driver <-> kernel-drm interaction problem, which is difficult to debug. About 1/4 of the 48 comments are mine, but half of mine are detailed gdb backtraces including debugging line-numbers with various koji builds, and should be useful... (In reply to comment #49) > this bug is a bit all over the place, 48 comments and nothing on? > > Please try -104 when it finished to fix any regression since -94. > > However can the ati person open a new bug so this remains the intel persons > bug. X hanging isn't always the same reason, and certainly not when using > different hw. I thought the Bug reporter was using ATi hardware? Certainly the packages stated at the beginning of the bug report clearly state ATi hardware (Xorg modules r128 [for Rage 128, I assume] and ati 6.9.0 are listed), so when did this become an Intel-related bug? ah the bug just got messy in the middle then. okay so its an ati problem on an r500. So G45 people go to another bug. ATI people stay. However the bug is now impossible to follow, so I've no idea what people are important to considered it fixed or how I can close it ever. In any case -104 should fix one bunch of dumb regressions, and I'm hoping to push some more fixes in for -ati before GA. Ok, so let me get this straight, kernel -104 contains a number of regression fixes that may be relevant to the people experiencing this problem (ATi hardware)? Will try with it and do some more testing then report back either success or failure. In my experience, however, since I "updated to rawhide" a few weeks ago, I have not seen this particular issue as frequently as it used to be (if anything has happened to me on counted occasions), but it does still happen (especially with composite enabled, EXA and has been reproducible with Cairo-dock). I think only comment 40 and comment 46 (same person) has intel hardware, but comment 46 is relevant... The way I see the *summary* of this bug is relatively simple - most people using rawhide have the occasional symptom of mouse movement still works but nothing GUI else, but ssh from outside still works; same with f9 (I was one of them but I have upgraded). At least two of us found that switching over from EXA to XAA helped substantially; I have provided a few gdb backtraces with line numbers against koji builds, under both EXA and XAA. The backtrace shows that the stuck happens when trying to copy data buffer between userland and kernel through drm's ioctl. Under EXA, I can quite reliably trigger it by restoring a firefox session with a dozen widows totally 100+ tabs. If I were more familiar with how the X server works (which unfortunately I am not, but I am reasonably competent with "general debugging") and its code base, this is how I would try to fix this: - the X server should give up and go back to unaccelerated mode after a certain number of unsuccessful tries of copying data through the kernel, rather than keep hammering the kernel repeatedly and quickly; - drm retries should have a sleep/pause occasionally to let other human-interface events such as switching VT by key strokes through; so that it is possible to reset the X server with ctrl-alt-backspace or ctrl-alt-f<n> reboot when everything fails... - the kernel drm should process ioctl correctly, check parameters, and quickly; if there is anything wrong with the incoming ioctl, emit some debug info through dmesg for debugging. I know there is "modprobe drm debug=1", but that's not useful because most people let the Xserver do the modprobe and never have debug on, until the problem happens... so *some* of what debug=1 does should be the default until this is fixed... Does this sound like a reasonable plan? The way you describe this, and the way this problem is presenting (seems to occur on more hardware than the one pertaining this bug) it would seem as if the XServer and DRM API were clashing at some point (that's the way I understand what you are saying). So if I understand correctly this problem *seems* to be the XServer pushing data through the DRM module, and when the DRM module cannot react to these XServer petitions, the XServer simply keeps on hammering the DRM module, rendering all I/O to a halt, is that some how it? If so, this seems to be clearly an upstream problem, the issue is "whose?". The only thing I can think of on Fedora that might be different from other systems that I've tested (with Xorg 7.4/ati drv 6.9.0) are patches to both the XServer and kernel (involving DRM)... Does this sound like a reasonable "conclusion"? (In reply to comment #55) I have done a bit of homework some weeks ago and AFAIK, unfortunately, there is no "upstream". Dave Airlie, the assignee of this bug, is also a substantial contributor to drv-ati and kernel drm . ( See: http://cgit.freedesktop.org/xorg/driver/xf86-video-ati/log/ http://cgit.freedesktop.org/~airlied/drm/log/ ) So nobody should be making any comments about patches being rawhide-specific, because it is simply that changes land in rawhide *first*, and he should be one of the most qualified people to deal with this. I didn't mean the comment to sound like I was placing the "blame" on anyone, rather that I was following what was suggested in the Xrog tracker (where I also posted this bug), and IIRC it was Alex Deucher who suggested that the root of the problem *could* be a regression in Fedora's XServer (maybe due to a patch). I know many of the Fedora developers are alos "upstream" developers, which means that a LOT of changes land first (than any other distribution) in Fedora's Rawhide. I apologize if my comment somehow sounded like I was "seeking a culprit", I was not, rather following leads. Created attachment 323671 [details]
gdb backtrace with kernel-2.6.27.5-104 and XAA ati-6.9.0-44
-104 indeed fixes the -101 regression. Thanks.
I have another hang with -104. I also noticed that where the X server got stuck is a little different (still drm/ioctl-related). So hope this new info is useful.
I ran gdb three times to see if
the x server is merely cycling through a few states very fast, and so this show three. (and seems to be identical). This is still with 6.9.0-44. I just upgraded to 6.9.0-46 before rebooting (I suppose I'll look for a koji upgrade whenever I have a hang, just hoping it will go away eventually).
Can you test the 113 kernel available at http://kojipkgs.fedoraproject.org/packages/kernel/2.6.27.5/113.fc10/? Changing bug summary to be a bit more specific, following airlied's lead from comment #52. (In reply to comment #58) > I have another hang with -104. I also noticed that where the X server got stuck > is a little different (still drm/ioctl-related). So hope this new info is > useful. You said "another hang with -104" - when does your system hang? How do you know the other regression is fixed if it's still hanging at startup? (In reply to comment #60) > You said "another hang with -104" - when does your system hang? How do you know > the other regression is fixed if it's still hanging at startup? I meant that the specific regression with -101 (GDM won't start) is gone, but -104 still have the the general problem with the older -94/-91 releases, namely once in a while, mouse movement still works but clicking has no effect, ssh inwards works and Xorg consumes 100% CPU and is found to be in drm/ioctl routines. (the specific code location seem to be slightly different). Sorry about the imprecise wording. Okay, so we've fixed the "X hang at startup" problem, which was the original bug report here. I'd suggest we close this bug now, but there's a lot of good debugging info, so I'm just changing the subject of the bug to reflect the current status. So now we're back on "EQ overflowing. The server is probably stuck in an infinite loop.", as seen in numerous other bugs - bug 444449 (i945 / Radeon M6 LY), bug 464866 (i945), etc. Specifically with r500 chips. Right? Does this happen randomly, or only when running compiz / switching terminals / resuming from suspend / etc? Even the initial report wasn't about "hang at start-up" - it is about mouse-pointer moving but no click actions "right after login-in". (which in real terms is quite a lot further after X server starts). My "EQ overflowing" tends to happen when I am doing something with firefox (maybe it is just because it is a frequently used application), e.g. scrolling or switching tabs, or when dragging gnome-terminal windows around. I have an RS690M (X1200) - I read somewhere that RS690M is technically an r300 rather than a r500/r600. (very confusing). I can think of one way where the initial poster's report fits into the later pattern: if the initial poster has his session preference configured to automatically restore favourite applications, and one of them is firefox, for example. I have an RS690 and the problem was occurring on facebook with selecting someone to send a message to, where it pops up a list. This is consistent with all other hangs I've had where it's been something overlaid that fades in (or possibly out). In the past it's been compiz, but in this case, there was no compiz involved. can you try the latest X server? 1.5.3-5 is in koji. I've fixed some dodgy EXA paths that affect radeon (In reply to comment #65) > can you try the latest X server? 1.5.3-5 is in koji. > > I've fixed some dodgy EXA paths that affect radeon Yes, EXA seems to work a bit better now. kernel-2.6.27.5-113.fc10 Mon 17 Nov 2008 21:40:42 GMT xorg-x11-drv-ati-6.9.0-48.fc10 Mon 17 Nov 2008 21:38:30 GMT xorg-x11-server-Xorg-1.5.3-5.fc10 Mon 17 Nov 2008 21:38:21 GMT xorg-x11-server-common-1.5.3-5.fc10 Mon 17 Nov 2008 21:38:19 GMT (In reply to comment #66) > Yes, EXA seems to work a bit better now. Ehm, sorry, "a bit better"? Is this bug fixed or not? (In reply to comment #67) > (In reply to comment #66) > > Yes, EXA seems to work a bit better now. > > Ehm, sorry, "a bit better"? Is this bug fixed or not? Under EXA, scrolling gnome-terminals shows tearing (which isn't visible with XAA). Probably unrelated. The thing is, nobody has a "reliable" way of getting "EQ overflowing". Unless somebody comes along with an identification of where/how that can happen in code paths, we'll just have to wait and see... BTW, I have just got onto drv-ati-6.9.0-51.fc10 and kernel-2.6.27.5-113.fc10 . (and will continue to look at koji). If I get a "EQ overflowing", you will hear from me... We believe that the problems reported by the original poster of this bug have been fixed with the latest X server. If you have other specific problems, please file those as new bug reports. X dose not hang still now. But the ati drivers still have two problems. 1. Like Hin-Tak Leung said: scrolling gnome-terminals shows tearing 2. When compiz is running, X server will be very slow, and the shadow around the window can not be displayed correctly. Version of kernel & ati driver: kernel-2.6.27.5-116.fc10.i686 xorg-x11-drv-ati-6.9.0-53.fc10.i386 Created attachment 323992 [details]
Screenshot with compiz
Created attachment 324005 [details]
another gdb backtrace - sorry, it hung...
Sorry guys, can somebody re-open the bug?
It hung with the latest(?), and here is the gdb back strace. Still stuck at drm/ioctl so I say it is the same bug. Please re-open.
xorg-x11-drv-ati-6.9.0-51.fc10 Tue 18 Nov 2008 12:22:18 GMT
kernel-2.6.27.5-116.fc10 Tue 18 Nov 2008 12:21:38 GMT
xorg-x11-server-Xorg-1.5.3-5.fc10 Mon 17 Nov 2008 21:38:21 GMT
This time I was re-positioning a gnome-terminal window when it happened. (firefox at the back but it basically is there most of the time anyway).
I see kernel-2.6.27.5-120.fc10, xorg-x11-drv-ati-6.9.0-54.fc10 are out in koji in the last few hours... so I'll upgrade, I guess.
Created attachment 324080 [details]
another gdb backtrace under kernel-2.6.27.5-120.fc10, xorg-x11-drv-ati-6.9.0-54.fc10
So I have upgraded to even more latest kernel drm and drv-ati since I posted attachement 324005; and have another one of those "mouse moves but nothing else"
moments with kernel-2.6.27.5-120.fc10, xorg-x11-drv-ati-6.9.0-54.fc10 .
This time it happened while I was dragging a view-page-source-window of firefox
around.
This an XAA backtrace.... so you are booting with nomodeset and XAA accel? so not the same problem at all. open a new bug clearly stating nomodeset + XAA (In reply to comment #74) > This an XAA backtrace.... so you are booting with nomodeset and XAA accel? > > so not the same problem at all. open a new bug clearly stating nomodeset + XAA Both are correct - I have tried removing nomodeset every time I reboot to a new koji kernel. So far, every one of them hangs at the end of the blue-start backgrounded progress bar just before the X server starts. Is removing nomodeset supposed to work now? It isn't. As for XAA... I am sorry, but the tearing while scrolling under EXA is very unsighty. I'll be happy to switch to EXA for general use (or general debugging, for that matter) if the tearing goes away... besides, the last time I checked, XAA is the default (i.e. if neither is specified in xorg.conf) despite it subsequently emits a warning in Xorg.0.log. So at the possible risk of either getting stuck, I would opt for XAA instead of EXA, just because it works better. (despite it saying "unsuported"). What I mean is, EXA is supported but no-good. :-(. This is a bit curious - according to xorg.conf in comment 2, the original poster did not specify an Accel method, so I would have thought XAA is used, but his log shows EXA. okay, so I have removed Accel from xorg.conf and rebooted. The default is EXA now, so I'll let it run that way until it get stucks(!). Still needs nomodeset to boot. The tearing on scroll is quite noticeable - e.g. with the default blue background, start one gnome-terminal, run dmesg to have some text, then scroll back with the scroll-bar. (same tearing happens for firefox as well). Also there are occasional flicker, from what looks like windows moving side-ways by a few pixels and back? - so EXA seems buggy. (neither the tearing nor flicker happen under XAA). I'll fill a separate bug with XAA then... I just found a "reliable" way of seeing the flicker - moving the mouse pointer quickly up-and-down across the bottom edge of the comment text-box! (saw it when I move the mouse just before pressing "commit"). Created attachment 324210 [details]
gdb backtrace when xserver stop responding with EXA and latest drv_ati, etc
Please re-open bug. This is another "mouse movement still works, 100% CPU" instance with the latest everthing, under EXA. See gdb backtrace. The versions are below:
xorg-x11-drv-ati-6.9.0-55.fc10
kernel-2.6.27.5-120.fc10
xorg-x11-server-Xorg-1.5.3-5.fc10
Created attachment 324216 [details]
gdb backtrace, 2nd stuck with EXA within a few hours.
This is the backtrace of another stuck within a few hours of the earlier, both under EXA.
Please re-open bug.
I am sorry, EXA just seems rather worse than XAA, so I am switching back.
Recently the nomodeset situation seems to have changed, so I tried various combinations. There is no winners, and each of them all have their own problems: radeon + EXA: tearing while scrolling, and most recently, font/screen corruption when firefox is used. (bug 473815) radeon + XAA: need to boot with nomodeset, or it goes into a black screen instead of GDM. (bug 464896 ?) radeonhd : no xvideo (bug 473819) For general use, radeon + XAA is the best at the moment. (RS690M [Radeon X1200 Series]) |