Bug 528593

Summary: Near total lockup shortly after logging in with KMS enabled (r600+ and ICH8+ combination)
Product: [Fedora] Fedora Reporter: Rodd Clarkson <rodd>
Component: xorg-x11-drv-atiAssignee: Dave Airlie <airlied>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: low    
Version: rawhideCC: amrlima, awilliam, bostjan.lah, ceski, chkr, dgoodwin, fedora, gene-redhat, jglisse, joshuacov, kdekorte, lars, leifer, maxim, mcepl, m.menheere, mursusoft, redhat, vedran, vpvainio, xgl-maint
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: card_r600
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-11-04 02:26:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 530341    
Attachments:
Description Flags
/var/log/dmesg
none
/var/log/Xorg.0.log
none
/var/log/Xorg.2.log
none
/var/log/Xorg.3.log
none
/var/log/Xorg.4.log
none
/var/log/Xorg.5.log
none
/var/log/Xorg.setup.log
none
xserver-crash-backtrace
none
backtrace bug at xf86drm.c:187
none
boot failure photo with radeon-20091028-x86_64.iso
none
My lspci -v.
none
dmidecode output
none
My lspci -v output
none
lspci none

Description Rodd Clarkson 2009-10-13 02:06:24 UTC
Description of problem:

When I log in and start to use X is locks up shortly after.  The keyboard doesn't respond, but the mouse is able to move around.  Pressing the power button (with the exception of holding down the button until the system stops) does nothing either.

If the desktop has managed to connect to the wireless router, then I can ssh and have managed to get some backtrace using gdb.  Also, I can run top after the freeze which shows that Xorg is consuming 100% of CPU.

Also running init 3 doesn't see X shutting and a VT appearing, but I can then press the power button to shut the system down.  Presumable, the switch to run level 3 has worked, but the screen doesn't reflect this.

If I run gdb I get the following:

#0  0x00000032c16d9717 in ioctl () at ../sysdeps/unix/syscall-template.S:82
#1  0x00000032dec03203 in drmIoctl (fd=9, request=3221775460, 
    arg=0x7fff192a1ab0) at xf86drm.c:188
#2  0x00000032dec0344c in drmCommandWriteRead (fd=<value optimized out>, 
    drmCommandIndex=<value optimized out>, data=<value optimized out>, 
    size=<value optimized out>) at xf86drm.c:2394
#3  0x00007f7cc9f81f59 in bo_wait (bo=0x1cdc780) at radeon_bo_gem.c:206
#4  0x00007f7cc9f82035 in bo_map (bo=0x1cdc780, write=<value optimized out>)
    at radeon_bo_gem.c:181
#5  0x00007f7cca24f36d in _radeon_bo_map (line=2320, 
    func=<value optimized out>, file=0x1 <Address 0x1 out of bounds>, write=0, 
    bo=<value optimized out>) at /usr/include/drm/radeon_bo.h:151
#6  R600DownloadFromScreenCS (line=2320, func=<value optimized out>, 
    file=0x1 <Address 0x1 out of bounds>, write=0, bo=<value optimized out>)
    at r600_exa.c:2320
#7  0x00007f7cc9545100 in exaGetImage (pDrawable=0x1b37dc0, x=1536, y=704, 
    w=256, h=64, format=<value optimized out>, 
    planeMask=<value optimized out>, d=<value optimized out>)
    at exa_accel.c:1283
#8  0x0000000000552a94 in miSpriteGetImage (pDrawable=0x1b37dc0, sx=1536, 
    sy=704, w=256, h=64, format=<value optimized out>, 
    planemask=<value optimized out>, pdstLine=<value optimized out>)
    at misprite.c:425
#9  0x000000000042dec0 in DoGetImage (planemask=<value optimized out>, 
    height=<value optimized out>, width=<value optimized out>, 
    y=<value optimized out>, x=<value optimized out>, 
    drawable=<value optimized out>, format=<value optimized out>, 
    client=0x1d2f5f0, im_return=<value optimized out>) at dispatch.c:2244
#10 ProcGetImage (planemask=<value optimized out>, 
    height=<value optimized out>, width=<value optimized out>, 
    y=<value optimized out>, x=<value optimized out>, 
    drawable=<value optimized out>, format=<value optimized out>, 
    client=0x1d2f5f0, im_return=<value optimized out>) at dispatch.c:2331
#11 0x000000000042c60c in Dispatch () at dispatch.c:445
#12 0x0000000000421c9a in main (argc=<value optimized out>, 
    argv=<value optimized out>, envp=<value optimized out>) at main.c:285

gdb doesn't report a segfault, but running bt gives the above input.

lspci (using f11) shows:

[rodd@moose ~]$ lspci
00:00.0 Host bridge: Intel Corporation Mobile 4 Series Chipset Memory Controller Hub (rev 07)
00:01.0 PCI bridge: Intel Corporation Mobile 4 Series Chipset PCI Express Graphics Port (rev 07)
00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 03)
00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 03)
00:1a.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 (rev 03)
00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 03)
00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller (rev 03)
00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 03)
00:1c.1 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 2 (rev 03)
00:1c.3 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 4 (rev 03)
00:1c.5 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 6 (rev 03)
00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 03)
00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 03)
00:1d.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 03)
00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 03)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev 93)
00:1f.0 ISA bridge: Intel Corporation ICH9M LPC Interface Controller (rev 03)
00:1f.2 SATA controller: Intel Corporation ICH9M/M-E SATA AHCI Controller (rev 03)
00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 03)
01:00.0 VGA compatible controller: ATI Technologies Inc Radeon Mobility HD 3670
01:00.1 Audio device: ATI Technologies Inc RV635 Audio device [Radeon HD 3600 Series]
04:00.0 Network controller: Intel Corporation PRO/Wireless 5300 AGN [Shiloh] Network Connection
08:00.0 Ethernet controller: Broadcom Corporation NetLink BCM5784M Gigabit Ethernet PCIe (rev 10)
09:01.0 FireWire (IEEE 1394): Ricoh Co Ltd R5C832 IEEE 1394 Controller (rev 05)
09:01.1 SD Host controller: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter (rev 22)
09:01.2 System peripheral: Ricoh Co Ltd R5C843 MMC Host Controller (rev 12)
09:01.3 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host Adapter (rev 12)
09:01.4 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev ff)




Version-Release number of selected component (if applicable):

Everything is up to date with f12 as of Oct 13, 2009.  I can get you more on this if you need, but since f12 is very flakey, this will have to do to start.


How reproducible:

Very, I start the system and log in.  If I run firefox and then gnome-terminal (both on my menus) then it happens, but it sometimes happens before I even do anything (while waiting for the network to connect) or if I don't run the above apps, then at some other time soon.  It will usually lock within 5 minutes of logging in, and more than likely within a minute.


What else would you like to know?

Comment 1 Matěj Cepl 2009-10-14 13:50:19 UTC
From looking at your backtrace it looks like the problem happened somewhere in kernel. In order to continue, we could use the following information.

Please attach your X server config file (/etc/X11/xorg.conf, if available), /var/log/dmesg, and X server log file (/var/log/Xorg.*.log) to the bug report as individual uncompressed file attachments using the bugzilla file attachment link below.

We will review this issue again once you've had a chance to attach this information.

Thanks in advance.

Comment 2 Rodd Clarkson 2009-10-14 22:42:00 UTC
Created attachment 364824 [details]
/var/log/dmesg

Comment 3 Rodd Clarkson 2009-10-14 22:42:41 UTC
Created attachment 364826 [details]
/var/log/Xorg.0.log

Comment 4 Rodd Clarkson 2009-10-14 22:43:54 UTC
Created attachment 364827 [details]
/var/log/Xorg.2.log

Comment 5 Rodd Clarkson 2009-10-14 22:44:35 UTC
Created attachment 364828 [details]
/var/log/Xorg.3.log

Comment 6 Rodd Clarkson 2009-10-14 22:44:58 UTC
Created attachment 364829 [details]
/var/log/Xorg.4.log

Comment 7 Rodd Clarkson 2009-10-14 22:45:17 UTC
Created attachment 364830 [details]
/var/log/Xorg.5.log

Comment 8 Rodd Clarkson 2009-10-14 22:45:42 UTC
Created attachment 364831 [details]
/var/log/Xorg.setup.log

Comment 9 Rodd Clarkson 2009-10-14 22:46:44 UTC
I don't have a /etc/X11/xorg.conf file.
Also, /var/log/Xorg.1.log is a zero size file, so I can't copy it up.

Comment 10 Joshua Covington 2009-10-15 08:19:20 UTC
I have a very similar issue (if not the same) here: https://bugzilla.redhat.com/show_bug.cgi?id=527452. Maybe mine should be marked as a duplicate of this one?

Comment 11 Adam Williamson 2009-10-15 15:36:19 UTC
joshua: I'm not expert enough at reading backtraces to say for sure. It does seem similar, though. matej, dave?

rodd: have you tested with KMS disabled, to see if that changes the behaviour? kernel parameter 'nomodeset'. thanks!

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 12 Joshua Covington 2009-10-15 18:29:22 UTC
I assume both are very similar and therefore tried the 'nomodeset'. Here is what happent.

The pc booted normally and I managed to check my email. After closing firefox, I decided to reopen it and "clean all history". The result was a crash of the Xserver. This is the gdb output:

Program received signal SIGABRT, Aborted.
0x0000003cc3c332f5 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x0000003cc3c332f5 in raise () from /lib64/libc.so.6
#1  0x0000003cc3c34b20 in abort () from /lib64/libc.so.6
#2  0x0000003cc3c2c2fa in __assert_fail () from /lib64/libc.so.6
#3  0x00007fd7f84d4a78 in radeon_validate_bo (radeon=0x32ff940, bo=0x3018660, read_domains=2, write_domain=0) at radeon_common.c:1008
#4  0x00007fd7f84d6cab in radeonRefillCurrentDmaRegion (rmesa=0x32ff940, size=65536) at radeon_dma.c:201
#5  0x00007fd7f84d6d2b in rcommonAllocDmaLowVerts (rmesa=0x32ff940, nverts=6, vsize=6) at radeon_dma.c:296
#6  0x00007fd7f84d0d6b in r300_quad (rmesa=0x103c, v0=0x396bec0, v1=0x396bed8, v2=0xffffffffffffffff, v3=<value optimized out>)
    at ../../../../../src/mesa/tnl_dd/t_dd_triemit.h:69
#7  0x00007fd7f84d124c in r300_render_quads_verts (ctx=<value optimized out>, start=<value optimized out>, count=<value optimized out>,
    flags=<value optimized out>) at ../../../../../src/mesa/tnl/t_vb_rendertmp.h:338
#8  0x00007fd7f81922e0 in run_render (ctx=0x336ff90, stage=<value optimized out>) at tnl/t_vb_render.c:320
#9  0x00007fd7f8189da4 in _tnl_run_pipeline (ctx=0x336ff90) at tnl/t_pipeline.c:158
#10 0x00007fd7f818a8b6 in _tnl_draw_prims (ctx=0x336ff90, arrays=<value optimized out>, prim=<value optimized out>, nr_prims=<value optimized out>,
    ib=<value optimized out>, min_index=1667198818, max_index=3) at tnl/t_draw.c:431
#11 0x00007fd7f81829ea in vbo_exec_vtx_flush (exec=0x338d520, unmap=<value optimized out>) at vbo/vbo_exec_draw.c:365
#12 0x00007fd7f817f579 in vbo_exec_FlushVertices_internal (ctx=<value optimized out>, unmap=60 '<') at vbo/vbo_exec_api.c:778
#13 0x00007fd7f817f5e0 in vbo_exec_FlushVertices (ctx=0x103c, flags=1) at vbo/vbo_exec_api.c:800
#14 0x00007fd7f80ed09b in _mesa_PopAttrib () at main/attrib.c:933
#15 0x00007fd804673283 in __glXDisp_Render (cl=<value optimized out>, pc=0x4d54110 "\4") at glxcmds.c:1823
#16 0x00007fd804677309 in __glXDispatch (client=0x290c6a0) at glxext.c:568
#17 0x00000000004471d4 in Dispatch () at dispatch.c:456
#18 0x000000000042d205 in main (argc=<value optimized out>, argv=0x7ffffe54dd48, envp=<value optimized out>) at main.c:397

After logging back in I decided to try again "clean all history". And I got exactly the same result: xserver crached with the above from gbd.

Comment 13 Adam Williamson 2009-10-15 18:45:29 UTC
interesting. btw - are you guys using compiz or the compositing in metacity?

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 14 Joshua Covington 2009-10-15 18:58:14 UTC
I have kde with enabled desktop effects. Everything is set to 'default'. I've not installed compiz separately. I think the theme is noduko or something like this.

Comment 15 Adam Williamson 2009-10-15 19:11:35 UTC
It would be interesting to know if this happens with desktop effects disabled.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 16 Joshua Covington 2009-10-15 19:52:53 UTC
With nomodeset and no desktop effects everything is wotking fine. I even couldn't reproduce this with more than 30 open tabs in firefox (everyone with a flash content). Maybe this is just luck?

Comment 17 Joshua Covington 2009-10-15 20:01:41 UTC
Created attachment 364984 [details]
xserver-crash-backtrace

I got this just after clicking on "Apply" after enabling the desktop effects. I think it's the same backtrase as before. The xserver crashed once again (after enabling the effects).

Comment 18 Adam Williamson 2009-10-15 20:04:13 UTC
it sounds rather like it's related to desktop effects (hence the 3D stuff in the driver) to me. can you reproduce *with* modesetting (so take out nomodeset) but *without* desktop effects?

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 19 Joshua Covington 2009-10-15 20:46:35 UTC
Created attachment 364989 [details]
backtrace bug at xf86drm.c:187

With modeset and no desktop effects. The server crashes again and it looks similar as before (https://bugzilla.redhat.com/show_bug.cgi?id=527452). This is the backtrace.

Comment 20 Rodd Clarkson 2009-10-16 00:33:07 UTC
Adam, how would I tell if I have compositing on in metacity.

To be honest, I never turns metacity 'on'.  I was using compiz (with my last laptop which had an nvidia chipset) and I just copied my home partition over to the new one and continued on.  Since the new laptop doesn't do 3D, it just reverted back to metacity.

Comment 21 Rodd Clarkson 2009-10-16 00:35:09 UTC
Also, while I welcome Joshua's comments and suspect that he and I have the 'same' issue, until this is confirmed by someone who knows (matej, dave) then it might be best if Joshua's comments remain part of his bug so that there isn't any confusion about outcomes if they aren't the same.

Comment 22 Rodd Clarkson 2009-10-16 06:11:46 UTC
finally got a chance to test nomodeset and what a difference it makes.

I'm able to do things.  The system hasn't crashed yet (and I've been logged in for about 10 minutes now, which is telling) and I'm unable to trigger it that way I usually do (which is nice for testing).

Obviously I shouldn't have to run nomodeset (whatever that does) each time so how do we isolate the problem and fix it.  What more information can I provide.


Rodd

Comment 23 Rodd Clarkson 2009-10-16 06:13:51 UTC
Oh, and I'm not sure how Joshua 'cleared' his history (and I'm not super keen to clean mine) but I did clear the last hour and I didn't get a crash.

Comment 24 Adam Williamson 2009-10-16 18:39:40 UTC
You can put 'nomodeset' into /boot/grub/menu.lst so you don't have to manually enter it at each boot, but it is indeed a workaround and not a fix. It does help us somewhat to pin down the issue, though.

I think there's probably enough info for Dave to take a shot at this, now...

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 25 Adam Williamson 2009-10-16 19:51:17 UTC
rodd: check for drop-shadows on windows. More scientifically, check the compositing_manager value in gconf apps/metacity/general .

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 26 Rodd Clarkson 2009-10-17 01:20:46 UTC
No drop shadows, and .gconf/apps/metacity/general/\%gconf.xml includes:

<entry name="compositing_manager" mtime="1190346649" type="bool" value="false"/>

Comment 27 Maxim Burgerhout 2009-10-19 21:23:47 UTC
Seeing the same behaviour on my Radeon HD3470. I have mesa-dri-drivers-experimental installed.

On one of the first occasions of the crash, I remember seeing this in Xorg.0.log, but I haven't saved the log...

[mi] EQ overflowing. The server is probably stuck in an infinite loop. 

Anyway, I can attach some more backtraces, if that is deemed useful.

Comment 28 Adam Williamson 2009-10-19 21:52:41 UTC
maxim: can you check if the same workaround (nomodeset and disable desktop effects) is good for you?

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 29 Maxim Burgerhout 2009-10-20 06:10:10 UTC
Adam: Yes it does. Should have put that in my comment, sorry. Disappointing though: had been looking forward to open source DRI on this thing...

Comment 30 Maxim Burgerhout 2009-10-20 10:20:33 UTC
Just found that waking up from suspend does not work: screen stays black, though the cursor is visible and movable. I don't see the logs showing anything in the way of problems after reboot. I'm mentioning this here, because it might be related: cursor moves, screen

Comment 31 Rodd Clarkson 2009-10-20 10:22:36 UTC
As an aside, I'm having issues with suspend and resume.  Is this something I can report now (as a seperate bug) including the nomodeset issue in the description, or should it wait until this bug is addressed?

I should note that suspend and resume works some times, but fails others.

Comment 32 Joshua Covington 2009-10-20 15:38:07 UTC
Can you try the latest F12 beta? Maybe this is already fixed in it. 

I think it works for me. My problem is (maybe) slightly different though. It looked fine with f12-snap3 but I had another issues that prevented me from perpely testing this.

Maybe this is bug is the same: https://bugzilla.redhat.com/show_bug.cgi?id=507720 but those packages aren't pushed to f11.

Comment 33 Maxim Burgerhout 2009-10-20 19:08:40 UTC
I am currently running an up-to-date rawhide system (or at least as up-to-date as I can get, due to some broken package dependencies) and still have these issues. Afaict there is no fix in rawhide for this problem.

As for bug 507220, it's hard to say whether it is the same, because there are no clear indicators in the logs what the exact problem is (i.e. a segfault or something) and the other bug does not have backtraces attached to it. The reporter in 507220 does mention he doesn't get to see a graphical login *at all*, which makes me guess the problems are not the same, because we experience freezes after login. But then again, I am not an Xorg developer / maintainer / whatever.

Comment 34 Joshua Covington 2009-10-20 20:27:26 UTC
(In reply to comment #33)
> I am currently running an up-to-date rawhide system (or at least as up-to-date
> as I can get, due to some broken package dependencies) and still have these
> issues. Afaict there is no fix in rawhide for this problem.
> 
> As for bug 507220, it's hard to say whether it is the same, because there are
> no clear indicators in the logs what the exact problem is (i.e. a segfault or
> something) and the other bug does not have backtraces attached to it. The
> reporter in 507220 does mention he doesn't get to see a graphical login *at
> all*, which makes me guess the problems are not the same, because we experience
> freezes after login. But then again, I am not an Xorg developer / maintainer /
> whatever.  

It's sad to hear this but it proves that my issue is different from yours. I'm now with f12-Kde-live_x86_64 and couldn't lock my mashine. Desktop effects are enabled and with more than 30 open tabs (with flash content) in firefox, it still works "fine". In f11 this always resulted in a lockup.

Of course, my rs482 is quite old compared to your r600. I hope this gets fixed soon.

Comment 35 Joshua Covington 2009-10-21 12:25:44 UTC
Can anyone of you try the following?

According to the xorg-x11-server change log:
* Fri Oct 09 2009 Ben Skeggs <bskeggs> 1.7.0-3 - xserver-1.7.0-exa-looping-forever-is-evil.patch: Fix rendercheck hang

Maybe if this patch is ported back to the xserver in f11, it "can" fix the problem. I don't know it this is the culprit but the backtrace shows problems in the render process.

Comment 36 mursusoft 2009-10-21 13:58:01 UTC
It freezes here too with Radeon HD 4850 immediately after login when I tried Fedora 12 beta KDE live with or without KMS (nomedoset). Renders OS unusable. Glad to see I'm not the only one with this issue :)

Comment 37 Maxim Burgerhout 2009-10-21 18:41:50 UTC
(In reply to comment #35)
> Can anyone of you try the following?
> 
> According to the xorg-x11-server change log:
> * Fri Oct 09 2009 Ben Skeggs <bskeggs> 1.7.0-3 -
> xserver-1.7.0-exa-looping-forever-is-evil.patch: Fix rendercheck hang
> 
> Maybe if this patch is ported back to the xserver in f11, it "can" fix the
> problem. I don't know it this is the culprit but the backtrace shows problems
> in the render process.  

I'd like to try, but I don't think there is an xorg-x11-server-Xorg-1.7.0-3 RPM on any of the mirrors yet. I'll post back if I see it scrolling by during a yum update.

Comment 38 Jérôme Glisse 2009-10-21 21:26:08 UTC
Affected user can you try testing with KMS enabled and adding pcie_aspm=off to your kernel boot parameter and report if it helps. Might be a duplicate of :
https://bugzilla.redhat.com/show_bug.cgi?id=517625

Comment 39 Maxim Burgerhout 2009-10-22 07:10:52 UTC
Doesn't work for me: tried it just now, but system still hangs after a couple of minutes in X. After reboot, Xorg.0.log.old shows nothing after last (II) line. No (EE) lines and no (WW) lines, except for some messages about falling back to an old method for fbdev and vesa at the top of the file. 

My laptop is not connected to anything atm, so I cannot log in over ssh to do a backtrace, but I'm guessing the problem is the same as above.

Sidenote: I'm still not seeing the xorg-x11-server-Xorg-1.7.0-3 RPM (mentioned by Joshua above) in rawhide. Am I missing something here?

(In reply to comment #38)
> Affected user can you try testing with KMS enabled and adding pcie_aspm=off to
> your kernel boot parameter and report if it helps. Might be a duplicate of :
> https://bugzilla.redhat.com/show_bug.cgi?id=517625

Comment 40 Joshua Covington 2009-10-22 07:46:45 UTC
(In reply to comment #39)
> 
> Sidenote: I'm still not seeing the xorg-x11-server-Xorg-1.7.0-3 RPM (mentioned
> by Joshua above) in rawhide. Am I missing something here?
>

I meant to port back the patch to f11-server. However, it's not working in my case and the lockup still occurs. pcie_aspm=off doesn't help here, either.

Comment 41 Adam Williamson 2009-10-22 21:35:59 UTC
if everyone who's commented on this bug can try the workaround, that'd be good: if it does, your bug is likely 517625, please direct further comments there. if not, keep posting in this bug.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 42 Gene Stuckey 2009-10-23 17:30:05 UTC
Same problem as original reporter.

01:00.0 VGA compatible controller [0300]: ATI Technologies Inc Mobility Radeon
HD 3650 [1002:9591]

Hangs shortly after logging in. Xorg is consuming 100% of CPU.
Using pcie_aspm=off does not help.
Using nomodeset works.

Comment 43 Rodd Clarkson 2009-10-24 01:35:02 UTC
adam, finally got around to testing pcie_aspm=off.  (I've been writing a job application and it's a big task, so haven't had a change to reboot for a while)

when I added it, the system didn't boot.  I got a cursor in the top left corner of the screen the blinked and nothing else seemed to happen.

so it's a big step backward for me ;-]

Comment 44 Rodd Clarkson 2009-10-24 01:37:25 UTC
how bad does this have to be to be a BLOCKER?

Comment 45 Rodd Clarkson 2009-10-24 02:41:32 UTC
okay, I also gave mesa-dri-drivers-experimental.x86_64 a try without nomodeset.

The screen goes blank after logging in an I had to hard reboot.

I've got the Xorg.0.log for it with backtrace and reports of an infinite loop, but I'm not sure if it's helpful to this bug.

I've removed the above package, because it's not helping.

Comment 46 Maxim Burgerhout 2009-10-24 08:55:14 UTC
Is it possible that pressing F2 on the screen with the cursor in the top left reveals a LUKS encryption prompt?

(In reply to comment #43)
> adam, finally got around to testing pcie_aspm=off.  (I've been writing a job
> application and it's a big task, so haven't had a change to reboot for a while)
> 
> when I added it, the system didn't boot.  I got a cursor in the top left corner
> of the screen the blinked and nothing else seemed to happen.
> 
> so it's a big step backward for me ;-]

Comment 47 Joshua Covington 2009-10-24 10:54:05 UTC
(In reply to comment #44)
> how bad does this have to be to be a BLOCKER?  

I also want it marked as a blocker. F12 is to be released in a month and then most efforts will be pointed toward fixing bugs in it.

On the other side I had to use windows because of all the lockups. F12 has its own problems. I think the sevirity is enough for a blocker status.

Comment 48 Jérôme Glisse 2009-10-24 11:02:50 UTC
I don't think we accept single GPU issue as a blocker bug.

Comment 49 Joshua Covington 2009-10-24 11:26:32 UTC
(In reply to comment #48)
> I don't think we accept single GPU issue as a blocker bug.  

OK, but think about those experiencing total lock up just after logging in. Their systems are practucally useless in a GUI envirnoment and most of the users do need a GUI.

I have a R400, people here have r600 and in the other bug they have r700 cards. pcie_asmp=off isn't valuable fix for all people.

I know that currently there's lot of work going in the xserver/mesa/xf86ati-driver projects but the severity is enough to triggera a higher (blocker) status for this.

Comment 50 Adam Williamson 2009-10-26 21:07:15 UTC
jerome: it's rather a judgment call; depends how common the chip is, and how much other work we have to do before release. I don't think you have a lot of other F12 blockers on your plate, so it'd be good if you could prioritise this one.

We may have a system with an affected card somewhere within Red Hat, which we could get you access to, if that would help. If that may be useful, poke James Laska (jlaska on irc or jlaska@ for email), he can check on whether that's possible if you give him a PCI ID.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 51 Vedran Miletić 2009-10-27 10:08:58 UTC
Adding some keywords. To people reading this: bug with kms on r700 _might_ be the same thing as this, but bug on RS480 and RS482 certainly isn't the same thing as this bug.

Comment 52 Jérôme Glisse 2009-10-27 17:07:48 UTC
It's among the highest priority for me (short list is suspend/resume, AGP corruption, R600/R700 lockup). I will check if i have the exact same card, but last time i checked i had no issue with any of my R600 or R700.

Comment 53 mursusoft 2009-10-27 19:36:56 UTC
Haha for me the freeze wasn't caused by radeon driver but pulseaudio+audigy combination. I finally removed my unused audigy card and vòla everything works now with the defaults... One hell of a surprise fix :D

Comment 54 António Lima 2009-10-27 22:43:05 UTC
I have the same issue reported initially with fedora 12 beta livecd: total Lock-up except for mouse.

Adding "nomodeset" to kernel args fixes the issue. I can't get a trace since I'm using a livecd and I don't want to install fedora 12 now since this is my laptop for work.

lspci

01:00.0 VGA compatible controller: ATI Technologies Inc Mobility Radeon HD 3400 Series
01:00.1 Audio device: ATI Technologies Inc RV620 Audio device [Radeon HD 34xx Series]

Comment 55 Gene Stuckey 2009-10-27 23:49:27 UTC
(In reply to comment #42)
> 
> 01:00.0 VGA compatible controller [0300]: ATI Technologies Inc Mobility Radeon
> HD 3650 [1002:9591]
> 

This is working for me now with the latest rawhide updates, specifically, the updated kernel-2.6.31.5-96.fc12.i686.

I no longer need to use nomodeset.

Comment 56 Rodd Clarkson 2009-10-28 10:59:04 UTC
Using kernel-2.6.31.5-96.fc12.x86_64 did NOT solve this for me.

01:00.0 VGA compatible controller: ATI Technologies Inc Radeon Mobility HD 3670
01:00.1 Audio device: ATI Technologies Inc RV635 Audio device [Radeon HD 3600 Series]

I noticed the audigy card comment, and noticed that the guy in comment #54 had a similar audio card to me and wondered whether there was something in this.

Comment 57 Rodd Clarkson 2009-10-28 11:01:42 UTC
So, I guess the question is, can you stop pulseaudio running to test this?

I'm just curious, because I spend some time the other day beating on GDM to try and crash my system and had no luck.  Isn't GDM running the same X as I do when I log in?  And if so, maybe it's something user related that causes the lock up.

Comment 58 Jérôme Glisse 2009-10-28 19:00:21 UTC
It seems this bug only happen on intel motherboard, so far i think ICH8,ICH9,ICH10 family are linked to this issue. I think i had the issue on a ICH8 motherboard but i did and update to the lastest package (kernel,ddx,xserver) and i can't seems to trigger it again. People please update your installation and test again, if it doesn't work provide kernel, xorg-x11-drv-ati package version. In the meantime i will do a reinstall of the beta cd which should have non working version i believe.

Also i think following bugs are all duplicate as they share motherboard+R600/R700 lockup:

https://bugzilla.redhat.com/show_bug.cgi?id=531147
https://bugzilla.redhat.com/show_bug.cgi?id=517625
https://bugzilla.redhat.com/show_bug.cgi?id=522177
https://bugzilla.redhat.com/show_bug.cgi?id=522260
https://bugzilla.redhat.com/show_bug.cgi?id=522929
https://bugzilla.redhat.com/show_bug.cgi?id=525821

Note this is just a feeling, but in lockup case we don't have certitude until we find a fix for at least someone (and even than the fix might just hide or delay the root issue).

Comment 59 Adam Williamson 2009-10-28 19:46:15 UTC
rodd: that 'audio device' isn't your sound card, it's the audio component of your video card. Modern video cards have basic audio capability so they can send audio over HDMI when you use an HDMI cable. If you think pulseaudio is contributing somehow, you can forcibly uninstall pulseaudio, it'll take some other components with it but the system will still run.

The sound card issue sounds like an oddball one, but just in case, who of those experiencing this bug has a separate sound card (i.e. an actual PCI sound card, not motherboard audio)? Those who do, could you try disconnecting the card and see if it affects your situation?

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 60 Adam Williamson 2009-10-28 19:47:51 UTC
jerome, this bug is separated from 517625 because 517625 is specifically for the pcie_aspm issue. I'm trying to keep 517625 for reporters for whom 'pcie_aspm=off' works around the issue (and kernel 97 'solves' it). For the reporters in this bug, pcie_aspm=off does not resolve the problem.

Not sure about the others, like you I'm struggling to find the 'corner pieces' on this issue/issues at present :/

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 61 Maxim Burgerhout 2009-10-28 20:02:58 UTC
Updated packages don't work for me. Still experience a locked system. They do change something though, because after installing them, starting Firefox does not leave me with a frozen X, but with a black screen.

Have an integrated sound card, so I cannot disconnect is: I'm not going to provide help there.

Comment 62 Maxim Burgerhout 2009-10-28 20:36:44 UTC
For a second, I though setting SELinux in permissive mode did it: I was able to actually do something in my session. But after coming back from suspend, the crash was back again :-(

I was able to actually something though, when I put SELinux in permissive mode: I could enable desktop effects, drag some windows, start Firefox. Previously, those actions directly froze my X session...

It might be a coincidence, but I have noticed more programs having trouble with SELinux's (new?) policy that completely disallows marking a stack executable. Maybe worth investigating a bit more? It might not be the direct cause, but I recon it is possible it is a step in the direction of the solution...

Comment 63 Adam Williamson 2009-10-28 21:27:26 UTC
When Jerome says the 'latest package' he's not just referring to the latest you get from a regular F12 upgrade, but this:

http://koji.fedoraproject.org/koji/buildinfo?buildID=138707

if you have the ability to try that kernel build it would be good if you could. I am currently trying to spin a live build with that kernel included, but that'll only be x86-64 (I can't do a 32-bit live build).

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 64 Adam Williamson 2009-10-28 23:25:37 UTC
Live build is currently uploading to:

http://adamwill.fedorapeople.org/radeon-20091028-x86_64.iso

it should be complete in around 2-3 hours.

That's for x86-64 only.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 65 Rodd Clarkson 2009-10-29 08:48:03 UTC
adam, this spin is great.  there are so many things about this spin that are different to my f12 experience on my installed box.

For example I can't get it to crash, which is related to this bug.

Also, bluetooth works (and hasn't the whole way through testing rawhide)

Hmmm, this makes me wonder what config files are causing grief with my system.

Comment 66 Rodd Clarkson 2009-10-29 08:51:15 UTC
And I've just cycled through suspend and resume three times and  it works.  The record so far was once and then a crash on the second resume, so this is a vast, vast improvement.

What do we need to look for to find out why config files don't like f12.

I use the same /home partition for f11 and f12 and clearly something is amiss in this.

Comment 67 Rodd Clarkson 2009-10-29 08:52:35 UTC
can I get the live spin to mount my home directory and then start X to see if this is the problem?

Comment 68 Bostjan 2009-10-29 12:32:27 UTC
I'm afraid I still get the lockup with either the live image from #64 or all the latest updates (including kernel-PAE-2.6.31.5-104.fc12.i686). It's in general good enough to just log in, open a terminal and quickly move it around the screen to trigger it.

This is my hardware:
00:00.0 Host bridge: Intel Corporation Mobile PM965/GM965/GL960 Memory Controller Hub (rev 0c)
00:01.0 PCI bridge: Intel Corporation Mobile PM965/GM965/GL960 PCI Express Root Port (rev 0c)
00:03.0 Communication controller: Intel Corporation Mobile PM965/GM965 MEI Controller (rev 0c)
00:03.2 IDE interface: Intel Corporation Mobile PM965/GM965 PT IDER Controller (rev 0c)
00:03.3 Serial controller: Intel Corporation Mobile PM965/GM965 KT Controller (rev 0c)
00:19.0 Ethernet controller: Intel Corporation 82566MM Gigabit Network Connection (rev 03)
00:1a.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #4 (rev 03)
00:1a.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #5 (rev 03)
00:1a.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #2 (rev 03)
00:1b.0 Audio device: Intel Corporation 82801H (ICH8 Family) HD Audio Controller (rev 03)
00:1c.0 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 1 (rev 03)
00:1c.1 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 2 (rev 03)
00:1c.4 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 5 (rev 03)
00:1d.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #1 (rev 03)
00:1d.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #2 (rev 03)
00:1d.2 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #3 (rev 03)
00:1d.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #1 (rev 03)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev f3)
00:1f.0 ISA bridge: Intel Corporation 82801HBM (ICH8M-E) LPC Interface Controller (rev 03)
00:1f.1 IDE interface: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) IDE Controller (rev 03)
00:1f.2 SATA controller: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) SATA AHCI Controller (rev 03)
01:00.0 VGA compatible controller: ATI Technologies Inc M76 [Radeon Mobility HD 2600 Series]
01:00.1 Audio device: ATI Technologies Inc RV630/M76 audio device [Radeon HD 2600 Series]
02:06.0 CardBus bridge: Ricoh Co Ltd RL5c476 II (rev b9)
02:06.1 CardBus bridge: Ricoh Co Ltd RL5c476 II (rev b9)
10:00.0 Network controller: Intel Corporation PRO/Wireless 4965 AG or AGN [Kedron] Network Connection (rev 61)

Comment 69 Jérôme Glisse 2009-10-29 13:09:14 UTC
Bostjan I think i have the exact same motherboard but my R600/R700 GPU doesn't hang with the lastest package mentioned in previous comment, i will try to grab a HD2600.

Comment 70 Leif Gruenwoldt 2009-10-29 15:21:59 UTC
Created attachment 366642 [details]
boot failure photo with radeon-20091028-x86_64.iso

Adam, I tried your liveusb on my HD2600 and it fails before getting to gdm.


$ lspci -nn
00:00.0 Host bridge [0600]: Intel Corporation Mobile PM965/GM965/GL960 Memory Controller Hub [8086:2a00] (rev 0c)
00:01.0 PCI bridge [0604]: Intel Corporation Mobile PM965/GM965/GL960 PCI Express Root Port [8086:2a01] (rev 0c)
00:19.0 Ethernet controller [0200]: Intel Corporation 82566MM Gigabit Network Connection [8086:1049] (rev 03)
00:1a.0 USB Controller [0c03]: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #4 [8086:2834] (rev 03)
00:1a.1 USB Controller [0c03]: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #5 [8086:2835] (rev 03)
00:1a.7 USB Controller [0c03]: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #2 [8086:283a] (rev 03)
00:1b.0 Audio device [0403]: Intel Corporation 82801H (ICH8 Family) HD Audio Controller [8086:284b] (rev 03)
00:1c.0 PCI bridge [0604]: Intel Corporation 82801H (ICH8 Family) PCI Express Port 1 [8086:283f] (rev 03)
00:1c.1 PCI bridge [0604]: Intel Corporation 82801H (ICH8 Family) PCI Express Port 2 [8086:2841] (rev 03)
00:1c.4 PCI bridge [0604]: Intel Corporation 82801H (ICH8 Family) PCI Express Port 5 [8086:2847] (rev 03)
00:1d.0 USB Controller [0c03]: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #1 [8086:2830] (rev 03)
00:1d.1 USB Controller [0c03]: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #2 [8086:2831] (rev 03)
00:1d.2 USB Controller [0c03]: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #3 [8086:2832] (rev 03)
00:1d.7 USB Controller [0c03]: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #1 [8086:2836] (rev 03)
00:1e.0 PCI bridge [0604]: Intel Corporation 82801 Mobile PCI Bridge [8086:2448] (rev f3)
00:1f.0 ISA bridge [0601]: Intel Corporation 82801HBM (ICH8M-E) LPC Interface Controller [8086:2811] (rev 03)
00:1f.1 IDE interface [0101]: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) IDE Controller [8086:2850] (rev 03)
00:1f.2 SATA controller [0106]: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) SATA AHCI Controller [8086:2829] (rev 03)
01:00.0 VGA compatible controller [0300]: ATI Technologies Inc M76 [Radeon Mobility HD 2600 Series] [1002:9581]
01:00.1 Audio device [0403]: ATI Technologies Inc RV630/M76 audio device [Radeon HD 2600 Series] [1002:aa08]
02:06.0 CardBus bridge [0607]: Ricoh Co Ltd RL5c476 II [1180:0476] (rev b9)
02:06.1 CardBus bridge [0607]: Ricoh Co Ltd RL5c476 II [1180:0476] (rev b9)
02:06.2 FireWire (IEEE 1394) [0c00]: Ricoh Co Ltd R5C832 IEEE 1394 Controller [1180:0832] (rev 03)
02:06.3 SD Host controller [0805]: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter [1180:0822] (rev 20)
02:06.4 System peripheral [0880]: Ricoh Co Ltd R5C843 MMC Host Controller [1180:0843] (rev ff)
10:00.0 Network controller [0280]: Broadcom Corporation BCM4312 802.11a/b/g [14e4:4312] (rev 02)

Comment 71 Adam Williamson 2009-10-29 17:03:06 UTC
leif: 'no root device found' is different to this X bug and indicates some kind of problem with the image. check the sha256sum:

[adamw@adam live_build]$ sha256sum radeon-20091028-x86_64.iso 
8308014ae7e931bc6d237dc5cd2f2e59d9353f7e1a69071fecc2f96a7a325608  radeon-20091028-x86_64.iso

is what you want. Maybe re-burn it slower, or try it as a USB stick with livecd-iso-to-disk or something.

rodd: I'd recommend you create a new user on your installed system and see if that user also works without problems. If so, the problem would definitely seem to be associated with your initial user account, and hence your problem would seem to be different from Bostjan's.

Jerome, just a misc. note: I'll leave this image up on my space, so feel free to link to it for anyone you ask to test the 104 kernel build. If you do a newer kernel or ati driver build later that you want some testing for, let me know and I can update the live spin.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 72 Leif Gruenwoldt 2009-10-29 17:36:51 UTC
Adam: ok i remade a live usb stick. This time it booted. However within a minute of logging into the desktop I still experience the original freezing problem.

Comment 73 Jérôme Glisse 2009-10-30 21:02:08 UTC
Rodd can you give the exact reference of your motherboard ? Brand ... i will try to get one.

Comment 74 Rodd Clarkson 2009-10-30 22:56:18 UTC
I'm using a Dell Studio XPS 16.  So I'm not sure I can tell you what mobo is in it.  There's an lspci listing up top that's mine.

I'm trying to reinstall f12 at the moment to test with a new user and see if the problems still exist (as I've run Adam's live CD and had a totally positive experience.) however Anaconda keeps crashing in the post install stage so I need to reinstall it again and save the crash data and then ...

Comment 75 Vedran Miletić 2009-10-30 22:57:47 UTC
Rodd, you can get pretty accurate info about the motherboard using 'dmidecode'.

Comment 76 Adam Williamson 2009-10-31 05:23:48 UTC
rodd's lspci shows that he's using on ICH9. i'm going to do some more extensive re-triage on this later when i'm less tired and have had less to drink :)

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 77 Maxim Burgerhout 2009-10-31 09:03:45 UTC
Bad news. I downloaded and tried the ISO. Booted it, logged in, setup NetworkManager, started a terminal, started 'Desktop effects and then my system froze, showing a dialog telling me my hardware doesn't support 3D. Here's my lspci output:

00:00.0 Host bridge: Intel Corporation Mobile 4 Series Chipset Memory Controller Hub (rev 07)
00:01.0 PCI bridge: Intel Corporation Mobile 4 Series Chipset PCI Express Graphics Port (rev 07)
00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 03)
00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 03)
00:1a.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 (rev 03)
00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 03)
00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller (rev 03)
00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 03)
00:1c.1 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 2 (rev 03)
00:1c.3 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 4 (rev 03)
00:1c.4 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 5 (rev 03)
00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 03)
00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 03)
00:1d.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 03)
00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 03)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev 93)
00:1f.0 ISA bridge: Intel Corporation ICH9M LPC Interface Controller (rev 03)
00:1f.2 SATA controller: Intel Corporation ICH9M/M-E SATA AHCI Controller (rev 03)
00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 03)
01:00.0 VGA compatible controller: ATI Technologies Inc Mobility Radeon HD 3400 Series
01:00.1 Audio device: ATI Technologies Inc RV620 Audio device [Radeon HD 34xx Series]
06:00.0 Network controller: Intel Corporation Wireless WiFi Link 5100
08:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8055 PCI-E Gigabit Ethernet Controller (rev 13)
0a:03.0 FireWire (IEEE 1394): Ricoh Co Ltd R5C832 IEEE 1394 Controller (rev 05)
0a:03.1 SD Host controller: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter (rev 22)
0a:03.2 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host Adapter (rev 12)

The system is a Sony VAIO VGN-FW21E laptop.

Comment 78 Devan Goodwin 2009-10-31 11:48:16 UTC
Running rawhide, exact same symptoms with Radeon 3600. Shortly after logging in X hangs. Disabling KMS with nomodeset allows things to function normally.

Comment 79 Adam Williamson 2009-10-31 19:16:29 UTC
Devan, please provide your 'lspci -v' output. Thanks.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 80 Devan Goodwin 2009-10-31 19:24:31 UTC
Created attachment 366968 [details]
My lspci -v.

Adding my output of lspci -v.

Comment 81 Rodd Clarkson 2009-10-31 23:44:59 UTC
Created attachment 366987 [details]
dmidecode output

Comment 82 Adam Williamson 2009-11-01 22:30:18 UTC
I have done a survey of all the reports of radeon hanging shortly after login in F12. Here's the results as they relate to this bug.

This bug will become the master bug for cases where people with the combination of an r600 or r700 graphics chip and an Intel ICH8, ICH9 or ICH10-based motherboard experience hangs shortly after starting X unless 'nomodeset' (or, in some cases, 'pcie_aspm=off' - we have decided this workaround indicates the same bug) is set as a kernel parameter.

Reporters Rodd Clarkson, Maxim Burgerhout and Devan Goodwin have r600 graphics chipsets and ICH9 motherboards. Reporters Bostjan and Leif Gruendwoldt have r600 graphics chipsets and ICH8 motherboards.

Reporters 'mursusoft' and Gene Stuckey have not provided enough information to identify their motherboard chipsets. However, since mursusoft's bug was avoided by removing his sound card, it seems like a different issue - I have given him further advice in bug #517625. Gene Stuckey's problem being 'fixed' in kernel 96 also indicates it may be a different bug, as other reporters have not seen this. Gene, it would be useful to see your 'lspci' output.

Bugs 522177 and 522260 will be closed as duplicates of this bug, as they show the classic r600+ICH9 combinations. Bugs 522929 and 525821 are also likely dupes of this issue, and will be investigated to clarify this for sure.

This bug is currently considered an F12 release blocker, but it is hard for our engineers to fix until they have direct access to an affected system. They're working to try and put together affected combinations of hardware, but if anyone with an affected system happens to be physically located close to either of our developers, we may be able to try and put together a meeting to get physical access to the system. Dave Airlie is in Brisbane, Australia. I believe Jerome Glisse is in France, I'm not sure exactly where. If any person with a system affected by this bug happens to be in those areas, could they please post a comment to say so? Thanks.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 83 Adam Williamson 2009-11-01 22:32:03 UTC
*** Bug 522177 has been marked as a duplicate of this bug. ***

Comment 84 Adam Williamson 2009-11-01 22:32:45 UTC
*** Bug 522260 has been marked as a duplicate of this bug. ***

Comment 85 Gene Stuckey 2009-11-01 22:57:31 UTC
Created attachment 367049 [details]
My lspci -v output

Apparently I didn't really remove the nomodeset. (Must have hit ESC instead of Enter when editing the boot command.) Sorry for that.


So I still get the X lockup shortly after logging in.

Looks like I have the ICH9-based motherboard and r600 graphics.

Comment 86 Adam Williamson 2009-11-01 23:08:41 UTC
*** Bug 525821 has been marked as a duplicate of this bug. ***

Comment 87 Adam Williamson 2009-11-01 23:09:33 UTC
Gene: indeed you do - thanks for clarifying, and making our record of observed behaviour more consistent and accurate.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 88 António Lima 2009-11-01 23:22:04 UTC
Created attachment 367053 [details]
lspci

Ai I wrote in a comment above I have the same lockup. Here goes my lspci -v. I seem to have the suspected hardware.

Comment 89 Adam Williamson 2009-11-01 23:37:51 UTC
Antonio has r600+ICH8 combination. Thanks Antonio!

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 90 Rodd Clarkson 2009-11-02 01:05:14 UTC
While not exactly in the same place, I'm in the same time zone and continent as Dave Airlie.

I'm in Melbourne, Australia.  I'd be happy to have one-to-one contact with Dave if this would be helpful (either by chat or phone) and Dave is welcome to contact me via email to set this up.  If Dave happens to be coming to Melbourne, I could make my laptop available to him for the day.

Comment 91 Kevin DeKorte 2009-11-02 02:03:34 UTC
I'm willing to allow Dave Airlied to login to my machine remotely to gather data on the machine.

Comment 92 Martin Ebourne 2009-11-02 02:15:30 UTC
I have the same problem as well with R600/ICH8. My lspci is the same as comment #68 From Bostjan, maybe the same laptop range. Mine is HP Compaq 8510w.

I note that Comment #70 From Leif Gruenwoldt has a very similar list of HW except for wireless card and from the photo I can see it's also an HP Compaq of the same case design.

Comment 93 Leif Gruenwoldt 2009-11-02 02:32:45 UTC
(In reply to comment #92)
> I note that Comment #70 From Leif Gruenwoldt has a very similar list of HW
> except for wireless card and from the photo I can see it's also an HP Compaq of
> the same case design.  

Yes my hardware is very similar. I have the HP Compaq 8510p.

Comment 94 Nick Lamb 2009-11-02 03:07:49 UTC
Maybe I didn't look carefully enough but it seems like all the affected systems are laptops judging from the Smolt profiles and lspci output? ie Radeon Mobility variants. Any counter-examples?

Comment 95 Adam Williamson 2009-11-02 04:01:55 UTC
Only one - the other guy who commented on your bug, Lars Hamann. His is just listed as "ATI Technologies Inc RV670PRO [Radeon HD 3850]". His looks like a desktop from the specs, and that's what smolt calls it. But aside from that, yes, all reporters appear to be on laptops.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 96 Dave Airlie 2009-11-02 04:10:37 UTC
I've gotten a definite ACPI interaction issue with the Lenovo W500 laptop I have here, it lockups sitting at console if the blank kicks in under kms.

If I can get past that I suspect I'll find the other issue.

I've put a -106 kernel into koji not sure if it'll make a difference

it'll show up here when the task finished churning.

http://kojipkgs.fedoraproject.org/packages/kernel/2.6.31.5/106.fc12/

Comment 97 Kevin DeKorte 2009-11-02 04:26:54 UTC
My machine is not a laptop, but the lockups were solved by the -97 kernel for me.

http://www.smolts.org/client/show/pub_d4e5de0b-c85b-4967-b60f-4ca0fbd8854c

Comment 98 Adam Williamson 2009-11-02 04:45:00 UTC
Kevin: yours is using a laptop adapter, though: your Xorg.0.log tags it as a "ATI Technologies Inc Mobility Radeon HD 3600 Series". The -97 kernel build essentially implemented the 'pcie_aspm=off' workaround within the kernel, that's why it helped you, but it's not a real fix for the problem according to Dave and Jerome.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 99 lars@bistromatic.de 2009-11-02 05:21:14 UTC
-104 kernel solves my problem.

Comment 100 Adam Williamson 2009-11-02 05:35:02 UTC
Thanks for the report. It'd be great if you could run for a while to verify there's no more hangs, I want to be careful with this issue! Please, others also test the same kernel and report.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 101 Rodd Clarkson 2009-11-02 05:49:34 UTC
As I mentioned above, I've had a wonderful experience with the LiveCD that Adam made.  Using that it's all very nice (compiz doesn't work but that desktop is stable) and I've used it for a couple of hours browsing without issue.  It even suspends and resumes without a crash (and this is new too).

However, in the interest of further testing of this with my home partition I tried to install from the liveCD and it died in the post install phase.  Since then I've had no luck installing f12 (with either Adam's liveCD or the the Beta LiveCD) and I've even tried running yum update and installing newer versions of anaconda from koji.  So I need some help finding a install method so I can help test this out.

Comment 102 Ville-Pekka Vainio 2009-11-02 06:22:40 UTC
Since there's been discussions about the form factor, I have a desktop machine as well, an RV620 chip with an ICH9 based motherboard. The card does identify itself as a "mobility card", though, an "ATI Technologies Inc Mobility Radeon HD 3450". I've bought the card separately, it's passively cooled, which is why I guess it has a laptop chip.

Comment 103 Bostjan 2009-11-02 07:31:06 UTC
I've just tried kernel -106. It doesn't help, sorry.

Comment 104 Rodd Clarkson 2009-11-02 07:36:10 UTC
Okay, I've got the LiveCD adam made installed now and I've been using it for about 1 hour without any issues.

I'm running kernel-2.6.31.5-104.fc12.x86_64

Comment 105 Rodd Clarkson 2009-11-02 07:38:35 UTC
Also, just a thought, but is everyone using x86_64 or are some using i686?

Comment 106 Bostjan 2009-11-02 07:49:39 UTC
I am using i686.

Comment 107 Rodd Clarkson 2009-11-02 09:27:32 UTC
I've just tried kernel-2.6.31.5-106.fc12.x86_64 and it's back to badness.

The outcome wasn't the usual non responding desktop with a mouse that moves.  This time I got a weird color over the screen and all was locked up.  No mouse could be 'discerned' moving.

I really like where -104 is at at this stage, but I'm going to try -105 too.

Somewhere between -96 and -106 something was right for a moment.

Comment 108 Maxim Burgerhout 2009-11-02 11:47:56 UTC
Odd. Yesterday I booted the ISO from CD again, and I was able to update packages, run Firefox, install mesa-dri-drivers-experimental, restart X, run compiz, all without problems. I didn't have time to report back, but it all seemed stable until I tried to suspend. Suspend hung the system, as usual.

Today, I tried booting the ISO again after I made an USB boot stick from it, and I could not make it to crash: I tried generating CPU load, I tried generating disk load (sha256 of my disk), but the system kept running. Eventually, the system hung when I tried to suspend it. I had not tried to install mesa-dri-drivers-experimental this time.

Intrigued, I tried it a second time - still without networking - and now resizing a firefox window hangs it. The third boot from the USB stick, it hung on dragging the firefox Window. Neither times had I even come as far as to try to install mesa-dri-drivers-experimental.

Apparently, there are circumstances under which my system is stable, even with KMS. I just can't understand what causes the system to be stable during one run and to crash during another, identical run. CPU / GPU temperature, maybe?

Have other people testing the radeon-20091028-x86_64.iso experienced similar varying stability?

Comment 109 Devan Goodwin 2009-11-02 12:41:51 UTC
I'm experiencing this on a desktop system.

Comment 110 Kevin DeKorte 2009-11-02 14:54:27 UTC
2.6.31.5-107.fc12.x86_64 seems to be ok on my machine, no lockups yet.

[kdekorte@quad ~]$ dmesg | grep drm
[drm] Initialized drm 1.1.0 20060810
[drm] radeon defaulting to kernel modesetting.
[drm] radeon kernel modesetting enabled.
[drm] radeon: Initializing kernel modesetting.
[drm] register mmio base: 0xFE9E0000
[drm] register mmio size: 65536
[drm] Clocks initialized !
[drm] Detected VRAM RAM=256M, BAR=256M
[drm] RAM width 128bits DDR
[drm] radeon: 256M of VRAM memory ready
[drm] radeon: 512M of GTT memory ready.
[drm] Loading RV635 CP Microcode
[drm] GART: num cpu pages 131072, num gpu pages 131072
[drm] ring test succeeded in 1 usecs
[drm] radeon: ib pool ready.
[drm] ib test succeeded in 0 usecs
[drm] Radeon Display Connectors
[drm] Connector 0:
[drm]   DVI-I
[drm]   DDC: 0x7e50 0x7e50 0x7e54 0x7e54 0x7e58 0x7e58 0x7e5c 0x7e5c
[drm]   Encoders:
[drm]     DFP1: INTERNAL_UNIPHY
[drm]     CRT2: INTERNAL_KLDSCP_DAC2
[drm] Connector 1:
[drm]   DIN
[drm]   Encoders:
[drm]     TV1: INTERNAL_KLDSCP_DAC2
[drm] Connector 2:
[drm]   DVI-I
[drm]   DDC: 0x7e40 0x7e40 0x7e44 0x7e44 0x7e48 0x7e48 0x7e4c 0x7e4c
[drm]   Encoders:
[drm]     CRT1: INTERNAL_KLDSCP_DAC1
[drm]     DFP2: INTERNAL_KLDSCP_LVTMA
[drm] fb mappable at 0xD0141000
[drm] vram apper at 0xD0000000
[drm] size 7257600
[drm] fb depth is 24
[drm]    pitch is 6912
[drm] TMDS-9: set mode 1680x1050 26
[drm] TMDS-15: set mode 1280x1024 28
fb0: radeondrmfb frame buffer device
[drm] Initialized radeon 2.0.0 20080528 for 0000:01:00.0 on minor 0
[drm:drm_mode_rmfb] *ERROR* tried to remove a fb that we didn't own
[drm:drm_mode_rmfb] *ERROR* tried to remove a fb that we didn't own
[drm:drm_mode_getfb] *ERROR* invalid framebuffer id
[drm] TMDS-9: set mode 1680x1050 2f
[drm] TMDS-15: set mode 1280x1024 33
[drm] TMDS-15: set mode 1280x1024 32

Comment 111 Maxim Burgerhout 2009-11-02 16:32:49 UTC
Following Kevin DeKorte's comment above, I installed -107 too, and had similar good results initially. No lockups, suspend seemed to work, nomodeset seemed no longer needed. Until I reboot a couple of times during tests: then the locking started again...

Am I missing something blatantly obvious here? 

I am at LinuxWorld in Utrecht in the Netherlands on Wednesday, if someone is there, I'll happily lend them my laptop for a couple of hours for debugging.

Comment 112 Adam Williamson 2009-11-02 17:02:05 UTC
I suspect there could be some sort of hardware/environment link here, especially given that it appears to be affecting only Mobility adapters, which usually have more 'smarts' built in to try and reduce temperature / power usage in the laptop environment. Jerome, Dave, does that sound like something significant?

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 113 Bostjan 2009-11-02 17:40:52 UTC
I also have tried -107 however for me it fails immediately like other kernels, i.e. I log in open a terminal window and very quickly move it around the screen. Lockup happens within seconds. All of the tested kernels work without problems with nomodeset.

Comment 114 Adam Williamson 2009-11-02 17:53:31 UTC
jerome, dave, I did have one thought: maybe we should take the aspm workaround from kernel -97 *out* for these test builds, so the fact that the aspm workaround seems to help for some people but not others doesn't confuse the issue? just an idea, I might be wrong but I thought I'd mention it.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 115 Gene Stuckey 2009-11-02 18:04:39 UTC
-106 fails for me like before (X hang shortly after logging in). It works with nomodeset.

Comment 116 Rodd Clarkson 2009-11-02 20:58:36 UTC
Hi All,

Looking at the changelogs for the kernel is seems Dave tried something in 106 and backed it out in 107 which seems to support my experience.

* Mon Nov 02 2009 Dave Airlie <airlied> 2.6.31.5-107
 - r600: back that out, thanks to yaneti for testing.

* Mon Nov 02 2009 Dave Airlie <airlied> 2.6.31.5-106
 - r600: ring size guesswork fix. 

Can people with issues with 106 try 107

http://koji.fedoraproject.org/koji/buildinfo?buildID=139379

Comment 117 lars@bistromatic.de 2009-11-02 21:17:06 UTC
I've tested -107 for a few hours now without any problems. Even suspend works as expected.

Comment 118 Rodd Clarkson 2009-11-02 21:54:47 UTC
I'm running -107 with compiz and have been for about 16 mins.  Not conclusive, but it's a great start.

I've banged a little on my system by running glxgears and resizing it and cycling through suspend and resumes and so far it's holding up.

Comment 119 Gene Stuckey 2009-11-02 22:06:53 UTC
I've been using -107 now for about 30 minutes without any issues.

Comment 120 Adam Williamson 2009-11-02 22:22:28 UTC
107 is, I believe, precisely the same as 105. 106 introduced a test fix, 107 backed it out again, so it went right back to 105 state.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 121 Dave Airlie 2009-11-02 22:47:32 UTC
Okay -110 is in the builders, we found a problem on irc last night that fixes an issue someone else had seen, and it might also explain this problem (it also might not).

But please get with testing -110 when it lands.

Comment 122 Christian Krause 2009-11-02 22:50:35 UTC
I was suffering from this problem as well:
When using KMS the X server started to hang after a couple of minutes. Mouse movement was still possible, but the screen was completely stuck. Remote login was always been possible and revealed that the Xorg process consumed 100% CPU time (#525821).

I've tried kernel -107 (without nomodeset, so KMS was enabled) it it looks good so far. No hanging X anymore and Suspend/Resume has started to work, too.

Comment 123 Adam Williamson 2009-11-02 23:51:39 UTC
a quick note: checked with airlied, the only thing that could be making this 'work' in 107 is the workaround which essentially implements pcie_aspm=off in the kernel. 107 does not yet include a proper fix for this bug and there's nothing in it that should make it work on any system where 'pcie_aspm=off' doesn't make it work on kernel 96.

testing of 110 would be useful, but dave expects to need a further patch after 110 to properly address this issue at present.

we may later do a build with the aspm workaround removed, to make the results clearer.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 124 Gene Stuckey 2009-11-03 00:08:37 UTC
(In reply to comment #123)
> a quick note: checked with airlied, the only thing that could be making this
> 'work' in 107 is the workaround which essentially implements pcie_aspm=off in
> the kernel. 107 does not yet include a proper fix for this bug and there's
> nothing in it that should make it work on any system where 'pcie_aspm=off'
> doesn't make it work on kernel 96.

Strange... 96 with 'pcie_aspm=off' still locks up my machine. Can't even ssh in.
But 107 has been working fine (without nomodeset and without 'pcie_aspm=off').

Comment 125 Martin Ebourne 2009-11-03 00:44:51 UTC
110 working for me here using kms/defaults. Over half an hour of active use where it had locked up within a couple of minutes before. Bit hard to test it though with bug #522250 and bug #522271 fighting for a slice of the action.

Comment 126 Adam Williamson 2009-11-03 05:51:25 UTC
there's a kernel 112 building here, thanks to Dave:

http://koji.fedoraproject.org/koji/buildinfo?buildID=139511

we believe this ought to have a real fix for the issue. the aspm workaround is disabled in this build, so if it works for you, then we think we've nailed it.

if everyone could test this build (once it's done) and report, it would be appreciated. Please test with a completely default config - no special kernel parameters. no pcie_aspm, no nomodeset, no nothing - and let us know how it goes. we're on a pretty tight schedule for the final release, so if you could test ASAP it would be really appreciated.

I will spin up an x86-64 live CD for doing a clean test and for those who can't install Rawhide, that should be up in 3-4 hours. will update the bug when it's ready.

thanks all!

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 127 Adam Williamson 2009-11-03 07:37:49 UTC
Live image is built and transferring now. It's x86-64 only, I can't build i686 live image here, but any hardware affected by this should be x86-64 capable I believe.

The image will be up at:

http://adamwill.fedorapeople.org/radeon-20091102-x86_64.iso

in around 2 hours from the time this comment is posted. checksum:

[adamw@adam live_build]$ sha256sum radeon-20091102-x86_64.iso 
b8eb1be1feb35b0dd42ee69124e7a2a40d12675ece00aafe337ebfb9ebc3ac47  radeon-20091102-x86_64.iso

Please, test and report. thanks!

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 128 Rodd Clarkson 2009-11-03 07:50:36 UTC
Initial testing looks good.

I've used for less that 5 minutes, but the usual crash methods all work this time (ie, they start and don't crash the system)

$ uname -a
Linux localhost.localdomain 2.6.31.5-112.fc12.x86_64 #1 SMP Tue Nov 3 00:28:52 EST 2009 x86_64 x86_64 x86_64 GNU/Linux


from grub.conf (to show what parameters were passed)

title Fedora (2.6.31.5-112.fc12.x86_64)
        root (hd0,5)
        kernel /vmlinuz-2.6.31.5-112.fc12.x86_64 ro root=/dev/mapper/vg_moose-LogVol05  LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us rhgb quiet
        initrd /initramfs-2.6.31.5-112.fc12.x86_64.img

Will test more tonight, but so far it looks good.

Thanks to all for their hard work on this.

Comment 129 Adam Williamson 2009-11-03 08:18:25 UTC
thanks for the update, Rodd. I'm going to bed, be back in 7 hours or so. will check up then.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 130 Bostjan 2009-11-03 08:22:15 UTC
Yes, success. 
kernel-PAE-2.6.31.5-112.fc12.i686 works without a glitch so far on my machine. All usual tests have not crashed it. Running it now for about 1 hour with my typical workload - no problems so far,

Comment 131 Joshua Covington 2009-11-03 08:45:38 UTC
(In reply to comment #127)
> Live image is built and transferring now. It's x86-64 only, I can't build i686
> live image here, but any hardware affected by this should be x86-64 capable I
> believe.
> 
> The image will be up at:
> 
> http://adamwill.fedorapeople.org/radeon-20091102-x86_64.iso
> 
> in around 2 hours from the time this comment is posted. checksum:
> 
> [adamw@adam live_build]$ sha256sum radeon-20091102-x86_64.iso 
> b8eb1be1feb35b0dd42ee69124e7a2a40d12675ece00aafe337ebfb9ebc3ac47 
> radeon-20091102-x86_64.iso
> 
> Please, test and report. thanks!
> 
> -- 
> Fedora Bugzappers volunteer triage team
> https://fedoraproject.org/wiki/BugZappers  

I get a wrong chechsum on this and the live image is only 224MB???
My chechsum sha256sum: B53B48C6A8D00F86AF3BEFE844AE1EA0F1591232A27B1C23F378BA1287A9DF4A

Comment 132 Joshua Covington 2009-11-03 08:47:14 UTC
(In reply to comment #131)
> (In reply to comment #127)
> > Live image is built and transferring now. It's x86-64 only, I can't build i686
> > live image here, but any hardware affected by this should be x86-64 capable I
> > believe.
> > 
> > The image will be up at:
> > 
> > http://adamwill.fedorapeople.org/radeon-20091102-x86_64.iso
> > 
> > in around 2 hours from the time this comment is posted. checksum:
> > 
> > [adamw@adam live_build]$ sha256sum radeon-20091102-x86_64.iso 
> > b8eb1be1feb35b0dd42ee69124e7a2a40d12675ece00aafe337ebfb9ebc3ac47 
> > radeon-20091102-x86_64.iso
> > 
> > Please, test and report. thanks!
> > 
> > -- 
> > Fedora Bugzappers volunteer triage team
> > https://fedoraproject.org/wiki/BugZappers  
> 
> I get a wrong chechsum on this and the live image is only 224MB???
> My chechsum sha256sum:
> B53B48C6A8D00F86AF3BEFE844AE1EA0F1591232A27B1C23F378BA1287A9DF4A  

 Sorry, obviously the image is being uploaded right now (292Mb as of writing).

Comment 133 Maxim Burgerhout 2009-11-03 09:43:10 UTC
Great! -112 is actually the first build that seems to improve things drastically here.

Just went through about half an hour of use, a couple of reboots and suspends, and it still has not crashed for me. I'll keep using it during the coming days and will report back if I experience something unexpected.

Comment 134 Davide Cescato 2009-11-03 12:07:54 UTC
I tested both rawhide with kernel-2.6.31.5-112.fc12.x86_64 and the live image radeon-20091102-x86_64.iso on a Lenovo W500. In both cases I did not experience any lockups after several minutes of usage. This is great news!

I have a small note on the live image radeon-20091102-x86_64.iso. Unlike all other live images I have used so far, in this one the root account is password-protected. As a result, I was unable to mount any local drives or to install additional software (I wanted to try playing video files from my hard disk and to test some win32 applications after installing wine), since both operations required the root password. Adam, was there a specific reason for protecting the root account, or was this done by mistake?

I have an additional question. 3D acceleration for r600 is disabled by default and can be enabled by installing mesa-dri-drivers-experimental, is this correct?

Comment 135 Devan Goodwin 2009-11-03 12:19:31 UTC
Looking very good with 112 on rawhide. I dare say it's fixed for me. 

Nice work guys.

Comment 136 Devan Goodwin 2009-11-03 13:33:05 UTC
Ok so no crashes yet, but performance has gotten noticably worse in some areas. For example, scrolling in Google Reader has been bad for me for awhile, I think since upgrading to rawhide, but it's now gone almost catatonic. Scrolling here barely moves and completely pegs a cpu core when using it. 

X in general acts a little strange here and there, and on occasion temporarily locks up, but it does come back within 5-10 seconds. 

Not sure if this is expected or not so just thought I'd mention it. No actual lockups and desktop is generally usable.

Comment 137 Kevin DeKorte 2009-11-03 13:58:34 UTC
the -112 kernel works fine for me as well.

Comment 138 Gene Stuckey 2009-11-03 14:42:45 UTC
112 has been working fine for me for a few hours now.

Comment 139 Adam Williamson 2009-11-03 14:56:50 UTC
joshua: sorry, the transfer took longer than I expected. It's done now.

Thank you for all the testing, guys, that's awesome news!

I will work with Dave and the kernel team to request a tag for the kernel build and then we'll be able to close this puppy. Please do report back if any of you start hitting hangs.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 140 Leif Gruenwoldt 2009-11-03 15:13:41 UTC
-122 kernel stable for me too!

uptime of 27min so far. That's 26minutes longer than ever before :)

And just for some added great news... desktop effects using the mesa-dri-drivers-experimental package is also working awesome on my HD 2600 hardware now. Yay F12!

Thanks Adam and everyone else involved!

Comment 141 Adam Williamson 2009-11-03 15:39:26 UTC
please send all credit to Dave and Jerome, all I did was slow them down by bugging them about it every 5 minutes =)

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 142 Jérôme Glisse 2009-11-03 17:38:31 UTC
Credit should go to Rafał Miłecki who helped us debugging this issue. If by tomorrow no, one report back anymore issue i will close this bug. Of course if it takes more time to face same you can reopen it, but i believe if we ever have more lockup on r600/r700 the root of the problem will be different than for this one.

Comment 143 Rodd Clarkson 2009-11-03 20:33:48 UTC
Alright, I've got an uptime of 12 hours now and I've been beating this one senseless (in my own funny way).

I've plugged in displays, played DVD's, it's using the experimental mesa driver running compiz and I've suspended and resumed a number of times and it's brilliant.

There's so many people to thank.

Dave and Jerome, thanks for figuring this out, even if you did need a little help from Rafal. ;-]  After having display and kernel issues ever since f12 hit rawhide, it's great to finally have this working really well.  The best I'd hoped for was another f11 where 2D worked and I could suspend and resume, so I got a lot more than I hope for.

And Adam.  You may claimed to have only nagged, but nagged you did.  You took this bug seriously, including a request for consideration about it being a blocker.  You spend a lot of time piecing it all together and figuring out some commonality between this bug and others and you persisted when it might have been easy just to ignore a handful of ati users.  So thanks for all your 'wrangling' and 'nagging'.  It's truly appreciated.

Comment 144 Adam Williamson 2009-11-03 20:48:50 UTC
thanks for the feedback, Rodd. we'll do our best to break it with an update two weeks after release, we know we'd be failing the standards of the Fedora project if we don't ;)

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 145 Christian Krause 2009-11-03 22:10:21 UTC
I can happily confirm that kernel -112 fixes my problem as well. Even 3D acceleration and Suspend/Resume and Hibernate/Resume work without any problems.

Many thanks to all of you!

Comment 146 Maarten 2009-11-03 22:24:16 UTC
Running now from usblive with radeon-20091102-x86_64.iso all is well. Thanks.

Comment 147 Davide Cescato 2009-11-03 22:28:24 UTC
I have been running rawhide with the -112 kernel for several hours, and did not experience a single lockup. Great job! Thanks to everybody involved in solving this nasty bug!

Before this bug gets closed, I would like to throw in a couple of thoughts... 

First, live images. Live images found their way into the test days and now even into bug zapping! Their use is an excellent method for bringing all testers to a common ground, hence significantly improving the quality of the data available to the developers. Live images are the way to go!

Second, the role of a QA person in the bug zapping process. I am aware that the developers Dave and Jerome are ultimately responsible for the fix of the bug and hence deserve the greatest thanks, but I think that Adam's role, as an intermediator between developers and testers, is very important as well: for assisting the developers in gathering, filtering and processing test data, and for maintaining the contact with the testers, which is surely a time-consuming process, but worth every second of it. As a tester, receiving the feedback that the data I provided is being evaluated is motivating and encourages me to do allocate more of my resources (the most precious one being time) to provide additional data, if needed. Adam is doing a great job, and the Fedora project needs more people like him, who have a direct link with the developers and a good level of insight in the problems to be solved, but who can also take time to deal with the users or testers.

Comment 148 Nick Lamb 2009-11-03 22:34:06 UTC
The new ISO works for me too. 20-30 minutes of miscellaneous activity, plus idling while I fetched dinner did not crash it.

I am indebted to Rafał Miłecki / Jerome / Dave / anyone else involved in tracking this down and getting a working fix in time for Fedora 12.

Comment 149 Adam Williamson 2009-11-03 22:40:24 UTC
thanks for the feedback everyone, and the thoughts davide. spinning up live CDs is still a reasonably intensive process in time, bandwidth and storage space (it takes anywhere from 15 mins to 2 hours to build one, 2 hours to upload it to my fedorapeople space, and however long it takes you guys to download and test), but it certainly seems like a good idea for bugs of this kind and I'm glad you agree :)

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 150 Adam Williamson 2009-11-04 02:26:03 UTC
kernel 112 was tagged for tomorrow's Rawhide, so we can close this now. thanks again to all for your testing on this, it's greatly appreciated.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers