Bug 525904 - kernel-2.6.31.1-48 causes X to freeze with RADEON:R770:HD4870X2
Summary: kernel-2.6.31.1-48 causes X to freeze with RADEON:R770:HD4870X2
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: x86_64
OS: Linux
low
high
Target Milestone: ---
Assignee: Dave Airlie
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-09-26 22:23 UTC by Robert Laverick
Modified: 2009-10-28 02:00 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-10-28 02:00:05 UTC


Attachments (Terms of Use)
kernel output during fatal error with kernel 2.6.31.1-56.x86_64 & kms (2.71 KB, text/plain)
2009-09-30 22:10 UTC, Robert Laverick
no flags Details
serial console output (44.49 KB, text/plain)
2009-10-05 22:05 UTC, Robert Laverick
no flags Details
Kernel output with radeon.tv=0 and kms enabled (34.58 KB, text/plain)
2009-10-06 21:25 UTC, Robert Laverick
no flags Details
serial console log from another machine with an ATI card (11.27 KB, text/plain)
2009-10-07 20:47 UTC, Robert Laverick
no flags Details
Odd background colour in Nautilus with KMS (5.23 KB, image/png)
2009-10-28 00:34 UTC, Robert Laverick
no flags Details

Description Robert Laverick 2009-09-26 22:23:42 UTC
Description of problem:
When X tries to start after updating to kernel 2.6.31.1-48 the system hard freezes, I can't find any error messages, or get any logs out of the system

Version-Release number of selected component (if applicable):
kernel-2.6.31.1-48.fc12.x86_64
xorg-x11-server-common-1.6.99.902-1.fc12.x86_64
xorg-x11-drv-ati-6.13.0-0.4.20090908git651fe5a47.fc12.x86_64

How reproducible:
Every time

Steps to Reproduce:
1. Boot into runlevel5 with latest rawhide kernel
2.
3.
  
Actual results:
System stops responding to all input

Expected results:
Normal X startup

Additional info:
Smolt: http://www.smolts.org/client/show/pub_3885987d-0be7-4cd3-9acf-adb1786960aa

same occurs if I change to boot runlevel 3 and then start x manually with startx from command line.

Comment 1 Robert Laverick 2009-09-27 09:28:21 UTC
took me a while, but if I use the nomodeset boot parameter then X still works with this kernel however this breaks compiz.  Should this be on xorg-x11-drv-ati rather than kernel?  tho it's clearly the kernel that changed to cause this problem.

Comment 2 Chuck Ebbert 2009-09-27 09:47:03 UTC
Adapter:
  	ATI Technologies Inc R700 [Radeon HD 4870 X2]

Comment 3 Dave Airlie 2009-09-30 04:29:38 UTC
please give kernel-2.6.31.1-56 and xorg-x11-drv-ati-6.13.0-0.6.20090929git7968e1fb8.fc12.i686

with modesetting enabled.

Comment 4 Robert Laverick 2009-09-30 17:13:40 UTC
I've tested with the updated version of xorg-x11-drv-ati you suggested which didn't help, but the kernel update didn't show up in my yum update yet so I'll wait till it shows up (hopefully tomorrow) and re-test with that.

Comment 5 Robert Laverick 2009-09-30 22:07:33 UTC
I've updated to the kernel and the ati drivers, however with KMS enabled I get a fatal error during kernel init (before the "main" boot process begins)

I've transcribed the output and will attach below.

Comment 6 Robert Laverick 2009-09-30 22:10:50 UTC
Created attachment 363252 [details]
kernel output during fatal error with kernel 2.6.31.1-56.x86_64 & kms

this is transcribed manually, although I've tried the best I can to get everything right if something doesn't make sense let me know and I'll try and double check it

Comment 7 Robert Laverick 2009-10-05 22:04:25 UTC
(In reply to comment #3)
> please give kernel-2.6.31.1-56 and
> xorg-x11-drv-ati-6.13.0-0.6.20090929git7968e1fb8.fc12.i686
> with modesetting enabled.  

I've been trying to get a propper console log output of this crash, however when I enable serial console (replace quite with console=ttyS0,115200 in the kernel params) the plymouth error doesn't occur, however I get something very similar with Xorg.

I've logged the whole console output, and I've attached below.  Relevant section starts at line 718

As you can see the system hasn't actually frozen, it's only the keyboard and display that have frozen, from the serial console I can still shutdown the system normally (once I remember the right command :-) )

Does this indicate it's an drv-ati bug and should therefor be re-categorized xorg-x11-drv-ati?

Comment 8 Robert Laverick 2009-10-05 22:05:24 UTC
Created attachment 363749 [details]
serial console output

Comment 9 Dave Airlie 2009-10-06 06:15:32 UTC
okay this is one of those x2 cards by the looks of it, not really sure whats going on there, how many outputs does that card have?

does boot with radeon.tv=0 help?

Comment 10 Robert Laverick 2009-10-06 07:39:03 UTC
2xDVI out 1xTV out

I've included a link to the specs of card in question below, I'll run the test with radeon.tv=0 and let you know how it gets on.

http://www.xfxforce.com/en-gb/products/graphiccards/HD%204000series/4870X2.aspx

Comment 11 Robert Laverick 2009-10-06 21:24:21 UTC
radeon.tv=0 puts the error back into the kernel's init again, with the following oops output from serial console which I've attached below.

Line 510
========

[drm] radeon: kernel modesetting successfully initialized.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000680
IP: [<ffffffffa001d763>] drm_sysfs_connector_add+0x35/0x1d3 [drm]
PGD 252518067 PUD 252d19067 PMD 0 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/devices/platform/radeon_cp.0/firmware/radeon_cp.0/loading
CPU 0

Comment 12 Robert Laverick 2009-10-06 21:25:19 UTC
Created attachment 363902 [details]
Kernel output with radeon.tv=0 and kms enabled

Comment 13 Robert Laverick 2009-10-07 20:47:00 UTC
Created attachment 364038 [details]
serial console log from another machine with an ATI card

I've manage to get my hands on another machine to test on, this uses a slightly different setup, it's an ATI card, tho a much older AGP one, but it's also a tyan dual opteron motherboard (tho older and slower) and I get another oops when xorg starts.

This is with the 20091001.15.iso live CD which has the same kernel and graphics drivers that I have on the R700 test box (this other machine is my wifes so I've not installed rawhide over her precious oblivion save games :-) )

Seems interesting that this other machine does it with a much different ATI card, perhapse it's something related to real multi processor machines?   I really don't know, but here's the serial console output I hope this can help you work out what's going on...

Comment 14 Robert Laverick 2009-10-09 20:02:14 UTC
I see there's been quite a significant patch posted for kms for kernel 2.6.32

http://sourceforge.net/mailarchive/forum.php?thread_name=alpine.DEB.2.00.0910080648450.5562%40skynet.skynet.ie&forum_name=dri-devel

which does seem to mention some initlization fixes for the R7xx kms code among a host of others (3000 lines!  wow someones been busy :-) )

I don't pretend to know anything about such low level programming, but is there a recomended way I can test patchsets like that in rawhide, or you consider testing such patchsets a waste of time if it'll be comming downstream at some point as it's merged to 2.6.32 (assuming such a large patch makes it in at the rc stage)

I really am looking to help here, and don't mind getting my hands dirty (hell I'd even learn C to get this working :-) I'm real excited about open graphics drivers finally hitting linux)

Rob

Comment 15 Adam Williamson 2009-10-09 20:32:21 UTC
Rawhide's kernel already uses the drm-next tree, which is rather ahead of drm-linus AIUI, and drm-linus is what was getting pulled into main there. So I'm almost sure Rawhide's kernel would have that stuff already. please correct me if I'm wrong, Dave!

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 16 Robert Laverick 2009-10-11 13:47:37 UTC
ahh, ok, well if there is anything else I can do to help, code to test with more debugging output, or anything experimental that needs testing feel free to give me a shout, I've been using fedora for a while and feel it's really time I started to give more back to the community.

Rob

Comment 17 Adam Williamson 2009-10-13 19:47:35 UTC
robert: don't worry, we (or rather Dave and Chuck) will :)

in the mean time, some references:

https://fedoraproject.org/wiki/QA/Join
http://fedoraproject.org/join-fedora

of course, you're already doing item #1 on the QA/Join page, so thanks for that =)

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 18 Jérôme Glisse 2009-10-14 10:45:42 UTC
I don't think 2.6.31-48 did have the r600/r700 fixes can you try with lastest fedora 12 kernel please ?

Comment 19 Robert Laverick 2009-10-14 13:32:29 UTC
(In reply to comment #18)
> I don't think 2.6.31-48 did have the r600/r700 fixes can you try with lastest
> fedora 12 kernel please ?  

Hi Jerome, 

I've already updated to kernel-2.6.31.1-56 which is the kernel that has been outputting the oops logs that I've attached (comments 6, 8 and 12)
Sorry I didn't make that clearer, since the problems have been since 2.6.31.1-48 should I leave the title as is, or update it to indicate the latest version I have tried unsuccessfully? I'm kinda new to this bugzilla stuff and haven't got the etiquette down yet so appreciate any pointers!

Comment 20 Jérôme Glisse 2009-10-14 14:16:33 UTC
ok i didn't notice this is a dual GPU card which might explain the issue, the kernel likely get confuse. I will look into the issue soon (need to ask for the hw too ;)). You can rename bug title to start with RADEON:R770:HD4870X2 (but no big deal) i am just discussing with other on the opportunity to have such kind of string.

Comment 21 Robert Laverick 2009-10-14 14:46:40 UTC
Jerome: If there's anything I can do/run/download to test with my hardware I'm more than happy to help, the box I'm testing this on is mostly a gaming box and has a serial cable setup to another box, so if it'd be of use I can always setup ssh access for you, tho looking at what comes out of the monitor would probably be problematic for you! (maybe a webcam :-) )

Comment 22 Robert Laverick 2009-10-21 20:16:32 UTC
I've updated to kernel 2.6.31.4-88 from Koji

http://koji.fedoraproject.org/koji/buildinfo?buildID=137561

I no longer get an oops, but it seems not to be working (I get the text mode rhgb (tho in high res text) and I get the following lines in dmesg.

[drm] Initialized drm 1.1.0 20060810
[drm] radeon defaulting to kernel modesetting.
[drm] radeon kernel modesetting enabled.
radeon 0000:04:00.0: PCI INT A -> Link[LNK3] -> GSI 19 (level, high) -> IRQ 19
radeon 0000:04:00.0: setting latency timer to 64
[drm] radeon: Initializing kernel modesetting.
[drm] register mmio base: 0xA0200000
[drm] register mmio size: 65536
ATOM BIOS: R700
[drm] Clocks initialized !
[drm] Detected VRAM RAM=256M, BAR=256M
[drm] RAM width 128bits DDR
[TTM] Zone  kernel: Available graphics memory: 4039144 kiB.
[TTM] Zone   dma32: Available graphics memory: 2097152 kiB.
[drm] radeon: 256M of VRAM memory ready
[drm] radeon: 512M of GTT memory ready.
[drm] Loading RV770 CP Microcode
platform radeon_cp.0: firmware: requesting radeon/RV770_pfp.bin
platform radeon_cp.0: firmware: requesting radeon/RV770_me.bin
[drm] GART: num cpu pages 131072, num gpu pages 131072
[drm] ring test succeeded in 1 usecs
[drm] radeon: ib pool ready.
[drm] ib test succeeded in 0 usecs
[drm] Radeon Display Connectors
[drm] Connector 0:
[drm]   DVI-I
[drm]   DDC: 0x7e60 0x7e60 0x7e64 0x7e64 0x7e68 0x7e68 0x7e6c 0x7e6c
[drm]   Encoders:
[drm]     DFP1: INTERNAL_UNIPHY
[drm]     CRT2: INTERNAL_KLDSCP_DAC2
[drm] Connector 1:
[drm]   DIN
[drm]   Encoders:
[drm]     TV1: INTERNAL_KLDSCP_DAC2
[drm] Connector 2:
[drm]   DVI-I
[drm]   DDC: 0x7e20 0x7e20 0x7e24 0x7e24 0x7e28 0x7e28 0x7e2c 0x7e2c
[drm]   Encoders:
[drm]     CRT1: INTERNAL_KLDSCP_DAC1
[drm]     DFP2: INTERNAL_KLDSCP_LVTMA
[drm] fb mappable at 0xB0141000
[drm] vram apper at 0xB0000000
[drm] size 7680000
[drm] fb depth is 24
[drm]    pitch is 6400
executing set pll
executing set crtc timing
[drm] TMDS-9: set mode 1600x1200 27
Console: switching to colour frame buffer device 200x75
fb0: radeondrmfb frame buffer device
registered panic notifier
[drm] Initialized radeon 2.0.0 20080528 for 0000:04:00.0 on minor 0
work_for_cpu used greatest stack depth: 2864 bytes left
radeon 0000:05:00.0: PCI INT A -> Link[LNK3] -> GSI 19 (level, high) -> IRQ 19
radeon 0000:05:00.0: setting latency timer to 64
[drm] radeon: Initializing kernel modesetting.
[drm] register mmio base: 0xA0300000
[drm] register mmio size: 65536
[drm:radeon_driver_load_kms] *ERROR* Fatal error while trying to initialize radeon.
radeon 0000:05:00.0: PCI INT A disabled
radeon: probe of 0000:05:00.0 failed with error -22


I'll keep trying kernels from koji every so often, if there's a specific build that might be relevant and you want me to try just say the word.

Comment 23 Adam Williamson 2009-10-21 23:01:25 UTC
of course check for xorg-x11-drv-ati updates as well as kernel updates...

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 24 Jérôme Glisse 2009-10-22 08:12:28 UTC
We don't support X2 card properly yet, i am waiting for hw to fix that, i will ping you once i had a chance to push a fix that might work for you.

Comment 25 Adam Williamson 2009-10-22 21:37:11 UTC
jerome: should I add a 'commonbugs' note to use nomodeset if you have an X2 card? is it a sensible workaround?

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 26 Jérôme Glisse 2009-10-23 07:45:57 UTC
Yeah for now i think it makes sense to list this as a common bug, don't know when we will have time to look into it.

Comment 27 Edouard Bourguignon 2009-10-24 14:23:21 UTC
I'm not sure it's the same bug (can't see errors in /var/log/messages) but with my RV770 [Radeon HD 4870] on f12 beta the screen is blank on boot. Have to magic syskeys to reboot. With nomodsetting it works, same if I remove rhgb on the cmdline.

Comment 28 Robert Laverick 2009-10-24 15:56:20 UTC
(In reply to comment #27)
> I'm not sure it's the same bug (can't see errors in /var/log/messages) but with
> my RV770 [Radeon HD 4870] on f12 beta the screen is blank on boot. Have to
> magic syskeys to reboot. With nomodsetting it works, same if I remove rhgb on
> the cmdline.  

The errors were in dmesg for me, also which versions of kernel and the ati driver are you using?

rpm -qa kernel* xorg-x11-drv-ati*

I found that updating to one of the ones not yet in rawhide from koji does at least allow me to get into xorg with KMS set, but seems to cause me odd problems with other things (sound doesn't via my SB X-Fi work when KMS is enabled for example)

You might be able to ssh into the system and get some dmesg output that way, also you want to remove the quiet kernel option if it's hanging early in the process so you can see the kernel output.

Comment 29 Edouard Bourguignon 2009-10-25 14:08:32 UTC
It's true, with latest kernel update (2.6.31.5-96.fc12.x86_64) I can now boot without nomodsetting option. But sound card (Creative Labs SB0400 Audigy2 Value) is not working (it was on F11), don't know if it is related.

Comment 30 Robert Laverick 2009-10-25 21:14:04 UTC
Seems like an interesting coincidence that we're both having the same (or at least similar) problems with the audigy as well as the gfx card, I'm going to try taking the sound card out and see how that changes things.

Comment 31 Robert Laverick 2009-10-25 21:34:42 UTC
I'd be interested to know if you also experience bug 526592 as well (desktop effect not working when starting with nomodeset kernel option

Comment 32 Robert Laverick 2009-10-26 00:10:09 UTC
ok, so removing the sound card changes nothing, also I've noticed that on a fresh install dri is no longer supported by default as it's been moved into the package mesa-dri-drivers-experimental which isn't installed by default.  also it seems that without that sound does work without needing to add nomodeset to the kernel.

I'm seeing quite a lot of corruption in 2d graphics now, which I guess I need to report as another ticket since the problems in this one seem to be all related to code which is now marked as experimental so I'll quit fiddling with it for the time being, or at least going on about it.  :-)

Comment 33 Edouard Bourguignon 2009-10-26 07:35:35 UTC
Thank you Robert for all this information you share. The 2d is quite fast but I also have a lot of corruption (particularly in firefox). If you open that other ticket I would be pleased to CC on it.

Btw I'd like to test the desktop effects but we have to install catalyst driver for that? Have a really bad experience (hard freeze) with fglrx since 9.7.

It's really strange about the sound card, pavumeter shows something, everything seems ok, just have no output to the speakers...

Comment 34 Robert Laverick 2009-10-26 08:16:49 UTC
If you want to play with 3d accel, just install the mesa-dri-drivers-experimental package

yum install mesa-dri-drivers-experimental

I seems to work really well, I've not seen many significant problems with it to be honest, a few minor things (at least in terms of desktop effects).  As it says on the tin it's considered experimental and so isn't guaranteed to be problem free, and I guess it's been marked experimental because they don't have time to work all the bugs out before F12 goes live.

I'll try and get a handle on the 2d corruption at some point tonight (GMT) and get a decent ticket posted, tho there is already a ticket in the system specifically for the Firefox issues bug 522985.

Comment 35 Robert Laverick 2009-10-26 18:56:02 UTC
(In reply to comment #33)
> Thank you Robert for all this information you share. The 2d is quite fast but I
> also have a lot of corruption (particularly in firefox). If you open that other
> ticket I would be pleased to CC on it.

https://bugzilla.redhat.com/show_bug.cgi?id=531065

I've added this ticket with some screenshots of the problems I've been seening, there are probably a few other times I've seen things (like some font coruption) that I didn't manage to catch in screenshots, if you have any other examples, feel free to attache them onto that case.

Comment 36 Adam Williamson 2009-10-26 20:47:56 UTC
Robert: um, so the problematic X2 card actually works OK with modesetting with kernel 96?

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 37 Robert Laverick 2009-10-27 01:16:23 UTC
it's better certainly, but it still has a few issues, I'll spend some time tomorrow night testing all the different things I've had trouble with, and see if I can work out what's what, I think kms causes problems with my sound card, and does show some errors in dmesg, but I'll update again tomorrow with a more detail current status.

Comment 38 Robert Laverick 2009-10-28 00:34:54 UTC
Created attachment 366363 [details]
Odd background colour in Nautilus with KMS

ok, I've done some testing with 96 and KMS (and the experimental mesa stuff too) and all seems pretty good.

a few things to mention:

1. Still getting the errors mentioned in Comment 22, also seeing this message during shutdown too, tho during shutdown I don't get anything on screen other than a black screen with a white flashing cursor

[drm:radeon_driver_load_kms] *ERROR* Fatal error while trying to initialize

2. Still getting occasional 2d graphics corruption, tho that seems to be regardless of KMS, so bug 531065 seems to be the best place for that.

3. One thing that does only happen with KMS is that when browsing folders, I occasionally get one that opens with a turquoise background, I can't see any reason for this, and closing/reopening the folder clears it back the the normal white, I'll include a screenshot so you can see what I mean, seems very odd tho, as if the pallet has gone wrong somehow, but only in one window? *shrug*

4. quake3 works, and works well, 120fps with KMS enabled (134fps without KMS) BUT when KMS is enabled fullscreen leaves a 1px gap through which you can see the desktop at the right and bottom, doesn't happen with nomodeset parameter.  (fantastic to see it run with OSS drivers tho!)

5. I'm no longer having any issues with my x-fi card with or without KMS.

I must say I'm impressed with all this to no end.

PS I hope I haven't forgotten anything I typed something very similar to this but when trying to enable spellcheck in firefox I dumbly allowed it to restart and lost it!

Comment 39 Adam Williamson 2009-10-28 02:00:05 UTC
I think we can close this bug then, as it's not freezing any more. :) Thanks a lot for the feedback. Please do file separate bugs for any of the issues you mentioned which you think would benefit from being tracked as bugs.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers


Note You need to log in before you can comment on or make changes to this bug.