Bug 717071

Summary: [RV770] 3.0-0.rc4.git3.1.fc16.x86_64 radeon GPU lockup then crash
Product: [Fedora] Fedora Reporter: Nicolas Mailhot <nicolas.mailhot>
Component: xorg-x11-drv-atiAssignee: Jérôme Glisse <jglisse>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 19CC: elad, gansalmon, itamar, jeff.raber, jonathan, kernel-maint, madhu.chinakonda, mcepl, xgl-maint
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: [cat:crash]
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-02-17 13:47:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
xorg logs
none
system logs
none
dmesg
none
new system logs
none
xorg config
none
system logs
none
dmesg
none
lspci
none
xorg logs
none
working xorg logs with xorg-x11-drv-ati-6.14.0-9.20110316gitcdfc007ec.fc16.x86_64
none
failing xorg logs with xorg-x11-drv-ati-6.14.1-2.20110525gitfe5c42f51.fc15.x86_64 none

Description Nicolas Mailhot 2011-06-27 21:07:18 UTC
Created attachment 510167 [details]
xorg logs

Something in last weekend's rawhide made this system HD 4850 real unhappy

It seems the default mode is now broken :

1. in bios/grub either the screen will refuse the mode or if it accepts it will show the screens with various ascii characters moving around in white/green

2. in the kernel framebuffer: mode change works, but green (with magenta tinges) horizontal dot traces move in a blocky pattern around the screen

3. in xorg : trying to launch xorg (either via gdm or startx) locks up the system

Attaching various system logs, rescued via ssh from a windows box (so don't expect any filtering, raw logs only)

I don't have the free time to update daily lately and each crash costs hours of md resync so the breakage may have been introduced in earlier versions (or maybe it was always here and rawhide gnome-shell just poke where it should not)

System is up-to-date as of now, koji builds included (everything that yum accepted to update)

There have been a cold spell here so temperatures were under average when the system broke and it was not overheating (sensors report 75°C for the GPU, well below what it can take). memtest86 passes with no errors (single pass yesterday)

The 65.3 kHz EDID mode looks suspicious (the resolution is right)

Jun 27 22:19:32 arekh kernel: [  130.771039] CP stall for more than 10020msec
Jun 27 22:19:32 arekh kernel: [  130.771043] ------------[ cut here ]------------
Jun 27 22:19:32 arekh kernel: [  130.771067] WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:267 radeon_fence_wait+0x296/0x357 [radeon]()
Jun 27 22:19:32 arekh kernel: [  130.771070] Hardware name: EP45-DS5
Jun 27 22:19:32 arekh kernel: [  130.771073] GPU lockup (waiting for 0x00000006 last fence id 0x00000003)
Jun 27 22:19:32 arekh kernel: [  130.771075] Modules linked in: fuse ppdev parport_pc lp parport it87 hwmon_vid coretemp ip6t_REJECT nf_conntrack_ipv6 ip6table_filter xt_state ipt_MASQUERADE ipt_REDIRECT xt_owner iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack xt_TPROXY nf_tproxy_core xt_socket nf_defrag_ipv4 ip6_tables nf_defrag_ipv6 xt_mark iptable_mangle cpufreq_ondemand acpi_cpufreq freq_table mperf snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_event snd_seq_midi_emul tuner_simple tuner_types wm8775 tda9887 tda8290 tuner cx25840 joydev snd_emu10k1 microcode snd_hda_codec_hdmi ivtv cx2341x i2c_i801 snd_rawmidi snd_hda_intel snd_hda_codec snd_ac97_codec v4l2_common serio_raw videodev pcspkr media snd_seq v4l2_compat_ioctl32 tveeprom ac97_bus iTCO_wdt iTCO_vendor_support snd_seq_device snd_util_mem snd_hwdep snd_pcm snd_timer snd soundcore snd_page_alloc r8169 mii uinput raid1 firewire_ohci pata_acpi firewire_core ata_generic crc_itu_t pata_jmicron radeon ttm drm_kms_helper drm i2c_algo_bit i
Jun 27 22:19:32 arekh kernel: 2c_core [last unloaded: scsi_wait_scan]
Jun 27 22:19:32 arekh kernel: [  130.771168] Pid: 1698, comm: X Not tainted 3.0-0.rc4.git3.1.fc16.x86_64 #1
Jun 27 22:19:32 arekh kernel: [  130.771170] Call Trace:
Jun 27 22:19:32 arekh kernel: [  130.771178]  [<ffffffff81057b14>] warn_slowpath_common+0x83/0x9b
Jun 27 22:19:32 arekh kernel: [  130.771182]  [<ffffffff81057bcf>] warn_slowpath_fmt+0x46/0x48
Jun 27 22:19:32 arekh kernel: [  130.771210]  [<ffffffffa00b6ff3>] ? r600_gpu_is_lockup+0xbd/0xc6 [radeon]
Jun 27 22:19:32 arekh kernel: [  130.771230]  [<ffffffffa0090f45>] radeon_fence_wait+0x296/0x357 [radeon]
Jun 27 22:19:32 arekh kernel: [  130.771235]  [<ffffffff81074d54>] ? __init_waitqueue_head+0x4b/0x4b
Jun 27 22:19:32 arekh kernel: [  130.771258]  [<ffffffffa009155a>] radeon_sync_obj_wait+0x11/0x13 [radeon]
Jun 27 22:19:32 arekh kernel: [  130.771267]  [<ffffffffa004b6a0>] ttm_bo_wait+0xbd/0x179 [ttm]
Jun 27 22:19:32 arekh kernel: [  130.771292]  [<ffffffffa00a1ef1>] radeon_bo_wait+0x7b/0xa4 [radeon]
Jun 27 22:19:32 arekh kernel: [  130.771317]  [<ffffffffa00a245a>] radeon_gem_wait_idle_ioctl+0x3d/0x70 [radeon]
Jun 27 22:19:32 arekh kernel: [  130.771331]  [<ffffffffa001a8e2>] drm_ioctl+0x2a4/0x386 [drm]
Jun 27 22:19:32 arekh kernel: [  130.771355]  [<ffffffffa00a241d>] ? radeon_gem_busy_ioctl+0x86/0x86 [radeon]
Jun 27 22:19:32 arekh kernel: [  130.771361]  [<ffffffff812108ff>] ? inode_has_perm+0x6a/0x77
Jun 27 22:19:32 arekh kernel: [  130.771365]  [<ffffffff812109b3>] ? file_has_perm+0xa7/0xc9
Jun 27 22:19:32 arekh kernel: [  130.771370]  [<ffffffff811463dc>] do_vfs_ioctl+0x47b/0x4bc
Jun 27 22:19:32 arekh kernel: [  130.771374]  [<ffffffff81146473>] sys_ioctl+0x56/0x7b
Jun 27 22:19:32 arekh kernel: [  130.771379]  [<ffffffff814f9e82>] system_call_fastpath+0x16/0x1b
Jun 27 22:19:32 arekh kernel: [  130.771383] ---[ end trace b32a929e3e5ce906 ]---
Jun 27 22:19:32 arekh kernel: [  130.787534] radeon 0000:01:00.0: GPU softreset




Additional info:

Comment 1 Nicolas Mailhot 2011-06-27 21:09:48 UTC
Created attachment 510171 [details]
system logs

Comment 2 Elad Alfassa 2011-06-28 05:23:18 UTC
Thanks for the bug report.  We have reviewed the information you have provided above, and there is some additional information we require that will be helpful in our diagnosis of this issue.

Please add drm.debug=0x04 to the kernel command line, restart computer, and attach

* your X server config file (/etc/X11/xorg.conf, if available),
* X server log file (/var/log/Xorg.*.log)
* output of the dmesg command, and
* system log (/var/log/messages)

to the bug report as individual uncompressed file attachments using the bugzilla file attachment link above.

We will review this issue again once you've had a chance to attach this information.

Thanks in advance.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 3 Nicolas Mailhot 2011-06-28 06:17:11 UTC
Created attachment 510203 [details]
dmesg

Comment 4 Nicolas Mailhot 2011-06-28 06:19:29 UTC
Created attachment 510205 [details]
new system logs

Comment 5 Nicolas Mailhot 2011-06-28 06:20:29 UTC
Created attachment 510206 [details]
xorg config

Comment 6 Nicolas Mailhot 2011-06-28 06:22:35 UTC
Requested info. No new xorg logs as the system crashed too fast for them to be written to disk. (wrote unable to handle null pointer deference to console for dbus with lots of lines about rcu, sadly my card reader is broken right now so I can provide a screen shot)

Comment 7 Nicolas Mailhot 2011-06-28 18:24:56 UTC
with this evening new kernel and gdm packages the system boots and can start gdm without locking up the system, so I have full logs

However the framebuffer console is still completely parasited by green/magenta traces and X mode is completely unusable (full screen so parasited by green/magenta artifacts nothing can be read)

Comment 8 Nicolas Mailhot 2011-06-28 18:25:59 UTC
Created attachment 510333 [details]
system logs

Comment 9 Nicolas Mailhot 2011-06-28 18:26:44 UTC
Created attachment 510335 [details]
dmesg

Comment 10 Nicolas Mailhot 2011-06-28 18:27:45 UTC
Created attachment 510336 [details]
lspci

Comment 11 Nicolas Mailhot 2011-06-28 18:28:27 UTC
Created attachment 510338 [details]
xorg logs

Comment 12 Nicolas Mailhot 2011-07-09 12:18:14 UTC
After investigating more it appears one of the gddr heatsinks on the gfx card has fallen of so this gddr was propaly cooked and unreliable (shougln't crash the kernel though)

with a brand new gfx card and
xorg-x11-drv-ati-6.14.0-9.20110316gitcdfc007ec.fc16.x86_64
xorg works as it should.

Whith rawhide latest xorg-x11-drv-ati, though, xorg still crashes on startup

Comment 13 Nicolas Mailhot 2011-07-09 12:20:14 UTC
Created attachment 512040 [details]
working xorg logs with xorg-x11-drv-ati-6.14.0-9.20110316gitcdfc007ec.fc16.x86_64

Comment 14 Nicolas Mailhot 2011-07-09 12:31:34 UTC
Created attachment 512041 [details]
failing xorg logs with xorg-x11-drv-ati-6.14.1-2.20110525gitfe5c42f51.fc15.x86_64

According to rpm rules
xorg-x11-drv-ati-6.14.1-2.20110525gitfe5c42f51.fc15.x86_64 is newer than
xorg-x11-drv-ati-6.14.0-9.20110316gitcdfc007ec.fc16.x86_64

Comment 15 Matěj Cepl 2011-08-08 12:12:48 UTC
(In reply to comment #14)
> Created attachment 512041 [details]
> failing xorg logs with
> xorg-x11-drv-ati-6.14.1-2.20110525gitfe5c42f51.fc15.x86_64
> 
> According to rpm rules
> xorg-x11-drv-ati-6.14.1-2.20110525gitfe5c42f51.fc15.x86_64 is newer than
> xorg-x11-drv-ati-6.14.0-9.20110316gitcdfc007ec.fc16.x86_64

---------------------------------
In function RADEONScreenInit_KMS:
(from frame 3: /usr/lib64/xorg/modules/drivers/radeon_drv.so (0x7f56558f0000+0xd9e01) [0x7f56559c9e01])
968: 	    return FALSE;
969:     }
970: 
971:     xf86SetBlackWhitePixels(pScreen);
972: 
973:     if (pScrn->bitsPerPixel > 8) {
974: 	VisualPtr  visual;
975: 
976: 	visual = pScreen->visuals + pScreen->numVisuals;
977: 	while (--visual >= pScreen->visuals) {
978: >>>>>>> 	    if ((visual->class | DynamicClass) == DirectColor) {
979: 		visual->offsetRed   = pScrn->offset.red;
980: 		visual->offsetGreen = pScrn->offset.green;
981: 		visual->offsetBlue  = pScrn->offset.blue;
982: 		visual->redMask     = pScrn->mask.red;
983: 		visual->greenMask   = pScrn->mask.green;
984: 		visual->blueMask    = pScrn->mask.blue;
985: 	    }
986: 	}
987:     }
988: 


Frame 4: /usr/bin/Xorg (AddScreen+0x17f) [0x433bdf]
	/usr/src/debug/xorg-server-20110510/dix/dispatch.c:3906
	AddScreen
Frame 5: /usr/bin/Xorg (InitOutput+0x282) [0x479fb2]
	/usr/src/debug/xorg-server-20110510/hw/xfree86/common/xf86Init.c:739
	InitOutput
Frame 6: /usr/bin/Xorg (0x400000+0x22a39) [0x422a39]
	/usr/src/debug/xorg-server-20110510/dix/main.c:208
	main

Comment 16 Matěj Cepl 2011-08-08 12:14:42 UTC
  3448.481] Caught signal 11 (Segmentation fault). Server aborting

Comment 17 Jeff Raber 2011-08-09 03:43:54 UTC
The patch at: https://bugs.freedesktop.org/show_bug.cgi?id=39572
might help with the GPU lockup issue.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers

Comment 18 Jérôme Glisse 2012-02-21 21:59:55 UTC
Still an issue with f17/rawhide ?

Comment 19 Nicolas Mailhot 2012-02-22 12:33:57 UTC
I haven't seen it for quite a while

Comment 20 Fedora End Of Life 2013-04-03 15:17:20 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19

Comment 21 Fedora End Of Life 2015-01-09 16:42:14 UTC
This message is a notice that Fedora 19 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 19. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained. Approximately 4 (four) weeks from now this bug will
be closed as EOL if it remains open with a Fedora 'version' of '19'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 19 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 22 Fedora End Of Life 2015-02-17 13:47:30 UTC
Fedora 19 changed to end-of-life (EOL) status on 2015-01-06. Fedora 19 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.