Bug 680651

Summary: [Redwood][RV280][RV200] oops Radeon ttm_bo_ref_bug
Product: [Fedora] Fedora Reporter: John Reiser <jreiser>
Component: xorg-x11-drv-atiAssignee: Dave Airlie <airlied>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 15CC: airlied, robert.l.kief, xgl-maint
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-2.6.38.8-32.fc15 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-06-25 06:52:08 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
traceback (photo of text console)
none
Xorg.0.log after reboot
none
output from 'dmesg' after reboot
none
/var/log/messages after reboot
none
serial console log
none
console photo of BUG with traceback and info none

Description John Reiser 2011-02-26 18:25:10 UTC
Created attachment 481169 [details]
traceback (photo of text console)

Description of problem: kernel oops some time after XFCE screensaver starts


Version-Release number of selected component (if applicable):
xorg-x11-drv-ati-6.14.0-2.20110204gita27b5dbd9.fc15.i686


How reproducible: unknown


Steps to Reproduce:
1. Install F15 alpha XFCE Desktop from Live CD
2. yum update
3. login and wait
  
Actual results: kernel oops implicating ttm_bo_ref; will attach photo of console showing traceback


Expected results: no bug/oops


Additional info:01:00.0 VGA compatible controller [0300]: ATI Technologies Inc Radeon RV200 QW [Radeon 7500] [1002:5157]

Comment 1 John Reiser 2011-02-26 18:26:57 UTC
Created attachment 481170 [details]
Xorg.0.log after reboot

Comment 2 John Reiser 2011-02-26 18:27:39 UTC
Created attachment 481171 [details]
output from 'dmesg' after reboot

Comment 3 John Reiser 2011-02-26 18:28:08 UTC
Created attachment 481172 [details]
/var/log/messages after reboot

Comment 4 John Reiser 2011-02-27 19:01:24 UTC
Created attachment 481267 [details]
serial console log

This is reproducible.  XFCE Desktop > Applications > Preferences > Screensaver;  Random screensaver (leave default selections checked), Blank after 2 minutes, Cycle after 1 minute; Advanced tab > Power saver off (want screensaver to loop forever, trying a random screensaver once per minute); File > Blank now.  Wait; in this particular case, 7 minutes.

Comment 5 John Reiser 2011-02-27 19:06:51 UTC
Recursive oops, 2 deep, the first one is:
[  346.124012] kernel BUG at drivers/gpu/drm/ttm/ttm_bo.c:272!
[  346.124012] invalid opcode: 0000 [#1] SMP 
[  346.124012] last sysfs file: /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card0/card0-VGA-1/status
[  346.124012] Modules linked in: sco bnep l2cap bluetooth rfkill asb100 hwmon_vid sunrpc 8021q garp stp llc p4_clockmod ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables snd_cmipci gameport snd_seq snd_pcm snd_page_alloc snd_opl3_lib snd_timer snd_hwdep snd_mpu401_uart snd_rawmidi microcode snd_seq_device iTCO_wdt e100 snd soundcore mii i2c_i801 iTCO_vendor_support uinput ipv6 uas usb_storage radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: mperf]
[  346.124012] 
[  346.124012] Pid: 288, comm: kworker/0:3 Not tainted 2.6.38-0.rc6.git4.1.fc15.i686 #1 System Manufacturer System Name/P4B266
[  346.124012] EIP: 0060:[<f78de2e7>] EFLAGS: 00010202 CPU: 0
[  346.124012] EIP is at ttm_bo_ref_bug+0x8/0xa [ttm]
[  346.124012] EAX: f23b5254 EBX: 00000001 ECX: f78de2df EDX: 00000054
[  346.124012] ESI: f78de2df EDI: f23b5254 EBP: f44afeec ESP: f44afeec
[  346.124012]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[  346.124012] Process kworker/0:3 (pid: 288, ti=f44ae000 task=f66ebed0 task.ti=f44ae000)
[  346.124012] Stack:
[  346.124012]  f44aff00 c05cb8b8 f78de2df f23b522c 00000000 f44aff0c f78dec94 f5167380
[  346.124012]  f44aff2c f78df0a3 00000000 00000000 0000b8fb f23b5200 f66c2440 f5403240
[  346.124012]  f44aff48 f79d7e45 00000000 00000000 00000000 f2323b00 f66c2440 f44aff5c
[  346.124012] Call Trace:
[  346.124012]  [<c05cb8b8>] kref_sub+0x3c/0x46
[  346.124012]  [<f78de2df>] ? ttm_bo_ref_bug+0x0/0xa [ttm]
[  346.124012]  [<f78dec94>] ttm_bo_list_ref_sub+0x22/0x25 [ttm]
[  346.124012]  [<f78df0a3>] ttm_bo_reserve+0x63/0x6d [ttm]

Comment 6 John Reiser 2011-03-01 04:47:54 UTC
This is reproducible on a recent card, too:

01:00.0 VGA compatible controller [0300]: ATI Technologies Inc Redwood PRO [Radeon HD 5500 Series] [1002:68da]
[    19.239] (--) PCI:*(0:1:0:0) 1002:68da:1682:3083 rev 0, Mem @ 0xd0000000/268435456, 0xfdfc0000/131072, I/O @ 0x0000bc00/256, BIOS @ 0x????????/131072
[    19.274] (--) RADEON(0): Chipset: "ATI Radeon HD 5500 Series" (ChipID = 0x68da)

The key is to boot with only 1GB of memory:  add " mem=1023m" to the kernel boot command line [note all lower-case.]  The kernel was 2.6.38-0.rc6.git4.1.fc15.i686 [and not -PAE], the CPU is
processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 15
model		: 107
model name	: AMD Athlon(tm) 64 X2 Dual Core Processor 4800+
stepping	: 2
cpu MHz		: 2499.649
cache size	: 512 KB

It took about 12 minutes to crash.

Comment 7 John Reiser 2011-04-09 15:27:35 UTC
Created attachment 490973 [details]
console photo of BUG with traceback and info

Another card, another crash, this time on [RV280]:
[    52.794] (--) RADEON(0): Chipset: "ATI Radeon 9250 5960 (AGP)" (ChipID = 0x5960)

with F15 beta RC2 LiveCD:
Linux version 2.6.38.2-9.fc15.i686 (mockbuild.fedoraproject.org) (gcc version 4.6.0 20110329 (Red Hat 4.6.0-1) (GCC) ) #1 SMP Wed Mar 30 16:54:01 UTC 2011

and this Xserver:
X.Org X Server 1.10.0
Release Date: 2011-2-25
[    52.282] Build Date: 30 March 2011  08:08:44PM
[    52.282] Build ID: xorg-x11-server 1.10.0-7.fc15
[    52.282] Current version of pixman: 0.20.2

and this visible symptom:
BUG: Unable to handle kernel paging request at fffffffc

Comment 8 John Reiser 2011-04-10 17:52:32 UTC
Added [Redwood][RV280] tags adjacent to [RV200] in Summary, reflecting Comment #6 and Comment #7.  Such a wide range of cards suggests that the bug is in [ttm] independent of which card, particularly because one common feature is low kernel RAM (1023m of Comment #6.)

Comment 9 Dave Airlie 2011-05-29 10:21:17 UTC
I've built a scratch kernel at 

it will end up at:
http://kojipkgs.fedoraproject.org/scratch/airlied/task_3098801/

it should be finished in a few hours, though it may disappear in a few days.

Comment 10 John Reiser 2011-05-29 19:44:10 UTC
Good news: kernel-2.6.38.7-30.bz680651.fc15.i686 survived 110 minutes [normal termination and no OOPS] running xscreensaver under XFCE on a 1GB machine with RV280 (PCI 1002:5960:174b:0130).  Also:
xorg-x11-drv-ati-6.14.1-1.20110504gita6d2dba6.fc15.i686
mesa-dri-drivers-7.11-0.11.20110525.0.fc15.i686
xorg-x11-server-Xorg-1.10.1-14.fc15.i686

The previous non-patched kernel-2.6.38.6-27.fc15.i686 was getting an OOPS [Unable to handle kernel paging request at fffffffc] after about half an hour.

Comment 11 Chuck Ebbert 2011-06-04 15:33:55 UTC
Fixed in 2.6.38.8-31

Comment 12 Chuck Ebbert 2011-06-25 06:50:56 UTC
*** Bug 716495 has been marked as a duplicate of this bug. ***