Bug 654634 - Radeon DRM driver hangs
Summary: Radeon DRM driver hangs
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 12
Hardware: x86_64
OS: Linux
low
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-11-18 13:42 UTC by Zoltan Boszormenyi
Modified: 2010-12-03 12:00 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-12-03 12:00:44 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Zoltan Boszormenyi 2010-11-18 13:42:51 UTC
Description of problem:

I have a system where a PC-based POS software is running 15 hours a day and occasionally I can see two "INFO: task ... blocked for more than 120 seconds." logged for the same second. With kernel version 2.6.32.21-168.fc12.x86_64
and older, the two tasks were my POS software communicating via the serial line
and the other was "task events/0:9" which must the the radeon DRM driver because (wild guess) nothing else uses TTM inside the system. The log entries
below were from the above specified kernel version:

Oct 28 17:32:37 localhost kernel: INFO: task events/0:9 blocked for more than 120 seconds.
Oct 28 17:32:37 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 28 17:32:37 localhost kernel: events/0      D ffff880001c15740     0     9      2 0x00000000
Oct 28 17:32:37 localhost kernel: ffff88003aabfcb0 0000000000000046 ffff88003aabfc78 ffff88003aabfc74
Oct 28 17:32:37 localhost kernel: 0000000000000000 0000000000000645 ffff88003aabffd8 ffff88003aabffd8
Oct 28 17:32:37 localhost kernel: ffff88003aa96158 000000000000f980 0000000000015740 ffff88003aa96158
Oct 28 17:32:37 localhost kernel: Call Trace:
Oct 28 17:32:37 localhost kernel: [<ffffffffa005dec3>] ttm_bo_wait_unreserved+0x10c/0x128 [ttm]
Oct 28 17:32:37 localhost kernel: [<ffffffff81074977>] ? autoremove_wake_function+0x0/0x39
Oct 28 17:32:37 localhost kernel: [<ffffffffa005df5b>] ttm_bo_reserve_locked+0x7c/0xce [ttm]
Oct 28 17:32:37 localhost kernel: [<ffffffffa005e916>] ttm_bo_cleanup_refs+0xdb/0x229 [ttm]
Oct 28 17:32:37 localhost kernel: [<ffffffffa005daac>] ? ttm_bo_release_list+0x0/0xcc [ttm]
Oct 28 17:32:37 localhost kernel: [<ffffffffa005eaf9>] ttm_bo_delayed_delete+0x95/0x100 [ttm]
Oct 28 17:32:37 localhost kernel: [<ffffffffa005eb83>] ttm_bo_delayed_workqueue+0x1f/0x35 [ttm]
Oct 28 17:32:37 localhost kernel: [<ffffffff8107034c>] worker_thread+0x1a9/0x237
Oct 28 17:32:37 localhost kernel: [<ffffffffa005eb64>] ? ttm_bo_delayed_workqueue+0x0/0x35 [ttm]
Oct 28 17:32:37 localhost kernel: [<ffffffff81074977>] ? autoremove_wake_function+0x0/0x39
Oct 28 17:32:37 localhost kernel: [<ffffffff810701a3>] ? worker_thread+0x0/0x237
Oct 28 17:32:37 localhost kernel: [<ffffffff8107468a>] kthread+0x7f/0x87
Oct 28 17:32:37 localhost kernel: [<ffffffff81012d6a>] child_rip+0xa/0x20
Oct 28 17:32:37 localhost kernel: [<ffffffff8107460b>] ? kthread+0x0/0x87
Oct 28 17:32:37 localhost kernel: [<ffffffff81012d60>] ? child_rip+0x0/0x20
Oct 28 17:32:37 localhost kernel: INFO: task penztargepd:1765 blocked for more than 120 seconds.
Oct 28 17:32:37 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 28 17:32:37 localhost kernel: penztargepd   D 0000000000000000     0  1765   1717 0x00000080
Oct 28 17:32:37 localhost kernel: ffff8800384e17e8 0000000000000082 0000000000000000 ffff880001c4fc08
Oct 28 17:32:37 localhost kernel: ffff880001c4fc30 0000000281047030 ffff8800384e1fd8 ffff8800384e1fd8
Oct 28 17:32:37 localhost kernel: ffff880038413298 000000000000f980 0000000000015740 ffff880038413298
Oct 28 17:32:37 localhost kernel: Call Trace:
Oct 28 17:32:37 localhost kernel: [<ffffffff814577df>] schedule_timeout+0x36/0xe3
Oct 28 17:32:37 localhost kernel: [<ffffffff8145765f>] wait_for_common+0xd6/0x131
Oct 28 17:32:37 localhost kernel: [<ffffffff81051e2e>] ? default_wake_function+0x0/0x14
Oct 28 17:32:37 localhost kernel: [<ffffffff8145775d>] wait_for_completion+0x1d/0x1f
Oct 28 17:32:37 localhost kernel: [<ffffffff81070b57>] flush_work+0x91/0xa5
Oct 28 17:32:37 localhost kernel: [<ffffffff810706a6>] ? wq_barrier_func+0x0/0x14
Oct 28 17:32:37 localhost kernel: [<ffffffff81070d07>] flush_delayed_work+0x55/0x5a
Oct 28 17:32:37 localhost kernel: [<ffffffff812b2d10>] tty_flush_to_ldisc+0x15/0x17
Oct 28 17:32:37 localhost kernel: [<ffffffff812ae9a8>] n_tty_poll+0x6a/0x153
Oct 28 17:32:37 localhost kernel: [<ffffffff812ab006>] tty_poll+0x63/0x7d
Oct 28 17:32:37 localhost kernel: [<ffffffff8112ce0d>] do_select+0x33a/0x549
Oct 28 17:32:37 localhost kernel: [<ffffffff8112d01c>] ? __pollwait+0x0/0xcc
Oct 28 17:32:37 localhost kernel: [<ffffffff8112d0e8>] ? pollwake+0x0/0x54
Oct 28 17:32:37 localhost kernel: [<ffffffff8112d0e8>] ? pollwake+0x0/0x54
Oct 28 17:32:37 localhost kernel: [<ffffffff81047ea4>] ? pick_next_task_fair+0xdb/0xea
Oct 28 17:32:37 localhost kernel: [<ffffffff8104fc1c>] ? finish_task_switch+0x48/0xb8
Oct 28 17:32:37 localhost kernel: [<ffffffff814573ff>] ? thread_return+0x8c/0xdb
Oct 28 17:32:37 localhost kernel: [<ffffffff8104fc1c>] ? finish_task_switch+0x48/0xb8
Oct 28 17:32:37 localhost kernel: [<ffffffff8106537b>] ? lock_timer_base+0x2c/0x52
Oct 28 17:32:37 localhost kernel: [<ffffffff81065414>] ? try_to_del_timer_sync+0x73/0x81
Oct 28 17:32:37 localhost kernel: [<ffffffff8106543b>] ? del_timer_sync+0x19/0x25
Oct 28 17:32:37 localhost kernel: [<ffffffff8145785c>] ? schedule_timeout+0xb3/0xe3
Oct 28 17:32:37 localhost kernel: [<ffffffff810654c9>] ? process_timeout+0x0/0x10
Oct 28 17:32:37 localhost kernel: [<ffffffff812d48bb>] ? serial8250_tx_empty+0x43/0x54
Oct 28 17:32:37 localhost kernel: [<ffffffff8112d94f>] core_sys_select+0x174/0x20a
Oct 28 17:32:37 localhost kernel: [<ffffffff812ac737>] ? tty_ioctl+0xa5/0x7ce
Oct 28 17:32:37 localhost kernel: [<ffffffff8112afdc>] ? vfs_ioctl+0x22/0x87
Oct 28 17:32:37 localhost kernel: [<ffffffff81017db1>] ? read_tsc+0x9/0x1b
Oct 28 17:32:37 localhost kernel: [<ffffffff8107c96e>] ? ktime_get_ts+0xb2/0xc0
Oct 28 17:32:37 localhost kernel: [<ffffffff8112dc29>] sys_select+0x96/0xc1
Oct 28 17:32:37 localhost kernel: [<ffffffff81011d32>] system_call_fastpath+0x16/0x1b

With kernel version 2.6.32.23-170.fc12.x86_64, today another lockup happened but the offending tasks are now "events/0:9" and Xorg:

Nov 18 13:43:36 localhost kernel: INFO: task events/0:9 blocked for more than 120 seconds.
Nov 18 13:43:36 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 18 13:43:36 localhost kernel: events/0      D 00000000ffffffff     0     9      2 0x00000000
Nov 18 13:43:36 localhost kernel: ffff88003aabfcb0 0000000000000046 ffff88003aabfc78 ffff88003aabfc74
Nov 18 13:43:36 localhost kernel: 0000000000000000 0000000000000400 ffff88003aabffd8 ffff88003aabffd8
Nov 18 13:43:36 localhost kernel: ffff88003aad03d0 000000000000f980 0000000000015740 ffff88003aad03d0
Nov 18 13:43:36 localhost kernel: Call Trace:
Nov 18 13:43:36 localhost kernel: [<ffffffffa005eec3>] ttm_bo_wait_unreserved+0x10c/0x128 [ttm]
Nov 18 13:43:36 localhost kernel: [<ffffffff810748cf>] ? autoremove_wake_function+0x0/0x39
Nov 18 13:43:36 localhost kernel: [<ffffffffa005ef5b>] ttm_bo_reserve_locked+0x7c/0xce [ttm]
Nov 18 13:43:36 localhost kernel: [<ffffffffa005f916>] ttm_bo_cleanup_refs+0xdb/0x229 [ttm]
Nov 18 13:43:36 localhost kernel: [<ffffffffa005faf9>] ttm_bo_delayed_delete+0x95/0x100 [ttm]
Nov 18 13:43:36 localhost kernel: [<ffffffffa005fb83>] ttm_bo_delayed_workqueue+0x1f/0x35 [ttm]
Nov 18 13:43:36 localhost kernel: [<ffffffff810702a4>] worker_thread+0x1a9/0x237
Nov 18 13:43:36 localhost kernel: [<ffffffffa005fb64>] ? ttm_bo_delayed_workqueue+0x0/0x35 [ttm]
Nov 18 13:43:36 localhost kernel: [<ffffffff810748cf>] ? autoremove_wake_function+0x0/0x39
Nov 18 13:43:36 localhost kernel: [<ffffffff810700fb>] ? worker_thread+0x0/0x237
Nov 18 13:43:36 localhost kernel: [<ffffffff810745e2>] kthread+0x7f/0x87
Nov 18 13:43:36 localhost kernel: [<ffffffff81012d6a>] child_rip+0xa/0x20
Nov 18 13:43:36 localhost kernel: [<ffffffff81074563>] ? kthread+0x0/0x87
Nov 18 13:43:36 localhost kernel: [<ffffffff81012d60>] ? child_rip+0x0/0x20
Nov 18 13:43:36 localhost kernel: INFO: task Xorg:1791 blocked for more than 120 seconds.
Nov 18 13:43:36 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 18 13:43:36 localhost kernel: Xorg          D 0000000000000000     0  1791   1790 0x00400084
Nov 18 13:43:36 localhost kernel: ffff880037cd57e8 0000000000000082 0000000000000000 0000000200000246
Nov 18 13:43:36 localhost kernel: ffff880037cd5900 0000000000000008 ffff880037cd5fd8 ffff880037cd5fd8
Nov 18 13:43:36 localhost kernel: ffff880028741b20 000000000000f980 0000000000015740 ffff880028741b20
Nov 18 13:43:36 localhost kernel: Call Trace:
Nov 18 13:43:36 localhost kernel: [<ffffffff81457bcf>] schedule_timeout+0x36/0xe3
Nov 18 13:43:36 localhost kernel: [<ffffffff81457a4f>] wait_for_common+0xd6/0x131
Nov 18 13:43:36 localhost kernel: [<ffffffff81051e0a>] ? default_wake_function+0x0/0x14
Nov 18 13:43:36 localhost kernel: [<ffffffff81457b4d>] wait_for_completion+0x1d/0x1f
Nov 18 13:43:36 localhost kernel: [<ffffffff81070aaf>] flush_work+0x91/0xa5
Nov 18 13:43:36 localhost kernel: [<ffffffff810705fe>] ? wq_barrier_func+0x0/0x14
Nov 18 13:43:36 localhost kernel: [<ffffffff81070c5f>] flush_delayed_work+0x55/0x5a
Nov 18 13:43:36 localhost kernel: [<ffffffff812b2ee8>] tty_flush_to_ldisc+0x15/0x17
Nov 18 13:43:36 localhost kernel: [<ffffffff812aeb80>] n_tty_poll+0x6a/0x153
Nov 18 13:43:36 localhost kernel: [<ffffffff812ab1de>] tty_poll+0x63/0x7d
Nov 18 13:43:36 localhost kernel: [<ffffffff8112cf41>] do_select+0x33a/0x549
Nov 18 13:43:36 localhost kernel: [<ffffffff8112d150>] ? __pollwait+0x0/0xcc
Nov 18 13:43:36 localhost kernel: [<ffffffff8112d21c>] ? pollwake+0x0/0x54
Nov 18 13:43:36 localhost kernel: [<ffffffff8112d21c>] ? pollwake+0x0/0x54
Nov 18 13:43:36 localhost kernel: [<ffffffff8112d21c>] ? pollwake+0x0/0x54
Nov 18 13:43:36 localhost kernel: [<ffffffff8112d21c>] ? pollwake+0x0/0x54
Nov 18 13:43:36 localhost kernel: [<ffffffff8112d21c>] ? pollwake+0x0/0x54
Nov 18 13:43:36 localhost kernel: [<ffffffff8112d21c>] ? pollwake+0x0/0x54
Nov 18 13:43:36 localhost kernel: [<ffffffff8112d21c>] ? pollwake+0x0/0x54
Nov 18 13:43:36 localhost kernel: [<ffffffff8112d21c>] ? pollwake+0x0/0x54
Nov 18 13:43:36 localhost kernel: [<ffffffff8112d21c>] ? pollwake+0x0/0x54
Nov 18 13:43:36 localhost kernel: [<ffffffff8112da83>] core_sys_select+0x174/0x20a
Nov 18 13:43:36 localhost kernel: [<ffffffffa00a2223>] ? radeon_gem_busy_ioctl+0x0/0xa1 [radeon]
Nov 18 13:43:36 localhost kernel: [<ffffffff810e6e59>] ? vma_prio_tree_remove+0x47/0xdf
Nov 18 13:43:36 localhost kernel: [<ffffffff8112b110>] ? vfs_ioctl+0x22/0x87
Nov 18 13:43:36 localhost kernel: [<ffffffff81017dce>] ? read_tsc+0x9/0x1b
Nov 18 13:43:36 localhost kernel: [<ffffffff8107c8c2>] ? ktime_get_ts+0xb2/0xc0
Nov 18 13:43:36 localhost kernel: [<ffffffff8112dd5d>] sys_select+0x96/0xc1
Nov 18 13:43:36 localhost kernel: [<ffffffff81011d32>] system_call_fastpath+0x16/0x1b

You can see that the PCIe serial card is on the same interrupt as the built-in Radeon videocard:

# cat /proc/interrupts 
           CPU0       CPU1       
  0:        129          9   IO-APIC-edge      timer
  1:        947        346   IO-APIC-edge      i8042
  4:          0          4   IO-APIC-edge    
  7:          0          0   IO-APIC-edge      parport0
  8:          0          1   IO-APIC-edge      rtc0
  9:          0          0   IO-APIC-fasteoi   acpi
 14:       1053      32482   IO-APIC-edge      pata_atiixp
 15:          0          0   IO-APIC-edge      pata_atiixp
 16:      34942      62890   IO-APIC-fasteoi   ohci_hcd:usb3, ohci_hcd:usb4, HDA Intel
 17:          0          3   IO-APIC-fasteoi   ehci_hcd:usb1
 18:      18164      40561   IO-APIC-fasteoi   ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7, radeon, serial
 19:          2         95   IO-APIC-fasteoi   ehci_hcd:usb2, HDA Intel
 27:      10132      65893   PCI-MSI-edge      ahci
 28:      80392         73   PCI-MSI-edge      eth0
NMI:          0          0   Non-maskable interrupts
LOC:     464967     439902   Local timer interrupts
SPU:          0          0   Spurious interrupts
PMI:          0          0   Performance monitoring interrupts
PND:          0          0   Performance pending work
RES:     292970     235036   Rescheduling interrupts
CAL:       7725      22978   Function call interrupts
TLB:       5185       8082   TLB shootdowns
TRM:          0          0   Thermal event interrupts
THR:          0          0   Threshold APIC interrupts
MCE:          0          0   Machine check exceptions
MCP:          9          9   Machine check polls
ERR:          0
MIS:          0

# lspci
00:00.0 Host bridge: Advanced Micro Devices [AMD] RS780 Host Bridge
00:01.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge (int gfx)
00:02.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge (ext gfx port 0)
00:05.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge (PCIE port 1)
00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA Controller [AHCI mode]
00:12.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller
00:12.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller
00:12.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller
00:13.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller
00:13.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller
00:13.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller
00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 3c)
00:14.1 IDE interface: ATI Technologies Inc SB700/SB800 IDE Controller
00:14.2 Audio device: ATI Technologies Inc SBx00 Azalia (Intel HDA)
00:14.3 ISA bridge: ATI Technologies Inc SB700/SB800 LPC host controller
00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge
00:14.5 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI2 Controller
00:18.0 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Link Control
01:05.0 VGA compatible controller: ATI Technologies Inc 760G [Radeon 3000]
01:05.1 Audio device: ATI Technologies Inc RS780 Azalia controller
02:00.0 PCI bridge: PLX Technology, Inc. PEX8112 x1 Lane PCI Express-to-PCI Bridge (rev aa)
03:00.0 Communication controller: NetMos Technology PCI 9835 Multi-I/O Controller (rev 01)
04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 03)

Mainboard is MSI 760GM-E51.

Plain metacity without compositing is used, no Xv or OpenGL apps are used.

Version-Release number of selected component (if applicable):

Fully updated Fedora 12.

How reproducible:

Very sporadic. The system usually works for almost two weeks between lockups.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Bug Zapper 2010-12-03 12:00:44 UTC
Fedora 12 changed to end-of-life (EOL) status on 2010-12-02. Fedora 12 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.