Bug 654634

Summary: Radeon DRM driver hangs
Product: [Fedora] Fedora Reporter: Zoltan Boszormenyi <zboszor>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 12CC: dougsland, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-12-03 12:00:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Zoltan Boszormenyi 2010-11-18 13:42:51 UTC
Description of problem:

I have a system where a PC-based POS software is running 15 hours a day and occasionally I can see two "INFO: task ... blocked for more than 120 seconds." logged for the same second. With kernel version 2.6.32.21-168.fc12.x86_64
and older, the two tasks were my POS software communicating via the serial line
and the other was "task events/0:9" which must the the radeon DRM driver because (wild guess) nothing else uses TTM inside the system. The log entries
below were from the above specified kernel version:

Oct 28 17:32:37 localhost kernel: INFO: task events/0:9 blocked for more than 120 seconds.
Oct 28 17:32:37 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 28 17:32:37 localhost kernel: events/0      D ffff880001c15740     0     9      2 0x00000000
Oct 28 17:32:37 localhost kernel: ffff88003aabfcb0 0000000000000046 ffff88003aabfc78 ffff88003aabfc74
Oct 28 17:32:37 localhost kernel: 0000000000000000 0000000000000645 ffff88003aabffd8 ffff88003aabffd8
Oct 28 17:32:37 localhost kernel: ffff88003aa96158 000000000000f980 0000000000015740 ffff88003aa96158
Oct 28 17:32:37 localhost kernel: Call Trace:
Oct 28 17:32:37 localhost kernel: [<ffffffffa005dec3>] ttm_bo_wait_unreserved+0x10c/0x128 [ttm]
Oct 28 17:32:37 localhost kernel: [<ffffffff81074977>] ? autoremove_wake_function+0x0/0x39
Oct 28 17:32:37 localhost kernel: [<ffffffffa005df5b>] ttm_bo_reserve_locked+0x7c/0xce [ttm]
Oct 28 17:32:37 localhost kernel: [<ffffffffa005e916>] ttm_bo_cleanup_refs+0xdb/0x229 [ttm]
Oct 28 17:32:37 localhost kernel: [<ffffffffa005daac>] ? ttm_bo_release_list+0x0/0xcc [ttm]
Oct 28 17:32:37 localhost kernel: [<ffffffffa005eaf9>] ttm_bo_delayed_delete+0x95/0x100 [ttm]
Oct 28 17:32:37 localhost kernel: [<ffffffffa005eb83>] ttm_bo_delayed_workqueue+0x1f/0x35 [ttm]
Oct 28 17:32:37 localhost kernel: [<ffffffff8107034c>] worker_thread+0x1a9/0x237
Oct 28 17:32:37 localhost kernel: [<ffffffffa005eb64>] ? ttm_bo_delayed_workqueue+0x0/0x35 [ttm]
Oct 28 17:32:37 localhost kernel: [<ffffffff81074977>] ? autoremove_wake_function+0x0/0x39
Oct 28 17:32:37 localhost kernel: [<ffffffff810701a3>] ? worker_thread+0x0/0x237
Oct 28 17:32:37 localhost kernel: [<ffffffff8107468a>] kthread+0x7f/0x87
Oct 28 17:32:37 localhost kernel: [<ffffffff81012d6a>] child_rip+0xa/0x20
Oct 28 17:32:37 localhost kernel: [<ffffffff8107460b>] ? kthread+0x0/0x87
Oct 28 17:32:37 localhost kernel: [<ffffffff81012d60>] ? child_rip+0x0/0x20
Oct 28 17:32:37 localhost kernel: INFO: task penztargepd:1765 blocked for more than 120 seconds.
Oct 28 17:32:37 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 28 17:32:37 localhost kernel: penztargepd   D 0000000000000000     0  1765   1717 0x00000080
Oct 28 17:32:37 localhost kernel: ffff8800384e17e8 0000000000000082 0000000000000000 ffff880001c4fc08
Oct 28 17:32:37 localhost kernel: ffff880001c4fc30 0000000281047030 ffff8800384e1fd8 ffff8800384e1fd8
Oct 28 17:32:37 localhost kernel: ffff880038413298 000000000000f980 0000000000015740 ffff880038413298
Oct 28 17:32:37 localhost kernel: Call Trace:
Oct 28 17:32:37 localhost kernel: [<ffffffff814577df>] schedule_timeout+0x36/0xe3
Oct 28 17:32:37 localhost kernel: [<ffffffff8145765f>] wait_for_common+0xd6/0x131
Oct 28 17:32:37 localhost kernel: [<ffffffff81051e2e>] ? default_wake_function+0x0/0x14
Oct 28 17:32:37 localhost kernel: [<ffffffff8145775d>] wait_for_completion+0x1d/0x1f
Oct 28 17:32:37 localhost kernel: [<ffffffff81070b57>] flush_work+0x91/0xa5
Oct 28 17:32:37 localhost kernel: [<ffffffff810706a6>] ? wq_barrier_func+0x0/0x14
Oct 28 17:32:37 localhost kernel: [<ffffffff81070d07>] flush_delayed_work+0x55/0x5a
Oct 28 17:32:37 localhost kernel: [<ffffffff812b2d10>] tty_flush_to_ldisc+0x15/0x17
Oct 28 17:32:37 localhost kernel: [<ffffffff812ae9a8>] n_tty_poll+0x6a/0x153
Oct 28 17:32:37 localhost kernel: [<ffffffff812ab006>] tty_poll+0x63/0x7d
Oct 28 17:32:37 localhost kernel: [<ffffffff8112ce0d>] do_select+0x33a/0x549
Oct 28 17:32:37 localhost kernel: [<ffffffff8112d01c>] ? __pollwait+0x0/0xcc
Oct 28 17:32:37 localhost kernel: [<ffffffff8112d0e8>] ? pollwake+0x0/0x54
Oct 28 17:32:37 localhost kernel: [<ffffffff8112d0e8>] ? pollwake+0x0/0x54
Oct 28 17:32:37 localhost kernel: [<ffffffff81047ea4>] ? pick_next_task_fair+0xdb/0xea
Oct 28 17:32:37 localhost kernel: [<ffffffff8104fc1c>] ? finish_task_switch+0x48/0xb8
Oct 28 17:32:37 localhost kernel: [<ffffffff814573ff>] ? thread_return+0x8c/0xdb
Oct 28 17:32:37 localhost kernel: [<ffffffff8104fc1c>] ? finish_task_switch+0x48/0xb8
Oct 28 17:32:37 localhost kernel: [<ffffffff8106537b>] ? lock_timer_base+0x2c/0x52
Oct 28 17:32:37 localhost kernel: [<ffffffff81065414>] ? try_to_del_timer_sync+0x73/0x81
Oct 28 17:32:37 localhost kernel: [<ffffffff8106543b>] ? del_timer_sync+0x19/0x25
Oct 28 17:32:37 localhost kernel: [<ffffffff8145785c>] ? schedule_timeout+0xb3/0xe3
Oct 28 17:32:37 localhost kernel: [<ffffffff810654c9>] ? process_timeout+0x0/0x10
Oct 28 17:32:37 localhost kernel: [<ffffffff812d48bb>] ? serial8250_tx_empty+0x43/0x54
Oct 28 17:32:37 localhost kernel: [<ffffffff8112d94f>] core_sys_select+0x174/0x20a
Oct 28 17:32:37 localhost kernel: [<ffffffff812ac737>] ? tty_ioctl+0xa5/0x7ce
Oct 28 17:32:37 localhost kernel: [<ffffffff8112afdc>] ? vfs_ioctl+0x22/0x87
Oct 28 17:32:37 localhost kernel: [<ffffffff81017db1>] ? read_tsc+0x9/0x1b
Oct 28 17:32:37 localhost kernel: [<ffffffff8107c96e>] ? ktime_get_ts+0xb2/0xc0
Oct 28 17:32:37 localhost kernel: [<ffffffff8112dc29>] sys_select+0x96/0xc1
Oct 28 17:32:37 localhost kernel: [<ffffffff81011d32>] system_call_fastpath+0x16/0x1b

With kernel version 2.6.32.23-170.fc12.x86_64, today another lockup happened but the offending tasks are now "events/0:9" and Xorg:

Nov 18 13:43:36 localhost kernel: INFO: task events/0:9 blocked for more than 120 seconds.
Nov 18 13:43:36 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 18 13:43:36 localhost kernel: events/0      D 00000000ffffffff     0     9      2 0x00000000
Nov 18 13:43:36 localhost kernel: ffff88003aabfcb0 0000000000000046 ffff88003aabfc78 ffff88003aabfc74
Nov 18 13:43:36 localhost kernel: 0000000000000000 0000000000000400 ffff88003aabffd8 ffff88003aabffd8
Nov 18 13:43:36 localhost kernel: ffff88003aad03d0 000000000000f980 0000000000015740 ffff88003aad03d0
Nov 18 13:43:36 localhost kernel: Call Trace:
Nov 18 13:43:36 localhost kernel: [<ffffffffa005eec3>] ttm_bo_wait_unreserved+0x10c/0x128 [ttm]
Nov 18 13:43:36 localhost kernel: [<ffffffff810748cf>] ? autoremove_wake_function+0x0/0x39
Nov 18 13:43:36 localhost kernel: [<ffffffffa005ef5b>] ttm_bo_reserve_locked+0x7c/0xce [ttm]
Nov 18 13:43:36 localhost kernel: [<ffffffffa005f916>] ttm_bo_cleanup_refs+0xdb/0x229 [ttm]
Nov 18 13:43:36 localhost kernel: [<ffffffffa005faf9>] ttm_bo_delayed_delete+0x95/0x100 [ttm]
Nov 18 13:43:36 localhost kernel: [<ffffffffa005fb83>] ttm_bo_delayed_workqueue+0x1f/0x35 [ttm]
Nov 18 13:43:36 localhost kernel: [<ffffffff810702a4>] worker_thread+0x1a9/0x237
Nov 18 13:43:36 localhost kernel: [<ffffffffa005fb64>] ? ttm_bo_delayed_workqueue+0x0/0x35 [ttm]
Nov 18 13:43:36 localhost kernel: [<ffffffff810748cf>] ? autoremove_wake_function+0x0/0x39
Nov 18 13:43:36 localhost kernel: [<ffffffff810700fb>] ? worker_thread+0x0/0x237
Nov 18 13:43:36 localhost kernel: [<ffffffff810745e2>] kthread+0x7f/0x87
Nov 18 13:43:36 localhost kernel: [<ffffffff81012d6a>] child_rip+0xa/0x20
Nov 18 13:43:36 localhost kernel: [<ffffffff81074563>] ? kthread+0x0/0x87
Nov 18 13:43:36 localhost kernel: [<ffffffff81012d60>] ? child_rip+0x0/0x20
Nov 18 13:43:36 localhost kernel: INFO: task Xorg:1791 blocked for more than 120 seconds.
Nov 18 13:43:36 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 18 13:43:36 localhost kernel: Xorg          D 0000000000000000     0  1791   1790 0x00400084
Nov 18 13:43:36 localhost kernel: ffff880037cd57e8 0000000000000082 0000000000000000 0000000200000246
Nov 18 13:43:36 localhost kernel: ffff880037cd5900 0000000000000008 ffff880037cd5fd8 ffff880037cd5fd8
Nov 18 13:43:36 localhost kernel: ffff880028741b20 000000000000f980 0000000000015740 ffff880028741b20
Nov 18 13:43:36 localhost kernel: Call Trace:
Nov 18 13:43:36 localhost kernel: [<ffffffff81457bcf>] schedule_timeout+0x36/0xe3
Nov 18 13:43:36 localhost kernel: [<ffffffff81457a4f>] wait_for_common+0xd6/0x131
Nov 18 13:43:36 localhost kernel: [<ffffffff81051e0a>] ? default_wake_function+0x0/0x14
Nov 18 13:43:36 localhost kernel: [<ffffffff81457b4d>] wait_for_completion+0x1d/0x1f
Nov 18 13:43:36 localhost kernel: [<ffffffff81070aaf>] flush_work+0x91/0xa5
Nov 18 13:43:36 localhost kernel: [<ffffffff810705fe>] ? wq_barrier_func+0x0/0x14
Nov 18 13:43:36 localhost kernel: [<ffffffff81070c5f>] flush_delayed_work+0x55/0x5a
Nov 18 13:43:36 localhost kernel: [<ffffffff812b2ee8>] tty_flush_to_ldisc+0x15/0x17
Nov 18 13:43:36 localhost kernel: [<ffffffff812aeb80>] n_tty_poll+0x6a/0x153
Nov 18 13:43:36 localhost kernel: [<ffffffff812ab1de>] tty_poll+0x63/0x7d
Nov 18 13:43:36 localhost kernel: [<ffffffff8112cf41>] do_select+0x33a/0x549
Nov 18 13:43:36 localhost kernel: [<ffffffff8112d150>] ? __pollwait+0x0/0xcc
Nov 18 13:43:36 localhost kernel: [<ffffffff8112d21c>] ? pollwake+0x0/0x54
Nov 18 13:43:36 localhost kernel: [<ffffffff8112d21c>] ? pollwake+0x0/0x54
Nov 18 13:43:36 localhost kernel: [<ffffffff8112d21c>] ? pollwake+0x0/0x54
Nov 18 13:43:36 localhost kernel: [<ffffffff8112d21c>] ? pollwake+0x0/0x54
Nov 18 13:43:36 localhost kernel: [<ffffffff8112d21c>] ? pollwake+0x0/0x54
Nov 18 13:43:36 localhost kernel: [<ffffffff8112d21c>] ? pollwake+0x0/0x54
Nov 18 13:43:36 localhost kernel: [<ffffffff8112d21c>] ? pollwake+0x0/0x54
Nov 18 13:43:36 localhost kernel: [<ffffffff8112d21c>] ? pollwake+0x0/0x54
Nov 18 13:43:36 localhost kernel: [<ffffffff8112d21c>] ? pollwake+0x0/0x54
Nov 18 13:43:36 localhost kernel: [<ffffffff8112da83>] core_sys_select+0x174/0x20a
Nov 18 13:43:36 localhost kernel: [<ffffffffa00a2223>] ? radeon_gem_busy_ioctl+0x0/0xa1 [radeon]
Nov 18 13:43:36 localhost kernel: [<ffffffff810e6e59>] ? vma_prio_tree_remove+0x47/0xdf
Nov 18 13:43:36 localhost kernel: [<ffffffff8112b110>] ? vfs_ioctl+0x22/0x87
Nov 18 13:43:36 localhost kernel: [<ffffffff81017dce>] ? read_tsc+0x9/0x1b
Nov 18 13:43:36 localhost kernel: [<ffffffff8107c8c2>] ? ktime_get_ts+0xb2/0xc0
Nov 18 13:43:36 localhost kernel: [<ffffffff8112dd5d>] sys_select+0x96/0xc1
Nov 18 13:43:36 localhost kernel: [<ffffffff81011d32>] system_call_fastpath+0x16/0x1b

You can see that the PCIe serial card is on the same interrupt as the built-in Radeon videocard:

# cat /proc/interrupts 
           CPU0       CPU1       
  0:        129          9   IO-APIC-edge      timer
  1:        947        346   IO-APIC-edge      i8042
  4:          0          4   IO-APIC-edge    
  7:          0          0   IO-APIC-edge      parport0
  8:          0          1   IO-APIC-edge      rtc0
  9:          0          0   IO-APIC-fasteoi   acpi
 14:       1053      32482   IO-APIC-edge      pata_atiixp
 15:          0          0   IO-APIC-edge      pata_atiixp
 16:      34942      62890   IO-APIC-fasteoi   ohci_hcd:usb3, ohci_hcd:usb4, HDA Intel
 17:          0          3   IO-APIC-fasteoi   ehci_hcd:usb1
 18:      18164      40561   IO-APIC-fasteoi   ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7, radeon, serial
 19:          2         95   IO-APIC-fasteoi   ehci_hcd:usb2, HDA Intel
 27:      10132      65893   PCI-MSI-edge      ahci
 28:      80392         73   PCI-MSI-edge      eth0
NMI:          0          0   Non-maskable interrupts
LOC:     464967     439902   Local timer interrupts
SPU:          0          0   Spurious interrupts
PMI:          0          0   Performance monitoring interrupts
PND:          0          0   Performance pending work
RES:     292970     235036   Rescheduling interrupts
CAL:       7725      22978   Function call interrupts
TLB:       5185       8082   TLB shootdowns
TRM:          0          0   Thermal event interrupts
THR:          0          0   Threshold APIC interrupts
MCE:          0          0   Machine check exceptions
MCP:          9          9   Machine check polls
ERR:          0
MIS:          0

# lspci
00:00.0 Host bridge: Advanced Micro Devices [AMD] RS780 Host Bridge
00:01.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge (int gfx)
00:02.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge (ext gfx port 0)
00:05.0 PCI bridge: Advanced Micro Devices [AMD] RS780 PCI to PCI bridge (PCIE port 1)
00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA Controller [AHCI mode]
00:12.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller
00:12.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller
00:12.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller
00:13.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller
00:13.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller
00:13.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller
00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 3c)
00:14.1 IDE interface: ATI Technologies Inc SB700/SB800 IDE Controller
00:14.2 Audio device: ATI Technologies Inc SBx00 Azalia (Intel HDA)
00:14.3 ISA bridge: ATI Technologies Inc SB700/SB800 LPC host controller
00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge
00:14.5 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI2 Controller
00:18.0 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Link Control
01:05.0 VGA compatible controller: ATI Technologies Inc 760G [Radeon 3000]
01:05.1 Audio device: ATI Technologies Inc RS780 Azalia controller
02:00.0 PCI bridge: PLX Technology, Inc. PEX8112 x1 Lane PCI Express-to-PCI Bridge (rev aa)
03:00.0 Communication controller: NetMos Technology PCI 9835 Multi-I/O Controller (rev 01)
04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 03)

Mainboard is MSI 760GM-E51.

Plain metacity without compositing is used, no Xv or OpenGL apps are used.

Version-Release number of selected component (if applicable):

Fully updated Fedora 12.

How reproducible:

Very sporadic. The system usually works for almost two weeks between lockups.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Bug Zapper 2010-12-03 12:00:44 UTC
Fedora 12 changed to end-of-life (EOL) status on 2010-12-02. Fedora 12 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.