Bug 892233

Summary: radeon modeset crashes on A4-3400 HD6410D with kernel 3.7.1
Product: [Fedora] Fedora Reporter: Mikko Tiihonen <mikko.tiihonen>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 18CC: gansalmon, itamar, jforbes, jonathan, kernel-maint, madhu.chinakonda
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-03-13 17:24:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
kernel divide error oops
none
Patch that works around the division by zero none

Description Mikko Tiihonen 2013-01-06 02:34:18 UTC
Created attachment 673228 [details]
kernel divide error oops

Description of problem:
Enabling radeon modeset on my HD6410D causes a kernel crash.
Not a regression, since it has not worked with previous kernel versions.

Version-Release number of selected component (if applicable):
kernel-3.7.1-2.fc18.x86_64

How reproducible:
always

Steps to Reproduce:
1. boot with nomodeset (seems to use vesafb)
2. rmmod radeon
3. modprobe radeon modeset=1
  
Actual results:
kernel crashes with divide error in evergreen_startup and machine pretty much locks up.
Call Trace:
 [<ffffffffa047ed02>] evergreen_startup+0x632/0x1660 [radeon]
 [<ffffffffa047feb3>] evergreen_init+0x183/0x2a0 [radeon]
 [<ffffffffa041fbf4>] radeon_device_init+0x554/0x640 [radeon]
 [<ffffffffa042164d>] radeon_driver_load_kms+0x9d/0x1a0 [radeon]
 [<ffffffffa0027d16>] drm_get_pci_dev+0x186/0x2d0 [drm]
 [<ffffffff8117fae9>] ? kfree+0x49/0x170
 [<ffffffffa0496b99>] radeon_pci_probe+0xb1/0xb9 [radeon]
 [<ffffffff81314cb9>] local_pci_probe+0x79/0x100
 [<ffffffff81314e61>] pci_device_probe+0x121/0x130
 [<ffffffff813d2feb>] driver_probe_device+0x8b/0x390
 [<ffffffff813d339b>] __driver_attach+0xab/0xb0
 [<ffffffff813d32f0>] ? driver_probe_device+0x390/0x390
 [<ffffffff813d1075>] bus_for_each_dev+0x55/0x90
 [<ffffffff813d295e>] driver_attach+0x1e/0x20
 [<ffffffff813d2590>] bus_add_driver+0x1a0/0x290
 [<ffffffffa04e6000>] ? 0xffffffffa04e5fff
 [<ffffffffa04e6000>] ? 0xffffffffa04e5fff
 [<ffffffff813d3a67>] driver_register+0x77/0x170
 [<ffffffffa04e6000>] ? 0xffffffffa04e5fff
 [<ffffffff81313be8>] __pci_register_driver+0x48/0x50
 [<ffffffffa0027f7a>] drm_pci_init+0x11a/0x130 [drm]
 [<ffffffffa04e6000>] ? 0xffffffffa04e5fff
 [<ffffffffa04e6000>] ? 0xffffffffa04e5fff
 [<ffffffffa04e60ec>] radeon_init+0xec/0x1000 [radeon]
 [<ffffffff8100216a>] do_one_initcall+0x12a/0x180
 [<ffffffff810c2ac0>] sys_init_module+0xc0/0x220
 [<ffffffff8163d9d9>] system_call_fastpath+0x16/0x1b
Code: 09 c5 45 89 e9 66 0f 1f 44 00 00 44 89 cb 41 d1 e9 83 e3 01 41 01 db 83 ee 01 75 ef 89 d1 44 29 d9 41 39 cf 72 70 31 d2 44 89 f8 <f7> f1 0f af c8 41 89 c0 44 89 f8 29 c8 83 bf c0 00 00 00 27 19 
RIP  [<ffffffffa0464e98>] r6xx_remap_render_backend+0x78/0xf0 [radeon]
 RSP <ffff88020c755b20>

Expected results:
radeon driver initializes itself correctly

Additional info:
lspci: 0300: 1002:9644
full kernel oops attached

Comment 1 Mikko Tiihonen 2013-01-07 20:43:27 UTC
Just tried with 3.8.0-0.rc2.git1.1.fc19.x86_64 - still fails at the same place.

Comment 2 Mikko Tiihonen 2013-01-07 21:34:52 UTC
The attachment seems to point to r600.c:r6xx_remap_render_backend

The function contains only one divide:

u32 r6xx_remap_render_backend(struct radeon_device *rdev,
                              u32 tiling_pipe_num,
                              u32 max_rb_num,
                              u32 total_max_rb_num,
                              u32 disabled_rb_mask)
{
        u32 rendering_pipe_num, rb_num_width, req_rb_num;
...
        /* mask out the RBs that don't exist on that asic */
        disabled_rb_mask |= (0xff << max_rb_num) & 0xff;

        rendering_pipe_num = 1 << tiling_pipe_num;
        req_rb_num = total_max_rb_num - r600_count_pipe_bits(disabled_rb_mask);
        BUG_ON(rendering_pipe_num < req_rb_num);

        pipe_rb_ratio = rendering_pipe_num / req_rb_num;

I added a printk to see what actual parameters are passed in:

r6xx_remap_render_backend: tiling_pipe_num=2, max_rb_num=1, total_max_rb_num=8, disabled_rb_mask=253

Using those to calculate the divide by zero comes from:
disabled_rb_mask |= 254; -> 255
req_rb_num = 8 - 8;

Comment 3 Mikko Tiihonen 2013-01-11 18:35:43 UTC
Created attachment 677052 [details]
Patch that works around the division by zero

The attached patch works and allows me to boot with kernel modeset enabled without errors. The patch is very safe since it only changes the functionality in the cases that would have resulted in division by zero.

I think the proper fix would be to not modify the given disabled_rb_mask unless it has more than max_rb_num zero bits. My guess is that the mask modification has been added as a workaround for some other cases, but it seems that it can disable RBs that should be active - such as in my case the only available RB.

Comment 4 Mikko Tiihonen 2013-02-05 21:16:47 UTC
The fix is now in stable queue http://git.kernel.org/?p=linux/kernel/git/stable/stable-queue.git;a=commitdiff;h=ea8a8e923e4108853590c7d5b6ea6765b4585839

So this bug can be closed when 3.7.7 kernel is available in Fedora.

Comment 5 Josh Boyer 2013-02-05 21:23:10 UTC
Earlier you tested 3.8.0-0.rc2.git1.1.  This commit you mention should be in 3.8.0-0.rc6.git2.1.  Can you test that to see if it solves the issue for you?

If so, we can bring those patches in before 3.7.7 is released.

Comment 6 Mikko Tiihonen 2013-02-06 16:48:20 UTC
just tested kernel-3.8.0-0.rc6.git3.1.fc19.x86_64 - works very nicely

Comment 7 Josh Boyer 2013-02-11 13:11:10 UTC
3.7.7 should be released early this week.  We'll pick those patches up from there.  We wouldn't have been able to get an update ready and pushed before then anyway.