Bug 1045766 - WARNING, at drivers/gpu/drm/radeon/radeon_gart.c:235 radeon_gart_unbind+0xca/0xe0 [radeon]() trying to unbind memory from uninitialized GART !
Summary: WARNING, at drivers/gpu/drm/radeon/radeon_gart.c:235 radeon_gart_unbind+0xca/...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: xorg-x11-drv-ati
Version: 20
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: X/OpenGL Maintenance List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-12-21 21:10 UTC by Máximo Castañeda
Modified: 2014-02-08 13:18 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-02-08 13:18:02 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
dmesg (109.81 KB, text/plain)
2013-12-21 21:10 UTC, Máximo Castañeda
no flags Details
test without Xorg (102.66 KB, text/x-vhdl)
2013-12-25 12:56 UTC, Máximo Castañeda
no flags Details
WARNINGs with kernel 3.12.6-300 (61.26 KB, text/plain)
2014-01-04 21:02 UTC, Máximo Castañeda
no flags Details
Later X soft lockup (43.40 KB, text/plain)
2014-01-04 21:04 UTC, Máximo Castañeda
no flags Details
Full journal having told DDIS and then OFF, includes a few minutes of the aftermath (881.20 KB, text/x-vhdl)
2014-01-11 18:14 UTC, Máximo Castañeda
no flags Details
Full journal with emergency.target, IGD -> DIS ok, and issue when DIS -> IGD (100.53 KB, text/x-vhdl)
2014-01-11 18:17 UTC, Máximo Castañeda
no flags Details

Description Máximo Castañeda 2013-12-21 21:10:43 UTC
Created attachment 840153 [details]
dmesg

Description of problem:
Several such warnings after issuing:
echo OFF > /sys/kernel/debug/vgaswitcheroo/switch
with kernel-3.12.5-302.fc20.x86_64.  It didn't happen with kernel-3.11.10-301.fc20.x86_64.

Version-Release number of selected component (if applicable):
kernel-3.12.5-302.fc20.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Boot into kernel-3.12.5-302.fc20.x86_64
2. echo OFF > /sys/kernel/debug/vgaswitcheroo/switch

Actual results:
Several WARNINGs, see attached dmesg after 71.110343.

Expected results:
No-issue disabling of discrete graphic card.
With 3.11.10-301.fc20.x86_64 all that goes to the log is:
[  138.569631] ALSA sound/pci/hda/hda_intel.c:3079 0000:01:00.1: Disabling via VGA-switcheroo
[  138.681114] ALSA sound/pci/hda/hda_intel.c:3085 0000:01:00.1: Cannot lock devices!
[  138.681120] radeon: switched off

Additional info:
lspci -vnn | grep VGA:
00:02.0 VGA compatible controller [0300]: Intel Corporation Core Processor Integrated Graphics Controller [8086:0046] (rev 02) (prog-if 00 [VGA controller])
01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Park [Mobility Radeon HD 5430/5450/5470] [1002:68e0] (rev ff) (prog-if ff)

Comment 1 Máximo Castañeda 2013-12-25 12:55:03 UTC
Not knowing about bugzilla components and assignments, I won't change it myself, but X is not needed for this to happen.  I'm attaching the journal from a session with no X running, booting with systemd.unit=emergency.target.  That is:

1.- Boot with systemd.unit=emergency.target
2.- Mount debugfs
3.- Check that Xorg is not running
4.- echo OFF > /sys/kernel/debug/vgaswitcheroo/switch
5.- Watch for WARNING

This time the first is at
drivers/gpu/drm/drm_mm.c:578 drm_mm_takedown+0x2e/0x30 [drm]()
and there's a second one at
drivers/gpu/drm/ttm/ttm_page_alloc_dma.c:533 ttm_dma_free_pool+0xea/0xf0 [ttm]()

Comment 2 Máximo Castañeda 2013-12-25 12:56:13 UTC
Created attachment 841480 [details]
test without Xorg

Comment 3 Zhiyuan Ma 2014-01-03 03:32:15 UTC
I second this issue, but I don't see that much detail.

The direct impact of it is I cannot access vgaswitcheroo by

echo OFF > /sys/kernel/debug/vgaswitcheroo/switch

because there is no such file.

Comment 4 Máximo Castañeda 2014-01-04 20:57:44 UTC
(In reply to Zhiyuan Ma from comment #3)
> I second this issue, but I don't see that much detail.
> 
> The direct impact of it is I cannot access vgaswitcheroo by
> 
> echo OFF > /sys/kernel/debug/vgaswitcheroo/switch
> 
> because there is no such file.

The file disappears after the WARNINGs, that is, due to the bug.  Not being accessible, means I cannot reenable the discrete card, which leads to problems when shutting down.  Not disabling the discrete card means a difference of more than 15 ºC in core temperature.  Also the tainted flag is raised, I guess with good reason, and some kernel scary messages appear, such as:
Userspace still has active objects !
trying to unbind memory from uninitialized GART !
Memory manager not clean during takedown.
disabling already-disabled device

So this means no new Fedora-issued binary kernel for me until this is fixed.

Comment 5 Máximo Castañeda 2014-01-04 21:01:25 UTC
Also happening with kernel-3.12.6-300.fc20.x86_64, this time at:
drivers/gpu/drm/radeon/radeon_gart.c:235 radeon_gart_unbind+0xca/0xe0 [radeon]()
drivers/gpu/drm/radeon/radeon_gart.c:235 radeon_gart_unbind+0xca/0xe0 [radeon]()
drivers/gpu/drm/radeon/radeon_gart.c:235 radeon_gart_unbind+0xca/0xe0 [radeon]()
drivers/gpu/drm/radeon/radeon_gart.c:235 radeon_gart_unbind+0xca/0xe0 [radeon]()
drivers/gpu/drm/radeon/radeon_gart.c:235 radeon_gart_unbind+0xca/0xe0 [radeon]()
drivers/gpu/drm/radeon/radeon_gart.c:235 radeon_gart_unbind+0xca/0xe0 [radeon]()
drivers/gpu/drm/drm_mm.c:578 drm_mm_takedown+0x2e/0x30 [drm]()
drivers/gpu/drm/drm_mm.c:578 drm_mm_takedown+0x2e/0x30 [drm]()
drivers/gpu/drm/ttm/ttm_page_alloc_dma.c:533 ttm_dma_free_pool+0xea/0xf0 [ttm]()
drivers/pci/pci.c:1430 pci_disable_device+0x84/0x90()

I'm attaching those WARNINGs, and separately later BUG: soft lockup in Xorg and INFO: rcu_sched self-detected stall when trying to shutdown, in case they are of any use.

Comment 6 Máximo Castañeda 2014-01-04 21:02:43 UTC
Created attachment 845557 [details]
WARNINGs with kernel 3.12.6-300

Comment 7 Máximo Castañeda 2014-01-04 21:04:38 UTC
Created attachment 845558 [details]
Later X soft lockup

Comment 8 Jannik 2014-01-10 15:28:09 UTC
Similar problem here with 3.12.6-300.fc20.x86_64

echo OFF > /sys/kernel/debug/vgaswitcheroo/switch
triggers a kernel panic with reason "WARNING: CPU: 0 PID: 2602 at drivers/pci/pci.c:1430 pci_disable_device+0x84/0x90()".

Maybe this is related to https://bugzilla.redhat.com/show_bug.cgi?id=994438
I attached the abrt stuff there.

Comment 9 Máximo Castañeda 2014-01-11 18:10:59 UTC
JannikV's comment got me thinking I hadn't tried disabling the integrated graphics instead of the discrete one.  I've tried now, and the issue didn't pop up.  So, if #994438 is related and the problem is deeper in the kernel (from what I get that's with attaching/detaching a net driver), it happens with radeon and doesn't happen with i915.

I'm attaching two new test points with kernel-3.12.6-300.fc20.x86_64

1.- echo DDIS to vgaswitcheroo
2.- log off gnome
3.- I get a black screen, no greeter
4.- Ctrl-Alt-F2, login
5.- echo OFF to vgaswitcheroo
6.- hell breaks loose

1.- boot with systemd.unit=emergency.target
2.- echo DIS to vgaswitcheroo
3.- no issue, no need to echo OFF to vgaswitcheroo, IGD is off and we are using DIS
4.- echo IGD to vgaswitcheroo
5.- feel the pain

Comment 10 Máximo Castañeda 2014-01-11 18:14:52 UTC
Created attachment 848695 [details]
Full journal having told DDIS and then OFF, includes a few minutes of the aftermath

Comment 11 Máximo Castañeda 2014-01-11 18:17:00 UTC
Created attachment 848698 [details]
Full journal with emergency.target, IGD -> DIS ok, and issue when DIS -> IGD

Comment 12 Máximo Castañeda 2014-01-18 20:34:26 UTC
kernel-3.12.7-300.fc20.x86_64 doesn't seem to trigger this issue.  Not having an explanation of what happened in previous versions, I'm hesitant to close this bug just yet, but I'll do if the next iterations don't show it and nobody else say anything.

Comment 13 Máximo Castañeda 2014-02-08 13:18:02 UTC
Two new kernels without the bug.  Even though no one has popped in to say what the bug was and what change solves it, I'm closing this issue.
[If I don't use the correct status, please someone that knows better change it]


Note You need to log in before you can comment on or make changes to this bug.