Bug 1224579 - Radeon Kernel Oops & Lockup when switching on DisplayPort attached monitor
Summary: Radeon Kernel Oops & Lockup when switching on DisplayPort attached monitor
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: xorg-x11-drv-ati
Version: 22
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: X/OpenGL Maintenance List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-05-25 00:19 UTC by Richard Bradfield
Modified: 2015-07-26 16:09 UTC (History)
12 users (show)

Fixed In Version: Kernel 4.1
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-07-26 16:09:13 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Journal of boot containing failure (844.41 KB, text/x-vhdl)
2015-05-25 00:19 UTC, Richard Bradfield
no flags Details
LSPCI (3.37 KB, text/plain)
2015-05-25 00:20 UTC, Richard Bradfield
no flags Details
git-bisect log (1.71 KB, text/plain)
2015-05-26 16:26 UTC, Richard Bradfield
no flags Details
Culprit patch (3.42 KB, patch)
2015-05-26 16:27 UTC, Richard Bradfield
no flags Details | Diff
Upstream Radeon patch (fix) (1.81 KB, patch)
2015-07-20 18:06 UTC, Richard Bradfield
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
FreeDesktop.org 90681 0 None None None Never

Description Richard Bradfield 2015-05-25 00:19:45 UTC
Created attachment 1029309 [details]
Journal of boot containing failure

Description of problem:
Switching my DisplayPort attached monitor on (or off and then on again) while booted to multiuser target (no Xorg or Wayland) causes a kernel Oops, followed by soft-lockups which render the system unusable.

Doing the same while at the full Graphical target causes gnome-shell to crash, but it is possible to recover the system by logging in again.


Version-Release number of selected component (if applicable):
4.0.4-301.fc22.x86_64

How reproducible:
Always


Steps to Reproduce:
Toggle monitor off and on again. Or switch monitor on after system has booted.


Actual results:
In text/console mode: Kernel Oops followed by lockup.
In Grahpical mode: Gnome session crashes. Xorg keeps running at 100% CPU.


Expected results:
Monitor turns on and is detected by the system.


Additional info:
I have included the information for the kernel oops in this bug as it seemed to be the most critical issue, limited data is available as the oops leads to a lockup from which it is impossible to recover (even SysRq is unresponsive).

The oops message (attached) is the last entry in the journal, and the location of the failure is as follows:

>May 24 23:25:28 fedora-workstation.home kernel: BUG: unable to handle kernel NULL pointer dereference at 00000000000003b0
>May 24 23:25:28 fedora-workstation.home kernel: IP: [<ffffffffa0152055>] radeon_connector_edid+0x5/0x70 [radeon]

Further investigation leads me to line 280 in radeon_connectors.c, in particular, in radeon_connector_edid(..) a dereference of radeon_connector with no NULL checking.

GPU in use is a Radeon 6950, Kernel is the latest stable F22 release.

Comment 1 Richard Bradfield 2015-05-25 00:20:40 UTC
Created attachment 1029310 [details]
LSPCI

Comment 2 Kamil Páral 2015-05-26 10:26:53 UTC
I can't help you debug this, but as another piece of information, I see "radeon_audio_dpms" in the stack trace and since kernel 4.0 the radeon driver supports audio over DisplayPort. So there's a decent possibility that this is related.

Comment 3 Richard Bradfield 2015-05-26 12:31:18 UTC
I have confirmed that this isn't an issue in 3.19, and am currently bisecting forwards from 4.0.1 (working) to 4.0.4 (broken).

Comment 4 Richard Bradfield 2015-05-26 16:25:40 UTC
Bisect identifies the bad commit as 016a255b7835ee7e49a3eba3c14ba0bc0221a4f8

>016a255b7835ee7e49a3eba3c14ba0bc0221a4f8 is the first bad commit
>commit 016a255b7835ee7e49a3eba3c14ba0bc0221a4f8
>Author: Alex Deucher <alexander.deucher>
>Date:   Tue Apr 7 09:52:42 2015 -0400
>
>    drm/radeon: only mark audio as connected if the monitor supports it (v3)
>    
>    commit 0f55db36d49d45b80eff0c0a2a498766016f458b upstream.
>    
>    Otherwise the driver may try and send audio which may confuse the
>    monitor.
>    
>    v2: set pin to NULL if no audio
>    v3: avoid crash with analog encoders
>    
>    Signed-off-by: Alex Deucher <alexander.deucher>
>    Signed-off-by: Greg Kroah-Hartman <gregkh>>
>
>:040000 040000 de0366a6790f5c91d175bcb89cb34956bbe72b26 >bbdb5734961f824558152c7c34a840c78bc3a9a9 M	drivers

I will attach the patch file in question.

Comment 5 Richard Bradfield 2015-05-26 16:26:46 UTC
Created attachment 1030071 [details]
git-bisect log

Comment 6 Richard Bradfield 2015-05-26 16:27:36 UTC
Created attachment 1030072 [details]
Culprit patch

Comment 7 Kamil Páral 2015-05-27 07:56:29 UTC
Great job finding the commit causing this issue. I know that Alex Deucher is available either on https://bugs.freedesktop.org , or #radeon channel on freenode. Unless xorg-x11-drv-ati maintainers have a better advice, I suggest to create a bug ticket at https://bugs.freedesktop.org (product DRI, component DRM/Radeon) and include the info above, then link it here. It seems to be an upstream issue.

Comment 8 Richard Bradfield 2015-05-27 14:19:01 UTC
Thanks Kamil, bug is created and linked here.

Comment 9 Andreas Tunek 2015-07-20 17:56:44 UTC
Is this fixed upstream?

Comment 10 Richard Bradfield 2015-07-20 18:05:32 UTC
(In reply to Andreas Tunek from comment #9)
> Is this fixed upstream?

I have confirmed with the upstream that the suggested fix:

1) Revert 016a255b7835ee7e49a3eba3c14ba0bc0221a4f8
2) Apply patch (attached now).

Fixes the issue when I build the kernel. As far as I am aware this has not made it into a kernel release, but I don't know if it was even submitted for approval.

Comment 11 Richard Bradfield 2015-07-20 18:06:18 UTC
Created attachment 1054008 [details]
Upstream Radeon patch (fix)

Comment 12 Richard Bradfield 2015-07-22 12:08:03 UTC
Upstream has committed the proposed fixes to the Linux tree, and they should be in from Linux 4.1 onwards.

Comment 13 Andreas Tunek 2015-07-23 18:00:19 UTC
So it is in 4.1.2?

Comment 14 Richard Bradfield 2015-07-23 20:34:07 UTC
The commit can be found in the stable tree here:
http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=fbfd3bc7dfd7efcad2d2e52bf634f84c80a77a35

The next subsequent tag is 4.1-rc6 so any release following this (4.1.2 included) will have these commits.

Comment 15 Matthew 2015-07-24 01:41:37 UTC
While this patch seems to be incorporated upstream, there seems to be another issue (reported in several bugs, including here: https://bugzilla.redhat.com/show_bug.cgi?id=1240566) which is still present (I've tried 4.1, 4.1.1, 4.1.2, and 4.1.3). Does anyone know if there is a patch to fix this?

Comment 16 Andreas Tunek 2015-07-24 20:12:00 UTC
Matthew: This patch does not fix that bug.

Comment 17 Richard Bradfield 2015-07-26 16:09:13 UTC
Marking this as closed as I have tested 4.1 kernels and verified the fix.


Note You need to log in before you can comment on or make changes to this bug.