Created attachment 1029309 [details] Journal of boot containing failure Description of problem: Switching my DisplayPort attached monitor on (or off and then on again) while booted to multiuser target (no Xorg or Wayland) causes a kernel Oops, followed by soft-lockups which render the system unusable. Doing the same while at the full Graphical target causes gnome-shell to crash, but it is possible to recover the system by logging in again. Version-Release number of selected component (if applicable): 4.0.4-301.fc22.x86_64 How reproducible: Always Steps to Reproduce: Toggle monitor off and on again. Or switch monitor on after system has booted. Actual results: In text/console mode: Kernel Oops followed by lockup. In Grahpical mode: Gnome session crashes. Xorg keeps running at 100% CPU. Expected results: Monitor turns on and is detected by the system. Additional info: I have included the information for the kernel oops in this bug as it seemed to be the most critical issue, limited data is available as the oops leads to a lockup from which it is impossible to recover (even SysRq is unresponsive). The oops message (attached) is the last entry in the journal, and the location of the failure is as follows: >May 24 23:25:28 fedora-workstation.home kernel: BUG: unable to handle kernel NULL pointer dereference at 00000000000003b0 >May 24 23:25:28 fedora-workstation.home kernel: IP: [<ffffffffa0152055>] radeon_connector_edid+0x5/0x70 [radeon] Further investigation leads me to line 280 in radeon_connectors.c, in particular, in radeon_connector_edid(..) a dereference of radeon_connector with no NULL checking. GPU in use is a Radeon 6950, Kernel is the latest stable F22 release.
Created attachment 1029310 [details] LSPCI
I can't help you debug this, but as another piece of information, I see "radeon_audio_dpms" in the stack trace and since kernel 4.0 the radeon driver supports audio over DisplayPort. So there's a decent possibility that this is related.
I have confirmed that this isn't an issue in 3.19, and am currently bisecting forwards from 4.0.1 (working) to 4.0.4 (broken).
Bisect identifies the bad commit as 016a255b7835ee7e49a3eba3c14ba0bc0221a4f8 >016a255b7835ee7e49a3eba3c14ba0bc0221a4f8 is the first bad commit >commit 016a255b7835ee7e49a3eba3c14ba0bc0221a4f8 >Author: Alex Deucher <alexander.deucher> >Date: Tue Apr 7 09:52:42 2015 -0400 > > drm/radeon: only mark audio as connected if the monitor supports it (v3) > > commit 0f55db36d49d45b80eff0c0a2a498766016f458b upstream. > > Otherwise the driver may try and send audio which may confuse the > monitor. > > v2: set pin to NULL if no audio > v3: avoid crash with analog encoders > > Signed-off-by: Alex Deucher <alexander.deucher> > Signed-off-by: Greg Kroah-Hartman <gregkh>> > >:040000 040000 de0366a6790f5c91d175bcb89cb34956bbe72b26 >bbdb5734961f824558152c7c34a840c78bc3a9a9 M drivers I will attach the patch file in question.
Created attachment 1030071 [details] git-bisect log
Created attachment 1030072 [details] Culprit patch
Great job finding the commit causing this issue. I know that Alex Deucher is available either on https://bugs.freedesktop.org , or #radeon channel on freenode. Unless xorg-x11-drv-ati maintainers have a better advice, I suggest to create a bug ticket at https://bugs.freedesktop.org (product DRI, component DRM/Radeon) and include the info above, then link it here. It seems to be an upstream issue.
Thanks Kamil, bug is created and linked here.
Is this fixed upstream?
(In reply to Andreas Tunek from comment #9) > Is this fixed upstream? I have confirmed with the upstream that the suggested fix: 1) Revert 016a255b7835ee7e49a3eba3c14ba0bc0221a4f8 2) Apply patch (attached now). Fixes the issue when I build the kernel. As far as I am aware this has not made it into a kernel release, but I don't know if it was even submitted for approval.
Created attachment 1054008 [details] Upstream Radeon patch (fix)
Upstream has committed the proposed fixes to the Linux tree, and they should be in from Linux 4.1 onwards.
So it is in 4.1.2?
The commit can be found in the stable tree here: http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=fbfd3bc7dfd7efcad2d2e52bf634f84c80a77a35 The next subsequent tag is 4.1-rc6 so any release following this (4.1.2 included) will have these commits.
While this patch seems to be incorporated upstream, there seems to be another issue (reported in several bugs, including here: https://bugzilla.redhat.com/show_bug.cgi?id=1240566) which is still present (I've tried 4.1, 4.1.1, 4.1.2, and 4.1.3). Does anyone know if there is a patch to fix this?
Matthew: This patch does not fix that bug.
Marking this as closed as I have tested 4.1 kernels and verified the fix.