Bug 1697591

Summary: modesetting driver on some Intel hardware fails to start after kernel 4.20.13 update
Product: [Fedora] Fedora Reporter: Jason Tibbitts <j>
Component: xorg-x11-serverAssignee: X/OpenGL Maintenance List <xgl-maint>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 30CC: airlied, ajax, awilliam, bskeggs, caillon+fedoraproject, fabrice, gmarr, jan.kratochvil, jan.public, jforbes, jglisse, john.j5live, j, kenyon, LailahFSF, mihai, mrmazda, ofourdan, rhughes, robatino, rstrode, rstrube, sandmann, xgl-maint
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-26 17:53:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jason Tibbitts 2019-04-08 18:56:54 UTC
So I'm filing this so that there's a place to track this in Fedora and perhaps use our blocker bug process.  I've chosen to file this against xorg because that's where the upstream DRM maintainers have said the problem ultimately lies, but this heavily interacts with the kernel.

The problem is that due to a patch that went into the stable tree at 4.20.13 and is also in the 5.0 series and beyond, X when running on some Intel hardware fails to configure the display and bails with:

[     4.961] (EE) modeset(0): failed to set mode: Invalid argument

In Fedora 29 this has been worked around by reverting the problematic patch (https://src.fedoraproject.org/rpms/kernel/blob/f29/f/0001-Revert-drm-i915-fbdev-Actually-configure-untiled-dis.patch).

In Fedora 30, this patch hasn't been present for a while.  That might have been an oversight but I suppose a decision needs to be made on whether the revert should be there or if there's anything else that can be done.

An upstream patch series was mentioned as being some early work on this issue: https://gitlab.freedesktop.org/xorg/xserver/merge_requests/36/commits

Related upstream bug reports, which should have the relevant X and kernel logs:
https://gitlab.freedesktop.org/xorg/xserver/issues/542
https://bugs.freedesktop.org/show_bug.cgi?id=109806
(plus various duplicates and things migrated from other bug trackers which should all be linked from the above)

The bottom line is that either X needs to get fixed soon, the revert needs to be carried in the kernel until X gets fixed, or that some other workaround needs to be found.  I don't know if switching to 'nomodeset' for Intel would work or even be reasonable.

Comment 1 Fedora Blocker Bugs Application 2019-04-08 20:13:37 UTC
Proposed as a Blocker for 30-final by Fedora user tibbs using the blocker tracking app because:

 This bug prevents a number of common hardware configurations (some generations of Intel graphics) from booting to a functional X server.  I do not use Wayland so I do not know if that configuration is also broken.

Comment 2 Justin M. Forbes 2019-04-08 21:09:34 UTC
From a kernel standpoint, we saw the revert as the right thing to do in F28 and F29 because it was a stable release, and it was breaking user systems in the field. But upstream has made it clear that they do not intend to revert this patch, and it is an X issue. We certainly do not wish to be in the business of maintaining this patch for eternity.

Comment 3 Robert Strube 2019-04-10 19:16:19 UTC
I've also been impacted by this bug (Dell XPS 9575), I'm unable to boot off of the Fedora 30 Beta live image because of it.  After doing some research, I discovered that it's a regression bug in the i915 kernel module related to laptop panels that are incorrectly reporting their specifications.  Intel is aware of this bug, and has already created a patch.  Here's the thread:

https://bugs.freedesktop.org/show_bug.cgi?id=109959

Here's the patch (backport patch for 5.0)

https://patchwork.freedesktop.org/patch/296411/

It would be great to get this resolved before release, as it would be very difficult to install Fedora 30 on the affected hardware otherwise.

Comment 4 Jason Tibbitts 2019-04-10 19:21:09 UTC
I'm pretty sure that's not the same bug.

Comment 5 Justin M. Forbes 2019-04-10 19:28:11 UTC
Correct, that is a different bug, and the patch for it should be in the 5.0.7 kernels currently in updates-testing.

Comment 6 Robert Strube 2019-04-10 20:44:01 UTC
My mistake!

Good to know that there is a patch for that already in testing!  Sorry for the confusion.

Comment 9 Geoffrey Marr 2019-04-15 21:02:01 UTC
Discussed during the 2019-04-15 blocker review meeting: [1]

The decision to delay the classification of this as a blocker bug was made as we are very concerned about this bug and quite inclined to accept it, but we would like ajax's input on how wide the scope is and how practical it would be to fix before we make a final decision.

[1] https://meetbot.fedoraproject.org/fedora-blocker-review/2019-04-15/f30-blocker-review.2019-04-15-16.03.txt

Comment 10 Adam Williamson 2019-04-15 21:30:17 UTC
ajax, per the above - can we get your opinion on how feasible it is to fix this in xorg-x11-xserver before F30 Final release? It seems you submitted some patches upstream but the review sort of died:

https://gitlab.freedesktop.org/xorg/xserver/merge_requests/36

can we kick that process? Would you be comfortable updating that change and shipping it as a downstream patch until we can get upstream to merge it? Would it be possible to do that quite quickly?

Also, what's your opinion on how many people this bug will impact if we don't fix it? jforbes thinks it's quite significant and the bug deserves to be a blocker - do you agree?

Thanks a lot!

Comment 11 Adam Williamson 2019-04-22 16:17:46 UTC
airlied: ping? ajax, airlied, we really need input from an expert here.

Comment 12 Geoffrey Marr 2019-04-22 17:55:19 UTC
Discussed during the 2019-04-22 blocker review meeting: [1]

The decision to classify this bug as an "AcceptedFreezeException" and to delay the classification of it as a blocker was made as we are still missing input from the graphics team. However, we think this is at minimum serious enough to rate a freeze exception. We will try to get info from graphics team ASAP and make a blocker decision.

[1] https://meetbot.fedoraproject.org/fedora-blocker-review/2019-04-22/f30-blocker-review.2019-04-22-16.00.txt

Comment 13 Sylvia Sánchez 2019-04-22 19:02:56 UTC
Hello,
I'm using the KDE Spin with kernel 5.0.7, Plasma version 5.14.5, Intel graphics Haswell and Wayland. Wayland may start but fail to work after a few seconds, or fail altogether and can't be reset except by a forced shutdown. 
So I would say the issue is still on, at least with Wayland. I hadn't any issues with Xorg.

Comment 14 Adam Williamson 2019-04-23 22:13:02 UTC
So ajax showed up earlier today and gave us a scratch build for this:

https://koji.fedoraproject.org/koji/taskinfo?taskID=34344390

can anyone who is definitely affected by the bug please test with that scratch build and see if it works? Thanks a lot!

Comment 15 Adam Williamson 2019-04-23 22:14:24 UTC
Discussion between ajax and tibbs from #fedora-devel earlier, for the record:

https://paste.fedoraproject.org/paste/GuIlya8TFhhpcqgYKJAs8w

Comment 16 Adam Williamson 2019-04-23 22:14:43 UTC
<tibbs> ajax: https://bugzilla.redhat.com/show_bug.cgi?id=1697591 is one.
<tibbs> Looking at the list I'm not sure what the other would be.
<tibbs> Maybe https://bugzilla.redhat.com/show_bug.cgi?id=1693409
<ajax> whee. love to write patches nobody's brave enough to review
<tibbs> The thing with the intel gfx change in 4.20.13 is a terrible situation.
<tibbs> Intel people regress userspace without caring one whit, saying it's an X bug.
<tibbs> Kernel people don't want to carry a revert patch forever.
<tibbs> And there's no fix in the X pipeline that anyone seems to understands besides you.
<ajax> that last sentence is true far more often than i'm happy with
<tibbs> I've no idea what other distros are doing; from looking at the X ticket they appear to be leaving users broken.
<ajax> do not stare too long into the abyss, lest you become a domain expert and etc.
<tibbs> And as before, I'm happy to test a fix, but I couldn't offer anything resembling a review.
<ajax> right on. give me a moment to mutter some magic words at koji
<tibbs> I can certainly build myself, but have to wait a bit until people log out of the known-affected desktops.
<ajax> koij watch-task 34344390
<tibbs> Is this just the server with that "WIP: modesetting: Use atomic more atomically" patchset applied?
<tibbs> If so, any thoughts about how problematic it is that it wouldn't work when resizing or rotating the display?
<tibbs> Certainly that's less common than starting X.
<ajax> according to my last comment on the MR, resize works now
<ajax> but rotate doesn't, but kinda didn't before either
<ajax> this is literally all i remember about that patch series though
<ajax> personally i'd take slightly broken rotation in exchange for X starting
<ajax> (it's just that series)
* smooge sees ajax wearing gray robes and whispering over the palatir he has connected to koji
<ajax> more of a "black concert t-shirt and jeans" kind of vibe today
<ajax> i do have a black cat though
<ajax> have to run to an appointment, bbiab

Comment 17 Jason Tibbitts 2019-04-23 22:55:15 UTC
For what it's worth, I was able to test the patched X version (though on F29, as the only hardware I have that shows this issue lives is used regularly).  It does appear to fix the issue, or at least the server starts up properly where before it didn't.  This is running the current rawhide kernel (so the reversion patch is not present) and the patched Xorg SRPM rebuilt in mock for F29.  I did not try using RandR.

One caveat is that the patched X is rather chatty with lines like these:
[     9.584] Setting SRC_H
[     9.584] Adding to 0x55d1af80d320: 1e b 4380000

Comment 18 Adam Williamson 2019-04-23 22:57:21 UTC
So after discussion between airlied, labbott, jforbes, tibbs and myself today, for Fedora 30 release purposes the kernel revert workaround for this is being restored:

https://src.fedoraproject.org/rpms/kernel/c/370e7344e36e417de6a6ffbd7708b78110a13eff?branch=f30

(the bug number in the description is wrong, that patch really is for this bug). This still ought to be fixed on the X.org side in the long term, however.

Comment 19 Justin M. Forbes 2019-04-24 00:33:49 UTC
Moving back to assigned. The kernel revert is not the fix for this bug. It is a work around. The bug shouldn't be closed until the xorg fix is in.

Comment 20 Adam Williamson 2019-04-24 01:22:01 UTC
OK, we now have a kernel build to test:

https://koji.fedoraproject.org/koji/taskinfo?taskID=34371336 (x86_64)
https://koji.fedoraproject.org/koji/taskinfo?taskID=34371339 (i686)

Can affected folks test that *without* the patched X server? Thanks!

Comment 21 Fedora Update System 2019-04-24 05:30:31 UTC
kernel-5.0.9-301.fc30 kernel-headers-5.0.9-300.fc30 kernel-tools-5.0.9-300.fc30 has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-e84f6c34da

Comment 22 Adam Williamson 2019-04-24 05:32:07 UTC
Justin: sorry for the status change, but we need the bug to be associated with the update for blockerbugs to track it correctly, and I can't set the update to close *some* bugs but not *others*, you can only set an update to close *all* or *none* of the bugs it's associated with (that's a Bodhi limitation). We can re-open this once it's gone through Bodhi.

Comment 23 Fedora Update System 2019-04-24 20:27:44 UTC
kernel-5.0.9-301.fc30, kernel-headers-5.0.9-300.fc30, kernel-tools-5.0.9-300.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-e84f6c34da

Comment 24 Fedora Update System 2019-04-25 19:33:47 UTC
kernel-5.0.9-301.fc30, kernel-headers-5.0.9-300.fc30, kernel-tools-5.0.9-300.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.

Comment 25 Adam Williamson 2019-04-25 21:08:26 UTC
Re-opening but dropping blocker status, as Justin wants this open for a 'proper' fix but the kernel workaround means we are no longer blocking F30 release.

Comment 26 Jan Kratochvil 2019-06-15 20:20:31 UTC
The problem was still the same for me on F-30 as on F-29:
        xorg-x11-server-Xorg-1.20.4-3.fc30.x86_64
        kernel-5.1.9-300.fc30.x86_64
It has been fixed (workarounded?) by that Driver "intel" from Bug 1630367 Comment 18.

Comment 27 Fabrice Bellet 2019-06-25 16:23:30 UTC
Since these patches have been added to the xorg-x11-server package in fedora 30:


+# test for https://bugzilla.redhat.com/show_bug.cgi?id=1697591
+# see also https://gitlab.freedesktop.org/xorg/xserver/merge_requests/36
+#Patch21: 0001-modesetting-Weaksauce-atomic-property-debugging.patch
+Patch22: 0002-modesetting-Propagate-more-failure-in-drmmode_set_mo.patch
+Patch23: 0003-modesetting-Factor-out-drmmode_target_output.patch
+Patch24: 0004-modesetting-Use-atomic-instead-of-per-crtc-walks-whe.patch

The following command freezes my gnome-shell session. This is 100% reproducible:

$ xrandr --fb 2560x1440 --output eDP-1 --scale-from 2560x1440 --panning 2560x1440

I tested by rebuilding the package for version 1.20.5-2, and discarding these 3 patches to be sure that it restores a working behavior.

Comment 28 Ben Cotton 2020-04-30 20:42:10 UTC
This message is a reminder that Fedora 30 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 30 on 2020-05-26.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '30'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 30 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 29 Ben Cotton 2020-05-26 17:53:50 UTC
Fedora 30 changed to end-of-life (EOL) status on 2020-05-26. Fedora 30 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.