Bug 1697591 - modesetting driver on some Intel hardware fails to start after kernel 4.20.13 update [NEEDINFO]
Summary: modesetting driver on some Intel hardware fails to start after kernel 4.20.13...
Status: ASSIGNED
Alias: None
Product: Fedora
Classification: Fedora
Component: xorg-x11-server
Version: 30
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: X/OpenGL Maintenance List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Keywords: Reopened
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-08 18:56 UTC by Jason Tibbitts
Modified: 2019-04-25 21:08 UTC (History)
19 users (show)

(edit)
Clone Of:
(edit)
Last Closed: 2019-04-25 19:33:47 UTC
awilliam: needinfo? (ajax)


Attachments (Terms of Use)

Description Jason Tibbitts 2019-04-08 18:56:54 UTC
So I'm filing this so that there's a place to track this in Fedora and perhaps use our blocker bug process.  I've chosen to file this against xorg because that's where the upstream DRM maintainers have said the problem ultimately lies, but this heavily interacts with the kernel.

The problem is that due to a patch that went into the stable tree at 4.20.13 and is also in the 5.0 series and beyond, X when running on some Intel hardware fails to configure the display and bails with:

[     4.961] (EE) modeset(0): failed to set mode: Invalid argument

In Fedora 29 this has been worked around by reverting the problematic patch (https://src.fedoraproject.org/rpms/kernel/blob/f29/f/0001-Revert-drm-i915-fbdev-Actually-configure-untiled-dis.patch).

In Fedora 30, this patch hasn't been present for a while.  That might have been an oversight but I suppose a decision needs to be made on whether the revert should be there or if there's anything else that can be done.

An upstream patch series was mentioned as being some early work on this issue: https://gitlab.freedesktop.org/xorg/xserver/merge_requests/36/commits

Related upstream bug reports, which should have the relevant X and kernel logs:
https://gitlab.freedesktop.org/xorg/xserver/issues/542
https://bugs.freedesktop.org/show_bug.cgi?id=109806
(plus various duplicates and things migrated from other bug trackers which should all be linked from the above)

The bottom line is that either X needs to get fixed soon, the revert needs to be carried in the kernel until X gets fixed, or that some other workaround needs to be found.  I don't know if switching to 'nomodeset' for Intel would work or even be reasonable.

Comment 1 Fedora Blocker Bugs Application 2019-04-08 20:13:37 UTC
Proposed as a Blocker for 30-final by Fedora user tibbs using the blocker tracking app because:

 This bug prevents a number of common hardware configurations (some generations of Intel graphics) from booting to a functional X server.  I do not use Wayland so I do not know if that configuration is also broken.

Comment 2 Justin M. Forbes 2019-04-08 21:09:34 UTC
From a kernel standpoint, we saw the revert as the right thing to do in F28 and F29 because it was a stable release, and it was breaking user systems in the field. But upstream has made it clear that they do not intend to revert this patch, and it is an X issue. We certainly do not wish to be in the business of maintaining this patch for eternity.

Comment 3 Robert Strube 2019-04-10 19:16:19 UTC
I've also been impacted by this bug (Dell XPS 9575), I'm unable to boot off of the Fedora 30 Beta live image because of it.  After doing some research, I discovered that it's a regression bug in the i915 kernel module related to laptop panels that are incorrectly reporting their specifications.  Intel is aware of this bug, and has already created a patch.  Here's the thread:

https://bugs.freedesktop.org/show_bug.cgi?id=109959

Here's the patch (backport patch for 5.0)

https://patchwork.freedesktop.org/patch/296411/

It would be great to get this resolved before release, as it would be very difficult to install Fedora 30 on the affected hardware otherwise.

Comment 4 Jason Tibbitts 2019-04-10 19:21:09 UTC
I'm pretty sure that's not the same bug.

Comment 5 Justin M. Forbes 2019-04-10 19:28:11 UTC
Correct, that is a different bug, and the patch for it should be in the 5.0.7 kernels currently in updates-testing.

Comment 6 Robert Strube 2019-04-10 20:44:01 UTC
My mistake!

Good to know that there is a patch for that already in testing!  Sorry for the confusion.

Comment 9 Geoffrey Marr 2019-04-15 21:02:01 UTC
Discussed during the 2019-04-15 blocker review meeting: [1]

The decision to delay the classification of this as a blocker bug was made as we are very concerned about this bug and quite inclined to accept it, but we would like ajax's input on how wide the scope is and how practical it would be to fix before we make a final decision.

[1] https://meetbot.fedoraproject.org/fedora-blocker-review/2019-04-15/f30-blocker-review.2019-04-15-16.03.txt

Comment 10 Adam Williamson 2019-04-15 21:30:17 UTC
ajax, per the above - can we get your opinion on how feasible it is to fix this in xorg-x11-xserver before F30 Final release? It seems you submitted some patches upstream but the review sort of died:

https://gitlab.freedesktop.org/xorg/xserver/merge_requests/36

can we kick that process? Would you be comfortable updating that change and shipping it as a downstream patch until we can get upstream to merge it? Would it be possible to do that quite quickly?

Also, what's your opinion on how many people this bug will impact if we don't fix it? jforbes thinks it's quite significant and the bug deserves to be a blocker - do you agree?

Thanks a lot!

Comment 11 Adam Williamson 2019-04-22 16:17:46 UTC
airlied: ping? ajax, airlied, we really need input from an expert here.

Comment 12 Geoffrey Marr 2019-04-22 17:55:19 UTC
Discussed during the 2019-04-22 blocker review meeting: [1]

The decision to classify this bug as an "AcceptedFreezeException" and to delay the classification of it as a blocker was made as we are still missing input from the graphics team. However, we think this is at minimum serious enough to rate a freeze exception. We will try to get info from graphics team ASAP and make a blocker decision.

[1] https://meetbot.fedoraproject.org/fedora-blocker-review/2019-04-22/f30-blocker-review.2019-04-22-16.00.txt

Comment 13 Sylvia Sánchez 2019-04-22 19:02:56 UTC
Hello,
I'm using the KDE Spin with kernel 5.0.7, Plasma version 5.14.5, Intel graphics Haswell and Wayland. Wayland may start but fail to work after a few seconds, or fail altogether and can't be reset except by a forced shutdown. 
So I would say the issue is still on, at least with Wayland. I hadn't any issues with Xorg.

Comment 14 Adam Williamson 2019-04-23 22:13:02 UTC
So ajax showed up earlier today and gave us a scratch build for this:

https://koji.fedoraproject.org/koji/taskinfo?taskID=34344390

can anyone who is definitely affected by the bug please test with that scratch build and see if it works? Thanks a lot!

Comment 15 Adam Williamson 2019-04-23 22:14:24 UTC
Discussion between ajax and tibbs from #fedora-devel earlier, for the record:

https://paste.fedoraproject.org/paste/GuIlya8TFhhpcqgYKJAs8w

Comment 16 Adam Williamson 2019-04-23 22:14:43 UTC
<tibbs> ajax: https://bugzilla.redhat.com/show_bug.cgi?id=1697591 is one.
<tibbs> Looking at the list I'm not sure what the other would be.
<tibbs> Maybe https://bugzilla.redhat.com/show_bug.cgi?id=1693409
<ajax> whee. love to write patches nobody's brave enough to review
<tibbs> The thing with the intel gfx change in 4.20.13 is a terrible situation.
<tibbs> Intel people regress userspace without caring one whit, saying it's an X bug.
<tibbs> Kernel people don't want to carry a revert patch forever.
<tibbs> And there's no fix in the X pipeline that anyone seems to understands besides you.
<ajax> that last sentence is true far more often than i'm happy with
<tibbs> I've no idea what other distros are doing; from looking at the X ticket they appear to be leaving users broken.
<ajax> do not stare too long into the abyss, lest you become a domain expert and etc.
<tibbs> And as before, I'm happy to test a fix, but I couldn't offer anything resembling a review.
<ajax> right on. give me a moment to mutter some magic words at koji
<tibbs> I can certainly build myself, but have to wait a bit until people log out of the known-affected desktops.
<ajax> koij watch-task 34344390
<tibbs> Is this just the server with that "WIP: modesetting: Use atomic more atomically" patchset applied?
<tibbs> If so, any thoughts about how problematic it is that it wouldn't work when resizing or rotating the display?
<tibbs> Certainly that's less common than starting X.
<ajax> according to my last comment on the MR, resize works now
<ajax> but rotate doesn't, but kinda didn't before either
<ajax> this is literally all i remember about that patch series though
<ajax> personally i'd take slightly broken rotation in exchange for X starting
<ajax> (it's just that series)
* smooge sees ajax wearing gray robes and whispering over the palatir he has connected to koji
<ajax> more of a "black concert t-shirt and jeans" kind of vibe today
<ajax> i do have a black cat though
<ajax> have to run to an appointment, bbiab

Comment 17 Jason Tibbitts 2019-04-23 22:55:15 UTC
For what it's worth, I was able to test the patched X version (though on F29, as the only hardware I have that shows this issue lives is used regularly).  It does appear to fix the issue, or at least the server starts up properly where before it didn't.  This is running the current rawhide kernel (so the reversion patch is not present) and the patched Xorg SRPM rebuilt in mock for F29.  I did not try using RandR.

One caveat is that the patched X is rather chatty with lines like these:
[     9.584] Setting SRC_H
[     9.584] Adding to 0x55d1af80d320: 1e b 4380000

Comment 18 Adam Williamson 2019-04-23 22:57:21 UTC
So after discussion between airlied, labbott, jforbes, tibbs and myself today, for Fedora 30 release purposes the kernel revert workaround for this is being restored:

https://src.fedoraproject.org/rpms/kernel/c/370e7344e36e417de6a6ffbd7708b78110a13eff?branch=f30

(the bug number in the description is wrong, that patch really is for this bug). This still ought to be fixed on the X.org side in the long term, however.

Comment 19 Justin M. Forbes 2019-04-24 00:33:49 UTC
Moving back to assigned. The kernel revert is not the fix for this bug. It is a work around. The bug shouldn't be closed until the xorg fix is in.

Comment 20 Adam Williamson 2019-04-24 01:22:01 UTC
OK, we now have a kernel build to test:

https://koji.fedoraproject.org/koji/taskinfo?taskID=34371336 (x86_64)
https://koji.fedoraproject.org/koji/taskinfo?taskID=34371339 (i686)

Can affected folks test that *without* the patched X server? Thanks!

Comment 21 Fedora Update System 2019-04-24 05:30:31 UTC
kernel-5.0.9-301.fc30 kernel-headers-5.0.9-300.fc30 kernel-tools-5.0.9-300.fc30 has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-e84f6c34da

Comment 22 Adam Williamson 2019-04-24 05:32:07 UTC
Justin: sorry for the status change, but we need the bug to be associated with the update for blockerbugs to track it correctly, and I can't set the update to close *some* bugs but not *others*, you can only set an update to close *all* or *none* of the bugs it's associated with (that's a Bodhi limitation). We can re-open this once it's gone through Bodhi.

Comment 23 Fedora Update System 2019-04-24 20:27:44 UTC
kernel-5.0.9-301.fc30, kernel-headers-5.0.9-300.fc30, kernel-tools-5.0.9-300.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-e84f6c34da

Comment 24 Fedora Update System 2019-04-25 19:33:47 UTC
kernel-5.0.9-301.fc30, kernel-headers-5.0.9-300.fc30, kernel-tools-5.0.9-300.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.

Comment 25 Adam Williamson 2019-04-25 21:08:26 UTC
Re-opening but dropping blocker status, as Justin wants this open for a 'proper' fix but the kernel workaround means we are no longer blocking F30 release.


Note You need to log in before you can comment on or make changes to this bug.