Bug 1931065 - Frequent i915 hangs
Summary: Frequent i915 hangs
Keywords:
Status: CLOSED DUPLICATE of bug 1925346
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 33
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-02-20 12:11 UTC by Patrick O'Callaghan
Modified: 2021-02-25 11:31 UTC (History)
22 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-25 11:31:44 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Dmesg output (109.88 KB, text/plain)
2021-02-20 12:11 UTC, Patrick O'Callaghan
no flags Details
dmesg.txt (103.68 KB, text/plain)
2021-02-23 09:13 UTC, Jonathan Ryshpan
no flags Details
Output of /sys/class/drm/card0/error taken on 2021-02-22 (14.56 KB, text/plain)
2021-02-23 09:19 UTC, Jonathan Ryshpan
no flags Details

Description Patrick O'Callaghan 2021-02-20 12:11:38 UTC
Created attachment 1758425 [details]
Dmesg output

1. Please describe the problem:
Graphic artefacts, video tearing and occasional UI freezes (Plasma) requiring forced session reset or system reboot.

2. What is the Version-Release number of the kernel:
kernel-core-5.10.16-200.fc33.x86_64 (also 5.10.13, 14 and 15)

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :
First noticed with kernel-core-5.10.13-200.fc33.x86_64

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:
Issue appears randomly during normal system usage. Seems to be triggered more often by playing video, but has also happened simply on launching a GUI app (Calibre).

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:
Unknown.

6. Are you running any modules that not shipped with directly Fedora's kernel?:
No.

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Comment 1 Hans de Goede 2021-02-20 12:38:29 UTC
There have been some i915 driver changes related to some mitigations for a CVE, which landed in 5.10.9 which are known to cause problems for some users, for details see bug 1925346.

There are a couple of things which you can try to debug this:

1. Try installing the 5.10.8 kernel and see if the problem then goes away ?  You can still download that kernel here:
https://koji.fedoraproject.org/koji/buildinfo?buildID=1670578

Here are some generic instructions on directly installing a kernel from koji (the Fedora buildsystem):
https://fedorapeople.org/~jwrdegoede/kernel-test-instructions.txt

Note since this is actually an official build, there is no need to turn off secure-boot in this case.

2. Something else to test is to disable the new mitigations by running:

sudo grubby --update-kernel=ALL --args="i915.mitigations=off"

3. I've prepared a test kernel-build which has some extra fixes for the new mitigations, you can find this here:

https://koji.fedoraproject.org/koji/taskinfo?taskID=61751956

Note this kernel is a test-build and as such is not signed, so to run this one you do need to disable secure-boot.

4. I've prepared a test kernel-build which is based on 5.10.14 (which has the mentioned i915 problem) with 3 commits which I suspect are causing the issues reverted:

https://koji.fedoraproject.org/koji/taskinfo?taskID=61666542

Note this kernel is a test-build and as such is not signed, so to run this one you do need to disable secure-boot.


Note some users report that they still have some i915 gfx issues with workarounds 2. and 3. from above, so the best test to determine if the new i915 mitigations are causing issues is by running the 5.10.8 kernel. If that helps it would be good if you can also try the other workaround, just to gather some more information for the upstream developers to work with.

Also what hardware (specifically which CPU and thus which iGPU) are you using?

Comment 2 Patrick O'Callaghan 2021-02-20 17:18:59 UTC
(In reply to Hans de Goede from comment #1)
[...]
> Also what hardware (specifically which CPU and thus which iGPU) are you
> using?

Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz

I'll take a look at the other recommendations when I can. Thanks for the feedback.

Comment 3 Hans de Goede 2021-02-20 19:12:37 UTC
(In reply to Patrick O'Callaghan from comment #2)
> (In reply to Hans de Goede from comment #1)
> [...]
> > Also what hardware (specifically which CPU and thus which iGPU) are you
> > using?
> 
> Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz

Ok, so that is definitely affected by the recent security mitigation work which landed in 5.10.9 .

> I'll take a look at the other recommendations when I can. Thanks for the
> feedback.

Given your CPU I expect workaround 1. and 4. to definitely help.

I'm curious what the results with 2. and 3. will be, things will likely be better with those too, but the question is if things will be fully resolved, or if you will still see the occasional rendering glitch.

Comment 4 Patrick O'Callaghan 2021-02-21 18:06:14 UTC
(In reply to Hans de Goede from comment #3)
> (In reply to Patrick O'Callaghan from comment #2)
> > (In reply to Hans de Goede from comment #1)
> > [...]
> > > Also what hardware (specifically which CPU and thus which iGPU) are you
> > > using?
> > 
> > Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
> 
> Ok, so that is definitely affected by the recent security mitigation work
> which landed in 5.10.9 .
> 
> > I'll take a look at the other recommendations when I can. Thanks for the
> > feedback.
> 
> Given your CPU I expect workaround 1. and 4. to definitely help.
> 
> I'm curious what the results with 2. and 3. will be, things will likely be
> better with those too, but the question is if things will be fully resolved,
> or if you will still see the occasional rendering glitch.

I'm testing No. 3 currently. It's been running for several hours with no problems. I assume no. 2 would also work (5.10.13-200 was the first one where I noticed problems).

I'll post again if anything changes.

Comment 5 Patrick O'Callaghan 2021-02-21 18:09:34 UTC
(In reply to Patrick O'Callaghan from comment #4)
> (In reply to Hans de Goede from comment #3)
> > (In reply to Patrick O'Callaghan from comment #2)
> > > (In reply to Hans de Goede from comment #1)
> > > [...]
> > > > Also what hardware (specifically which CPU and thus which iGPU) are you
> > > > using?
> > > 
> > > Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
> > 
> > Ok, so that is definitely affected by the recent security mitigation work
> > which landed in 5.10.9 .
> > 
> > > I'll take a look at the other recommendations when I can. Thanks for the
> > > feedback.
> > 
> > Given your CPU I expect workaround 1. and 4. to definitely help.
> > 
> > I'm curious what the results with 2. and 3. will be, things will likely be
> > better with those too, but the question is if things will be fully resolved,
> > or if you will still see the occasional rendering glitch.
> 
> I'm testing No. 3 currently. It's been running for several hours with no
> problems. I assume no. 2 would also work (5.10.13-200 was the first one
> where I noticed problems).
> 
> I'll post again if anything changes.

Sorry, I meant I'm testing no. 4: 5.10.14-200.bz1925346.fc33.x86_64

Comment 6 Jonathan Ryshpan 2021-02-23 09:13:08 UTC
Created attachment 1758794 [details]
dmesg.txt

Taken on 2021-02-22

Comment 7 Jonathan Ryshpan 2021-02-23 09:19:51 UTC
Created attachment 1758796 [details]
Output of /sys/class/drm/card0/error taken on 2021-02-22

Comment 8 Jonathan Ryshpan 2021-02-23 09:41:49 UTC
1. Please describe the problem:
Graphic artifacts, flashing icons and flashing background wallpaper.  Problem restricted to KDE Plasma.

2. What is the Version-Release number of the kernel:
kernel-core-5.10.16-200.fc33.x86_64 (I think also 5.10.13, 14 and 15)

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear?  Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :
First noticed with kernel-core-5.10.13-200.fc33.x86_64 or kernel-core-5.10.14-200.fc33

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:
Issue appears randomly, generally often, during normal usage of Plasma functions.

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:
Unknown.

6. Are you running any modules that not shipped with directly Fedora's kernel?:
No.

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.
Attachments dmesg.txt and .../error/card0 have been already entered.  I am not familiar with the Redhat BZ and did things backward.

==> The problem seems to be cured by installing the test kernel described in H deG note 3
    https://koji.fedoraproject.org/koji/taskinfo?taskID=61751956

    The CPU is intel i5-4460 running at 3.2 GHz

Comment 9 Hans de Goede 2021-02-23 21:16:40 UTC
> The problem seems to be cured by installing the test kernel described in H deG note 3

Thank you for the feedback.

Comment 10 Hans de Goede 2021-02-25 11:31:44 UTC
> The problem seems to be cured by installing the test kernel described in H deG note 3

This clearly indicated that this is a dup of bug 1925346, for which that test kernel was initially build, so I'm going to mark this one as a dup, because having 4-5 bugs open for the same issue is not really helpful.

*** This bug has been marked as a duplicate of bug 1925346 ***


Note You need to log in before you can comment on or make changes to this bug.