Bug 1727433 - system hangs occasionally: i915 0000:00:02.0: Resetting chip for hang on rcs0
Summary: system hangs occasionally: i915 0000:00:02.0: Resetting chip for hang on rcs0
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: xorg-x11-drv-intel
Version: 30
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Adam Jackson
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-06 08:29 UTC by cacheflood
Modified: 2020-06-03 00:03 UTC (History)
8 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2020-05-26 14:32:27 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
dmesg for gpu hang (4.43 KB, text/plain)
2020-04-18 16:15 UTC, Dale Turner
no flags Details
dmesg for gpu hang 20200422 (942 bytes, text/plain)
2020-04-22 16:08 UTC, Dale Turner
no flags Details
dmesg for gpu hang April 28,2020 (1.26 KB, text/plain)
2020-04-28 16:58 UTC, Dale Turner
no flags Details
gpu error to accompany the dmesg April28,2020 (25.43 KB, text/plain)
2020-04-28 16:59 UTC, Dale Turner
no flags Details

Description cacheflood 2019-07-06 08:29:43 UTC
Description of problem: System hangs for few seconds.


uname: 5.1.15-300.fc30.x86_64 #1 SMP Tue Jun 25 14:07:22 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux


Additional info:

dmesg dump:

[ 6933.833583] perf: interrupt took too long (5219 > 5210), lowering kernel.perf_event_max_sample_rate to 38000
[10417.191008] i915 0000:00:02.0: GPU HANG: ecode 7:1:0xfffffffe, in Xwayland [2033], hang on rcs0
[10417.191010] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[10417.191011] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[10417.191011] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[10417.191012] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[10417.191013] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[10417.191065] i915 0000:00:02.0: Resetting chip for hang on rcs0


hwinfo:

H/W path                 Device      Class          Description
===============================================================
                                     system         Aspire S7-191 (Aspire S7-191_0746_2.09)
/0                                   bus            Helium
/0/0                                 memory         128KiB BIOS
/0/4                                 processor      Intel(R) Core(TM) i5-3317U CPU @ 1.70GHz
/0/4/6                               memory         32KiB L1 cache
/0/4/7                               memory         256KiB L2 cache
/0/4/8                               memory         3MiB L3 cache
/0/5                                 memory         32KiB L1 cache
/0/e                                 memory         4GiB System Memory
/0/e/0                               memory         2GiB SODIMM DDR3 Synchronous 1333 MHz (0.8 ns)
/0/e/1                               memory         2GiB SODIMM DDR3 Synchronous 1333 MHz (0.8 ns)
/0/100                               bridge         3rd Gen Core processor DRAM Controller
/0/100/2                             display        3rd Gen Core processor Graphics Controller
/0/100/14                            bus            7 Series/C210 Series Chipset Family USB xHCI Host Controller
/0/100/14/0              usb2        bus            xHCI Host Controller
/0/100/14/0/1            scsi6       storage        DataTraveler 2.0
/0/100/14/0/1/0.0.0      /dev/sdc    disk           8074MB DataTraveler 2.0
/0/100/14/0/1/0.0.0/0    /dev/sdc    disk           8074MB 
/0/100/14/0/1/0.0.0/0/2              volume         15EiB Windows FAT volume
/0/100/14/0/1/0.0.0/0/3              volume         20MiB Empty partition
/0/100/14/0/2                        bus            USB2.0 Hub
/0/100/14/0/2/3                      input          USB Keyboard
/0/100/14/0/2/4                      input          USB Receiver
/0/100/14/0/4                        input          Touchscreen
/0/100/14/1              usb3        bus            xHCI Host Controller
/0/100/16                            communication  7 Series/C216 Chipset Family MEI Controller #1
/0/100/1b                            multimedia     7 Series/C216 Chipset Family High Definition Audio Controller
/0/100/1c                            bridge         7 Series/C216 Chipset Family PCI Express Root Port 1
/0/100/1c.3                          bridge         7 Series/C216 Chipset Family PCI Express Root Port 4
/0/100/1c.3/0            wlp2s0      network        AR9462 Wireless Network Adapter
/0/100/1d                            bus            7 Series/C216 Chipset Family USB Enhanced Host Controller #1
/0/100/1d/1              usb1        bus            EHCI Host Controller
/0/100/1d/1/1                        bus            Integrated Rate Matching Hub
/0/100/1d/1/1/6                      communication  Bluetooth wireless interface
/0/100/1d/1/1/7                      multimedia     HD WebCam
/0/100/1f                            bridge         HM77 Express Chipset LPC Controller
/0/100/1f.2              scsi0       storage        82801 Mobile SATA Controller [RAID mode]
/0/100/1f.2/0            /dev/sda    disk           64GB LITEONIT CMT-64L
/0/100/1f.2/0/1                      volume         199MiB System partition
/0/100/1f.2/0/2                      volume         1023MiB EFI partition
/0/100/1f.2/0/3                      volume         118GiB LVM Physical Volume
/0/100/1f.2/1            /dev/sdb    disk           64GB LITEONIT CMT-64L
/0/100/1f.3                          bus            7 Series/C216 Chipset Family SMBus Controller
/0/1                                 generic        PnP device ETD0504
/0/2                                 system         PnP device PNP0c02
/0/3                                 system         PnP device PNP0b00
/0/6                                 generic        PnP device INT3f0d
/0/7                                 input          PnP device PNP0303
/0/8                                 system         PnP device PNP0c02
/0/9                                 system         PnP device PNP0c01
/1                       virbr0-nic  network        Ethernet interface
/2                       virbr0      network        Ethernet interface

Comment 1 cacheflood 2019-07-08 06:21:37 UTC
More information: Happens frequently if cpu is under hard load (used pigz for compressing big amount of files)

Comment 2 Christian Kujau 2020-01-14 23:47:57 UTC
From bug 1780800 (opened in December 2019):

 > The patch was submitted to stable and rejected because it doesn't apply to 5.4.  [...]
 > Upstream issue reports that backporting the fix from 5.5 to 5.4 is non-trivial. And now there are a few attempts at reverting the change that introduced the problem, so even the revert is apparently 
 > not straightforward. Skylake and Kabylake CPUs are affected, but I'm not sure if it's all or a subset of those.

Comment 3 Anthony Messina 2020-01-20 22:21:03 UTC
This also occurs in 5.4.12-200.fc31.x86_64

Comment 4 Bernie Hoefer 2020-02-03 23:48:01 UTC
(In reply to Anthony Messina from comment #3)

> This also occurs in 5.4.12-200.fc31.x86_64

Since that's a Fedora 31 kernel, you may want to follow the Fedora 31 Bugzilla ticket for it:

  BZ 1794064 - i915 0000:00:02.0: GPU HANG: ecode 9:1:0x00000000, hang on rcs0
  <https://bugzilla.redhat.com/show_bug.cgi?id=1794064>

Comment 5 bitchecker 2020-04-13 21:44:54 UTC
Same problem on Fedora 31 ( 5.5.15-200.fc31.x86_64 )

I get the problem when i move huge files on network ( scp/sftp/rsync+ssh ). 

The only entry into logs are:

```
kernel: Asynchronous wait on fence i915:xfwm4[1494]:33ad0 timed out (hint:intel_atomic_commit_ready+0x0/0x50 [i915])                                                                                                     
kernel: Asynchronous wait on fence i915:xfwm4[1494]:33ad0 timed out (hint:intel_atomic_commit_ready+0x0/0x50 [i915])
```

The result is that computer become completely freezed but it works via SSH, but if I exec a reboot command, i need to wait about 20min to unlock and see reboot works.

Comment 6 Dale Turner 2020-04-18 16:15:20 UTC
Created attachment 1679859 [details]
dmesg for gpu hang

I am getting something similar on Rawhide with the 5.7 kernel series. I kept 5.6.0-0.rc7.git1.1.fc33.x86_64 around which works fine. I can log into Openbox, but then it hangs. Sometimes I can log into Sway, but it quickly hangs. The last time I tried Sway, all I got was alternating black and grey screens.

The attached is "dmesg | grep i915" which shows both Sway and X (Openbox) failing.

Thanks.

Comment 7 Dale Turner 2020-04-22 16:08:23 UTC
Created attachment 1680923 [details]
dmesg for gpu hang 20200422

I am still getting this with kernel-5.7.0-0.rc2.1.fc33.x86_64.

Comment 8 bitchecker 2020-04-24 19:03:17 UTC
Still present with kernel 5.5.17-200.fc31.x86_64

I also encountered the problem with file transfers to storage devices as well as network traffic.

Comment 9 bitchecker 2020-04-24 19:04:39 UTC
(In reply to bitchecker from comment #8)
> Still present with kernel 5.5.17-200.fc31.x86_64
> 
> I also encountered the problem with file transfers to storage devices as
> well as network traffic.

Always completely freezed...much time to reboot ( ~ 20/30 min ).

Comment 10 Dale Turner 2020-04-28 16:58:14 UTC
Created attachment 1682555 [details]
dmesg for gpu hang April 28,2020

Still getting this with kernel-5.7.0-0.rc3.1.fc33.x86_64.

Comment 11 Dale Turner 2020-04-28 16:59:32 UTC
Created attachment 1682556 [details]
gpu error to accompany the dmesg April28,2020

Comment 12 Ben Cotton 2020-04-30 20:11:03 UTC
This message is a reminder that Fedora 30 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 30 on 2020-05-26.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '30'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 30 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 13 Dale Turner 2020-04-30 20:14:44 UTC
I'm seeing this with Rawhide (would be Fedora 33), so can we change the version for the bug, or should I create a new bug report?

Comment 14 bitchecker 2020-04-30 20:15:50 UTC
problems happens also with Fedora 31.

Recently upgraded system from 31 to 32 and tested with a network file transfer.

still present.

Please, change version.

Comment 15 bitchecker 2020-05-06 13:12:26 UTC
The problem is still present with a local copy of a large file.

Also tried to switch from LightDM to GDM but the situation does not change at all.

Please give importance to this bug because it is not possible to have to completely kill the machine and restart it again and again in order to work.

Comment 16 Ben Cotton 2020-05-26 14:32:27 UTC
Fedora 30 changed to end-of-life (EOL) status on 2020-05-26. Fedora 30 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 17 Dale Turner 2020-06-03 00:03:49 UTC
I filed a new bug as the problem still persists. 
https://bugzilla.redhat.com/show_bug.cgi?id=1843274


Note You need to log in before you can comment on or make changes to this bug.