Bug 1562530 - Random Freeze on Ryzen 2500U using amdgpu driver
Summary: Random Freeze on Ryzen 2500U using amdgpu driver
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: xorg-x11-drv-ati
Version: 28
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: X/OpenGL Maintenance List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1562444 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-03-31 16:38 UTC by Jerry
Modified: 2019-07-15 07:23 UTC (History)
45 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-05-28 23:55:04 UTC


Attachments (Terms of Use)
Output from DMESG for basic info on system (88.77 KB, text/plain)
2018-03-31 18:26 UTC, Jerry
no flags Details
dmesg output from crash (5.00 KB, text/plain)
2018-07-14 23:57 UTC, jon
no flags Details
dmesg Fedora 28 Lenovo 330 15-ARR AMD 2700u (226.64 KB, text/plain)
2019-01-28 15:09 UTC, Brent R Brian
no flags Details
Good boot, no freeze, some apps crash (257.00 KB, text/plain)
2019-04-03 11:00 UTC, Brent R Brian
no flags Details
No freeze, some apps crash (359.10 KB, text/plain)
2019-04-03 11:00 UTC, Brent R Brian
no flags Details
No freeze, some apps crash (229.60 KB, text/plain)
2019-04-03 11:01 UTC, Brent R Brian
no flags Details

Description Jerry 2018-03-31 16:38:19 UTC
Description of problem:

Complete system freeze requiring hard pawer off/on cycle and reboot.  This is on latest Fedora 27 distribution running on Ryzen 2500U with latest amdgpu from Fedora 27 distribution. (Nothing custome here)

Version-Release number of selected component (if applicable):


How reproducible:

Just regular browsing with Firefox, email reading with Thunderbird, other applications. No clear pattern yet. Sometimes while entering data in for example Bugzilla or new email compose. Does not seem related to video playback such as in Youtube.

Comment 1 Jerry 2018-03-31 18:26:46 UTC
Created attachment 1415608 [details]
Output from DMESG for basic info on system

DMESG output attached. Also, this is not hardware failure as system exhibits no issues whatsoever on Windows.  This unit does have relatively new wireless chip and GPU. To install F27 I had to use a respin to get started with kernel 4.15.

I would be happy to test later versions of drivers etc if I can get a little guidance.  I suppose I could go to rawhide if this would be helpful.

Comment 2 Jerry 2018-03-31 18:43:30 UTC
*** Bug 1562444 has been marked as a duplicate of this bug. ***

Comment 3 Jerry 2018-03-31 19:43:42 UTC
$ lspci
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 15d0
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Device 15d1
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge
00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 15d3
00:01.6 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 15d3
00:01.7 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 15d3
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe Dummy Host Bridge
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 15db
00:08.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 15dc
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 61)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 15e8
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 15e9
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 15ea
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 15eb
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 15ec
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 15ed
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 15ee
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 15ef
01:00.0 Non-Volatile memory controller: Intel Corporation Device f1a5 (rev 03)
02:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS522A PCI Express Card Reader (rev 01)
03:00.0 Network controller: Realtek Semiconductor Co., Ltd. RTL8822BE 802.11a/b/g/n/ac WiFi adapter
04:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Raven Ridge [Radeon Vega Series / Radeon Vega Mobile Series] (rev c4)
04:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device 15de
04:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Device 15df
04:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Device 15e0
04:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Device 15e1
04:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Device 15e3
05:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51)

Comment 4 Jerry 2018-03-31 19:45:39 UTC
Last Log:

12:37:00 PM kernel: pcieport 0000:00:01.7:    [12] Replay Timer Timeout  
12:35:53 PM bluetoothd: Failed to set mode: Blocked through rfkill (0x12)
12:35:52 PM spice-vdagent: Cannot access vdagent virtio channel /dev/virtio-ports/com.redhat.spice.0
12:35:51 PM pulseaudio: [pulseaudio] backend-ofono.c: Failed to register as a handsfree audio agent with ofono: org.freedesktop.DBus.Error.ServiceUnknown: The name org.ofono was not provided by any .service files
12:35:08 PM reporter-system: System encountered a non-fatal error in ??()
12:34:39 PM bluetoothd: Failed to set mode: Blocked through rfkill (0x12)
12:34:39 PM spice-vdagent: Cannot access vdagent virtio channel /dev/virtio-ports/com.redhat.spice.0
12:34:37 PM abrtd: '/var/spool/abrt/oops-2018-03-30-11:15:18-906-0' is not a problem directory
12:34:36 PM bluetoothd: Failed to set mode: Blocked through rfkill (0x12)
12:34:35 PM kernel: acer_wmi: Unsupported machine has AMW0_GUID1, unable to load
12:34:35 PM kernel: ACPI Error: Method parse/execution failed \_SB.WMID.WMAA, AE_AML_BUFFER_LIMIT (20170831/psparse-550)
12:34:35 PM kernel: sp5100_tco: I/O address 0x0cd6 already in use
12:34:35 PM kernel: tpm_crb MSFT0101:00: can't request region for resource [mem 0xcd579000-0xcd579fff]
12:34:35 PM kernel: lis3lv02d: unknown sensor type 0x0
 5:34:34 AM kernel: [drm:generic_reg_wait [amdgpu]] *ERROR* REG_WAIT timeout 1us * 100 tries - tgn10_lock line:566
 5:34:33 AM kernel: [drm:construct [amdgpu]] *ERROR* construct: Invalid Connector ObjectID from Adapter Service for connector index:2!
 5:34:32 AM kernel: AMD-Vi: Unable to write to IOMMU perf counter.

Comment 5 Jerry 2018-03-31 19:51:11 UTC
Interestingly on this run just before the freeze I had System Monitor running behind firefox window partially viewable show the CPU utilization graphs moving. I had another terminal window shelled into a remote machine running top so I could see it update.

When everything froze, the terminal was frozenn, firefox was frozen, but the system monitor appeared to be in motion. Mouse and keyboard were dead so I could do nothing but power off.  This is also using Wayland.

I am now going to try without Wayland and see what it does.

pcieport 0000:00:01.7: is suspicious, and I think the kernel was still running, though I could not access it.

Comment 6 Jorge Martínez López 2018-03-31 20:58:39 UTC
Hello Jerry,

Have you tried booting with parameter "rcu_nocbs=0-7"? (As your processor has 8 threads)

I think it is related to this kernel bug: https://bugzilla.kernel.org/show_bug.cgi?id=196683 

There is also some advice regarding some BIOS settings related to the power supply.

Comment 7 Jerry 2018-04-01 02:49:27 UTC
Jorge, I will try your suggestions. Spotted this in logs too:

Display server /usr/libexec/Xorg crash in OsLookupColor()

Comment 8 Jerry 2018-04-01 03:43:18 UTC
With rcu_nocbs=0-7, I still get a freeze under heacy use in about 30 minutes

Comment 9 Chris Siebenmann 2018-04-07 02:27:44 UTC
This may be connected to bug #1478219, which was also an amdgpu-related total system lockup under active use (in my case with a Radeon RX 550 card). However that issue appears to now be gone in current Fedora package versions and I think that those versions predate this bug report. However it may be worth testing (and verifying). If the kernel parameter 'amdgpu.dpm=0' avoids this issue, it's almost certainly related to that bug.

Comment 10 Braulio Oliveira 2018-04-13 21:57:44 UTC
I'm also having the same problem. Posted some details at https://lists.freedesktop.org/archives/amd-gfx/2018-March/020580.html including dmesg.

Using kernel 4.16.2 on archlinux on a HP Envy x360.

What I've tried and didn't help:
- rcu_nocb=0-7 and rcu_nocb=0-16
- disabling ASLR
- amdgpu.dpm=0

To reproduce open simultaneosly
- vblank_mode=0 glxgears
- https://hangouts.google.com/start

It should crash within a few minutes

Comment 11 Braulio Oliveira 2018-04-13 22:23:22 UTC
It ALWAYS crashes on shader15 of http://www.graphicsfuzz.com/benchmark/android-v1.html

Comment 12 Jerry 2018-04-20 13:37:16 UTC
(In reply to Braulio Oliveira from comment #11)
> It ALWAYS crashes on shader15 of
> http://www.graphicsfuzz.com/benchmark/android-v1.html

These tests are oriented toward android devices and some fail on non Ryzen machines as well (I tried with my AMD A8-6410 laptop)

Comment 13 Jerry 2018-04-20 13:45:50 UTC
I have tried Fedora 28 Beta Live Image boot from USB and have not seen the problem yet. However, I have only been able to run for a short time. As soon as Fedora 28 releases I will install it fully, exercise it and report back.

Also note on the HP Laptop, Bios Update to version F16b here. So check bios updates as we track this problem.

Comment 14 Braulio Oliveira 2018-04-21 00:20:54 UTC
(In reply to Jerry from comment #13)
> I have tried Fedora 28 Beta Live Image boot from USB and have not seen the
> problem yet. However, I have only been able to run for a short time. As soon
> as Fedora 28 releases I will install it fully, exercise it and report back.
> 
> Also note on the HP Laptop, Bios Update to version F16b here. So check bios
> updates as we track this problem.

Hi Jerry, I had BIOS F16a and just updated to F17a. Still the problem is present,  with the kernel 4.16.3 or the hand-compiled amd-staging-drm-next. I'm reproducing it with glxgears/hangouts in a few minutes.

Comment 15 Jerry 2018-04-22 16:25:49 UTC
I updated to latest BIOS and confirm it is still present on the F28 Beta Live. I have not tried applying updates to an actual F28 installation to further test yet. (I am running off the USB at the moment)

Comment 16 Jerry 2018-04-25 01:13:22 UTC
I installed F28 Beta and installed all available updates. So far with several hours of using the machine, I have had no hangs.  The previous combination of glxgears and hangouts has run without issue.  I would say I have a few hours of use in now.

Comment 17 Jerry 2018-04-28 14:30:00 UTC
Problem still not resolved, freeze last night while running glxgears and browsing gcc bugzilla.  This is with Fedora 28 Beta and all available updates installed.

Comment 18 Travis 2018-04-28 15:17:48 UTC
This happens on Ryzen 2500u in the HP 15m-bq121dx (ENVY x360) laptop. I encountered the freeze within a few minutes. So far, I've tested kernels 4.16.1, 4.16.2, 4.16.3, 4.16.4, and vanilla 4.17 rc2. I have yet to try 4.16.5.

Comment 19 James Le Cuirot 2018-05-03 13:00:51 UTC
My wife has an Acer SF315-41 (Swift 3) with a Ryzen 2700U and has been experiencing about 3 freezes a day. I know that some of them are actually just the touch pad freezing but that is a different issue and there have definitely been complete system freezes too.

We're currently using OpenSUSE 42.3 with very recently kernel (4.17.0-rc2) and Mesa (20180413 git) builds as well as the latest BIOS (1.04).

I would use netconsole to get more information but there's no ethernet port on this machine and I'm guessing it doesn't work over wireless.

I also have a desktop Ryzen 1600X and I'm familiar with the freezing issues encountered there. There are no advanced options in Acer BIOS but I have disabled the C6 Package state with zenstates.py to rule achieve the same effect and rule that out. Sadly it seems to be something else this time.

Comment 20 James Le Cuirot 2018-05-03 13:04:02 UTC
I forgot to add that it sometimes freezes while booting up but I haven't yet seen that with the full kernel log output visible. I also found that booting Fedora from a USB stick would sometimes fail as it would suddenly start reporting errors reading from the device as if it had just been pulled out. May not be related but thought it was worth mentioning.

Comment 21 Braulio Oliveira 2018-05-03 13:45:23 UTC
Nice report James.

Good to know that it affects Fedora/OpenSUSE/ArchLinux, multiple kernels and BIOSes and multiple hardwares (HP/Acer/Desktop).

Hopefully this get fixed soon!

Comment 22 Jerry 2018-05-04 03:04:08 UTC
Just installed updates on Fedora 28, this included updates to Mesa. Problem still present

Comment 23 James Le Cuirot 2018-05-12 22:56:33 UTC
(In reply to Braulio Oliveira from comment #21)
> Good to know that it affects Fedora/OpenSUSE/ArchLinux, multiple kernels and
> BIOSes and multiple hardwares (HP/Acer/Desktop).

To be clear, I doubt that the desktop issue is related. I ruled it out by applying the same fix on the laptop and it didn't help.

Since I am unable to use netconsole, I tried to use kdump instead. I'm not familiar with it but I haven't been able to get it work, despite jumping through a few hoops. Even if I trigger a crash manually with /proc/sysrq-trigger, kdump doesn't seem to kick in. It just sits there instead of rebooting and I then find /var/crash is still empty. Perhaps you guys could try it?

Comment 24 Jerry 2018-05-15 01:16:14 UTC
It has been reported that the bug is fixed with mesa-18.0.2. I have not had time to install this and test yet.

Comment 25 James Le Cuirot 2018-05-15 08:46:07 UTC
(In reply to Jerry from comment #24)
> It has been reported that the bug is fixed with mesa-18.0.2. I have not had
> time to install this and test yet.

Where did you hear that? I doubt it as I've been running a very recent Mesa git and have still had freezes every day.

Comment 26 Jerry 2018-05-19 06:07:06 UTC
I heard fro a mesa developer. However another developer suggested we need to get more data for them. I found when the freeze occurs I can still ssh into the machine from another and this confirms the cpu and the wifi are still running. Doing this one can then examine logs and capture live information. I have been short on time, but I plan to get setup to redo this and get additional data to the developers.

Comment 27 James Le Cuirot 2018-05-20 12:33:00 UTC
Perhaps you/they meant Mesa 18.2 (unreleased) rather than 18.0.2? Phoronix just reported it has helped a lot with Vega.

I'm not able to SSH but maybe I need to try harder as even when it hasn't frozen, it takes a minute or two of pinging before inbound connections start to work and that's even after disabling power management on the wifi interface. Never seen that before.

Comment 28 Srikrishna Sekhar 2018-05-22 14:11:51 UTC
Running Mesa 18.2 devel does not improve the issue for me - Running games still results in a random system lockup.

I am running kubuntu 18.04 with 4.16.9 mainline kernel, and mesa 18.2.0-devel from the padoka ppa.

Comment 29 James Le Cuirot 2018-05-27 20:15:17 UTC
I've updated my system to OpenSUSE 15.0 in the hope that it might be something other than the kernel or Mesa but it's still the same. I'm now using a Mesa 18.2 build from 20180518.

Comment 30 jon 2018-07-14 23:56:17 UTC
I'm experiencing this as well, full patched Fedora 28.

Linux hostname 4.17.5-200.fc28.x86_64 #1 SMP Tue Jul 10 13:39:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Ryzen 1700X
MSI Tomahawk B350
XFX Radeon RX 560 (RX-560P4SFG5) 

I'll attach dmesg output as well.

Comment 31 jon 2018-07-14 23:57:07 UTC
Created attachment 1458906 [details]
dmesg output from crash

Comment 32 jon 2018-07-14 23:58:41 UTC
Also I should add the behavior is the same as well.  Very low usage, a few terminals and firefox.  Crash happens randomly, maybe an hour, maybe six.  Freezes completely, mouse still moves around the screen and no input from keyboard/mouse recognized.  Cannot switch virtual terminals and stops responding to icmp.

Comment 33 Braulio Oliveira 2018-07-15 00:03:14 UTC
idle=nomwait kernel parameter fixed all hangs for me (see https://community.amd.com/thread/224000)

Comment 34 jon 2018-07-15 04:37:32 UTC
(In reply to Braulio Oliveira from comment #33)
> idle=nomwait kernel parameter fixed all hangs for me (see
> https://community.amd.com/thread/224000)

I'm testing this out, I'll report back.

Comment 35 James Le Cuirot 2018-07-15 07:35:44 UTC
(In reply to jon from comment #30)
> 
> Ryzen 1700X
> MSI Tomahawk B350
> XFX Radeon RX 560 (RX-560P4SFG5)

Jon, you don't have a mobile Ryzen or Vega graphics. I doubt your issue is related and you should look at this report instead, specifically ensure your BIOS is up to date and look for the Power Supply Idle Control setting.

https://bugzilla.kernel.org/show_bug.cgi?id=196683

Comment 36 jon 2018-07-15 12:55:10 UTC
Thanks, James.  I updated my BIOS approximately one month ago, so it's pretty recent, but I'll check again.  I'm also already using the rcu_nocbs=0-15 in my boot command line.  This is what I have now, but I'll remove the idle parameter now:

from proc/cmdline:

BOOT_IMAGE=/boot/vmlinuz-4.17.5-200.fc28.x86_64 root=/dev/mapper/fedora_ssd_vg00-root ro resume=/dev/mapper/fedora_ssd_vg00-swap rd.lvm.lv=fedora_ssd_vg00/root rd.lvm.lv=fedora_ssd_vg00/swap idle=nomwait rcu_nocbs=0-15 rhgb quiet LANG=en_US.UTF-8

Comment 37 jon 2018-07-15 15:49:50 UTC
Just had another crash.  After rebooting I did confirm that I'm running the latest MSI Tomahawk Arctic BIOS.  7A34vHD from 2018-05-04.

Same error:
Jul 15 10:41:04 pc003 kernel: [drm:generic_reg_wait [amdgpu]] *ERROR* REG_WAIT timeout 10us * 3000 tries - dce110_stream_encoder_dp_blank line:956

This just started happening maybe one to two weeks ago after updating all packages via dnf.  Prior that that I had weeks of uptime at a time.  Now it locks up after a couple hours, never even close to a full day.

Comment 38 jon 2018-07-16 15:59:26 UTC
As an update I want to be clear that this is a regression.  This system has been running without issue for months.

Comment 39 Jerry 2018-07-22 03:07:45 UTC
(In reply to jon from comment #38)
> As an update I want to be clear that this is a regression.  This system has
> been running without issue for months.

Try setting kernel parameter 'idle=nomwait'  I found this at some other websites and it is working for me.  The mwait cpu instruction can hang a thread.  This is documented in the AMD errata for certain Ryzen chips.

Comment 40 jon 2018-07-22 03:32:34 UTC
Thanks for the suggestion but I'm already using that option and I've had several freezes since then.

Are you using displayport by any chance?  I'm using three monitors:  one each via dvi, hdmi and displayport.

Comment 41 Jerry 2018-07-22 03:52:05 UTC
(In reply to jon from comment #40)
> Thanks for the suggestion but I'm already using that option and I've had
> several freezes since then.
> 
> Are you using displayport by any chance?  I'm using three monitors:  one
> each via dvi, hdmi and displayport.

No I have only a laptop here with Ryzen 2500U GPU. So your issue is different.

Comment 42 jon 2018-07-22 14:05:16 UTC
(In reply to James Le Cuirot from comment #35)
> (In reply to jon from comment #30)
> > 
> > Ryzen 1700X
> > MSI Tomahawk B350
> > XFX Radeon RX 560 (RX-560P4SFG5)
> 
> Jon, you don't have a mobile Ryzen or Vega graphics. I doubt your issue is
> related and you should look at this report instead, specifically ensure your
> BIOS is up to date and look for the Power Supply Idle Control setting.
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=196683

I'm using the rcu_nocbs=0-15 which I think would fix the soft lockup issue.  I think this issue is different, and it just started a few weeks ago.  I've been using this cpu and gpu for months without issue using rcu_nocbs.

I think Jerry is right, I think my issue is different.  I have filed a kernel bug on it.

Comment 43 Hexawolf 2018-10-13 22:09:48 UTC
Confirming this issue, Ryzen 2500U on HP Notebook 15-db0229ur.
- I am pretty sure this is related to Vega graphics
- Disabling C-State C6 partially solves the problem

Is Ryzen 5 so uncommon that this issue gets such little attention? This makes thousands of systems totally useless.

Comment 44 Jerry 2018-10-13 22:39:45 UTC
(In reply to Hexawolf from comment #43)
> Confirming this issue, Ryzen 2500U on HP Notebook 15-db0229ur.
> - I am pretty sure this is related to Vega graphics
> - Disabling C-State C6 partially solves the problem
> 
> Is Ryzen 5 so uncommon that this issue gets such little attention? This
> makes thousands of systems totally useless.

How did you disable C-State C6?  What OS/kernel are you running?

My current boot parameters:

Kernel command line: BOOT_IMAGE=/vmlinuz-4.18.12-200.fc28.x86_64 root=/dev/mapper/fedora_localhost--live-root ro resume=/dev/mapper/fedora_localhost--live-swap rd.lvm.lv=fedora_localhost-live/root rd.lvm.lv=fedora_localhost-live/swap rhgb quiet LANG=en_US.UTF-8 idle=nomwait processor.max_cstate=5

I have not tried dialing back the cstate further and my system still has issues with some suspend attempts, for example if I am on battery and close the lid, it will not come back.

Comment 45 Hexawolf 2018-10-21 08:54:58 UTC
(In reply to Jerry from comment #44)
> How did you disable C-State C6?  What OS/kernel are you running?

By editing /dev/cpu/[0-9]*/msr, here's a good utility I believe:
https://github.com/r4m0n/ZenStates-Linux
Was using 4.19-rc7.

> I have not tried dialing back the cstate further and my system still has
> issues with some suspend attempts, for example if I am on battery and close
> the lid, it will not come back.

I had no problems suspending laptop and bringing it back but random freezes are even worse. This is certainly a critical issue as it *already* caused a massive data loss. It is also interesting because I have noticed that sometimes sound keeps playing and probably other system services like SSH are alive, though this must be checked.

By the way, having latest BIOS from HP - F.11 Rev.A in my case

Comment 46 Enno 2018-11-28 16:28:04 UTC
I have found a very surprising way to reproduce this bug.
Up to now, it causes the soft lockup to 100% but I don't know why.

I booted via a Live USB to scroll through logs and easily edit kernel parameters. While doing so I wanted to change the keyboard layout to german {because I have a german keyboard} when I clicked on the graphical icon for keyoard layout the system freezes and after checking the logs I saw that it is the same error message Jerry posted.
drm:construct amdgpu...
I dont know why it does this, I dont know what that means for the bug, but I wanted to let you know that there is a strange way to replicate this bug and maybe you have better chances using this information than I have.

Comment 47 Dakota Lambert 2019-01-10 18:53:48 UTC
I encounter this issue almost daily on my Huawei Matebook D with the 2500U on Fedora Silverblue. The bios is currently up to date at version 1.18. I can seemingly trigger it easily when using YouTube in Firefox but it has happened independent of Firefox even being open. 

I can only replicate this issue on gnome-shell but it's happened on Fedora 29 Workstation, Silverblue, and Ubuntu. 

For me these are hard lockups with the sound repeating the last second or two of whatever was playing at the time. On my Desktop system with an AMD GPU I've seen lockups/hangs where the audio/network continues working but I haven't seen that on ryzen/vega mobile.

Comment 48 Jerry 2019-01-13 19:58:00 UTC
This is fixed on my HP laptop with their latest BIOS rev F.19.

I am not seeing any more issues except an error message on boot. Running Fedora 29 now.

[13438.629004] [drm:generic_reg_wait [amdgpu]] *ERROR* REG_WAIT timeout 1us * 10 tries - optc1_lock line:628
[13438.629089] WARNING: CPU: 7 PID: 1806 at drivers/gpu/drm/amd/amdgpu/../display/dc/dc_helper.c:254 generic_reg_wait+0xe7/0x160 [amdgpu]

Comment 49 Zoltan Boszormenyi 2019-01-23 17:10:15 UTC
I bought an Acer A315-41 laptop with Ryzen 3 2200U APU inside.
I have tried to install Fedora 29 using the netinst installer (version 1.2) but booting the installer reliably triggers the CPU soft lockup error in UDEV while still on the console, although in KMS mode.
I tried adding these options as suggested elsewhere:

idle=nomwait
processor.max_cstate=1
rcu_nocbs=0-3 (2 cores, 4 threads on this CPU)
pcie_aspm=off

The BIOS was hopelessly outdated with version 1.02, version 1.11 was available from Acer. I had to install Windows 10 to run the BIOS upgrade program.

But nothing of the above helped, the Fedora 29 installer cannot start its GUI as the soft lockup is still triggered.

Comment 50 Zoltan Boszormenyi 2019-01-23 19:30:34 UTC
I have progressed a little. pci=biosirq helped me to install Fedora 29 in text mode, I still could not get it to start the GUI.
I used the netinst version of the installer and kernel 4.20.3-200.fc29 was installed.
This Acer A315-41 laptop needs "pcie_aspm=off noapic" (nothing more) for this kernel version to boot properly into graphics but the previous installation somehow did finish correctly. Booting into single mode does not accept the root password, saying that the root user is disabled.
I tried the same set of options with the installer but they did not help. Now, if only the netinst installer was using kernel 4.20...

Comment 51 Brent R Brian 2019-01-28 15:06:31 UTC
I have a Lenovo Ideapad 330 15-ARR (16G 2700u) running Fedora 28.

Random Freeze, usually when I hit the "window" key or Activities menu (strike upper left corner).

I have enabled openSSH server so I can log in from another machine, kill gnome-shell and all is good.

dmesg = dmesg_20190128_100000.txt

I can install / run whatever you need ... if Fedora 29 fixed this, I will upgrade and "go away".

B

Comment 52 Brent R Brian 2019-01-28 15:09:56 UTC
Created attachment 1524252 [details]
dmesg Fedora 28 Lenovo 330 15-ARR AMD 2700u

gnome-shell freezes periodically when activity menu, window key or upper left mouse strike.

Comment 53 Jerry 2019-02-22 17:23:34 UTC
(In reply to Jerry from comment #48)
> This is fixed on my HP laptop with their latest BIOS rev F.19.
> 
> I am not seeing any more issues except an error message on boot. Running
> Fedora 29 now.
> 
> [13438.629004] [drm:generic_reg_wait [amdgpu]] *ERROR* REG_WAIT timeout 1us
> * 10 tries - optc1_lock line:628
> [13438.629089] WARNING: CPU: 7 PID: 1806 at
> drivers/gpu/drm/amd/amdgpu/../display/dc/dc_helper.c:254
> generic_reg_wait+0xe7/0x160 [amdgpu]

Not any more. F.19 was pulled by HP, not sure why. One thing F.19 was doing is allocating 1 GB of memory to the GPU. I updated to F.20 which allocates 256 MB as previous bios's did. The kernel 4.20 series fail to boot completely due to a firmware load issue (see https://bugs.freedesktop.org/show_bug.cgi?id=109206). Now I am using kernel 4.19.15-300.fc29.x86_64 idle=nomwait iommu=pt and bios F.20 which is latest and glxgears locks up the machine rather quickly.

Comment 54 Jerry 2019-03-01 01:36:22 UTC
With kernel 4.20.11 all appears OK. I do get some apic errors.

[    0.000000] Linux version 4.20.11-200.fc29.x86_64 (mockbuild@bkernel03.phx2.fedoraproject.org) (gcc version 8.2.1 20181215 (Red Hat 8.2.1-6) (GCC)) #1 SMP Wed Feb 20 15:56:08 UTC 2019
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.20.11-200.fc29.x86_64 root=/dev/mapper/fedora-root ro resume=/dev/mapper/fedora-swap rd.lvm.lv=fedora/root rd.lvm.lv=fedora/swap rhgb quiet LANG=en_US.UTF-8 nomwait iommu=pt

...

I dont know if the following means anything.

[    4.962452] ACPI Error: Field [D128] at bit offset/length 128/1024 exceeds size of target Buffer (160 bits) (20181003/dsopcode-201)
[    4.962460] ACPI Error: Method parse/execution failed \HWMC, AE_AML_BUFFER_LIMIT (20181003/psparse-516)
[    4.962469] ACPI Error: Method parse/execution failed \_SB.WMID.WMAA, AE_AML_BUFFER_LIMIT (20181003/psparse-516)
[    4.962528] ACPI Error: Field [D128] at bit offset/length 128/1024 exceeds size of target Buffer (160 bits) (20181003/dsopcode-201)
[    4.962532] ACPI Error: Method parse/execution failed \HWMC, AE_AML_BUFFER_LIMIT (20181003/psparse-516)
[    4.962538] ACPI Error: Method parse/execution failed \_SB.WMID.WMAA, AE_AML_BUFFER_LIMIT (20181003/psparse-516)
[    4.962594] ACPI Error: Field [D128] at bit offset/length 128/1024 exceeds size of target Buffer (160 bits) (20181003/dsopcode-201)
[    4.962598] ACPI Error: Method parse/execution failed \HWMC, AE_AML_BUFFER_LIMIT (20181003/psparse-516)
[    4.962603] ACPI Error: Method parse/execution failed \_SB.WMID.WMAA, AE_AML_BUFFER_LIMIT (20181003/psparse-516)
[    4.962668] input: HP WMI hotkeys as /devices/virtual/input/input14
[    4.962886] ACPI Error: Field [D128] at bit offset/length 128/1024 exceeds size of target Buffer (160 bits) (20181003/dsopcode-201)
[    4.962891] ACPI Error: Method parse/execution failed \HWMC, AE_AML_BUFFER_LIMIT (20181003/psparse-516)
[    4.962897] ACPI Error: Method parse/execution failed \_SB.WMID.WMAA, AE_AML_BUFFER_LIMIT (20181003/psparse-516)
[    4.962951] ACPI Error: Field [D128] at bit offset/length 128/1024 exceeds size of target Buffer (160 bits) (20181003/dsopcode-201)
[    4.962955] ACPI Error: Method parse/execution failed \HWMC, AE_AML_BUFFER_LIMIT (20181003/psparse-516)
[    4.962961] ACPI Error: Method parse/execution failed \_SB.WMID.WMAA, AE_AML_BUFFER_LIMIT (20181003/psparse-516)

Comment 55 Brent R Brian 2019-03-01 14:01:50 UTC
kernel-4.20.11-200.fc29.x86_64 way more stable

boots up good (no kernel parms)

using zenStates to disable c6

after several days a bit of screen glitching (glitch, redraw) but does it "fixes" itself ... 

MUCH BETTER

Comment 56 Jerry 2019-03-09 13:36:49 UTC
(In reply to Jerry from comment #54)
> With kernel 4.20.11 all appears OK. I do get some apic errors.
> 

This must have bene a fluke or some other dependency. None of the 4.20 kernels work. Scrambly line of color at the bottom of a blank screen. I have reverted back to:

kernel=/boot/vmlinuz-4.19.15-300.fc29.x86_64
args="ro resume=/dev/mapper/fedora-swap rd.lvm.lv=fedora/root rd.lvm.lv=fedora/swap rhgb quiet LANG=en_US.UTF-8 idle=nomwait iommu=pt"

Which appears to boot well. I have tried closing lid to suspend once and that worked, but cannot confirm stability with any suspend. From all my readings, there is a driver load problem ongoing and I dont know if its fixed in the the kernel 5 series yet.

Comment 57 Jerry 2019-03-27 19:07:59 UTC
As an update:

[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.0.3-200.fc29.x86_64 root=/dev/mapper/fedora-root ro resume=/dev/mapper/fedora-swap rd.lvm.lv=fedora/root rd.lvm.lv=fedora/swap rhgb quiet LANG=en_US.UTF-8 idle=nomwait amd_iommu=pt

$ cat /proc/cpuinfo
... snip ...
model		: 17
model name	: AMD Ryzen 5 2500U with Radeon Vega Mobile Gfx
stepping	: 0
microcode	: 0x8101007
... snip ...

This boots OK now, and I tried withut the idle=nomwait and it also boots fine.

Attempting to suspend the computer use to hang it up requiring a power cycle. Now it does not hang (for which I am very grateful) but also fails to suspend. Possibly due to the dracut bug here:

https://bugzilla.redhat.com/show_bug.cgi?id=1676357

Which I have not tried the fix for yet. In summary dracut faulted out during install of 5.0.3.  There are many different suspend errors from different components, so I will keep an eye on it.

Comment 58 Talha Khan 2019-04-02 16:21:01 UTC
Jerry, does kernel 5.0.4 boot for you?

I have the same setup as Jerry (AMD Ryzen 5 2500U) with Firmware version F.20 and it still isn't booting for me, even after installing the dracut update and reinstalling Kernel 5.0.4.
I get a black screen with a horizontal line near the bottom of scrambled orange pixels.
The last kernel that's working is 4.19.15-300.fc29.x86_64.

# grubby --info /boot/vmlinuz-5.0.4-200.fc29.x86_64
index=0
kernel=/boot/vmlinuz-5.0.4-200.fc29.x86_64
args="ro rhgb quiet LANG=en_US.UTF-8"
root=UUID=f7548a91-a3d2-4ec2-8573-c7f313417cda
initrd=/boot/initramfs-5.0.4-200.fc29.x86_64.img
title=Fedora (5.0.4-200.fc29.x86_64) 29 (Twenty Nine)

I added the kernel parameter amd_iommu=pt and it still didn't boot.
FYI, my suspend/resume is already working fine so I didn't add the parameter idle=nowait.

Comment 59 Jerry 2019-04-03 02:09:26 UTC
> Jerry, does kernel 5.0.4 boot for you?

Yes, but to get any of the kernels above 4.19.15 to boot, I had to delete /lib/firmware/amdgpu/raven_dmcu.bin . 

See: https://bugs.freedesktop.org/show_bug.cgi?id=109206

I do have to see if I can still boot 4.19.15 with and without this file.

Comment 60 Brent R Brian 2019-04-03 11:00:08 UTC
Created attachment 1551334 [details]
Good boot, no freeze, some apps crash

for comparison Lenovo IdeaPad 330 15-ARR (Ryzen 7 2700u)

Comment 61 Brent R Brian 2019-04-03 11:00:46 UTC
Created attachment 1551335 [details]
No freeze, some apps crash

for comparison Lenovo IdeaPad 330 15-ARR (Ryzen 7 2700u)

Comment 62 Brent R Brian 2019-04-03 11:01:30 UTC
Created attachment 1551336 [details]
No freeze, some apps crash

for comparison Lenovo IdeaPad 330 15-ARR (Ryzen 7 2700u)

Comment 63 Brent R Brian 2019-04-03 11:03:54 UTC
(In reply to Jerry from comment #5)
> Interestingly on this run just before the freeze I had System Monitor
> running behind firefox window partially viewable show the CPU utilization
> graphs moving. I had another terminal window shelled into a remote machine
> running top so I could see it update.
> 
> When everything froze, the terminal was frozenn, firefox was frozen, but the
> system monitor appeared to be in motion. Mouse and keyboard were dead so I
> could do nothing but power off.  This is also using Wayland.
> 
> I am now going to try without Wayland and see what it does.
> 

I had the same thing happen (monitor ran, all else stopped), I could log in via SSH to shut the system down.

Comment 64 Brent R Brian 2019-04-03 11:07:17 UTC
The only odd issues 5.0.5 (and previous):

Notifications of "component crash" (says tainted kernel, but VirtualBox was removed ... wish there was a tool to list things that taint the kernel)

ABRT crashes on occassion

The screen (no activity, just idle) will FLASH, more noticed when firefox is up (gmail) ... (screen dim / lock disabled).

Comment 65 Talha Khan 2019-04-04 14:23:53 UTC
(In reply to Jerry from comment #59)
> > Jerry, does kernel 5.0.4 boot for you?
> 
> Yes, but to get any of the kernels above 4.19.15 to boot, I had to delete
> /lib/firmware/amdgpu/raven_dmcu.bin . 
> 
> See: https://bugs.freedesktop.org/show_bug.cgi?id=109206
> 
> I do have to see if I can still boot 4.19.15 with and without this file.

Thanks. I'll follow up on that specific issue on that bug report.

I am running Fedora 29 KDE, and I would also experience random freezes. It is a bit less common now. However, the logs wouldn't indicate any root cause most of the time.

Comment 66 freqyxin 2019-04-10 03:43:56 UTC
Hey everyone, I just recently bought a Dell Inspiron 7375 running a Ryzen 5 2500U with vega graphics, and have had a hell of a time getting any Linux distro to run. I've encountered almost every Ryzen/Linux issue that is documented, but I think I've finally found a solution.

I've ran various boot options from Grub with very little success until booting Linux Mint 19.1 "Tessa" via Cinnamon in compatibility mode. After installing to the internal drive I changed the boot options, and everything still works. Wifi, BT, Touchscreen!!, touchpad, speakers...everything.



Boot options:

replace " quiet splash " with


" noapic noacpi nosplash irqroll -- "


Not sure if all of that is necessary, I literally just got it working after five days of blank screens and processor error loops, so more testing will be required. 

Hope this helps someone end a nightmare.

Comment 67 freqyxin 2019-04-10 05:21:06 UTC
(In reply to freqyxin from comment #66)
> Hey everyone, I just recently bought a Dell Inspiron 7375 running a Ryzen 5
> 2500U with vega graphics, and have had a hell of a time getting any Linux
> distro to run. I've encountered almost every Ryzen/Linux issue that is
> documented, but I think I've finally found a solution.
> 
> I've ran various boot options from Grub with very little success until
> booting Linux Mint 19.1 "Tessa" via Cinnamon in compatibility mode. After
> installing to the internal drive I changed the boot options, and everything
> still works. Wifi, BT, Touchscreen!!, touchpad, speakers...everything.
> 
> 
> 
> Boot options:
> 
> replace " quiet splash " with
> 
> 
> " noapic noacpi nosplash irqpoll -- "
> 
> 
> Not sure if all of that is necessary, I literally just got it working after
> five days of blank screens and processor error loops, so more testing will
> be required. 
> 
> Hope this helps someone end a nightmare.

Comment 68 freqyxin 2019-04-10 05:27:31 UTC
Correction to last. 

  noapic noacpi nosplash irqpoll 


Sorry for the confusion, noob here.

Comment 69 Talha Khan 2019-04-16 14:54:38 UTC
I experienced a random freeze last night. Nothing in the logs indicate an issue. Here's the last few entries in the log before rebooting:

Apr 15 23:30:46 infinity.localdomain systemd[1]: Starting system activity accounting tool...
Apr 15 23:30:46 infinity.localdomain systemd[1]: Started system activity accounting tool.
Apr 15 23:30:46 infinity.localdomain audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=sysstat-collect comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=>
Apr 15 23:30:46 infinity.localdomain audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=sysstat-collect comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=?>
-- Reboot --

Comment 70 Ben Cotton 2019-05-02 19:40:36 UTC
This message is a reminder that Fedora 28 is nearing its end of life.
On 2019-May-28 Fedora will stop maintaining and issuing updates for
Fedora 28. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora 'version' of '28'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 28 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 71 Ben Cotton 2019-05-28 23:55:04 UTC
Fedora 28 changed to end-of-life (EOL) status on 2019-05-28. Fedora 28 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 72 Jerry 2019-05-30 02:11:59 UTC
I am currently on Fedora 30 and all is working great as long as one removes raven_dmcu.bin from /lib/firmware/amdgpu/ before installing any kernel or kernel updates. This is a driver loading issue for the Vega graphics. Once the boot image is created without the influence of this file, things are very stable.

Comment 73 dac.override 2019-06-30 11:22:36 UTC
(In reply to Jerry from comment #72)
> I am currently on Fedora 30 and all is working great as long as one removes
> raven_dmcu.bin from /lib/firmware/amdgpu/ before installing any kernel or
> kernel updates. This is a driver loading issue for the Vega graphics. Once
> the boot image is created without the influence of this file, things are
> very stable.

I tried this. It did not work for me. Still soft-locks. Vega 2500U.

Comment 74 dac.override 2019-07-01 09:16:07 UTC
Added "idle=halt" to the kernel boot line, so far so good. will report back if this is not the solution adter all

Comment 75 kadu 2019-07-06 23:51:09 UTC
(In reply to Jerry from comment #72)
> I am currently on Fedora 30 and all is working great as long as one removes
> raven_dmcu.bin from /lib/firmware/amdgpu/ before installing any kernel or
> kernel updates. This is a driver loading issue for the Vega graphics. Once
> the boot image is created without the influence of this file, things are
> very stable.

Didn't work for me either. I'm using fedora 30 with gnome-shell. My cpu: AMD Ryzen 5 PRO 2500U w/ Radeon Vega Mobile Gfx (8)

Comment 76 Jerry 2019-07-07 01:08:15 UTC
(In reply to kadu from comment #75)
> (In reply to Jerry from comment #72)
> > I am currently on Fedora 30 and all is working great as long as one removes
> > raven_dmcu.bin from /lib/firmware/amdgpu/ before installing any kernel or
> > kernel updates. This is a driver loading issue for the Vega graphics. Once
> > the boot image is created without the influence of this file, things are
> > very stable.
> 
> Didn't work for me either. I'm using fedora 30 with gnome-shell. My cpu: AMD
> Ryzen 5 PRO 2500U w/ Radeon Vega Mobile Gfx (8)

If you previously installed the current kernel that does not boot you need to remove that image after you delete the raven_dmcu.bin file.  There are four kernel packages involved:

 kernel
 kernel-core
 kernel-modules
 kernel-modules-extra

This is one approach to rebuilding the kernel without the driver issue, but you have to have a booting kernel that works to do this.

The other approach is to rebuild the kernel boot image. See https://bugs.freedesktop.org/show_bug.cgi?id=109206 for the steps todo this after you have removed the raven_dmcu.bin file.

Also note. I have noticed that the troublesome bin file gets re-installed everytime the "firmware" package gets updated which has required me to redo the removal procedure more than once.

Comment 77 Brent R Brian 2019-07-08 13:23:41 UTC
I have noticed that when the machine wakes (open laptop lid) the fan will occasionally not run (normally it is very active varying speed and such).

When the fan stops, heavy loads will freeze the graphics (can still log in via ssh).

B

Comment 78 Ed 2019-07-15 07:23:13 UTC
HP x360, 2500u, BIOS F.20: Linux 5.3 from Fedora Rawhide freezes within minutes and shows garbled graphics, even when deleting the raven_dmcu.bin file and adding "idle=halt" to the kernel command line.

I wasn't able to find any logs of the error on my system.

Kernel 5.0.9 works fine with just the raven_dmcu deletion, and requires no command line parameters.


Note You need to log in before you can comment on or make changes to this bug.