Bug 1652702

Summary: kernel 4.19.2 won't poweroff or reboot on Ryzen 3 2200U
Product: [Fedora] Fedora Reporter: Markus Schönhaber <bugzilla-redhat>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED NEXTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 29CC: airlied, bskeggs, ewk, hdegoede, ichavero, itamar, jarodwilson, jglisse, john.j5live, jonathan, josef, kernel-maint, linville, mchehab, mjg59, steved, y9t7sypezp
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-29 16:32:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
log excerpt from a boot that hung
none
video of the screen during shutdown process
none
bisect log none

Description Markus Schönhaber 2018-11-22 16:12:18 UTC
Description of problem:
When trying to reboot or power off the machine, it seems to shut down normally but then hangs with screen turned off and not reacting to keyboard input. It has to be power-cycled by keeping the power button pressed for several seconds before it is usable again.
With the 4.18.18 kernel powering off or rebooting the machine works fine.

Version-Release number of selected component (if applicable):
4.19.2-301.fc29

How reproducible:
Always.

Steps to Reproduce:
1. Initiate power off or reboot of the machine.

Actual results:
Machine hangs after shutdown.

Expected results:
Machine powers off/reboots.

Additional info:
This happens on an Acer Aspire 3 A315-41 with a Ryzen 3 2200U.
kernel 4.19.2-301.fc29 shows this problem while 4.18.18-300.fc29 does not.

Model info (didn't find the exact same model on the en_US part of the Acer website, sorry):
https://www.acer.com/ac/de/DE/content/model/NX.GY9EG.015

Comment 1 Steve 2018-11-22 17:29:53 UTC
(In reply to bugzilla-redhat from comment #0)
> Description of problem:
> When trying to reboot or power off the machine, ...

In the "power off" case, how do you determine that the machine is hung?

Do suspend and resume work as expected?

Comment 2 Markus Schönhaber 2018-11-22 17:43:24 UTC
(In reply to Steve from comment #1)
> In the "power off" case, how do you determine that the machine is hung?
The power LED still shines blue and the only way to make the machine work again is by pressing and holding the power button for some seconds. Then it's really turned off and can be started normally again.

> Do suspend and resume work as expected?
I hadn't checked that but did now: no, it doesn't. The screen goes dark, the power LED shines blue (it would blink orange if the machine really was suspended) and only thing to do that has an effect is power-cycling the laptop.
Suspend and resume are no problem with the 4.18.18 kernel, though.

Comment 3 Steve 2018-11-22 22:06:23 UTC
Thanks for your followup reply.

Could you attach the end of the log file for a boot that ended with a hang?

If you immediately restart, this should give the right log:

$ journalctl -b -1 --no-hostname | tail -100 > journalctl-1.txt

(The number "100" is arbitrary. That's intended to get the shutdown messages.)

Obfuscate any info that you do not want included (hostname, username, etc.)

Comment 4 Markus Schönhaber 2018-11-23 08:17:51 UTC
Created attachment 1508224 [details]
log excerpt from a boot that hung

OK: I booted the 4.19.2 kernel, did "systemctl reboot", power cycled the machine, booted the same kernel again and created the log excerpt using
journalctl -b -1 --no-hostname | tail -100 > journalctl-1.txt

Comment 5 Steve 2018-11-23 15:02:41 UTC
(In reply to bugzilla-redhat from comment #4)
> Created attachment 1508224 [details]
> log excerpt from a boot that hung
> 
> OK: I booted the 4.19.2 kernel, did "systemctl reboot", power cycled the

Could you confirm that the machine hangs when you reboot using the desktop menu?

> machine, booted the same kernel again and created the log excerpt using
> journalctl -b -1 --no-hostname | tail -100 > journalctl-1.txt

Thanks. I am trying to reproduce this with F29 on an Intel laptop. The laptop doesn't hang, but I don't see these lines in my log:

...
Nov 23 09:08:23 kernel: fbcon: Taking over console
...
Nov 23 09:08:23 kernel: Console: switching to colour frame buffer device 240x67
...

Could you post your kernel command-line?

$ cat /proc/cmdline

Comment 6 Markus Schönhaber 2018-11-23 15:24:51 UTC
(In reply to Steve from comment #5)
> Could you confirm that the machine hangs when you reboot using the desktop
> menu?

Yes. It doesn't matter how I initiate the shutdown/reboot - from the login screen, from the desktop emnu (I'm using SDDM / KDE plasma BTW) or on the command line, the machine always hangs in the end.

> Thanks. I am trying to reproduce this with F29 on an Intel laptop.

I have a second laptop (HP ProBook 440 G5, Intel i5-8250U) also running F29 which does not show this problem - regardless of the kernel version. Therefore, to me, it seems unlikeley that you'll be able to reproduce it on your Intel machine.

> Could you post your kernel command-line?

# cat /proc/cmdline 
BOOT_IMAGE=/boot/vmlinuz-4.19.2-301.fc29.x86_64 root=/dev/mapper/acer-root ro resume=/dev/mapper/acer-swap rd.lvm.lv=acer/root rd.lvm.lv=acer/swap rhgb quiet LANG=de_DE.UTF-8 ivrs_ioapic[4]=00:14.0 ivrs_ioapic[5]=00:00.2

Comment 7 Hans de Goede 2018-11-23 15:45:45 UTC
Can you try dropping "rhgb" and "quiet" from your kernel-cmdline ?

You should then on power-off get a switch to text mode with a whole bunch of messages being printed in textmode. The last message being printed should be "poweroff" if the system keeps running after that then there is a problem with the poweroff handling in the kernel (it is asking the firmware to poweroff in a way which the firmware does not understand).

If it does not get to the "poweroff" message it would be interesting to know what the last couple of things which do get printed are.

Comment 8 Markus Schönhaber 2018-11-23 17:37:52 UTC
Created attachment 1508302 [details]
video of the screen during shutdown process

(In reply to Hans de Goede from comment #7)
> Can you try dropping "rhgb" and "quiet" from your kernel-cmdline ?

I did that, but unfortunately even though the machine itself doesn't power off, the screen does and I'm not able to read the (last) messages printed. I managed to make video of the shutdown process till the screen goes blank, though.

Comment 9 Hans de Goede 2018-11-23 17:46:27 UTC
The video ends with [sda] Stopping disk, which indicates that userspace has shutdown succesfully and this is an issue with the kernel's shutdown code, not some userspace problem which only triggers with 4.19 .

Other then coming to the conclusion that this really is a kernel problem I'm afraid I cannot offer much help.

Comment 10 Markus Schönhaber 2018-11-23 17:54:08 UTC
(In reply to Hans de Goede from comment #9)
> Other then coming to the conclusion that this really is a kernel problem I'm
> afraid I cannot offer much help.

OK, thanks for looking into it.

What can I do to help getting the problem pinned down?

Comment 11 Hans de Goede 2018-11-23 18:01:56 UTC
(In reply to bugzilla-redhat from comment #10)
> What can I do to help getting the problem pinned down?

The only thing which I can think of is doing a git bisect, before you go down that path first make sure that this reproduces with 4.19.0 from here:

https://koji.fedoraproject.org/koji/buildinfo?buildID=1155120

See here for instructions for installing a kernel directly from koji:
https://fedorapeople.org/~jwrdegoede/kernel-test-instructions.txt

If the problem also happens with 4.19.0 then you need to do a git bisect on the kernel between the v4.18 and v4.19 tags. There should be various howtos online on how to do this, although I'm not sure if there is one specifically targetting Fedora. But the process is the same everywhere.

Note doing a bisect takes a lot of computer time and also a not insignificant amount of human time. One piece of advice do not wait for the kernel builds to finish, just go and do something else while the kernel is building. This may make the total time it takes a bit longer, but waiting for the builds really is an inefficient use of your time.

Comment 12 Markus Schönhaber 2018-11-23 18:32:16 UTC
(In reply to Hans de Goede from comment #11)
> down that path first make sure that this reproduces with 4.19.0 from here:

It does.
OK, I'll try to do the bisecting...

Comment 13 Hans de Goede 2018-11-23 18:38:45 UTC
(In reply to bugzilla-redhat from comment #12)
> It does.
> OK, I'll try to do the bisecting...

Great. I've one request, if you have time and feel like it can you document the process a bit. Specifically which howto you started with and which Fedora specific steps you needed to take (like install package foo and bar to be able to build the kernel, where you for the kernel .config file to start with, etc.)
and then perhaps (again if you feel like it) you can create a wiki page about this on the Fedora wiki.

Hmm, double-checking if we don't already have a page for this I see we already have:
https://fedoraproject.org/wiki/User:Ignatenkobrain/Kernel/Bisection

So I guess you can follow that :)  Still if there are any inaccuracies there it would be great if you can write them down and let us know about them.

Comment 14 Markus Schönhaber 2018-11-28 19:23:22 UTC
Created attachment 1509617 [details]
bisect log

I cloned Linus' kernel tree
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
and did the bisecting there (git bisect log attached).
It seems the problem I'm seeing was introduced by commit
5c6ac7112fb2b73a5e4e7ac1648cdaceb558f268
Is there anything else I can do to help getting this fixed?

(In reply to Hans de Goede from comment #13)
> Still if there are any inaccuracies there it would be great if you can write them down and let us know about them.

Maybe I get around writing down what I did some time next week. But since this bug report is not the place to discuss this stuff: whom should I contact wrt this?

Comment 15 Hans de Goede 2018-11-28 19:58:53 UTC
Thank you for doing the bisect. Unfortunately the AMD gfx devs do not seem to have a clear how to on how to file a bug for this. I think it is best if you send a mail about this to Alex Deucher <alexander.deucher>. He should be able to tell you where and how to file a bug to get this looked into further.

As for doing some work on docs for bisecting, it is probably best if you contact the Fedora Docs team about that, either on irc or on their mailinglist, see:
https://fedoraproject.org/wiki/Docs_Project

Comment 16 Markus Schönhaber 2018-11-29 16:32:30 UTC
I was about to contact Alex Deucher but I thought it might be useful to check beforehand how the current 4.20-rc behaves wrt this issue. And indeed, neither a 4.20-rc4 kernel built from Linus' tree nor the 4.20.0-0.rc4.git1.1.fc30.x86_64 show any problems when powering off / rebooting / suspending the affected laptop.
So it seems the bug has already been fixed and there's probably no point in keeping this report open.

Obviously, I should have thought of checking 4.20-rc before I wasted a lot of my time (and yours, sorry!) on this bug report. At least I have learned something...

Comment 17 Hans de Goede 2018-11-29 21:03:13 UTC
(In reply to Markus Schönhaber from comment #16)
> I was about to contact Alex Deucher but I thought it might be useful to
> check beforehand how the current 4.20-rc behaves wrt this issue. And indeed,
> neither a 4.20-rc4 kernel built from Linus' tree nor the
> 4.20.0-0.rc4.git1.1.fc30.x86_64 show any problems when powering off /
> rebooting / suspending the affected laptop.
> So it seems the bug has already been fixed and there's probably no point in
> keeping this report open.
> 
> Obviously, I should have thought of checking 4.20-rc before I wasted a lot
> of my time (and yours, sorry!) on this bug report. At least I have learned
> something...

No problem, I should have thought of asking you to test 4.2-rc# myself...