Bug 1263762

Summary: Kernel panic when booting 32b images from usb on AMD processors
Product: [Fedora] Fedora Reporter: Petr Schindler <pschindl>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 23CC: awilliam, covex, daajjall, fweimer, gansalmon, itamar, jonathan, kernel-maint, kevin, kparal, lbrabec, madhu.chinakonda, mattdm, mchehab, pomidorabelisima, pschindl, robatino, sgallagh
Target Milestone: ---Keywords: CommonBugs
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: RejectedBlocker https://fedoraproject.org/wiki/Common_F23_bugs#amd-64-32
Fixed In Version: 4.2.1-300.fc23 kernel-4.1.8-200.fc22 kernel-4.1.8-100.fc21 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-09-24 05:07:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
photo of kernel panic none

Description Petr Schindler 2015-09-16 15:46:31 UTC
Created attachment 1074083 [details]
photo of kernel panic

Description of problem:
This happens only with 32b images (kernel).

When I try to boot usb (populated by dd or livecd-iso-to-disk) I got kernel panic after I chose to install in media grub.

This happens on both machines with amd processor I have here to test. It works correctly on machines with intel.

When I boot on intel I can see the message: 'Failed to find cpu0 device node' for a while (few seconds) before it boots normally.

You can find photo of kernel panic attached.

I tested this with Beta RC1 - Server DVD and netinst and with Workstation Live.

Version-Release number of selected component (if applicable):
4.2.0-300.fc23.i686

How reproducible:
Always on computers with amd processor

Steps to Reproduce:
1. put installation image on usb with dd or livecd-iso-to-disk
2. boot and choose to start the installation

Actual results:
kernel panic appears

Expected results:
System boots to the installer

Additional info:
I propose this as beta blocker as it violates the alpha criterion: 'The installer must run when launched normally from the release-blocking images.' when run on AMD machines.

Comment 1 Josh Boyer 2015-09-16 15:54:01 UTC
32-bit isn't a priority for the kernel team.  I'm just generally making that observation.

Which CPU(s) do you see this with?

Also, there isn't enough information to know where the issue starts.  If you could boot with pause_on_oops=60 and try and capture the full backtrace, that would be helpful.

Comment 2 Stephen Gallagher 2015-09-16 16:45:24 UTC
Petr, can you be more specific about what constitutes "AMD machines"? Is it one specific CPU family? Only models from 2008 and older, what?

Comment 3 Adam Williamson 2015-09-16 16:46:04 UTC
It's a bit of a shot in the dark, but https://lkml.org/lkml/2015/1/23/562 has a similar-looking trace. Seems like some cache-related patch was submitted, someone hit a trace looking somewhat similar to this one, and the patch was revised to fix it. Just possibly, since the traces look somewhat similar (and at least we can see cacheinfo_sysfs_init in Petr's too), this is related to that change? https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/arch/x86/kernel/cpu/intel_cacheinfo.c?id=0d55ba46bfbee64fd2b492b87bfe2ec172e7b056

Comment 4 Stephen Gallagher 2015-09-16 16:53:50 UTC
Additionally, from the description it sounds like this is only a problem for the i686 install media? Given that all modern AMD systems support 64-bit, I'm comfortable with making this a Common Bug and telling people to just install from the 64-bit media.

Comment 5 Adam Williamson 2015-09-16 17:36:40 UTC
For the record, the last range of 32-bit AMD processors was the last set of Athlon XPs, in early 2003. I did find a few references with Google to people who are still running F22 and F23 on such processors:

http://lists.opensuse.org/opensuse/2015-08/msg00364.html
https://bugzilla.redhat.com/show_bug.cgi?id=1244515

we don't know yet if this bug affects them, I don't have anything that old to test.

Comment 6 Adam Williamson 2015-09-16 18:20:21 UTC
Reproduced with a Phenom II X4 960T. The screen I get when booting with pause_on_oops=60 is https://www.happyassassin.net/temp/1263762.0.jpg , if that helps.

For the case of 64-bit CPUs I'm certainly fine with just documenting this, at least for Beta. If it also happened to 32-bit CPUs it'd be a bit less clear-cut. I'll send out a mail asking anyone who still has an Athlon XP box lying around to try it.

Comment 7 Kevin Fenzi 2015-09-16 22:18:17 UTC
Might possibly also be related to or a case of: 

https://bugzilla.kernel.org/show_bug.cgi?id=101971

Comment 8 Matthew Miller 2015-09-16 23:24:10 UTC
(In reply to awilliam from comment #5)
> For the record, the last range of 32-bit AMD processors was the last set of
> Athlon XPs, in early 2003. I did find a few references with Google to people
> who are still running F22 and F23 on such processors:
> 
> http://lists.opensuse.org/opensuse/2015-08/msg00364.html
> https://bugzilla.redhat.com/show_bug.cgi?id=1244515
> 
> we don't know yet if this bug affects them, I don't have anything that old
> to test.

Yeah, I think the main reason for 32-bit x86 Fedora on systems newer than that is memory use, not lack of 64-bit capability.

It's my _personal_ opinion that we shouldn't hold up the release for this. I'll work on developing an _official_ opinion by the readiness meetings tomorrow. :)

Comment 9 Josh Boyer 2015-09-16 23:54:46 UTC
Dave Airlie pointed out this was already reported.  It looks like bug 1253566 was the original report, but the reporter closed it out without comment.

I've pinged upstream on it though, as I don't see a fix in Linus' tree yet either.

Comment 10 Kamil Páral 2015-09-17 06:59:50 UTC
One of the affected CPUs is AMD FX-4100. With maxcpus=1 or maxcpus=0 it boots OK. 64bit Fedora images boot OK by default, only 32bit images are affected.

Comment 11 Petr Schindler 2015-09-17 07:04:50 UTC
The second one we tested this on is AMD A10-7870K Radeon R7. It boots with maxcpus=1 or maxcpus=0.

This bug affects systems upgraded to f23 too, but users can use F22's kernel. So it's not a big trouble so far.

Comment 12 poma 2015-09-17 11:12:58 UTC
https://bugzilla.kernel.org/show_bug.cgi?id=101971#c8

Tested-by: poma <pomidorabelisima>

Comment 13 Shimon 2015-09-17 11:15:12 UTC
I've got a beautiful Athlon XP 2.2 GHz still running fine :)
Do you still need to run the image on such hardware or is the problem more generic?

BTW, is the image link correct? (x86_64 looks strange)

https://dl.fedoraproject.org/pub/alt/stage/23_Alpha_RC1/Server/x86_64/iso/Fedora-Server-netinst-x86_64-23_Alpha.iso

Comment 14 poma 2015-09-17 11:19:26 UTC
BTW we are waiting for Greg KH, right?
https://lists.manjaro.org/pipermail/manjaro-dev/Week-of-Mon-20150803/000579.html

Mainline 4.3-rc1 is already there, therefore.

Comment 15 Kamil Páral 2015-09-17 12:16:02 UTC
(In reply to Shimon from comment #13)
> I've got a beautiful Athlon XP 2.2 GHz still running fine :)
> Do you still need to run the image on such hardware or is the problem more
> generic?

Yes, it seems to be 32-bit only, it would be great to have your results.

> 
> BTW, is the image link correct? (x86_64 looks strange)
> 
> https://dl.fedoraproject.org/pub/alt/stage/23_Alpha_RC1/Server/x86_64/iso/
> Fedora-Server-netinst-x86_64-23_Alpha.iso

The correct link is http://dl.fedoraproject.org/pub/alt/stage/23_Beta_RC1/Server/i386/iso/Fedora-Server-netinst-i386-23_Beta.iso .

Comment 16 poma 2015-09-17 12:22:50 UTC
(In reply to Josh Boyer from comment #9)
> Dave Airlie pointed out this was already reported.  It looks like bug
> 1253566 was the original report, but the reporter closed it out without
> comment.
> 

Here again, a comment for you:


> I've pinged upstream on it though, as I don't see a fix in Linus' tree yet
> either.

A Ping?

We all live in a yellow submarine
Yellow submarine, yellow submarine

Comment 17 Josh Boyer 2015-09-17 12:55:08 UTC
Please test this scratch build when it completes:

http://koji.fedoraproject.org/koji/taskinfo?taskID=11122367

Comment 18 Florian Weimer 2015-09-17 13:18:25 UTC
(In reply to awilliam from comment #5)
> For the record, the last range of 32-bit AMD processors was the last set of
> Athlon XPs, in early 2003.

I think AMD Geode LX is still being sold, unfortunately.

Comment 19 Eric Hobbs 2015-09-17 13:33:05 UTC
I was able to get it to boot: http://imgur.com/v8SroOj

CPU is an AMD Athlon XP 2800+

Comment 20 Josh Boyer 2015-09-17 14:26:46 UTC
Resetting needinfo flag until we get confirmation on comment #17

Comment 21 Adam Williamson 2015-09-17 15:07:58 UTC
Florian: I believe the LX is a fundamentally different design, though, so it's not particularly likely to be affected by this? If anyone has one it'd be good to test it I guess.

Comment 22 Adam Williamson 2015-09-17 17:09:14 UTC
Discussed at 2015-09-17 Fedora 23 Beta Go/No-Go meeting, acting as a blocker review meeting: https://meetbot-raw.fedoraproject.org/teams/f23_beta_go_no-go_meeting/f23_beta_go_no-go_meeting.2015-09-17-16.00.log.txt . With all the feedback from the mailing list, a few things seem clear:

1) This issue seems to affect more recent 64-bit AMD CPUs.
2) This issue does not seem to affect any 32-bit AMD CPUs.
3) The kernel parameter 'maxcpus=0' or 'maxcpus=1' appears to work around the issue on affected CPUs.
4) It does affect existing 32-bit installs that are updated to F23 (on affected CPUs), but this can be worked around with the kernel parameter, or booting an F22 kernel (upgraded systems will still have older kernels).

Taken together it seems clear the impact of this is not sufficient to block the release, and so this was rejected as a blocker. We can document the workarounds for affected systems on the CommonBugs page.

jwb: I'll try and get around to testing the scratch build later today.

Comment 23 poma 2015-09-17 18:39:57 UTC
CPU op-mode(s):        32-bit, 64-bit
CPU op-mode(s):        32-bit
Vendor ID:             AuthenticAMD

4.2.0-301.fc23.i686+debug
PASSED/BOOTABLE

Comment 24 Shimon 2015-09-17 19:30:02 UTC
I was about to test it myself but in spite of the image (on-device) having the correct sum it doesn't boot with this mainboard's bios. It's not even detected by plopkexec (probably related to the partition type).
In the case of bioses expecting usb hd devices I'm pretty sure using dd is not enough if it's an iso image.

I was able to boot Knoppix via usb in the past so this release might have more problems than just old AMD processors...

Comment 25 Adam Williamson 2015-09-17 21:56:52 UTC
F23 isn't any different to the last several releases in terms of USB writing, Shimon. It's entirely possible it doesn't work on very old motherboards (I remember USB booting used to be a lot more finicky and less reliable), but then I would expect other recent releases to fail too, and I'm not sure we'd really care much, honestly. We can only support ancient hardware *so* far, and you do have other options (optical disc, network boot).

Comment 26 Shimon 2015-09-17 23:01:42 UTC
No, the correct instructions should have looked like that (you have my permission to use this):

For a completely motherboard agnostic boot, a tool like Universal USB Installer on windows or Unetbootin on Linux should be used (if your motherboard supports usb-cd emulation you can dd the image directly to the device) or burn the iso image to a CD-ROM to boot from an optical drive.

You're welcome.

Comment 27 Adam Williamson 2015-09-17 23:16:19 UTC
Sorry, no, we don't recommend those tools; they cause far more problems than they solve. Just check forum and G+ posts for people saying 'my USB stick doesn't boot!', someone replying "did you use unetbootin? don't do that", and the person replying "oh yeah I did, dd works fine"...

Comment 28 Josh Boyer 2015-09-18 12:19:04 UTC
Has anyone else tested the kernel from comment #17?  It would be really good to get confirmation of the potential fix for the very urgent issue </sarcasm>.

Comment 29 Lukas Brabec 2015-09-18 12:20:39 UTC
(In reply to Josh Boyer from comment #17)
> Please test this scratch build when it completes:
> 
> http://koji.fedoraproject.org/koji/taskinfo?taskID=11122367

yep, that works, problem fixed

Comment 30 Josh Boyer 2015-09-18 12:27:47 UTC
Thank you.  Patch pushed to Fedora git.

Comment 31 Fedora Update System 2015-09-22 12:05:00 UTC
kernel-4.2.1-300.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2015-16417

Comment 32 Fedora Update System 2015-09-22 16:46:57 UTC
kernel-4.1.8-200.fc22 has been submitted as an update to Fedora 22. https://bodhi.fedoraproject.org/updates/FEDORA-2015-16440

Comment 33 Fedora Update System 2015-09-22 16:48:56 UTC
kernel-4.1.8-100.fc21 has been submitted as an update to Fedora 21. https://bodhi.fedoraproject.org/updates/FEDORA-2015-16441

Comment 34 Fedora Update System 2015-09-23 03:55:12 UTC
kernel-4.2.1-300.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report.
If you want to test the update, you can install it with
$ su -c 'dnf --enablerepo=updates-testing update kernel'
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-16417

Comment 35 Fedora Update System 2015-09-23 05:22:41 UTC
kernel-4.1.8-100.fc21 has been pushed to the Fedora 21 testing repository. If problems still persist, please make note of it in this bug report.
If you want to test the update, you can install it with
$ su -c 'dnf --enablerepo=updates-testing update kernel'
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-16441

Comment 36 Fedora Update System 2015-09-23 21:22:00 UTC
kernel-4.1.8-200.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report.
If you want to test the update, you can install it with
$ su -c 'dnf --enablerepo=updates-testing update kernel'
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-16440

Comment 37 Fedora Update System 2015-09-24 05:07:19 UTC
kernel-4.2.1-300.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report.

Comment 38 Fedora Update System 2015-10-03 21:14:48 UTC
kernel-4.1.8-200.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report.

Comment 39 Fedora Update System 2015-10-03 21:51:32 UTC
kernel-4.1.8-100.fc21 has been pushed to the Fedora 21 stable repository. If problems still persist, please make note of it in this bug report.