Bug 1263762
Summary: | Kernel panic when booting 32b images from usb on AMD processors | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Petr Schindler <pschindl> | ||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 23 | CC: | awilliam, covex, daajjall, fweimer, gansalmon, itamar, jonathan, kernel-maint, kevin, kparal, lbrabec, madhu.chinakonda, mattdm, mchehab, pomidorabelisima, pschindl, robatino, sgallagh | ||||
Target Milestone: | --- | Keywords: | CommonBugs | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | RejectedBlocker https://fedoraproject.org/wiki/Common_F23_bugs#amd-64-32 | ||||||
Fixed In Version: | 4.2.1-300.fc23 kernel-4.1.8-200.fc22 kernel-4.1.8-100.fc21 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2015-09-24 05:07:40 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
32-bit isn't a priority for the kernel team. I'm just generally making that observation. Which CPU(s) do you see this with? Also, there isn't enough information to know where the issue starts. If you could boot with pause_on_oops=60 and try and capture the full backtrace, that would be helpful. Petr, can you be more specific about what constitutes "AMD machines"? Is it one specific CPU family? Only models from 2008 and older, what? It's a bit of a shot in the dark, but https://lkml.org/lkml/2015/1/23/562 has a similar-looking trace. Seems like some cache-related patch was submitted, someone hit a trace looking somewhat similar to this one, and the patch was revised to fix it. Just possibly, since the traces look somewhat similar (and at least we can see cacheinfo_sysfs_init in Petr's too), this is related to that change? https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/arch/x86/kernel/cpu/intel_cacheinfo.c?id=0d55ba46bfbee64fd2b492b87bfe2ec172e7b056 Additionally, from the description it sounds like this is only a problem for the i686 install media? Given that all modern AMD systems support 64-bit, I'm comfortable with making this a Common Bug and telling people to just install from the 64-bit media. For the record, the last range of 32-bit AMD processors was the last set of Athlon XPs, in early 2003. I did find a few references with Google to people who are still running F22 and F23 on such processors: http://lists.opensuse.org/opensuse/2015-08/msg00364.html https://bugzilla.redhat.com/show_bug.cgi?id=1244515 we don't know yet if this bug affects them, I don't have anything that old to test. Reproduced with a Phenom II X4 960T. The screen I get when booting with pause_on_oops=60 is https://www.happyassassin.net/temp/1263762.0.jpg , if that helps. For the case of 64-bit CPUs I'm certainly fine with just documenting this, at least for Beta. If it also happened to 32-bit CPUs it'd be a bit less clear-cut. I'll send out a mail asking anyone who still has an Athlon XP box lying around to try it. Might possibly also be related to or a case of: https://bugzilla.kernel.org/show_bug.cgi?id=101971 (In reply to awilliam from comment #5) > For the record, the last range of 32-bit AMD processors was the last set of > Athlon XPs, in early 2003. I did find a few references with Google to people > who are still running F22 and F23 on such processors: > > http://lists.opensuse.org/opensuse/2015-08/msg00364.html > https://bugzilla.redhat.com/show_bug.cgi?id=1244515 > > we don't know yet if this bug affects them, I don't have anything that old > to test. Yeah, I think the main reason for 32-bit x86 Fedora on systems newer than that is memory use, not lack of 64-bit capability. It's my _personal_ opinion that we shouldn't hold up the release for this. I'll work on developing an _official_ opinion by the readiness meetings tomorrow. :) Dave Airlie pointed out this was already reported. It looks like bug 1253566 was the original report, but the reporter closed it out without comment. I've pinged upstream on it though, as I don't see a fix in Linus' tree yet either. One of the affected CPUs is AMD FX-4100. With maxcpus=1 or maxcpus=0 it boots OK. 64bit Fedora images boot OK by default, only 32bit images are affected. The second one we tested this on is AMD A10-7870K Radeon R7. It boots with maxcpus=1 or maxcpus=0. This bug affects systems upgraded to f23 too, but users can use F22's kernel. So it's not a big trouble so far. https://bugzilla.kernel.org/show_bug.cgi?id=101971#c8 Tested-by: poma <pomidorabelisima> I've got a beautiful Athlon XP 2.2 GHz still running fine :) Do you still need to run the image on such hardware or is the problem more generic? BTW, is the image link correct? (x86_64 looks strange) https://dl.fedoraproject.org/pub/alt/stage/23_Alpha_RC1/Server/x86_64/iso/Fedora-Server-netinst-x86_64-23_Alpha.iso BTW we are waiting for Greg KH, right? https://lists.manjaro.org/pipermail/manjaro-dev/Week-of-Mon-20150803/000579.html Mainline 4.3-rc1 is already there, therefore. (In reply to Shimon from comment #13) > I've got a beautiful Athlon XP 2.2 GHz still running fine :) > Do you still need to run the image on such hardware or is the problem more > generic? Yes, it seems to be 32-bit only, it would be great to have your results. > > BTW, is the image link correct? (x86_64 looks strange) > > https://dl.fedoraproject.org/pub/alt/stage/23_Alpha_RC1/Server/x86_64/iso/ > Fedora-Server-netinst-x86_64-23_Alpha.iso The correct link is http://dl.fedoraproject.org/pub/alt/stage/23_Beta_RC1/Server/i386/iso/Fedora-Server-netinst-i386-23_Beta.iso . (In reply to Josh Boyer from comment #9) > Dave Airlie pointed out this was already reported. It looks like bug > 1253566 was the original report, but the reporter closed it out without > comment. > Here again, a comment for you: > I've pinged upstream on it though, as I don't see a fix in Linus' tree yet > either. A Ping? We all live in a yellow submarine Yellow submarine, yellow submarine Please test this scratch build when it completes: http://koji.fedoraproject.org/koji/taskinfo?taskID=11122367 (In reply to awilliam from comment #5) > For the record, the last range of 32-bit AMD processors was the last set of > Athlon XPs, in early 2003. I think AMD Geode LX is still being sold, unfortunately. I was able to get it to boot: http://imgur.com/v8SroOj CPU is an AMD Athlon XP 2800+ Resetting needinfo flag until we get confirmation on comment #17 Florian: I believe the LX is a fundamentally different design, though, so it's not particularly likely to be affected by this? If anyone has one it'd be good to test it I guess. Discussed at 2015-09-17 Fedora 23 Beta Go/No-Go meeting, acting as a blocker review meeting: https://meetbot-raw.fedoraproject.org/teams/f23_beta_go_no-go_meeting/f23_beta_go_no-go_meeting.2015-09-17-16.00.log.txt . With all the feedback from the mailing list, a few things seem clear: 1) This issue seems to affect more recent 64-bit AMD CPUs. 2) This issue does not seem to affect any 32-bit AMD CPUs. 3) The kernel parameter 'maxcpus=0' or 'maxcpus=1' appears to work around the issue on affected CPUs. 4) It does affect existing 32-bit installs that are updated to F23 (on affected CPUs), but this can be worked around with the kernel parameter, or booting an F22 kernel (upgraded systems will still have older kernels). Taken together it seems clear the impact of this is not sufficient to block the release, and so this was rejected as a blocker. We can document the workarounds for affected systems on the CommonBugs page. jwb: I'll try and get around to testing the scratch build later today. CPU op-mode(s): 32-bit, 64-bit CPU op-mode(s): 32-bit Vendor ID: AuthenticAMD 4.2.0-301.fc23.i686+debug PASSED/BOOTABLE I was about to test it myself but in spite of the image (on-device) having the correct sum it doesn't boot with this mainboard's bios. It's not even detected by plopkexec (probably related to the partition type). In the case of bioses expecting usb hd devices I'm pretty sure using dd is not enough if it's an iso image. I was able to boot Knoppix via usb in the past so this release might have more problems than just old AMD processors... F23 isn't any different to the last several releases in terms of USB writing, Shimon. It's entirely possible it doesn't work on very old motherboards (I remember USB booting used to be a lot more finicky and less reliable), but then I would expect other recent releases to fail too, and I'm not sure we'd really care much, honestly. We can only support ancient hardware *so* far, and you do have other options (optical disc, network boot). No, the correct instructions should have looked like that (you have my permission to use this): For a completely motherboard agnostic boot, a tool like Universal USB Installer on windows or Unetbootin on Linux should be used (if your motherboard supports usb-cd emulation you can dd the image directly to the device) or burn the iso image to a CD-ROM to boot from an optical drive. You're welcome. Sorry, no, we don't recommend those tools; they cause far more problems than they solve. Just check forum and G+ posts for people saying 'my USB stick doesn't boot!', someone replying "did you use unetbootin? don't do that", and the person replying "oh yeah I did, dd works fine"... Has anyone else tested the kernel from comment #17? It would be really good to get confirmation of the potential fix for the very urgent issue </sarcasm>. (In reply to Josh Boyer from comment #17) > Please test this scratch build when it completes: > > http://koji.fedoraproject.org/koji/taskinfo?taskID=11122367 yep, that works, problem fixed Thank you. Patch pushed to Fedora git. kernel-4.2.1-300.fc23 has been submitted as an update to Fedora 23. https://bodhi.fedoraproject.org/updates/FEDORA-2015-16417 kernel-4.1.8-200.fc22 has been submitted as an update to Fedora 22. https://bodhi.fedoraproject.org/updates/FEDORA-2015-16440 kernel-4.1.8-100.fc21 has been submitted as an update to Fedora 21. https://bodhi.fedoraproject.org/updates/FEDORA-2015-16441 kernel-4.2.1-300.fc23 has been pushed to the Fedora 23 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with $ su -c 'dnf --enablerepo=updates-testing update kernel' You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-16417 kernel-4.1.8-100.fc21 has been pushed to the Fedora 21 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with $ su -c 'dnf --enablerepo=updates-testing update kernel' You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-16441 kernel-4.1.8-200.fc22 has been pushed to the Fedora 22 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with $ su -c 'dnf --enablerepo=updates-testing update kernel' You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2015-16440 kernel-4.2.1-300.fc23 has been pushed to the Fedora 23 stable repository. If problems still persist, please make note of it in this bug report. kernel-4.1.8-200.fc22 has been pushed to the Fedora 22 stable repository. If problems still persist, please make note of it in this bug report. kernel-4.1.8-100.fc21 has been pushed to the Fedora 21 stable repository. If problems still persist, please make note of it in this bug report. |
Created attachment 1074083 [details] photo of kernel panic Description of problem: This happens only with 32b images (kernel). When I try to boot usb (populated by dd or livecd-iso-to-disk) I got kernel panic after I chose to install in media grub. This happens on both machines with amd processor I have here to test. It works correctly on machines with intel. When I boot on intel I can see the message: 'Failed to find cpu0 device node' for a while (few seconds) before it boots normally. You can find photo of kernel panic attached. I tested this with Beta RC1 - Server DVD and netinst and with Workstation Live. Version-Release number of selected component (if applicable): 4.2.0-300.fc23.i686 How reproducible: Always on computers with amd processor Steps to Reproduce: 1. put installation image on usb with dd or livecd-iso-to-disk 2. boot and choose to start the installation Actual results: kernel panic appears Expected results: System boots to the installer Additional info: I propose this as beta blocker as it violates the alpha criterion: 'The installer must run when launched normally from the release-blocking images.' when run on AMD machines.