Description of problem: I tried installing F25 on one of our test machines in our office. Installation went fine, but the system does not boot afterwards. Grub shows up, and after selecting a kernel I see a quick flash of initial kernel messages and then screen goes black and the system reboots. I tried several install media - Workstation Live, netinst (installing both Workstation and minimal system), pxeboot. All methods boot into the installer, but the installed system doesn't. This all happened using BIOS mode (since UEFI installation is broken atm). I have no idea why kernel reboots immediately, how to prevent that behavior, and how to receive important logs. I tested with enforcing=0 and selinux=0, same issue. I also tried booting the "rescue" kernel, same issue. In the end I started anaconda rescue mode and installed F24 kernel. With F24 system, the F25 system boots fine. Version-Release number of selected component (if applicable): kernel-4.8.0-0.rc1.git3.1.fc25.x86_64 kernel-4.6.6-300.fc24.x86_64 (works) Fedora-Workstation-Live-x86_64-25-20160815.n.2.iso Fedora-Everything-netinst-x86_64-25-20160815.n.2.iso Hardware: Base Board Information Manufacturer: ASUSTeK COMPUTER INC. Product Name: M5A97 PRO Processor Information Socket Designation: Socket 942 Type: Central Processor Family: FX Manufacturer: AMD ID: 12 0F 60 00 FF FB 8B 17 Signature: Family 21, Model 1, Stepping 2 How reproducible: always, on this machine Steps to Reproduce: 1. install F25 by any method, any package set, in BIOS mode 2. try to boot it 3. see the computer reboot immediately after trying to boot Additional information: I have seen this exact problem on just this machine, so it's not a universal issue. We have two other bare metal machines in the office - one of them doesn't even boot from the install media (kernel panic, probably a different issue), the other one works fine.
Created attachment 1191141 [details] lshw output
Created attachment 1191144 [details] lspci output
Created attachment 1191145 [details] dmidecode output
Created attachment 1191146 [details] journal (boot with F24 kernel)
Created attachment 1191147 [details] rpm -qa output
Proposing as a blocker, this violates this release criterion: " A system installed with a release-blocking desktop must boot to a log in screen where it is possible to log in to a working desktop using a user account created during installation or a 'first boot' utility. " https://fedoraproject.org/wiki/Fedora_25_Alpha_Release_Criteria#Expected_installed_system_boot_behavior Of course this is conditional for a specific hardware, so the decision should reflect that.
I am afraid that you are not alone, you may refer to the webpage below: # http://www.spinics.net/lists/linux-pci/msg51218.html Someone encounters the similar problem just like your laptop. I am on doubt about the compatibility problem based on your RD9x0/RX980 Host Bridge: # https://en.wikipedia.org/wiki/AMD_900_chipset_series Enabling multiple MSI vectors for the SATA controller when three or more SATA ports are used results in loss of interrupts and system hang. I`ll keep on watch whether other issues happened among FC*AND_FX
Long shot: can you try the scratch build in https://bugzilla.redhat.com/show_bug.cgi?id=1365917#c2 which has been linked to bootup issues on F25?
(In reply to ChunYu Wang from comment #7) > I am afraid that you are not alone, you may refer to the webpage below: > > # http://www.spinics.net/lists/linux-pci/msg51218.html The error message referenced in the email title is exactly the same error which I see when running lspci -vvv (on F24 kernel): # lspci -vvv > /dev/null pcilib: sysfs_read_vpd: read failed: Input/output error > I am on doubt about the compatibility problem based on your RD9x0/RX980 Host > Bridge: > > # https://en.wikipedia.org/wiki/AMD_900_chipset_series > Enabling multiple MSI vectors for the SATA controller when three or more > SATA ports are used results in loss of interrupts and system hang. I had 3 SATA ports used (2 disks, 1 DVD drive). I unplugged everything except a single HDD, even reinstalled F25, but the problem didn't disappear.
(In reply to Laura Abbott from comment #8) > Long shot: can you try the scratch build in > https://bugzilla.redhat.com/show_bug.cgi?id=1365917#c2 which has been linked > to bootup issues on F25? Doesn't help, same issue.
Kamil: can you say whether kernel-4.8.0-0.rc1.git0.1.fc25 is affected? That is the current stable F25 kernel.
Can you try removing 'quiet' from the grub kernel command line and add 'panic=0' to the kernel command line? This should get kernel messages and stop an automatic reboot if it's set up. Can you also try the 4.7.0 kernel from https://copr.fedorainfracloud.org/coprs/jforbes/kernel-stabilization/build/428437/ ? This would help narrow down the problem to 4.7 (stable kernel) or an actual rawhide problems)
for blocker / release engineering purposes: labbott states she believes, but cannot be certain, that kernel-4.8.0-0.rc1.git0.1.fc25 - which is the current 'stable' f25 kernel build, i.e. the one in the 'fedora' repo and which is included in composes - *would* be affected by this bug. That would mean that if we decide the bug is a blocker, we must find a fix for it before we can ship Alpha. We will await kamil's confirmation of this. We do not yet have a fix for this issue. labbott also states she'd vote -1 blocker / +1 FE for this bug, given the range of hardware affected. AFAICS the affected hardware looks to be 'some AMD chipsets'. It's slightly hard to make a call, but for now I can probably go with labbott's vote. I appear to have a manual for an 'M5A97 R2.0' lying around here, which means presumably I've got one of those in some system or other. If I can track it down I'll try and reproduce the bug...
Absent evidence that this affects a huge amount of hardware, I'll also vote -1 blocker/+1 FE for now.
(In reply to Adam Williamson from comment #12) > Kamil: can you say whether kernel-4.8.0-0.rc1.git0.1.fc25 is affected? That > is the current stable F25 kernel. Yes, same issue. (In reply to Laura Abbott from comment #13) > Can you try removing 'quiet' from the grub kernel command line and add > 'panic=0' to the kernel command line? This should get kernel messages and > stop an automatic reboot if it's set up. I removed 'rhgb quiet' and added 'panic=0' and it still reboots immediately. > Can you also try the 4.7.0 kernel from > https://copr.fedorainfracloud.org/coprs/jforbes/kernel-stabilization/build/ > 428437/ ? This would help narrow down the problem to 4.7 (stable kernel) or > an actual rawhide problems) That one works OK. I also tested kernel-4.8.0-0.rc0.git1.1.fc25 (the first 4.8 kernel built in Koji) and again it doesn't boot. I wonder why I'm able to boot into the installer, though? What is different between a LiveCD/pxe boot and the installed system boot? Can it be somehow related to initramfs instead of the kernel?
Discussed at 2016-08-18 go/no-go meeting, functioning as a blocker review meeting: https://meetbot-raw.fedoraproject.org/fedora-meeting/2016-08-18/f25-alpha-go_no_go-meeting.2016-08-18-17.00.html . We agreed to delay the decision on this one, as we don't yet have a clear feel for how much hardware may be affected. We will send out a request for more testing to the public lists.
Tested install with F25-everything-netinst-2016-08-16 on AMD A10-7700K CPU and ASUS A68HM-E FM2+mATX AMD Motherboard. Installed F Wkstn + a few extra groups. 3 disks using ext4 with home and var on their own disks. No issues with install, booted into Gnome on wayland first try. Let me know if more info is needed. What FS was original tester using?
I tested kernel-4.8.0-0.rc2.git2.1.fc25 as the latest kernel built in Koji and the problem still persists. (In reply to zachvatwork from comment #19) > What FS was original tester using? If "FS" means filesystem, it was a default Workstation/Everything install, so lvm+ext4.
disable_timer_pin_1 appended to kernel cmdline gonna stop the machine at the breaking point: ..TIMER: vector=... 4.8.0-0.rc2.git2.1.fc26.x86_64
(In reply to ChunYu Wang from comment #7) > I am afraid that you are not alone, you may refer to the webpage below: > > # http://www.spinics.net/lists/linux-pci/msg51218.html > > Someone encounters the similar problem just like your laptop. > > I am on doubt about the compatibility problem based on your RD9x0/RX980 Host > Bridge: > > # https://en.wikipedia.org/wiki/AMD_900_chipset_series > Enabling multiple MSI vectors for the SATA controller when three or more > SATA ports are used results in loss of interrupts and system hang. > > I`ll keep on watch whether other issues happened among FC*AND_FX Also NVIDIA MCP78S chipset https://en.wikipedia.org/wiki/NForce_700
Created attachment 1192117 [details] boot messages stopped with disable_timer_pin_1 (In reply to poma from comment #21) > disable_timer_pin_1 > appended to kernel cmdline gonna stop the machine at the breaking point: > ..TIMER: vector=... Yes it did. Screenshot attached. But does it help in debugging why it auto-reboots?
It seems to be initrd loading related. At least here (AMD) its an early reboot with all the 4.8.0-rc kernels so far when having an initrd. Since my rootfs doesn't need the initrd I tried removing it from the grub config and at least 4.8.0-0.rc2.git2.2.fc26.x86_64 boots fine without it. Perhaps https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=88b2f634028f1f38dcc3d412e10ff1f224976daa merged by Linus 15 hours ago..
That is a very likely culprit, I have kernel-4.8.0-0.rc2.git3.1.fc25.src.rpm building right now, which will contain the patch listed there. Hopefully we can get that tested and verify that it fixes things.
(In reply to Kamil Páral from comment #23) > Created attachment 1192117 [details] > boot messages stopped with disable_timer_pin_1 > > (In reply to poma from comment #21) > > disable_timer_pin_1 > > appended to kernel cmdline gonna stop the machine at the breaking point: > > ..TIMER: vector=... > > Yes it did. Screenshot attached. But does it help in debugging why it > auto-reboots? Given that the "panic=..." directive has no effect ... panic= [KNL] Kernel behaviour on panic: delay <timeout> timeout > 0: seconds before rebooting timeout = 0: wait forever timeout < 0: reboot immediately Format: <timeout> https://www.kernel.org/doc/Documentation/kernel-parameters.txt ... this looks more like a hardware reset itself, rather than "auto-reboot". If you want to actually debug https://www.kernel.org/doc/Documentation/serial-console.txt http://www.tldp.org/HOWTO/text/Remote-Serial-Console-HOWTO
(In reply to Justin M. Forbes from comment #25) > That is a very likely culprit, I have kernel-4.8.0-0.rc2.git3.1.fc25.src.rpm > building right now, which will contain the patch listed there. Hopefully we > can get that tested and verify that it fixes things. 4.8.0-0.rc2.git3.1.fc26.x86_64 BOOT PASSED
Proposing as an Alpha FE also, since we have a fix now; I'd be +1 to this for FE for sure. Any other votes?
+1 FE
FYI have testet 4.8.0-0.rc1.git0.1.fc25.x86_64 HW AMD Phenom II X4 965 MB Gigabyte 890GPA-UD3H (890GX + SB850 chipset) 32 GB ram Video:R7-250 Radeon 4 sata drives result: Stops and hang just after selecting the kernel in GRUB removed quiet and rghb from kernel cmd line Same result but get boot activity until it hangs Picture from screen attached where it hangs Knud
Created attachment 1192253 [details] picture from hang
kparal: Mind testing the kernels in http://koji.fedoraproject.org/koji/buildinfo?buildID=793063 for us? I think they will fix your issue.
http://koji.fedoraproject.org/koji/buildinfo?buildID=793063 (Comment 32) works for me, whereas before I got the same symptoms as Knud in Comment 31 (hang at less than 1 second into boot, serial console shows "x86: Booting SMP configuration:\n.... node #0, CPUs: #1") processor : 0 vendor_id : AuthenticAMD cpu family : 21 model : 16 model name : AMD A10-5800K APU with Radeon(tm) HD Graphics stepping : 1 microcode : 0x6001119 cpu MHz : 1400.000 cache size : 2048 KB physical id : 0 siblings : 4 Base Board Information Manufacturer: ASUSTeK COMPUTER INC. Product Name: F2A85-M PRO Version: Rev X.0x BIOS Information Vendor: American Megatrends Inc. Version: 6105 Release Date: 05/08/2013 Disks: SSD(sata3) as /dev/sda, HD(sata3) as /dev/sdb UEFI boot using grub2 from SSD.
4.8.0-0.rc2.git3.2.fc26.x86_64 (nodebug) works for me here
Tried Fedora-25-20160821.n.0 netinstall iso from https://kojipkgs.fedoraproject.org/compose/branched/Fedora-25-20160821.n.0/compose/Workstation/x86_64/iso/ It seemed to boot and install ok, though got a SE linux warning. System details: AMD FX 8350 Radeon HD 5450 Graphics 1 hard disk through sata 1 DVD-R through sata 1 hard disk through USB (Fedora 25 installed on this one) processor : 0 vendor_id : AuthenticAMD cpu family : 21 model : 2 model name : AMD FX(tm)-8350 Eight-Core Processor stepping : 0 microcode : 0x600084f cpu MHz : 1400.000 cache size : 2048 KB physical id : 0 siblings : 8 Base Board information: Manufacturer: ASUTek COMPUTER INC. Product Name: M5A97 LE R2.0 Version: Rev 1.xx BIOS information: Vendor: American Megatrends Inc. Version: 2202 Release Date: 12/12/2013
I installed system with kernel-4.8.0-0.rc2.git3.1.fc25 from comment 32 (from side repo) on Kamil's computer which is and it boots successfully.
As discussed above this is addressed by https://admin.fedoraproject.org/updates/FEDORA-2016-0dd1a509c8 , but we couldn't edit the update to get this bug listed.
Discussed during the 2016-08-22 blocker review meeting: [1] The decision to accept this as an Alpha AcceptedFreezeException was made as boot fails are more difficult to fix with updates. [1] https://meetbot.fedoraproject.org/fedora-blocker-review/2016-08-22/f25-blocker-review.2016-08-22-16.00.txt
The update went stable, closing.