Bug 1367321 - system reboots 1 second after selecting a kernel in grub
Summary: system reboots 1 second after selecting a kernel in grub
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 25
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: RejectedBlocker AcceptedFreezeException
Depends On:
Blocks: F25AlphaBlocker F25AlphaFreezeException
TreeView+ depends on / blocked
 
Reported: 2016-08-16 08:17 UTC by Kamil Páral
Modified: 2016-08-26 11:33 UTC (History)
21 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-08-23 16:19:11 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
lshw output (24.84 KB, text/plain)
2016-08-16 08:23 UTC, Kamil Páral
no flags Details
lspci output (38.53 KB, text/plain)
2016-08-16 08:24 UTC, Kamil Páral
no flags Details
dmidecode output (15.68 KB, text/plain)
2016-08-16 08:25 UTC, Kamil Páral
no flags Details
journal (boot with F24 kernel) (153.14 KB, text/plain)
2016-08-16 08:25 UTC, Kamil Páral
no flags Details
rpm -qa output (9.23 KB, text/plain)
2016-08-16 08:25 UTC, Kamil Páral
no flags Details
boot messages stopped with disable_timer_pin_1 (398.74 KB, image/jpeg)
2016-08-19 11:53 UTC, Kamil Páral
no flags Details
picture from hang (2.20 MB, image/jpeg)
2016-08-19 17:34 UTC, Knud Christiansen
no flags Details

Description Kamil Páral 2016-08-16 08:17:47 UTC
Description of problem:
I tried installing F25 on one of our test machines in our office. Installation went fine, but the system does not boot afterwards. Grub shows up, and after selecting a kernel I see a quick flash of initial kernel messages and then screen goes black and the system reboots. I tried several install media - Workstation Live, netinst (installing both Workstation and minimal system), pxeboot. All methods boot into the installer, but the installed system doesn't. This all happened using BIOS mode (since UEFI installation is broken atm).

I have no idea why kernel reboots immediately, how to prevent that behavior, and how to receive important logs.

I tested with enforcing=0 and selinux=0, same issue. I also tried booting the "rescue" kernel, same issue.

In the end I started anaconda rescue mode and installed F24 kernel. With F24 system, the F25 system boots fine.

Version-Release number of selected component (if applicable):
kernel-4.8.0-0.rc1.git3.1.fc25.x86_64
kernel-4.6.6-300.fc24.x86_64 (works)
Fedora-Workstation-Live-x86_64-25-20160815.n.2.iso
Fedora-Everything-netinst-x86_64-25-20160815.n.2.iso

Hardware:
Base Board Information
	Manufacturer: ASUSTeK COMPUTER INC.
	Product Name: M5A97 PRO
Processor Information
	Socket Designation: Socket 942
	Type: Central Processor
	Family: FX
	Manufacturer: AMD              
	ID: 12 0F 60 00 FF FB 8B 17
	Signature: Family 21, Model 1, Stepping 2

How reproducible:
always, on this machine

Steps to Reproduce:
1. install F25 by any method, any package set, in BIOS mode
2. try to boot it
3. see the computer reboot immediately after trying to boot

Additional information:
I have seen this exact problem on just this machine, so it's not a universal issue. We have two other bare metal machines in the office - one of them doesn't even boot from the install media (kernel panic, probably a different issue), the other one works fine.

Comment 1 Kamil Páral 2016-08-16 08:23:43 UTC
Created attachment 1191141 [details]
lshw output

Comment 2 Kamil Páral 2016-08-16 08:24:54 UTC
Created attachment 1191144 [details]
lspci output

Comment 3 Kamil Páral 2016-08-16 08:25:02 UTC
Created attachment 1191145 [details]
dmidecode output

Comment 4 Kamil Páral 2016-08-16 08:25:23 UTC
Created attachment 1191146 [details]
journal (boot with F24 kernel)

Comment 5 Kamil Páral 2016-08-16 08:25:32 UTC
Created attachment 1191147 [details]
rpm -qa output

Comment 6 Kamil Páral 2016-08-16 08:28:20 UTC
Proposing as a blocker, this violates this release criterion:
" A system installed with a release-blocking desktop must boot to a log in screen where it is possible to log in to a working desktop using a user account created during installation or a 'first boot' utility. "
https://fedoraproject.org/wiki/Fedora_25_Alpha_Release_Criteria#Expected_installed_system_boot_behavior

Of course this is conditional for a specific hardware, so the decision should reflect that.

Comment 7 ChunYu Wang 2016-08-16 13:46:09 UTC
I am afraid that you are not alone, you may refer to the webpage below:

# http://www.spinics.net/lists/linux-pci/msg51218.html

Someone encounters the similar problem just like your laptop.

I am on doubt about the compatibility problem based on your RD9x0/RX980 Host Bridge:

# https://en.wikipedia.org/wiki/AMD_900_chipset_series
Enabling multiple MSI vectors for the SATA controller when three or more SATA ports are used results in loss of interrupts and system hang.

I`ll keep on watch whether other issues happened among FC*AND_FX

Comment 8 Laura Abbott 2016-08-16 15:18:35 UTC
Long shot: can you try the scratch build in https://bugzilla.redhat.com/show_bug.cgi?id=1365917#c2 which has been linked to bootup issues on F25?

Comment 9 Kamil Páral 2016-08-17 08:49:04 UTC
(In reply to ChunYu Wang from comment #7)
> I am afraid that you are not alone, you may refer to the webpage below:
> 
> # http://www.spinics.net/lists/linux-pci/msg51218.html

The error message referenced in the email title is exactly the same error which I see when running lspci -vvv (on F24 kernel):

# lspci -vvv > /dev/null
pcilib: sysfs_read_vpd: read failed: Input/output error

> I am on doubt about the compatibility problem based on your RD9x0/RX980 Host
> Bridge:
> 
> # https://en.wikipedia.org/wiki/AMD_900_chipset_series
> Enabling multiple MSI vectors for the SATA controller when three or more
> SATA ports are used results in loss of interrupts and system hang.

I had 3 SATA ports used (2 disks, 1 DVD drive). I unplugged everything except a single HDD, even reinstalled F25, but the problem didn't disappear.

Comment 10 Kamil Páral 2016-08-17 09:03:10 UTC
(In reply to Laura Abbott from comment #8)
> Long shot: can you try the scratch build in
> https://bugzilla.redhat.com/show_bug.cgi?id=1365917#c2 which has been linked
> to bootup issues on F25?

Doesn't help, same issue.

Comment 12 Adam Williamson 2016-08-17 19:08:35 UTC
Kamil: can you say whether kernel-4.8.0-0.rc1.git0.1.fc25 is affected? That is the current stable F25 kernel.

Comment 13 Laura Abbott 2016-08-17 19:48:23 UTC
Can you try removing 'quiet' from the grub kernel command line and add 'panic=0' to the kernel command line? This should get kernel messages and stop an automatic reboot if it's set up.

Can you also try the 4.7.0 kernel from https://copr.fedorainfracloud.org/coprs/jforbes/kernel-stabilization/build/428437/ ? This would help narrow down the problem to 4.7 (stable kernel) or an actual rawhide problems)

Comment 14 Adam Williamson 2016-08-17 20:04:00 UTC
for blocker / release engineering purposes: labbott states she believes, but cannot be certain, that kernel-4.8.0-0.rc1.git0.1.fc25 - which is the current 'stable' f25 kernel build, i.e. the one in the 'fedora' repo and which is included in composes - *would* be affected by this bug. That would mean that if we decide the bug is a blocker, we must find a fix for it before we can ship Alpha. We will await kamil's confirmation of this.

We do not yet have a fix for this issue.

labbott also states she'd vote -1 blocker / +1 FE for this bug, given the range of hardware affected. AFAICS the affected hardware looks to be 'some AMD chipsets'. It's slightly hard to make a call, but for now I can probably go with labbott's vote.

I appear to have a manual for an 'M5A97 R2.0' lying around here, which means presumably I've got one of those in some system or other. If I can track it down I'll try and reproduce the bug...

Comment 15 Stephen Gallagher 2016-08-17 21:19:59 UTC
Absent evidence that this affects a huge amount of hardware, I'll also vote -1 blocker/+1 FE for now.

Comment 16 Kamil Páral 2016-08-18 08:16:15 UTC
(In reply to Adam Williamson from comment #12)
> Kamil: can you say whether kernel-4.8.0-0.rc1.git0.1.fc25 is affected? That
> is the current stable F25 kernel.

Yes, same issue.

(In reply to Laura Abbott from comment #13)
> Can you try removing 'quiet' from the grub kernel command line and add
> 'panic=0' to the kernel command line? This should get kernel messages and
> stop an automatic reboot if it's set up.

I removed 'rhgb quiet' and added 'panic=0' and it still reboots immediately.

> Can you also try the 4.7.0 kernel from
> https://copr.fedorainfracloud.org/coprs/jforbes/kernel-stabilization/build/
> 428437/ ? This would help narrow down the problem to 4.7 (stable kernel) or
> an actual rawhide problems)

That one works OK.

I also tested kernel-4.8.0-0.rc0.git1.1.fc25 (the first 4.8 kernel built in Koji) and again it doesn't boot.

I wonder why I'm able to boot into the installer, though? What is different between a LiveCD/pxe boot and the installed system boot? Can it be somehow related to initramfs instead of the kernel?

Comment 18 Adam Williamson 2016-08-18 18:58:35 UTC
Discussed at 2016-08-18 go/no-go meeting, functioning as a blocker review meeting: https://meetbot-raw.fedoraproject.org/fedora-meeting/2016-08-18/f25-alpha-go_no_go-meeting.2016-08-18-17.00.html . We agreed to delay the decision on this one, as we don't yet have a clear feel for how much hardware may be affected. We will send out a request for more testing to the public lists.

Comment 19 Zach Villers 2016-08-19 03:54:01 UTC
Tested install with F25-everything-netinst-2016-08-16 on AMD A10-7700K CPU and ASUS A68HM-E FM2+mATX AMD Motherboard. Installed F Wkstn + a few extra groups. 3 disks using ext4 with home and var on their own disks. No issues with install, booted into Gnome on wayland first try. 

Let me know if more info is needed. What FS was original tester using?

Comment 20 Kamil Páral 2016-08-19 07:29:59 UTC
I tested kernel-4.8.0-0.rc2.git2.1.fc25 as the latest kernel built in Koji and the problem still persists.

(In reply to zachvatwork from comment #19)
> What FS was original tester using?

If "FS" means filesystem, it was a default Workstation/Everything install, so lvm+ext4.

Comment 21 poma 2016-08-19 11:30:38 UTC
disable_timer_pin_1
appended to kernel cmdline gonna stop the machine at the breaking point:
..TIMER: vector=...

4.8.0-0.rc2.git2.1.fc26.x86_64

Comment 22 poma 2016-08-19 11:38:21 UTC
(In reply to ChunYu Wang from comment #7)
> I am afraid that you are not alone, you may refer to the webpage below:
> 
> # http://www.spinics.net/lists/linux-pci/msg51218.html
> 
> Someone encounters the similar problem just like your laptop.
> 
> I am on doubt about the compatibility problem based on your RD9x0/RX980 Host
> Bridge:
> 
> # https://en.wikipedia.org/wiki/AMD_900_chipset_series
> Enabling multiple MSI vectors for the SATA controller when three or more
> SATA ports are used results in loss of interrupts and system hang.
> 
> I`ll keep on watch whether other issues happened among FC*AND_FX


Also NVIDIA MCP78S chipset
https://en.wikipedia.org/wiki/NForce_700

Comment 23 Kamil Páral 2016-08-19 11:53:09 UTC
Created attachment 1192117 [details]
boot messages stopped with disable_timer_pin_1

(In reply to poma from comment #21)
> disable_timer_pin_1
> appended to kernel cmdline gonna stop the machine at the breaking point:
> ..TIMER: vector=...

Yes it did. Screenshot attached. But does it help in debugging why it auto-reboots?

Comment 24 Yanko Kaneti 2016-08-19 12:51:44 UTC
It seems to be initrd loading related. At least here (AMD) its an early reboot with all the 4.8.0-rc kernels so far when having an initrd. Since my rootfs doesn't need the initrd I tried removing it from the grub config and at least 4.8.0-0.rc2.git2.2.fc26.x86_64 boots fine without it.

Perhaps https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=88b2f634028f1f38dcc3d412e10ff1f224976daa 
merged by Linus 15 hours ago..

Comment 25 Justin M. Forbes 2016-08-19 13:12:55 UTC
That is a very likely culprit, I have kernel-4.8.0-0.rc2.git3.1.fc25.src.rpm building right now, which will contain the patch listed there.  Hopefully we can get that tested and verify that it fixes things.

Comment 26 poma 2016-08-19 15:22:18 UTC
(In reply to Kamil Páral from comment #23)
> Created attachment 1192117 [details]
> boot messages stopped with disable_timer_pin_1
> 
> (In reply to poma from comment #21)
> > disable_timer_pin_1
> > appended to kernel cmdline gonna stop the machine at the breaking point:
> > ..TIMER: vector=...
> 
> Yes it did. Screenshot attached. But does it help in debugging why it
> auto-reboots?


Given that the "panic=..." directive has no effect ...

panic=		[KNL] Kernel behaviour on panic: delay <timeout>
			timeout > 0: seconds before rebooting
			timeout = 0: wait forever
			timeout < 0: reboot immediately
			Format: <timeout>
https://www.kernel.org/doc/Documentation/kernel-parameters.txt

... this looks more like a hardware reset itself, rather than "auto-reboot".

If you want to actually debug
https://www.kernel.org/doc/Documentation/serial-console.txt
http://www.tldp.org/HOWTO/text/Remote-Serial-Console-HOWTO

Comment 27 poma 2016-08-19 15:23:20 UTC
(In reply to Justin M. Forbes from comment #25)
> That is a very likely culprit, I have kernel-4.8.0-0.rc2.git3.1.fc25.src.rpm
> building right now, which will contain the patch listed there.  Hopefully we
> can get that tested and verify that it fixes things.

4.8.0-0.rc2.git3.1.fc26.x86_64
BOOT PASSED

Comment 28 Adam Williamson 2016-08-19 15:31:32 UTC
Proposing as an Alpha FE also, since we have a fix now; I'd be +1 to this for FE for sure. Any other votes?

Comment 29 Stephen Gallagher 2016-08-19 16:31:38 UTC
+1 FE

Comment 30 Knud Christiansen 2016-08-19 17:32:52 UTC
FYI

have testet 4.8.0-0.rc1.git0.1.fc25.x86_64

HW
AMD Phenom II X4 965
MB Gigabyte 890GPA-UD3H  (890GX + SB850 chipset) 32 GB ram
Video:R7-250 Radeon
4 sata drives

result: Stops and hang just after selecting the kernel in GRUB
removed quiet and rghb from kernel cmd line
Same result but get boot activity until it hangs
Picture from screen attached where it hangs

Knud

Comment 31 Knud Christiansen 2016-08-19 17:34:56 UTC
Created attachment 1192253 [details]
picture from hang

Comment 32 Justin M. Forbes 2016-08-19 21:30:04 UTC
kparal: Mind testing the kernels in http://koji.fedoraproject.org/koji/buildinfo?buildID=793063 for us? I think they will fix your issue.

Comment 33 John Reiser 2016-08-19 23:39:37 UTC
http://koji.fedoraproject.org/koji/buildinfo?buildID=793063 (Comment 32) works for me, whereas before I got the same symptoms as Knud in Comment 31 (hang at less than 1 second into boot, serial console shows "x86: Booting SMP configuration:\n.... node  #0, CPUs:        #1")

processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 21
model		: 16
model name	: AMD A10-5800K APU with Radeon(tm) HD Graphics
stepping	: 1
microcode	: 0x6001119
cpu MHz		: 1400.000
cache size	: 2048 KB
physical id	: 0
siblings	: 4

Base Board Information
	Manufacturer: ASUSTeK COMPUTER INC.
	Product Name: F2A85-M PRO
	Version: Rev X.0x

BIOS Information
	Vendor: American Megatrends Inc.
	Version: 6105
	Release Date: 05/08/2013

Disks: SSD(sata3) as /dev/sda, HD(sata3) as /dev/sdb

UEFI boot using grub2 from SSD.

Comment 35 Yanko Kaneti 2016-08-20 14:58:05 UTC
4.8.0-0.rc2.git3.2.fc26.x86_64 (nodebug) works for me here

Comment 36 Benson Muite 2016-08-21 18:53:45 UTC
Tried Fedora-25-20160821.n.0 netinstall iso from https://kojipkgs.fedoraproject.org/compose/branched/Fedora-25-20160821.n.0/compose/Workstation/x86_64/iso/ 

It seemed to boot and install ok, though got a SE linux warning. System details:

AMD FX 8350
Radeon HD 5450 Graphics

1 hard disk through sata
1 DVD-R through sata
1 hard disk through USB (Fedora 25 installed on this one)

processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 21
model		: 2
model name	: AMD FX(tm)-8350 Eight-Core Processor
stepping	: 0
microcode	: 0x600084f
cpu MHz		: 1400.000
cache size	: 2048 KB
physical id	: 0
siblings	: 8

Base Board information:
 Manufacturer: ASUTek COMPUTER INC.
 Product Name: M5A97 LE R2.0
 Version: Rev 1.xx

BIOS information:
 Vendor: American Megatrends Inc.
 Version: 2202
 Release Date: 12/12/2013

Comment 37 Petr Schindler 2016-08-22 06:41:55 UTC
I installed system with kernel-4.8.0-0.rc2.git3.1.fc25 from comment 32 (from side repo) on Kamil's computer which is and it boots successfully.

Comment 38 Adam Williamson 2016-08-22 15:37:15 UTC
As discussed above this is addressed by https://admin.fedoraproject.org/updates/FEDORA-2016-0dd1a509c8 , but we couldn't edit the update to get this bug listed.

Comment 39 Geoffrey Marr 2016-08-22 22:00:02 UTC
Discussed during the 2016-08-22 blocker review meeting: [1]

The decision to accept this as an Alpha AcceptedFreezeException was made as boot fails are more difficult to fix with updates.

[1] https://meetbot.fedoraproject.org/fedora-blocker-review/2016-08-22/f25-blocker-review.2016-08-22-16.00.txt

Comment 40 Adam Williamson 2016-08-23 16:19:11 UTC
The update went stable, closing.


Note You need to log in before you can comment on or make changes to this bug.