Bug 1769063
Summary: | kernel-5.3.8-200 will not boot Dell Inspiron | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | dc.hart |
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
Status: | CLOSED DUPLICATE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | 31 | CC: | airlied, alxndr13, amdunn, bskeggs, carl, dc.hart, hdegoede, ichavero, itamar, jarodwilson, jeremy, jglisse, john.j5live, jonathan, josef, kernel-maint, linville, masami256, mchehab, mjg59, steved |
Target Milestone: | --- | Keywords: | OpsBlocker |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-01-03 18:04:04 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
dc.hart
2019-11-05 20:55:15 UTC
This problem persists through kernel-5.3.11-200.fc30.x86_64. I changed the severity to high (I am not sure if it is now urgent). I don't mean to be a pest but nearly two weeks have elapsed with no response whatsoever. I am sure that someone needs additional information from me to debug this problem. Here is lscpu if that helps: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 39 bits physical, 48 bits virtual CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 142 Model name: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz Stepping: 9 CPU MHz: 3370.651 CPU max MHz: 3500.0000 CPU min MHz: 400.0000 BogoMIPS: 5808.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 4096K NUMA node0 CPU(s): 0-3 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d I have exactly the same issue. I am currently running 5.2.18-200.fc30.x86_64. I have attempted to boot into kernel-5.3.11-200.fc30.x86_64 which was automatically installed via dnf. However, none of the journalctl logs show those attempts. My system has an encrypted disk, and I never get to the prompt for the disk key. So probably those boots were unable to flush the in memory log to disk. for i in {0..-3}; do journalctl -b $i | head -3 | tail -1; done Nov 19 08:08:42 localhost.localdomain kernel: Linux version 5.2.18-200.fc30.x86_64 (mockbuild.fedoraproject.org) (gcc version 9.2.1 20190827 (Red Hat 9.2.1-1) (GCC)) #1 SMP Tue Oct 1 13:14:07 UTC 2019 Nov 18 10:31:13 localhost.localdomain kernel: Linux version 5.2.18-200.fc30.x86_64 (mockbuild.fedoraproject.org) (gcc version 9.2.1 20190827 (Red Hat 9.2.1-1) (GCC)) #1 SMP Tue Oct 1 13:14:07 UTC 2019 Nov 17 08:50:16 localhost.localdomain kernel: Linux version 5.2.18-200.fc30.x86_64 (mockbuild.fedoraproject.org) (gcc version 9.2.1 20190827 (Red Hat 9.2.1-1) (GCC)) #1 SMP Tue Oct 1 13:14:07 UTC 2019 Nov 16 07:41:30 localhost.localdomain kernel: Linux version 5.2.18-200.fc30.x86_64 (mockbuild.fedoraproject.org) (gcc version 9.2.1 20190827 (Red Hat 9.2.1-1) (GCC)) #1 SMP Tue Oct 1 13:14:07 UTC 2019 One other issue - this Dell laptop (with 5.2.18-200 and earlier) dumps a few messages about hardware errors to the screen very early in the boot sequence - this is apparently a known issue on these laptops. It does not seem to cause any problems with those earlier kernels. Message from syslogd@localhost at Nov 19 08:08:42 ... kernel:mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 9: ee2000000040110a Message from syslogd@localhost at Nov 19 08:08:42 ... kernel:mce: [Hardware Error]: TSC 0 ADDR fef1ff00 MISC 3880010086 Message from syslogd@localhost at Nov 19 08:08:42 ... kernel:mce: [Hardware Error]: PROCESSOR 0:906e9 TIME 1574179720 SOCKET 0 APIC 0 microcode b4 Those messages don't appear on the screen when attempting to boot 5.3.x kernels. Removing "quiet rhbg" from the kernel args, and adding some or all of acpi=off, pci=noacpi, earlyprintk=vga gives a mostly blank screen with just: EFI stub: UEFI secureboot is enabled After that message, it just hangs. I never get the LUKS prompt for the disk key. Latest rawhide kernel - same problem. Tried secure boot change in bios. No help. I am wondering if I should try to convert from UEFI to legacy. We NEED to hear from someone at Fedora. same error here on a OptiPlex 7050. Error occured only on 5.3.11. Booting with 5.3.8 works. lshw output: alpha description: Desktop Computer product: OptiPlex 7050 (07A1) vendor: Dell Inc. serial: 8V6QCM2 width: 64 bits capabilities: smbios-3.0.0 dmi-3.0.0 smp vsyscall32 configuration: boot=normal chassis=desktop family=OptiPlex sku=07A1 uuid=44454C4C-5600-1036-8051-B8C04F434D32 *-core description: Motherboard product: 0XHGV1 vendor: Dell Inc. physical id: 0 version: A00 serial: /8V6QCM2/CNWS20078B01AC/ *-firmware description: BIOS vendor: Dell Inc. physical id: 0 version: 1.6.5 date: 09/09/2017 size: 64KiB capacity: 16MiB capabilities: pci pnp upgrade shadowing cdboot bootselect edd int13floppy1200 int13floppy720 int13floppy2880 int5printscreen int9keyboard int14serial int17printer acpi usb biosbootspecification netboot uefi *-memory description: System Memory physical id: 9 slot: System board or motherboard size: 16GiB d.c.hart: Have you tried booting with "rhgb quiet" removed from the kernel commandline? (you can edit the kernel cmdline in the grub menu). Maybe that will give some messages / hints as to what is going on. If that does not help, please try adding: "nomodeset" to the kernel commandline and see if that helps. No messages or hints. Someone else said it returns something like "uefi secureboot" Nomodeset - same result. Straight to a blank screen. I tried changing the bios to secure boot. Same result. I tried a bare minimum 5.4 kernel with no modules. Same thing. F-31 live image boots from a USB stick. Something changed in 5.3 onward that is affecting a small number of users. I have the latest BIOS according to Dell. Hmm, can you check what the exact kernel version is on the livecd? I think it is 5.3.6. You can still boot the machine using 5.2.18 right? You can download 5.3.6 here: https://koji.fedoraproject.org/koji/buildinfo?buildID=1400114 Here are instructions for installing a kernel directly from koji: https://fedorapeople.org/~jwrdegoede/kernel-test-instructions.txt If the livecd has a different version then 5.3.6, please try the livecd version, you can find all official kernel builds here: https://koji.fedoraproject.org/koji/packageinfo?packageID=8 If the livecd-version does boot when installed on the system please also test 5.3.7: https://koji.fedoraproject.org/koji/buildinfo?buildID=1402630 That will narrow down the possible causes to the changes in a single 5.3.z version, which should help find the cause. Same result with 5.3.6 and 5.3.7. I am baffled. (In reply to dc.hart from comment #10) > Same result with 5.3.6 and 5.3.7. I am baffled. Are you using classic BIOS boot or UEFI boot ? IF you do not know, try running: "ls /sys/firmware/efi/efivars" if you get a "No such file or directory" error then your system is booting in classic BIOS mode, if you get a bunch of files you are running in UEFI mode. It is possible that you are using one mode for the installed version and another for the livecd, typically with the livecd your BIOS-es boot-menu will let you choose the USB device as boot-source twice, once labelled EFI and the one without EFI typically is classic BIOS mode. Either way you can use the same check under the livecd too. If the 2 boot methods are different, that might explain, in that case try to boot the livecd in the same mode as the install to see if that helps. If both methods are the same and they are both classic BIOS, then this might be an issue with the bootloader, for classic BIOS the livecd uses syslinux where as the install uses grub2. If your install is using classic BIOS one thing to try is updating the installed grub version, with classic BIOS grub gets installed into the mbr, and the version in the MBR stays at the version from installation time, even though the grub package itself may have been updated later. To get the newer version into your MBR you need to re-install grub in the MBR, see: https://fedoraproject.org/wiki/GRUB_2#Updating_GRUB_2_configuration_on_BIOS_systems Note the /dev/sda is an example, if you are using a single sata disk and booting from that disk then it is correct, but your setup might be more complex, you need to specify the disk which your system is booting from, which might be a different disk then the one with Linux on it if you have multiple disks. Note you only need to run the grub2-install command, you shouldn't need to run grub2-mkconfig, although giving that a try does not hurt. Fedora 29 changed to end-of-life (EOL) status on 2019-11-26. Fedora 29 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed. Version = 30! Please revert the status Both are UEFI. BTW (grasping at straw) the EFI System Partition is fat16. Is that correct? [dch@reptile ~]$ sudo parted -l Model: ATA Samsung SSD 840 (scsi) Disk /dev/sda: 500GB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 1049kB 211MB 210MB fat16 EFI System Partition boot, esp 2 211MB 1285MB 1074MB ext4 3 1285MB 500GB 499GB lvm The entry in fstab seems conflicting: UUID=88D3-9C76 /boot/efi vfat umask=0077,shortname=winnt 0 2 (In reply to dc.hart from comment #14) > Both are UEFI. BTW (grasping at straw) the EFI System Partition is fat16. Is that correct? That should not be a problem. You are on F30, right? Only thing I can think of is that there is a bug in the somewhat older grub in F30 which triggers on some Dell machines in combination with newer kernels. Mind you this is just a theory. What you can do is install the F31 grub on F30, go to: https://koji.fedoraproject.org/koji/buildinfo?buildID=1417369 Then download: grub2-efi-x64-2.02-103.fc31.x86_64.rpm And then run: sudo rpm -Uvh grub2-efi-x64-2.02-103.fc31.x86_64.rpm rpm might complain about some other grub bits being too old when you do that, in that case download the other bits and then and them add them to the rpm -Uvh commandline. I'm assuming here that the shim from F30 will also be happy with the signatures on the F31 grub, I'm not familiar with the key management for the keys used for this, I guess they might be per distro. So if you get some secure boot related error after this and you cannot load the grub menu at all any more, try disabling secureboot in your BIOS settings. Worked ... and then it didn't. On Wednesday evening I decided to upgrade grub by upgrading from 30 to 31. The system booted from 5.3.12. The system booted on Thursday and Friday. On Saturday the system would not boot from the 5.3.x kernel but continues to boot from 5.2.18-200.fc30.x86_64. I reinstalled grub2\* and efi\* - same immediate blank screen - nothing to the logs. No clues from zapping rhgb quiet. I have no software from any source other than the official repositories. I am extremely busy over the next few days (in spite of being retired). I don't know enough about what happens once a kernel is selected from the grub menu but wonder if this has something to do with akmods or akmod-VirtualBox. VB is mission critical for me as I run a discrete virtual machine (Mint) with a vpn. I am also wondering if I might have a virus. 2019 is my 20th year of using Redhat or a derivative. This is a first. I will backup and experiment on Friday. I tried everything and ended up doing a clean install which is an adventure. The latency of dnf had me chewing my keyboard. But I digress. After install the machine boots from 5.3.7. After dnf upgrade it will not boot from either 5.3 kernel but will boot from 5.2.18. This leads me to believe that this is NOT a kernel issue but, rather, a problem with grub2. I have not made a change on this report. I will leave that up to someone at Redhat. What I do not understand is why this doesn't seem to be affecting a large number of users. Dell, i7, SSD doesn't seem all that unique. Later in the week I might experiment with extlinux. With each new kernel I would create initramfs and vmlinux symlinks. (In reply to dc.hart from comment #17) > I tried everything and ended up doing a clean install which is an adventure. > The latency of dnf had me chewing my keyboard. But I digress. > > After install the machine boots from 5.3.7. After dnf upgrade it will not > boot from either 5.3 kernel but will boot from 5.2.18. This leads me to > believe that this is NOT a kernel issue but, rather, a problem with grub2. I > have not made a change on this report. I will leave that up to someone at > Redhat. Hmm, weird. Did the dnf upgrade also upgrade grub2 perhaps? You could try doing: sudo dnf downgrade 'grub2*' That will give you an older version. Please try to run it twice, the first time to go from updates-testing version to the updates one and then another time to go the release version. The second run may fail because you may end up at the release version on the first run. If that does not help, you can also try downgrading the shim: sudo dnf downgrade 'shim*' p.s. It take it the original/release F31 kernel which you can also still select after the dnf upgrade is also broken after the dnf upgrade? I just noticed that we also have bug 1779611 opened by another user now, which is about the F31 release version of grub working and the one from the updates repo causing the system to not boot with any 4.3 kernels. I have the feeling these 2 bugs might be the same issue. Still an issue with Dell Inspiron 5567 on kernel 5.3.16-200.fc30.x86_64 - exact same issue, standard dnf upgrade on Fedora 30. Boots fine wth 5.2.18 kernel I just received a new Dell 5584 with the newer i7. We'll see what happens. More importantly I can experiment with the older machine. I am going to start with sgdisk -Z /dev/sda and see if that yields a different result. Andy, d.c.hart, has either of you tried to downgrade grub as I suggested in comment 18 ? Comments in bug 1779611 suggest that that bug is the same issue and there downgrading grub helps. Downgraded GRUB to 1:2.02-84.fc30 (the lowest it will go on F30). Same behavior - still will not boot 5.3.x and still boots 5.2.18 We've received one more similar bug report, at this time the most likely cause is that this is a grub issue and most information wrt debugging this from the grub side is located in bug 1779611, so I'm marking this as a duplicate of that bug. *** This bug has been marked as a duplicate of bug 1779611 *** |