Description of problem: Upgraded kernel to current 4.7.9 Version-Release number of selected component (if applicable): 4.7.9-200 How reproducible: Install kernel 4.7.9-200 and boot on a Dell Inspiron 500m, also called Latitude D500 Steps to Reproduce: 1. Install kernel 4.7.9 2. reboot Actual results: After Grub message "Loading initial ramdisk" nothing happens anymore Expected results: Linux kernel starts Additional info: I tried 2 times to install 4.7.9 just in case the initrd was created incorrectly the first time, but this did not help. Removing the kernel commandline arguments rhgb and quiet did not change anything in terms of no output is written, nothing at all, no further hint, what does not work. Ctrl-Alt-Del does not work, have to press the power button noapic or noapm or both do not help. CPU is a Intel(R) Pentium(R) M processor 2.10GHz, in case that matters. However i can't imagine. This laptop runs Fedora since many versions and this is the first time, the kernel does not start. 4.7.7-200 was fine There seems no version inbetween e.g. 4.7.8
On a Lenovo T500 4.7.9 starts normally. Problem might be limited to 32 bit kernel ?
Absolutely the same problem Fedora24 x86_64 with Nouveau module CPU Pentium(R) Dual-Core E5200 @ 2.50GHz lspci 00:00.0 Host bridge: NVIDIA Corporation MCP79 Host Bridge (rev b1) 00:00.1 RAM memory: NVIDIA Corporation MCP79 Memory Controller (rev b1) 00:03.0 ISA bridge: NVIDIA Corporation MCP79 LPC Bridge (rev b2) 00:03.1 RAM memory: NVIDIA Corporation MCP79 Memory Controller (rev b1) 00:03.2 SMBus: NVIDIA Corporation MCP79 SMBus (rev b1) 00:03.3 RAM memory: NVIDIA Corporation MCP79 Memory Controller (rev b1) 00:03.4 RAM memory: NVIDIA Corporation MCP79 Memory Controller (rev b1) 00:03.5 Co-processor: NVIDIA Corporation MCP79 Co-processor (rev b1) 00:04.0 USB controller: NVIDIA Corporation MCP79 OHCI USB 1.1 Controller (rev b1) 00:04.1 USB controller: NVIDIA Corporation MCP79 EHCI USB 2.0 Controller (rev b1) 00:06.0 USB controller: NVIDIA Corporation MCP79 OHCI USB 1.1 Controller (rev b1) 00:06.1 USB controller: NVIDIA Corporation MCP79 EHCI USB 2.0 Controller (rev b1) 00:08.0 Audio device: NVIDIA Corporation MCP79 High Definition Audio (rev b1) 00:09.0 PCI bridge: NVIDIA Corporation MCP79 PCI Bridge (rev b1) 00:0a.0 Ethernet controller: NVIDIA Corporation MCP79 Ethernet (rev b1) 00:0b.0 SATA controller: NVIDIA Corporation MCP79 AHCI Controller (rev b1) 00:0c.0 PCI bridge: NVIDIA Corporation MCP79 PCI Express Bridge (rev b1) 00:10.0 PCI bridge: NVIDIA Corporation MCP79 PCI Express Bridge (rev b1) 00:15.0 PCI bridge: NVIDIA Corporation MCP79 PCI Express Bridge (rev b1) 00:16.0 PCI bridge: NVIDIA Corporation MCP79 PCI Express Bridge (rev b1) 00:17.0 PCI bridge: NVIDIA Corporation MCP79 PCI Express Bridge (rev b1) 00:18.0 PCI bridge: NVIDIA Corporation MCP79 PCI Express Bridge (rev b1) 03:00.0 VGA compatible controller: NVIDIA Corporation C79 [GeForce 9300 / nForce 730i] (rev b1)
no boot on 4.7.9 i686 hp pavillon zt 3000 512 mb ram. kernel 4.7.5 boot ok in 4.7.9 after remove quit from boot line "Probing EDD (edd=off to disable) ... ok" then freeze video amd/Ati rv250/m9 gl (mobility firegl 9000/radeon 9000 cpu genuine intel familia de cpu=6 modelo 9 intel pentium m processor 1500 mhz disk 150gb
can you test http://koji.fedoraproject.org/koji/taskinfo?taskID=16118947 ? There was a similar bootup issue reported in https://bugzilla.redhat.com/show_bug.cgi?id=1384238
4.7.8-200.rhbz1384238.fc24.x86_64 the same fail. After Grub nothing.
Same problem. On Dell C840 (pentium) 4.7.7 (f23) works fine, but after yum update, 4.7.8 failed to boot and after 2nd yum update 4.7.9 fails to boot, both with booting message alone on otherwise black screen.
4.8.4-200.fc24.x86_64 the same fail. After Grub nothing.
I have a similar problem: on a Dell Inspiron 8600 laptop (Pentium M, 32-bit, 2GB RAM) kernel-4.7.9-200.fc24.i686 does not boot. I have to revert to the kernel that came with the install: kernel-4.5.5-300.fc24.i686 (I just installed F24 on this laptop). The processor does not support PAE. Another old laptop with PAE CPU *does* boot fine (with the PAE-version of the new kernel)
Created attachment 1214193 [details] lspci output of the machine with 4.7.9 failing
Regarding #4: I cannot try this build, because it's for x86_64. The laptop where 4.7.9 does not start, is a i686. Is there also a build for i686 ? I tried 4.8.4-200.fc24.i686: Same story, not a single message after the grub output. Additional info, i don't know if this matters: When i switch off the laptop in this hanging state, because even Ctrl-Alt-Del does not work anymore, during next boot the BIOS performs additional checks, probably because it assumes, that the previous POST did not finish. This seems weird to me as at least the loader has started. BIOS POST should be over, right ? So could this mean the first pieces of the kernel code confuse the BIOS ? Additional info: 4.7.9 starts without issue as a KVM/QEMU guest, also on a box with an AMD Phenom processor and an Asus M5A88-V EVO board. I'll attach the lspci and dmidecode output for the different cases i've tested (except for the KVM/QEMU virtual machine) and would like to encourage others to do the same. Probably this gives some clue, what is the common denominator of the machines, where Linux does not start from version 4.7.8 upward.
Created attachment 1214195 [details] dmidecode output of the machine with 4.7.9 failing
Created attachment 1214196 [details] lspci output of a Lenovo T500 with 4.7.9 working
Created attachment 1214197 [details] dmidecode output of a Lenovo T500 with 4.7.9 working
Created attachment 1214198 [details] lspci output of a box with Asus board and AMD processor (see #10)
Created attachment 1214199 [details] dmidecode output of a box with Asus board and AMD processor (see #10)
It may be to blame this patch: https://kernel.googlesource.com/pub/scm/linux/kernel/git/stable/stable-queue/+/d0a119106122b5d244a23faf2f43a7b21873d6a3/releases/4.7.8/x86-apic-get-rid-of-apic_version-array.patch I got the boot by activating APIC Mode in BIOS.
Tried to revert this patch mentioned in comment #16, but it did not help. I had already looked at that patch and it is looking clean ot me. However, it seems not complete. At least reverting using patch -R did not work. To get a consistent source state i had to do it manually. Unfortunately, as said, still no boot.
I believe I'm affected by this bug. By removing `rhgb quiet` and adding `debug ignore_loglevel earlyprintk=vga,keep` to the kernel command line, I got a panic message: https://i.imgur.com/PhFBU0Y.jpg and with the further addition of `boot_delay=10`, I was able to get a video of the panic (trace starts around 1:20): http://sendvid.com/v3uxqqtl so this does look like it might be related to the APIC change. Here's `lshw -sanitize` under the last working kernel, 4.7.5-200.fc24: https://gist.github.com/slingamn/afac16aea8c17d37f95ebe2388779df2
Interesting findings. What do you mean by "this apic change" ? There are not only the changes from vanilla kernel 4.7.7 to 4.7.8. As far as i have seen several other APIC related patches are pulled in during boot added by redhat, that might interfere with upstream changes or whatever. So what i tried to do is to build without these patches. Some earlier time, it was possible to comment the patches out in the spec file. Today, as usual, everything has changed and commenting them out does not help. There is a macro _with_vanilla in the spec file. When i set this for rpmbuild -bb ..., i get an error message, so this does not work. Seems, noone uses this option. So the next thing i would try to do is reverse-engineer this patching mechanism and modify the spec file accordingly. But frankly after lots of hours i spent into this i'm really fed up. i wouldn't be surprised if this ended up in a wontfix because our hareware is too old ...
I hope "wontfix" won't follow "justbroke".
Sorry, I spoke carelessly: I don't have any special insight into what APIC-related change might have caused the issue. Do you have the same panic message as I do (with the addition of those command-line options)?
Thank you very much for the arguments for this super-cool slow-motion boot ! Didn't know this yet. What i get can be seen here: http://www.muc.de/~af/scr.jpg . I did not see the video so i cannot compare now. I see a null pointer memory access early in native_apic_mem_read . Can't check what this means now, because i'm on the way to vacation, away from computers for a few days. Could have to do with the patch abolishing the apic version array, but not necessarily. Will look at the kernel code Monday, if noone else will have done until then.
(In reply to Albert Flügel from comment #0) > Description of problem: > Upgraded kernel to current 4.7.9 > > Version-Release number of selected component (if applicable): > 4.7.9-200 > > How reproducible: > Install kernel 4.7.9-200 and boot on a Dell Inspiron 500m, also called > Latitude D500 > > Steps to Reproduce: > 1. Install kernel 4.7.9 > 2. reboot > > > Actual results: > After Grub message "Loading initial ramdisk" nothing happens anymore > > Expected results: > Linux kernel starts Same problem with IBM ThinkPad X31 (works with 4.7.6). Generation later (IBM ThinkPad X60) is fine with 4.7.9. Both 32 bit.
Created attachment 1215884 [details] Patch to make kernel 4.7.9 boot again on certain architectures put into SOURCES , add line like this to kernel.spec : Patch899: cpu_from_apic_too_early.patch and rebuild as usual rpmbuild -bb ...
Problem is, as can be seen here: http://www.muc.de/~af/scr4.jpg , that hard_smp_processor_id is called by prefill_possible_map in a very early boot stage. prefill_possible_map collects infos about CPUs to fill some data structures. hard_smp_processor_id makes calls to functions in the fixmap area, that make accesses to kind of memory mapped hardware registers, e.g. the APIC. The problem is, that in this early stage of booting this kind of memory management is not established yet, so this cannot work. This causes the message BUG: unable to handle kernel paging request at ffffc020 visible in http://www.muc.de/~af/scr.jpg . The named address is exactly the one, where the APIC status register is expected. It is also present in the EAX register and the instruction at native_apic_mem_read + 17 is a mov from this address. So this access leads to an oops. Looking at prefill_possible_map one can see, that the value obtained from hard_smp_processor_id is used just for an informational output. The attached patch comments out the call to hard_smp_processor_id and replaces the cpu identifier in the output with the already available apic cpu id. This makes the kernel boot again. Probably the resulting output is not what the upstream maintainers want to see. However, for now it is imo better to have a somewhat unappropriate output compared to an unbootable linux. RPM packages to test on i686 (no PAE !) can be downloaded here: http://www.muc.de/~af/linux Feedback is welcome, upstream communication requested.
The patch works also for 4.8.4. Packages for i686 also in http://www.muc.de/~af/linux
The problem came in with commit 2a51fe083eba7f99cbda72f5ef90cdf2f4df882c . However, it's just the call to hard_smp_processor_id that leads to oops.
RPMs with the patch built into kernel 4.8.4 also for i686-PAE and the x86_64 architecture can be found in http://www.muc.de/~af/linux
(In reply to Albert Flügel from comment #25) > Problem is, as can be seen here: http://www.muc.de/~af/scr4.jpg , that > hard_smp_processor_id is called by prefill_possible_map in a very early boot > stage. > prefill_possible_map collects infos about CPUs to fill some data structures. > hard_smp_processor_id makes calls to functions in the fixmap area, that make > accesses to kind of memory mapped hardware registers, e.g. the APIC. The > problem is, that in this early stage of booting this kind of memory > management is not established yet, so this cannot work. This causes the > message > BUG: unable to handle kernel paging request at ffffc020 > visible in http://www.muc.de/~af/scr.jpg . The named address is exactly the > one, where the APIC status register is expected. It is also present in the > EAX register and the instruction at native_apic_mem_read + 17 is a mov from > this address. So this access leads to an oops. > Looking at prefill_possible_map one can see, that the value obtained from > hard_smp_processor_id is used just for an informational output. The attached > patch comments out the call to hard_smp_processor_id and replaces the cpu > identifier in the output with the already available apic cpu id. This makes > the kernel boot again. > Probably the resulting output is not what the upstream maintainers want to > see. However, for now it is imo better to have a somewhat unappropriate > output compared to an unbootable linux. > > RPM packages to test on i686 (no PAE !) can be downloaded here: > http://www.muc.de/~af/linux > > Feedback is welcome, upstream communication requested. Is there an upstream thread for this (on LKML or other?)? P.
Can you also please test with linux.git commit 1e90a13d0c3d ("x86/smpboot: Init apic mapping before usage")? Thanks, P.
I don't know, whether there is any upstream thread. I did not initiate one. The problem does not show up with 4.8.6-201. Commit 1e90a13d0c3d looks making sense to me in this context. 4.8.6-201 seems to fix it differently by calling hard_smp_processor_id only if boot_cpu_has(X86_FEATURE_APIC) is true. Can be done this way. To build 4.8.4 with just this commit added would take my laptop another 5 hours' build and if it does not work right away take me probably more hours to adapt the code around to build.
Kernel 4.8.6-201.fc24.i686 solves the problem for my Dell Inspiron 8600 laptop (non-PAE Pentium M, 32-bit, 2GB RAM): it boots fine again! (see comment 8)
(In reply to Albert Flügel from comment #31) > I don't know, whether there is any upstream thread. I did not initiate one. > The problem does not show up with 4.8.6-201. ... Can confirm: Kernel 4.8.6-201 (32bit) is working on IBM ThinkPad X31. (didn't try 4.8.4)
+1, 4.8.4-200.fc24.i686 is broken and 4.8.6-201.fc24.i686 is working.
With 'APIC Mode - disabled' in BIOS kernel-4.8.6-201-x86_64 not bootable for me. HW in Comment #2.
Nick, do you have a chance to try the same with the patched 4.8.4 from http://www.muc.de/~af/linux ?
Albert, unfortunately your kernel is not bootable too. Please see my boot log: http://imgur.com/a/PvCrO http://imgur.com/a/CvrlA
Interesting. Think i'll really give 4.8.4 + this commit 1e90a13d0c3d a try. Frankly i can't say, that i can really judge what way of structuring the functionality makes more sense here. This 1e90a13d0c3d looks clearer to me. Initializing the APIC access before using it seems more logical than to initialize later and skip certain APIC related consisteny checks under some conditions before, where these conditions seem not clear. Some upstream maintainer familiar with the APIC stuff should have an eye on this. However, i'll try a build with 1e90a13d0c3d instead of the way 4.8.6-201 tries to avoid the problem. But this can take some time.
I built a kernel 4.8.4 with "x86/smpboot: Init apic mapping before usage" (i find this as commit 0c524f819683e9f1c165d571256a9023b56f1f0c) included, currently only for x86_64 (and without the other patch that appeared in 4.8.6) as 4.8.4-301 in http://www.muc.de/~af/linux . Builds for i686 and i686-PAE will follow as i find the time. I had to modify the patch a bit and will attach it here. I added it as Patch899 in the SPEC file.
Created attachment 1219816 [details] Patch adding "x86/smpboot: Init apic mapping before usage" to kernel build as Patch899
Albert, I got normal boot with your kernel-4.8.4-301. APIC disabled in BIOS.
My laptop also boots with this source configuration 4.8.4-301 as outlined in comment 39. Packages for i686 are also in http://www.muc.de/~af/linux now. i686-PAE will follow tomorrow. To me this seems the appropriate fix. The check boot_cpu_has(X86_FEATURE_APIC) in 4.8.6 probably yields true, but when for whatever reason (probably making problems) the APIC is switched off, this breaks the boot. The attached patch should be after Patch849 in the spec file.
4.8.7-200.fc24.x86_64 from fedora repo boot ok with APIC disabled in BIOS.
Kernel 4.8.8-100.fc23.i686 #1 SMP boots on my Dell C840! A big THANK-YOU to all who made this happen.
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There are a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 24 kernel bugs. Fedora 25 has now been rebased to 4.10.9-100.fc24. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 26, and are still experiencing this issue, please change the version to Fedora 26. If you experience different issues, please open a new bug report for those.
Everything fine with 4.10.9-100.fc24 , this bug is gone since several versions, thank you.