Bug 1302071 - i686 kernel-4.5.0-0.rc1.git0.1.fc24 does not boot on some systems (KVMs, some metal)
Summary: i686 kernel-4.5.0-0.rc1.git0.1.fc24 does not boot on some systems (KVMs, some...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: binutils
Version: 24
Hardware: i686
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Nick Clifton
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: https://fedoraproject.org/wiki/Common...
: 1302114 1302525 (view as bug list)
Depends On:
Blocks: TRACKER-bugs-affecting-libguestfs F24BetaFreezeException
TreeView+ depends on / blocked
 
Reported: 2016-01-26 17:21 UTC by Adam Williamson
Modified: 2017-07-31 18:48 UTC (History)
24 users (show)

Fixed In Version: binutils-2.26-18.fc24
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-04-23 23:45:01 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
.config used for vanilla kernel build (170.88 KB, text/x-mpsub)
2016-01-28 13:58 UTC, Bruno Wolff III
no flags Details
Rawhide builds from around the time this problem was noticed (2.68 KB, text/plain)
2016-02-10 22:20 UTC, Bruno Wolff III
no flags Details

Description Adam Williamson 2016-01-26 17:21:37 UTC
All 32-bit openQA tests for Rawhide 2016-01-26 failed:

https://openqa.fedoraproject.org/tests/overview?distri=fedora&version=Rawhide&build=Rawhide__20160126&groupid=1

from the video, it looks like the system reboots virtually the instant a grub menu entry is selected. There were no changes to grub or anything else lower level that I can see in the 2016-01-26 Rawhide report, so the most obvious possible culprit is the new kernel build, kernel-4.5.0-0.rc1.git0.1.fc24 . More details as I get them!

Per FESCo, AIUI, 32-bit images are no longer release blocking by policy for Fedora 24, but there is the question of upgrading existing 32-bit installs, so there may still be grounds for considering this a blocker bug. I don't know if anyone's considered that topic yet.

Comment 1 Adam Williamson 2016-01-26 17:28:39 UTC
Hmm, looks like this doesn't happen on all systems. I can reproduce in a KVM with CPU model 'Westmere' here, but it doesn't break the same on my bare metal test system.

In the VM, if I remove 'quiet', I can see that 'loading vmlinuz...' completes, but it reboots during 'loading initrd.img...'

on my bare metal test system the boot doesn't actually succeed, it dies in a fire of ATA exceptions, but I don't know if that's new as I don't boot on that system every day (haven't booted a 32-bit image on it since F23 final).

Comment 2 Adam Williamson 2016-01-26 17:40:50 UTC
Looks like the ATA issue is also new, yesterday's Rawhide installer boots fine on my test box, so I'll file that one too.

Comment 3 Adam Williamson 2016-01-26 18:17:44 UTC
This happens at least with CPU models 'Westmere' and 'qemu32'. openQA is using CPU model 'host', and the host CPU on the worker boxes is a Xeon E5540.

On the other hand, the image booted fine on a second try on my bare metal test box and also boots on one of my laptops, so this definitely seems specific to VMs in some way.

Comment 4 Adam Williamson 2016-01-26 18:21:31 UTC
The same problem (more or less, the system just stops at a blank screen with a cursor, instead of rebooting) occurs in the 32-bit upgrade tests, so this isn't specific to the installer / live images, it affects an upgrade too.

Comment 5 Josh Boyer 2016-01-26 20:18:07 UTC
*** Bug 1302114 has been marked as a duplicate of this bug. ***

Comment 6 Richard W.M. Jones 2016-01-26 20:49:24 UTC
Although bug 1302114 was marked as a duplicate, the symptoms
are different.  My kernel does not print any message at all.
I attached gdb to qemu and it seems as if seabios is either
jumping to an invalid entry point, or else the kernel goes off
the rails very early (ending up at a nonsense address with no
stack frame, so it's hard to tell exactly what is happening).

Comment 7 Adam Williamson 2016-01-26 21:43:16 UTC
well, we're booting somewhat differently, right? that presumably accounts for the difference. You're feeding the kernel direct to qemu (if I'm following your process correctly), I'm booting a Rawhide installer image, which runs through grub.

Comment 8 Richard W.M. Jones 2016-01-26 21:57:00 UTC
Definitely, yes.

Comment 9 Josh Boyer 2016-01-28 12:59:22 UTC
*** Bug 1302525 has been marked as a duplicate of this bug. ***

Comment 10 Bruno Wolff III 2016-01-28 13:54:05 UTC
I am not sure why my bug got marked as a dulicate of this bug, but in my case the machine is real i686 hardware and an almost vanilla (I need a patch for an i915 bug) 4.5 rc1 kernel does boot.

Comment 11 Bruno Wolff III 2016-01-28 13:58:33 UTC
Created attachment 1119146 [details]
.config used for vanilla kernel build

My .config file was not the same as that used for Fedora builds because it was left over from my bisecting an i915 bug rather than coppied fresh from the latest Fedora kernel. So that might be an important difference in addition to and patches Fedora is carrying.

Comment 12 Bruno Wolff III 2016-01-28 15:32:16 UTC
I should note my machine is an f23 machine using rawhide kernels, so that when I build kernels the tools used may have differnces from those used to build f24 kernels.

Comment 13 Adam Williamson 2016-01-28 16:51:28 UTC
Well, it's possible that the bug affects both VM CPUs and *some* real ones - it just means there's something in common between your real CPU and all the different virtual CPU 'models' I tried, but not in common with the two real CPUs I booted on. That's perfectly plausible.

Comment 14 Adam Williamson 2016-01-28 16:57:34 UTC
One little tidbit - I notice now in the openQA upgrade tests, the machine stops with some possibly useful info displayed:

Probing EDD (edd=off to disable)... ok

Failed to allocate space for phdrs

-- System halted

Comment 15 Bruno Wolff III 2016-01-28 19:28:17 UTC
I got:
Failed to allocate space for phdrs

I am not sure about the system halted and not the Probing message, but I have rhgb and quiet set.

I'll try another upstream build with a .config file copied from config-4.5.0-0.rc1.git0.2.fc24.i686+PAE. I won't be able to test that until tomorrow morning. (I can't start the build until late tonight on that machine.) That should remove one more variable. I'll also do an rpmbuild on another f23 i686 machine to see if build on f23 is a differnce that matters. That one I can start sooner, but might not be finished before I need to sleep.)

Comment 16 Adam Williamson 2016-01-28 20:01:45 UTC
Yeah, that's the message that I find interesting. I just gave the others for context (in case it helps pin down exactly when/where things go sideways).

Comment 17 Felix Miata 2016-01-29 07:00:18 UTC
(In reply to awilliam from comment #14)
> One little tidbit - I notice now in the openQA upgrade tests, the machine
> stops with some possibly useful info displayed:

> Probing EDD (edd=off to disable)... ok

Above is as far as I get on host kt440, an Athlon XP dnf updated to 4.5.0-0.rc1.git0.2.fc24.i686+PAE:
model name	: AMD Athlon(tm) XP 2000+
stepping	: 2
cpu MHz		: 1674.292
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow up
 
With video=1blaaa on grub cmdline, not even that much shows up onscreen before complete system lockup.

Comment 18 Bruno Wolff III 2016-01-29 15:52:57 UTC
I tested a matching .config file and the system was bootable, so it doesn't look like a change to .config explains the difference. My rebuild of the source rpm on f23 was still running this morning so I haven't tested that yet.
It is possible that the patch to fix bug 1301374 avoids the issue for me. That patch keeps my system from being treated like some VMs. It was causing a black screen, but perhaps the symptons have changed since the problem started. To test this I will need to rebuild rc1 without that fix and see how things fail.

Comment 19 Richard W.M. Jones 2016-01-29 16:46:23 UTC
Don't know if this is related or not, but armv7hl has started
hanging now too: bug 1303147.

Comment 20 Bruno Wolff III 2016-01-30 01:27:57 UTC
I rebuilt the rpms for 4.5.0-0.rc1.git0.2 on an f23 instance and when I boot with it, I get the black screen of bug 1301374 rather than the error message about not being able to allocate space for PHDRS. This suggests that the issue is related to a tool difference between f23 rawhide. This doesn't indicate whether a change in tooling between rc0.git9 and rc1.git0 occurred or if kernel code changed that triggered an existing tool difference to cause problems.

Comment 21 Bruno Wolff III 2016-01-30 01:52:00 UTC
kernel-PAE-core-4.5.0-0.rc1.git1.2.fc24.i686 also results in a black screen. So maybe this bug is fixed there. I can't tell for sure until the other bug gets fixed.

Comment 22 Felix Miata 2016-02-03 22:37:09 UTC
Kernel 4.5.0-0.rc2.git1.1.fc24.i686 made this go away on Sempron host m7ncd here.

Comment 23 Richard W.M. Jones 2016-02-04 08:57:42 UTC
(In reply to Felix Miata from comment #22)
> Kernel 4.5.0-0.rc2.git1.1.fc24.i686 made this go away on Sempron host m7ncd
> here.

4.5.0-0.rc2.git1.1.fc24.i686+PAE fails on qemu, but in a different way.
It causes qemu to exit (with EXIT_SUCCESS!) as soon as SeaBIOS transfers
control to the kernel.

Comment 24 Adam Williamson 2016-02-04 09:35:16 UTC
For openQA, the 32-bit upgrade tests actually *worked* late last night (European time) - I think they may have run late enough that they got the 2016-02-04 packages from the mirrors and thus 4.5.0-0.rc2.git1.1.fc24.i686 . The image tests all failed again, but the images tested were the 2016-02-03 images so they had the previous kernel. We'll find out with tonight's run if the images start working again.

Comment 25 Steven Usdansky 2016-02-04 21:10:39 UTC
Same error message as in comment #15
kernel is 4.5.0-0.rc2.git2.1.fc24.i686

Comment 26 Felix Miata 2016-02-05 20:03:40 UTC
My previous comments here may have been about CRS:
https://bugzilla.kernel.org/show_bug.cgi?id=111901

Comment 27 Felix Miata 2016-02-06 08:49:21 UTC
(In reply to Felix Miata from comment #17)
> (In reply to awilliam from comment #14)
> > One little tidbit - I notice now in the openQA upgrade tests, the machine
> > stops with some possibly useful info displayed:
 
> > Probing EDD (edd=off to disable)... ok
 
> Above is as far as I get on host kt440, an Athlon XP dnf updated to
> 4.5.0-0.rc1.git0.2.fc24.i686+PAE:
> model name	: AMD Athlon(tm) XP 2000+
> stepping	: 2
> cpu MHz		: 1674.292
> flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
> pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow up

> With video=1blaaa on grub cmdline, not even that much shows up onscreen
> before complete system lockup.

DejaVu, but on host t2240 with i845G video, ICH4 (PATA), 1G RAM, and kernel 4.5.0-0.rc2.git2.1.fc24.i686.

With default 80x25 video and no Plymouth, the Probing... line starts out on line 9, but moves twice before the halt, ending up on line 4.

Comment 28 Felix Miata 2016-02-08 06:54:35 UTC
DejaVu, but on host gx270 with i865G video, ICH5 (PATA), 1G RAM, and kernel 4.5.0-0.rc2.git3.1.fc24.i686.

Again with default 80x25 video and no Plymouth, the Probing... line starts out on line 9, but moves twice before the halt, ending up on line 4.

Comment 29 Bruno Wolff III 2016-02-09 15:17:45 UTC
I am still seeing this with kernel-4.5.0-0.rc3.git0.1.fc24 which has the fix for bug 1301374. Vanilla 4.5-rc3 with the config file from kernel-4.5.0-0.rc3.git0.1.fc24 boots. I am in the process of rebuilding from the srpm on f23 to see if the important difference is between f23 and f24 or between Fedora and upstream.

Comment 30 Bruno Wolff III 2016-02-10 16:53:00 UTC
I rebuilt 4.5.0-0.rc3.git0.1 on f23 and it worked. So it looks like the problem is due to a change in the build environment rather than a change to the kernel. 
With the mass rebuild done, I'll be starting to move my i686 systems over to f24. That will give me a chance to try rebuilding the rpms on i686/f24 to see whether or not the change is between f23 and f24 or if there is some other change to the build environment.

Comment 31 Bruno Wolff III 2016-02-10 22:20:01 UTC
Created attachment 1122947 [details]
Rawhide builds from around the time this problem was noticed

I went and looked for rawhide builds that could have potentially triggered this problem. I can possibly try some of these that seem more plausible for causing a problem on f23 while waiting for my other i686 machine to be upgraded to f24 and doing a rpm rebuild from the srpm there.

Comment 32 Josh Boyer 2016-02-11 01:17:59 UTC
(In reply to Bruno Wolff III from comment #31)
> Created attachment 1122947 [details]
> Rawhide builds from around the time this problem was noticed
> 
> I went and looked for rawhide builds that could have potentially triggered
> this problem. I can possibly try some of these that seem more plausible for
> causing a problem on f23 while waiting for my other i686 machine to be
> upgraded to f24 and doing a rpm rebuild from the srpm there.

None of those should have an impact on kernel builds.

However, when I looked at this last week I did notice that binutils changed versions from a working build vs a failing build (or I think I noticed that anyway).  It's possible binutils-2.26 introduced an issue, and that would be backed up by it working when built on f23.

IIRC, gcc hadn't been updated to 6.0 yet at the initial failure point.

Comment 33 Bruno Wolff III 2016-02-11 15:07:58 UTC
I'll work on testing the binutils theory. Thanks for mentioning that, as I wasn't seeing anything in the list that looked like a candidate for causing trouble either.

Comment 34 Bruno Wolff III 2016-02-12 05:11:41 UTC
I tested building a vanilla kernel with binutils-2.26-1.fc24.i686 and binutils-devel-2.26-1.fc24.i686 and the kernel worked. I'm working on testing this on a second machine, but since the recent rawhide nodebug kernels don't boot, it looks like the binutils upgrade is not the trigger (at least on its own).
The next step is to get an i686 machine completely on rawhide and test a new build.

Comment 35 Bruno Wolff III 2016-02-12 14:05:51 UTC
On the second machine the build using binutils-2.26-1.fc24.i686 resulted in a kernel that rebooted right away. So I think I need to try with the latest binutils as well.

Comment 36 Bruno Wolff III 2016-02-14 03:57:52 UTC
After building a vanilla kernel on i686 f24, it wouldn't build. I tried downgrading binutils on f24, but that broken things and I couldn't test building a vallina kernel with f24 and the old binutils. Still it does look like binutils may be involved with kernels not wprking on i686.

Comment 37 Bruno Wolff III 2016-02-14 15:40:56 UTC
That should read that a vanilla kernel built on an i686 f24 system wouldn't boot. I then tried downgrading binutils to binutils-2.25-15.fc23.i686, but then the kernel make failed pretty quickly when trying to build a vanilla kernel. To definitely test this, I suspect we'd want an f24 system with all packages from right after binutils-2.26 went in. Then compare before and after downgrading it back to 2.25. I don't know if there is an easy way to do that though.

Comment 39 Bruno Wolff III 2016-02-15 16:51:50 UTC
I found that nightly compose images are still available from just before the binutils change. I grabbed a couple of days worth of workstation live images and will test them shortly to make sure at least one is bootable. I can use these as a base and pull in whatever else is needed for building kernels by hand without having to do a huge amount of work. This should allow me to confirm whether or not the binutils update to 2.26 is what starting breaking kernels for i686.

Comment 40 Bruno Wolff III 2016-02-15 17:45:50 UTC
Fedora-Live-Workstation-i686-rawhide-20160125.iso boots. It has binutils 2.25. It has some issues running terminal, but I can use a VT. I need to install at least a copy of openssl-devel from before the mass rebuild. Git doesn't come with workstation either and that my be nice to have as well. If I try an rpmbuild that might need a bit more than a vanilla kernel build. That takes a lot longer, so I will at least start by testing vanilla kernel builds with this first.

Comment 41 Bruno Wolff III 2016-02-16 03:53:53 UTC
I ended up tring the srpm rebuilds first after I realized testing a vanilla kernel build done on a live image was going to be a pain. I got the extra packages I needed from the correct era installed on the live image and have started running the first srpm rebuild for kernel-4.5.0-0.rc3.git0.1.fc24. (I chose that kernel because it has a fix for a severe video bug on the machine I will be testing the kernel on.) The srpm build should take between .5 and 1.5 days. Hopefully nothing goes wrong with the build. I'll report back on whether or not the first kernel works (I'm expecting it to), upgrade binutils and binutils-devel and start another srpm build.

Comment 42 Justin M. Forbes 2016-02-16 04:42:59 UTC
I really appreciate the effort you are putting into this, thanks!

Comment 43 Felix Miata 2016-02-16 13:19:34 UTC
kernel-PAE 4.5.0-0.rc3.git3.1.fc24.i686 on Athlon XP2100+ VIA KT400 host kt88b is booting normally.

Comment 44 Bruno Wolff III 2016-02-16 16:38:44 UTC
People who wanted i686 were asked to help out, so I am helping out.
The run I started last night ran out of space. I think I have it set up now with enough space for a rebuild.
I am also going to try rebuilding binutils-2.25 from the src rpm on current f24, to see if that fixes the problem with being able to build the kernel. If so, I'll test the built kernel.
If this does turn out to be an issue with binutils, I am not sure how we narrow down the problem(s). So far my thought is we'd need to do a bisect on the upstream source while testing kernel builds after each build. I haven't looked yet to see how practical that is.

Comment 45 Bruno Wolff III 2016-02-16 19:42:48 UTC
Rebuilding binutils 2.25 didn't allow it to be used for building the kernel on current f24.

Comment 46 Bruno Wolff III 2016-02-17 16:08:13 UTC
I tested the kernel built on rawhide circa January 25th, before binutils was updated (so with 2.25) and it worked. I should be able to test a build using 2.26 tonight. The builds take less than half a day, but I ran out of space twice which is why it took this long.

Comment 47 Bruno Wolff III 2016-02-18 01:09:57 UTC
I tested rebuilding the kernel with the same system as above with binutils-2.26-1.fc24 and the resulting kernel didn't boot. This strongly suggests that something is broken in binutils 2.26 with regard to i686.

Should this bug have its component changed?

How do we want to procede?
Setting up a bisect of binutils seems tricky.

Comment 48 Justin M. Forbes 2016-02-18 14:29:30 UTC
Yes, I have changed the component to binutils, I really appreciate all the effort you have put in here.

Comment 49 Nick Clifton 2016-02-19 10:56:04 UTC
Please could you try:

binutils-2.26-12.fc24

This *might* fix the problem, but without a binutils based testcase I cannot be sure...

Comment 50 Bruno Wolff III 2016-02-19 15:15:12 UTC
Yes. I think I'll be able to start the build from the office and be able to test it tonight.

Comment 51 Bruno Wolff III 2016-02-19 18:26:30 UTC
I have started building new kernel rpms on the circa January 25th system with binutils and binutils-devel upgraded to 2.26-12.fc24. The build seems to be working. If nothing goes wrong, I should be able to test the kernel late tonight.

Comment 52 Bruno Wolff III 2016-02-20 01:52:21 UTC
The kernel built with binutils 2.26-12 did not boot.

Comment 53 Nick Clifton 2016-02-22 11:52:46 UTC
Darn - I was really hoping that that would work.

Do you have any, binutils specific, idea as to why the kernel is not booting ?

One other thing that might be worth trying, if you are able to do so, is to use
a set of binutils sources based on the current latest FSF development sources.
There were some bugs reported and fixed against the 2.26 release which have not
yet made it into any official release anywhere.  If however you are able to 
prove that the upstream FSF sources work, then I can try back-porting patches to
see if we can create a rawhide rpm that works.

Comment 54 Bruno Wolff III 2016-02-22 15:17:38 UTC
OK, I'll look at trying to build a vanilla upstream. I'd still like it in an rpm, as it can't be parallel installed like kernels. Worst case I'll use rpm to remove it without removing dependencies.

The other thing I was wondering is if mix and matching between 2.25 binaries and 2.26. For example I think it would be interesting to replace just the 2.26 ld with the 2.25 version and see if builds boot. But I don't know if that has even a chance of working.

Comment 55 Adam Williamson 2016-02-22 16:38:24 UTC
Well, something that someone changed recently fixed something, because the openQA 32-bit tests starting booting again last night:

https://openqa.fedoraproject.org/tests/5766

I guess it must've been the binutils?

Comment 56 Bruno Wolff III 2016-02-22 16:54:20 UTC
I'll test that kernel. I should be able to use the live media to start testing on my work desktop, but I'll also want to test it at home tonight.
There could be multiple issues and I might have a different one masking this bug. To some extent that happened earlier with this bug overlapping an i915 bug, which both were preventing booting.

Comment 57 Bruno Wolff III 2016-02-22 19:23:46 UTC
Fedora-Live-Xfce-i386-Rawhide-20160213.iso worked on my work desktop. I will test things on two different i686 only machines tonight. Both of those machines were broken by kernels built after binutils 2.26 landed in rawhide. If there is another kernel bug that affected the one I tested on last time, hopefuly it won't apply to the other. It's also possible some other update in rawhide is making 2.26 better and 2.26-12 on an older rawhide wasn't good enough on its own.

Comment 58 Bruno Wolff III 2016-02-23 00:46:26 UTC
I tried Fedora-Live-Xfce-i386-Rawhide-20160213.iso on one machine and it just rebooted right away. I'll try it on the other machine later tonight. I'll also be trying the latest rawhide nodebug kernels out.

Comment 59 Bruno Wolff III 2016-02-23 03:25:14 UTC
I forgot I need a CD to test Fedora-Live-Xfce-i386-Rawhide-20160213.iso on the other machine because it has a really old bios and I don't have any blanks handy.
However I was able to test kernel-PAE-4.5.0-0.rc5.git0.1.fc24.i686 on both machines and it successfully booted on both. So it does look like this was fixed, but I am not sure exactly what changed to fix things.

Comment 60 Nick Clifton 2016-02-23 10:48:27 UTC
(In reply to Bruno Wolff III from comment #54)
> OK, I'll look at trying to build a vanilla upstream. I'd still like it in an
> rpm, as it can't be parallel installed like kernels.

I can create a scratch binutils source rpm for you if that will help.

> The other thing I was wondering is if mix and matching between 2.25 binaries
> and 2.26. For example I think it would be interesting to replace just the
> 2.26 ld with the 2.25 version and see if builds boot.

If you are using static binaries then it should just work.  But if you have dynamic binaries that use the shared libbfd.so then it is unlikely that mixing
and matching will work.


(In reply to Adam Williamson from comment #55)

> Well, something that someone changed recently fixed something, because the 
> openQA 32-bit tests starting booting again last night:

Yay.

> I guess it must've been the binutils?

I only made one change to the binutils rpm.  It was a big change, but when Bruno tested it, it did not work for him.  So whilst I would like to think that it was my binutils patch that fixed things, I cannot be sure.

Comment 61 Bruno Wolff III 2016-02-23 15:25:48 UTC
It is possible there was something else about current rawhide affecting the change. For my test I built on rawhide as of January 25th, except for an updated binutils. For example gcc bug fixes might account the change in combination with updated binutils. Tracking it down will be hard. I have a small worry, that the updated binutils might be working by chance and things might break again later. But I don't think it is worth the effort to rule thus out right now. If we get a regression later we can come back to this.

Comment 62 Felix Miata 2016-02-24 07:33:56 UTC
On i686 Rawhide on i865G host gx28b last updated >2 months ago, dnf update produced apparently normal booting kernel 4.5.0-0.rc5.git0.1.

Comment 63 Jan Kurik 2016-02-24 15:29:51 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 24 development cycle.
Changing version to '24'.

More information and reason for this action is here:
https://fedoraproject.org/wiki/Fedora_Program_Management/HouseKeeping/Fedora24#Rawhide_Rebase

Comment 64 Bruno Wolff III 2016-02-27 15:15:47 UTC
Failed to allocate space for phdrs has appeared again with kernel-PAE-core-4.5.0-0.rc5.git0.2.fc24.i686 and is failing to boot. kernel-PAE-core-4.5.0-0.rc5.git0.1.fc24.i686 works fine. My systems are all on branched now. I notice there was a binutils update yesterday, but I don't know yet if kernel-PAE-core-4.5.0-0.rc4.git2.2.fc24.i686 was built with it.

Comment 65 Bruno Wolff III 2016-02-27 15:24:40 UTC
It turns out it was built  with the older version, 2.26-12.fc24. So the binutils update didn't cause the problem. The older version was also installed when the initramfs was built.
It may be that there are multiple problems and some were fixed by changes to binutils, but not all. Or there may be an interaction between binutils and something else that causes a problem.

Comment 66 Bruno Wolff III 2016-03-01 04:04:55 UTC
kernel-PAE-core-4.5.0-0.rc6.git0.1.fc24.i686 is working for me.

Comment 67 Thorsten Leemhuis 2016-03-01 10:41:44 UTC
Wonder if that might be thx to https://git.kernel.org/torvalds/c/bf70e5513dfea29c3682e7eb3dbb45f0723bac09 
FWIW, for me it didn't help afaics: my test script still failed to boot a rc6 vanilla build with a PAE configuration in Qemu-KVM (RHEL7 host). I might be able to try kernel-PAE-core-4.5.0-0.rc6.git0.1.fc24.i686 later

Comment 68 Adam Williamson 2016-03-08 02:01:10 UTC
This or something similar seems to have come back again recently. All openQA 32-bit tests are failing again. Booting today's F24 32-bit boot.iso in a VM, it fails shortly after grub with:

Failed to allocate space for phdrs. ok

 -- System halted

showing on the console. It looks like we had successful boots with Fedora-24-20160303.n.0 but not with Fedora-24-20160304.n.0 . kernel-4.5.0-0.rc6.git2.1.fc24 looks like the suspect.

One thing I notice: the openQA tests started working again right around 2016-02-21, and that's right when we got 4.5.0-0.rc5.git0.1 , with " Disable debugging options."...and they've just broken again, right after we got kernel-4.5.0-0.rc6.git2.1.fc24 , with "- Reenable debugging options." Coincidence?

I see a kernel-4.5.0-0.rc7.git0.1.fc24 got done today which disables debugging again, so I guess we'll see if the 20160308 composes suddenly boot again.

Comment 69 Felix Miata 2016-03-08 02:15:23 UTC
After last night's F24 update, rebooting into new kernel on 32bit host gx27 locks up as soon as Grub menu disappears.

Comment 70 Bruno Wolff III 2016-03-09 13:11:54 UTC
I am seeing this with 4.5.0-0.rc6.git3.1.fc24.i686 and 4.5.0-0.rc7.git0.2.fc25.i686. 4.5.0-0.rc6.git2.2.fc25.i686 works normally.

Comment 71 Bruno Wolff III 2016-03-10 02:33:58 UTC
kernel-PAE-core-4.5.0-0.rc7.git1.2.fc25.i686 also doesn't boot.

Comment 72 Bruno Wolff III 2016-03-11 04:50:41 UTC
kernel-PAE-core-4.5.0-0.rc7.git2.2.fc25.i686 also doesn't boot.

Comment 73 Bruno Wolff III 2016-03-12 01:55:31 UTC
kernel-PAE-core-4.5.0-0.rc7.git3.2.fc25.i686 works.

Comment 74 Felix Miata 2016-03-13 04:36:23 UTC
On host gx28c, kernel-PAE-core-4.5.0-0.rc7.git0.2.fc24.i686 locks up requiring power switch after displaying Probing EDD (edd=off to disable)... ok.

Comment 75 Алексей Смирнов 2016-03-13 10:43:33 UTC
The same problem (Failed to allocate space for phdrs -- System halted) in the kernel-PAE-core-4.5.0-0.rc7.git3.1.fc25.i686 tested in gnome-boxes.

Comment 76 Fabian Vogt 2016-03-14 10:39:24 UTC
FYI, the phdrs issue is caused by a new optimization of R_386_GOT32X (relaxed relocations) in ld, which needs to be disabled to make the kernel's relocation work again: https://sourceware.org/bugzilla/show_bug.cgi?id=19807

Some more info on https://bugzilla.opensuse.org/show_bug.cgi?id=970239

Comment 77 Nick Clifton 2016-03-14 17:35:49 UTC
Hi Fabian,

> FYI, the phdrs issue is caused by a new optimization of R_386_GOT32X
> (relaxed relocations) in ld, which needs to be disabled to make the kernel's
> relocation work again: https://sourceware.org/bugzilla/show_bug.cgi?id=19807

Note - the patch in this PR is a kernel patch, not a binutils patch (despite
the fact that the problem is being reported on the FSF binutils bugzilla system).

Also note that at the time of writing (Mon Mar 14, 17:30 UCT) the PR is still open as the patch only appears to be effective when the current mainline development binutils sources are used, not the 2.26 release sources.  So there
may need to be both a binutils patch *and* a kernel patch created before this
issue can be resolved.

Comment 78 Adam Williamson 2016-03-23 18:04:23 UTC
So just as a quick update on this, though I don't have any super solid information...dgilmore points out that a 32-bit Cloud image build for Rawhide from yesterday succeeded:

http://koji.fedoraproject.org/koji/taskinfo?taskID=13422645

Cloud image builds involve booting and running an install, so 32-bit Cloud image builds have mostly been failing on this bug lately. It's possible this is now somehow working in Rawhide.

Rawhide is on 4.6 kernels, F24 still on 4.5. binutils seems to be even for F24 and Rawhide ATM, but the last *stable* binutils for F24 is 2.26-14.fc24 - 2.26-17.fc24 is still in updates-testing and will not be used for package builds or included in nightlies. So the two fixes listed as being for bug #1312507 will not be applied to F24 kernel builds or nightly composes yet.

If someone wants to verify this appears to be fixed in Rawhide and try to figure out why, so the fix can be applied to F24, that'd be great. The updated binutils should hit F24 normally in due course, but F24 is not going to go to the 4.6 kernel series until after GA, so if we need any kernel-side fix it would be good to isolate it for backporting to 4.5.

Comment 79 Richard W.M. Jones 2016-03-23 21:00:59 UTC
I just kicked off a scratch build of libguestfs with the
i686 test enabled, in Rawhide, and it failed in the same way:

https://kojipkgs.fedoraproject.org//work/tasks/9025/13439025/build.log
https://kojipkgs.fedoraproject.org//work/tasks/9025/13439025/root.log

Therefore I think this is not fixed.

Comment 80 Adam Williamson 2016-03-23 21:26:17 UTC
Well, and today's 32-bit cloud composes also failed. But jforbes also said he had success with a test yesterday. So it seems that for some reason, yesterday's 32-bit Rawhide may have been good, but today's not.

Comment 81 Bruno Wolff III 2016-03-24 01:03:39 UTC
Recent nodebug kernels have been booting for me. However there is a USB problem so I need to go back and grab an older kernel that worked if I want to use my headphones.

Comment 82 Ralf Corsepius 2016-03-24 04:44:46 UTC
None of the fc24-kernels boot for me. All fc23-kernels do.

Applying H.J.Lu's patch from https://bugzilla.kernel.org/attachment.cgi?id=209601 to the fc24 kernels, at least lets booting not fail immediately for me. Booting fails later on, seemingly because of i915 related problems.

The version of binutils in use (*-14 or *-17) doesn't seem to make a difference.

Comment 83 Bruno Wolff III 2016-04-07 16:09:48 UTC
Has anyone noticed a pattern to which recent kernels work and which don't? Now that my USB problem is fixed (tested on x86_64) I am waiting for a new 4.6 kernel that boots on i686 to use.

Comment 84 Justin M. Forbes 2016-04-07 16:41:51 UTC
It is not particularly relevant to this bug, but the same kernel that fixed your USB audio should boot on i686. It did on the test systems, so if it doesn't on your system please open a new bug for it.

Comment 85 Ralf Corsepius 2016-04-07 17:20:47 UTC
(In reply to Justin M. Forbes from comment #84)
> It is not particularly relevant to this bug, but the same kernel that fixed
> your USB audio should boot on i686. It did on the test systems, so if it
> doesn't on your system please open a new bug for it.

Thanks, kernel-4.5.0-302.fc24.i686+PAE (In case this what you are referring to) boots for me, but now I back to 
https://bugzilla.redhat.com/show_bug.cgi?id=1307033

Comment 86 Bruno Wolff III 2016-04-07 19:31:46 UTC
Neither 4.6.0-0.rc2.git2.1.fc25 nor 4.6.0-0.rc2.git2.2.fc25 booted on my laptop, though I think the symptoms were different. One got the pheader message and one instantly rebooted. I'll retest over the weekend. I probably will be too tired by the time I get home tonight to futz with it. But if I am feeling fairly awake, I'll try to document which kernel had which behavior and make sure I didn't boot a different one than I thought when trying things out last night and this morning.

Comment 87 Bruno Wolff III 2016-04-08 04:40:43 UTC
kernel-PAE-core-4.6.0-0.rc2.git2.1.fc25.i686 reboots instantly and kernel-PAE-core-4.6.0-0.rc2.git2.2.fc25.i686 displays the phdr message and then hangs until the power button is used to reset the machine.

Comment 88 Bruno Wolff III 2016-04-08 05:07:15 UTC
kernel-PAE-core-4.6.0-0.rc2.git3.1.fc25.i686 hangs immediately without printing anything to the console.

Comment 89 Bruno Wolff III 2016-04-08 14:04:17 UTC
kernel-PAE-core-4.6.0-0.rc2.git3.2.fc25.i686 also hung. I removed the rhgb and quiet options to see if there was more useful output, but all I saw was a quick flash of a line starting with "probing".
Usually if it is an i915 issue, things would get a bit further before bad stuff happens. Testing gets to be pretty hard if there are multiple problems causing boot failures.

Comment 90 Adam Williamson 2016-04-11 21:59:52 UTC
Update for the openQA case: Rawhide 32-bit tests started booting again with Fedora-Rawhide-20160409.n.0 and have booted with every compose since then (so 0410.n.0 and 0411.n.0). F24 32-bit tests are still all failing to boot.

Comment 91 Bruno Wolff III 2016-04-12 03:34:05 UTC
I am using a fedora 24 base system with rawhide kernels.
kernel-PAE-core-4.6.0-0.rc3.git0.1.fc25.i686 worked for me. So now I can use my headphones and a relatively recent kernel. It still seems rather random which kernels boot and which don't.

Comment 92 Justin M. Forbes 2016-04-13 12:03:18 UTC
Right, so current status, as of a while now.  Rawhide 32bit kernels are booting fine most of the time, we had one fail, but that was during the merge window so not uncommon. F24 kernels are still failing, and it seems binutils in F24 has not been updated.

Comment 93 Felix Miata 2016-04-13 21:22:34 UTC
Last night I updated a Rawhide Athlon installation previously migrated from F24. It boots/booted normally the prior 4.5.0.rc3.git3.1.fc24, but locks up as soon as 4.6.0.rc2.git4.1.fc25's initrd loads. Uninstalling 4.6.0.rc2.git4.1.fc25 and repeating dnf update didn't help. I then updated same machine's F24 from which Rawhide had been derived, still on 4.5.0.rc3.git3.1.fc24, to find 4.5.0-302.fc24 triggers reboot as soon as initrd loads.

Comment 94 Bruno Wolff III 2016-04-15 15:49:30 UTC
With kernel-PAE-core-4.6.0-0.rc3.git1.2.fc25.i686 I am getting the phdrs message again with a lock up. This is a rawhide nodebug kernel on an f24 system.

Comment 95 Justin M. Forbes 2016-04-20 14:16:02 UTC
This is an F24 bug at this point, and basically is waiting on the binutils update that has been sitting in rawhide for a long time now. Please open a different bug against rawhide for 4.6 issues 32bit issues. Adding more kernel information to a binutils bug with a known fix that hasn't been pushed for some reason is just confusing the issue.

Comment 96 Adam Williamson 2016-04-20 15:03:52 UTC
Looks like Nick just sent the binutils build for F24:

http://koji.fedoraproject.org/koji/buildinfo?buildID=756276

however, F24 is now in Beta freeze. So I'm proposing this as a Beta freeze exception, justification is that it prevents all the 32-bit images from working (on at least a lot of systems, maybe all - did we ever pin that down?); they're no longer release blocking but we *do* care and would like them to work.

Nick, can you please submit an update for F24 when the build is done and mark it as fixing this bug? Thanks!

Comment 97 Nick Clifton 2016-04-20 15:55:36 UTC
Sorry for the delay.  The patch should now be in F24:

  binutils-2.26-18.fc24

Cheers
  Nick

Comment 98 Adam Williamson 2016-04-20 16:09:39 UTC
That's great, but we do also need an update. Bodhi is in effect for f24 since alpha freeze. So there's two ways we can do this. Justin - you should be able to create a build root override now and do the kernel build. So we can do the kernel build first then create an update with both packages, or create the update with binutils now and edit the kernel in afterwards. Whatever works for you.

Comment 99 Fedora Update System 2016-04-20 23:19:05 UTC
binutils-2.26-18.fc24 kernel-4.5.2-300.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-7f37d42add

Comment 100 Kevin Fenzi 2016-04-20 23:19:29 UTC
+1 FE here.

Comment 101 Fedora Update System 2016-04-21 21:58:13 UTC
binutils-2.26-18.fc24, kernel-4.5.2-300.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-7f37d42add

Comment 102 Dennis Gilmore 2016-04-22 14:18:39 UTC
+1 FE

Comment 103 Jon Disnard 2016-04-22 14:24:10 UTC
+1 Freeze Exception

Comment 104 Adam Williamson 2016-04-22 14:41:18 UTC
that's +4 (counting me), setting acceptedFE.

Comment 105 Fedora Update System 2016-04-23 19:45:31 UTC
binutils-2.26-18.fc24 kernel-4.5.2-301.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-7f37d42add

Comment 106 Fedora Update System 2016-04-23 23:44:15 UTC
binutils-2.26-18.fc24, kernel-4.5.2-301.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report.

Comment 107 Adam Williamson 2016-04-25 05:47:48 UTC
This does indeed seem fixed in the openQA case; the net install tests failed because they got the previous day's kernel due to a timing issue in when the tests are kicked off, but the installer worked, and the server DVD test passed, so this looks good.

Comment 108 Richard W.M. Jones 2016-04-25 10:29:08 UTC
Can confirm the 32 bit kernel boots in qemu too.

Comment 109 Adam Williamson 2016-04-25 14:24:29 UTC
openQA uses qemu :)

Comment 110 Felix Miata 2016-04-25 21:07:22 UTC
Solved on 32 bit P4 host gx260 by dnf upgrading existing F24 installation to 4.5.2-301.


Note You need to log in before you can comment on or make changes to this bug.