Bug 1369934

Summary: Fedora 25 cloud images built with syslinux do not boot
Product: [Fedora] Fedora Reporter: Adam Williamson <awilliam>
Component: syslinuxAssignee: Peter Jones <pjones>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 26CC: bugzilla, kevin, mattdm, pjones, redhat-bugzilla, robatino
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: syslinux-6.04-0.7.fc27 syslinux-6.04-0.7.fc26 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-10 05:05:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description Adam Williamson 2016-08-24 19:14:29 UTC
Per https://bugzilla.redhat.com/show_bug.cgi?id=1369794 , as reported by lbrabec , Fedora 25 cloud images do not boot. They just hang approximately where the bootloader should be, well, booting.

We kind of hijacked 1369794 to be about a few bugs with post-install configuration which certainly were happening and were certainly messing up the images, but it seems that even with these fixed, the produced image still doesn't *boot*. So the boot failure seems to be something else.

Assigning to syslinux because it's always goddamn syslinux. (It did go from 6.03 to 6.04-pre1 since F24. F24 nightly cloud images, wiht syslinux-6.03-8, still boot OK).

Comment 1 Adam Williamson 2016-08-24 20:15:21 UTC
Bit more detail here.

https://bugzilla.redhat.com/show_bug.cgi?id=1369794 really *did* cause a bootloader problem: because of the anaconda post-install setup thread crashing, the `dd if=/usr/share/syslinux/mbr.bin of=/dev/vda` in %post which works around https://bugzilla.redhat.com/show_bug.cgi?id=1147998 never got run, so the Alpha images (and all nightlies) are suffering from that bug - they will never reach syslinux at all. That's why they hang at 'Booting from hard disk...' or wherever the firmware tries to hand off to the MBR loader.

So fixing that really does get us down the road...but it seems there's another issue behind that. https://kojipkgs.fedoraproject.org//work/tasks/6654/15366654/Fedora-Cloud-Base-25kevintest4-1.x86_64.qcow2 is an image built with the kickstart workarounds for #1369794 . It still fails to boot, but it doesn't fail to boot *in the same way*. It does actually reach syslinux, now - but syslinux fails to actually boot the kernel.

For me anyway, in a VM, it reaches the syslinux 'Welcome to Fedora. Automatic boot in 0 seconds. Press a key for options.' screen, then it gets stuck. If you watch the screen carefully, it actually flickers occasionally and you can very briefly see some kind of error message; it seems like syslinux is trying to load the kernel over and over and failing every time. The error message looks something like:

"Loading /boot/(kernelfilename) failed: Bad file number"

Comment 2 Adam Williamson 2016-08-24 21:58:45 UTC
This is an automatic blocker, per https://fedoraproject.org/wiki/QA:SOP_blocker_bug_process#Automatic_blockers : "Complete failure of any release-blocking TC/RC image to boot at all under any circumstance - "DOA" image (conditional failure is not an automatic blocker)"

For Alpha-1.2, we are going to try working around it by switching the cloud images back to grub2:

https://pagure.io/fedora-kickstarts/c/3d4d6ddc8d65a779d977f1c94eab73389f944e2f?branch=f25

if that works we can un-blocker it.

Comment 3 Adam Williamson 2016-08-24 22:06:01 UTC
Also as a note, I tried to reproduce this outside of the Cloud image build scenario, but couldn't yet. I thought https://www.happyassassin.net/ks/extlinux-dd.ks might do it, with these lines:

bootloader --timeout=1 --append="no_timer_check" --extlinux
%post
sfdisk -d /dev/vda > mbr.dump
sfdisk --force /dev/vda < mbr.dump
dd if=/usr/share/syslinux/mbr.bin of=/dev/vda
%end

but, no. If I run an F25 install from the Server DVD with that kickstart, it works OK.

If I change the partitioning lines to:

zerombr
clearpart --all
part / --fstype ext4 --grow

which is what's in fedora-cloud-base.ks , the installed system fails to boot, but not in the same way (syslinux fails to load because it can't find one of its files). I still can't find a way to recreate the exact way the cloud image fails.

Comment 4 Adam Williamson 2016-08-25 00:04:36 UTC
The following is sufficient to 'fix' this and make the disk boot:

mkdir /mnt/temp
mount /dev/vda1 /mnt/temp
cp -a /mnt/temp/boot /mnt/temp/boot2
mv /mnt/temp/boot /mnt/temp/bootold
mv /mnt/temp/boot2 /mnt/temp/boot
umount /mnt/temp

i.e., basically just cp -a the /boot directory and replace the original with the copy. after doing that, the image boots, for me. (thanks cmurf, who found this out).

Comment 5 Adam Williamson 2016-08-25 15:04:59 UTC
Alpha 1.2 passed autocloud:

https://apps.fedoraproject.org/autocloud/jobs/153

so the grub2 switch did work around it. Based on that, dropping the blocker status, but leaving the bug open as the bug does remain.

Comment 6 Adam Williamson 2016-08-29 22:44:07 UTC
Ady from IRC made a good catch here:

http://www.syslinux.org/wiki/index.php/Filesystem#ext

that sounds very much like it could be our issue: "As of Syslinux 6.03, "pure 64-bits", compression and/or encryption are not supported. Quoting part of the release notes of version 1.43 of e2fsprogs (May 17, 2016): 'Mke2fs will now create file systems with the metadata_csum and 64bit features enabled by default.' Users should rather (manually) disable the 64bit feature in the mke2fs command when creating / formatting a boot volume with ext4; otherwise, the bootloader (as of version 6.03) will fail."

so, I guess we can look at somehow causing the cloud image's / partition to not have the '64bit feature' enabled...

Comment 7 Adam Williamson 2016-08-29 22:44:58 UTC
oh, and yes, the e2fsprogs versioning adds up: Fedora 24 has 1.42.13 , Fedora 25 and Rawhide have 1.43.1.

Comment 8 Kevin Fenzi 2016-08-30 00:23:14 UTC
Good catch!

This is indeed it. I added to the kickstart: 

part / --fstype="ext4" --grow --mkfsoptions="-O ^64bit"

and switched back to syslinux and it boots. ;)

Comment 9 Kevin Fenzi 2016-08-30 16:50:14 UTC
https://pagure.io/fedora-kickstarts/pull-request/56

Is the PR to switch back to extlinux and set that option...

Comment 10 Fedora End Of Life 2017-02-28 10:08:53 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 26 development cycle.
Changing version to '26'.

Comment 11 Fedora Update System 2017-11-26 02:29:47 UTC
syslinux-6.04-0.7.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2017-2e8e9bcb0f

Comment 12 Fedora Update System 2017-11-27 04:33:27 UTC
syslinux-6.04-0.7.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-2e8e9bcb0f

Comment 13 Fedora Update System 2017-11-28 07:23:56 UTC
syslinux-6.04-0.7.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-d9fb12e2aa

Comment 14 Fedora Update System 2017-12-10 05:05:53 UTC
syslinux-6.04-0.7.fc27 has been pushed to the Fedora 27 stable repository. If problems still persist, please make note of it in this bug report.

Comment 15 Fedora Update System 2017-12-14 09:21:28 UTC
syslinux-6.04-0.7.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.