Bug 1015931
Summary: | extlinux bootloader option doesn't install mbr bootloader code, system not bootable | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Chris Murphy <bugzilla> |
Component: | syslinux | Assignee: | Matthew Miller <mattdm> |
Status: | CLOSED EOL | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 20 | CC: | adam.chance10, amulhern, bugzilla, dennis, dlehman, dustymabe, g.kaviyarasu, jonathan, joshua, mattdm, mkolman, pjones, rosti.bsd, sbueno, vanmeeuwen+fedora |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2015-06-29 12:34:27 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Chris Murphy
2013-10-06 23:12:42 UTC
OK this is interesting. If it's a clean disk, it looks like parted is putting some code in the mbr bootloader region that enables extlinux to work on a clean disk, and the bug isn't triggered. But if the disk already has some other bootloader in LBA0, e.g. grub legacy or grub2, then the parted code isn't written, nor does anaconda overwrite it, and the system isn't bootable. The proper command on Fedora is: cat /usr/share/syslinux/mbr.bin > /dev/XXX (In reply to Chris Murphy from comment #0) Thank you for opening this bug report, inspired by my message #51 in bug 986431. I'm copying that message to here for convenience. ======== I just tried to reinstall Fedora 19 from the netinst ISO with that secret extlinux kernel parameter. I also chose in the Anaconda to install the latest updates. So if there was any syslinux/extlinux update it was installed. The installation process has finished without any error. syslinux and extlinux were installed and also /boot/extlinux directory with many files (including extlinux.conf) was created. These are the good news. The bad news are that the MBR code was not updated, so it remained with the old grub2 MBR code, from my previous Fedora 19 installation (where I installed grub2 on ext4 partition using --force). The grub2 MBR code doesn't work in this case and only starts the grub2 rescue prompt. Updating MBR code would be easy, I thought mistakenly, and booted from the Fedora ISO again into a rescue mode shell. While in the rescue shell I didn't find any way to update just the MBR code. I probably searched not too much, because I remember from my short Arch Linux experiense that it's possible to install syslinux and also update the MBR code. Now I see that they do it by following command: dd bs=440 count=1 conv=notrunc if=/usr/lib/syslinux/bios/mbr.bin of=/dev/sda In Fedora this command need to be changed to: dd bs=440 count=1 conv=notrunc if=/usr/share/syslinux/bios/mbr.bin of=/dev/sda But since I didn't find this in time I just moved a boot flag (by fdisk) from /dev/sda2 (Fedora) into /dev/sda1 (Windows), booted from WinXP install CD and ran fixmbr from its rescue shell. It made my computer bootable into Windows. Then I booted again from the Fedora netinst ISO into its rescue shell and moved the boot flag back from /dev/sda1 into /dev/sda2. From now on my computer boots Fedora 19 with a small Fedora emblem splash. And the boot process starts from the Windows standard version of the MBR code. Now why I wrote this long saga? Just to show following things: 1. extlinux itself (in the ext4 primary partition) works properly 2. Anaconda makes extlinux configuration that works (at least for booting Fedora itself) 3. MBR is not updated during Fedora installation with the extlinux kernel parameter 4. There already is an mbr.bin file, dd and fdisk programs that can be used to install the syslinux MBR code and set partition boot flag to the right primary partition 5. grub2 MBR code is a joke. It has a not standard behavior and can't co-exist with anything but grub2. 6. extlinux can co-exist with any MBR code that knows to load a boot sector of a primary partition: a) syslinux MBR code b) Windows MBR code c) FreeBSD boot0 MBR code By the way, FreeBSD boot0 MBR code would be the best choice for dual-boot machines. Because it will continue to be able to boot one OS while the other OS is deleted. For example if I deleted my current Fedora then its extlinux will not work and MBR code will not ask me to boot from other primary partition. Only boot0 will. Alt Linux even has the boot0 RPM. Anaconda already uses extlinux for ARM by default (commit 39dfa711759b3c53ceff7457cd1518adb51a268b). This behavior is defined in following bootloader_by_platform dictionary in the bootloader.py # every platform that wants a bootloader needs to be in this dict bootloader_by_platform = {platform.X86: GRUB2, platform.EFI: EFIGRUB, platform.MacEFI: MacEFIGRUB, platform.PPC: GRUB2, platform.IPSeriesPPC: IPSeriesGRUB2, platform.NewWorldPPC: MacYaboot, platform.S390: ZIPL, platform.ARM: EXTLINUX, platform.omapARM: EXTLINUX} So fixing this bug stright away may broke the ARM platform. Maybe you need to make separate EXTLINUX and armEXTLINUX options. I'm not familiar with the ARM architecture but it looks like it has no MBR. (In reply to Chris Murphy from comment #1) > OK this is interesting. If it's a clean disk, it looks like parted is > putting some code in the mbr bootloader region that enables extlinux to work > on a clean disk, and the bug isn't triggered. > > But if the disk already has some other bootloader in LBA0, e.g. grub legacy > or grub2, then the parted code isn't written, nor does anaconda overwrite > it, and the system isn't bootable. From my experience this is a standard behavior of any partition software. If MBR is empty it's created or needs to be created (before a new partition creation) from scratch, including it's code. If MBR already exists then the MBR code is untouched and only partition table is changed. The MBR existance can be detected by its two bytes signature at the end of the sector: 0x55, 0xAA. Anyway one partition need to be set as active (bootable). Otherwise extlinux will not start. parted probably sets the active flag to the first created partition automatically. But this is not mandatory. MBR can have no partition set as active and other partition software (I'm almost sure fdisk) allows it. Just consider secondary hard drive without any OS, used for storage only. It doesn't need the active flag to be set. fdisk inserts nothing in the first 440 bytes of LBA0. Follows first is the parted MBR code, then follows the syslinux mbr code. [root@localhost ~]# dd if=/dev/sda bs=440 count=1 2>/dev/null | hexdump -C 00000000 fa b8 00 10 8e d0 bc 00 b0 b8 00 00 8e d8 8e c0 |................| 00000010 fb be 00 7c bf 00 06 b9 00 02 f3 a4 ea 21 06 00 |...|.........!..| 00000020 00 be be 07 38 04 75 0b 83 c6 10 81 fe fe 07 75 |....8.u........u| 00000030 f3 eb 16 b4 02 b0 01 bb 00 7c b2 80 8a 74 01 8b |.........|...t..| 00000040 4c 02 cd 13 ea 00 7c 00 00 eb fe 00 00 00 00 00 |L.....|.........| 00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 000001b0 00 00 00 00 00 00 00 00 |........| 000001b8 [root@localhost ~]# hexdump -C /usr/share/syslinux/mbr.bin 00000000 33 c0 fa 8e d8 8e d0 bc 00 7c 89 e6 06 57 8e c0 |3........|...W..| 00000010 fb fc bf 00 06 b9 00 01 f3 a5 ea 1f 06 00 00 52 |...............R| 00000020 52 b4 41 bb aa 55 31 c9 30 f6 f9 cd 13 72 13 81 |R.A..U1.0....r..| 00000030 fb 55 aa 75 0d d1 e9 73 09 66 c7 06 8d 06 b4 42 |.U.u...s.f.....B| 00000040 eb 15 5a b4 08 cd 13 83 e1 3f 51 0f b6 c6 40 f7 |..Z......?Q...@.| 00000050 e1 52 50 66 31 c0 66 99 e8 66 00 e8 21 01 4d 69 |.RPf1.f..f..!.Mi| 00000060 73 73 69 6e 67 20 6f 70 65 72 61 74 69 6e 67 20 |ssing operating | 00000070 73 79 73 74 65 6d 2e 0d 0a 66 60 66 31 d2 bb 00 |system...f`f1...| 00000080 7c 66 52 66 50 06 53 6a 01 6a 10 89 e6 66 f7 36 ||fRfP.Sj.j...f.6| 00000090 f4 7b c0 e4 06 88 e1 88 c5 92 f6 36 f8 7b 88 c6 |.{.........6.{..| 000000a0 08 e1 41 b8 01 02 8a 16 fa 7b cd 13 8d 64 10 66 |..A......{...d.f| 000000b0 61 c3 e8 c4 ff be be 7d bf be 07 b9 20 00 f3 a5 |a......}.... ...| 000000c0 c3 66 60 89 e5 bb be 07 b9 04 00 31 c0 53 51 f6 |.f`........1.SQ.| 000000d0 07 80 74 03 40 89 de 83 c3 10 e2 f3 48 74 5b 79 |..t.@.......Ht[y| 000000e0 39 59 5b 8a 47 04 3c 0f 74 06 24 7f 3c 05 75 22 |9Y[.G.<.t.$.<.u"| 000000f0 66 8b 47 08 66 8b 56 14 66 01 d0 66 21 d2 75 03 |f.G.f.V.f..f!.u.| 00000100 66 89 c2 e8 ac ff 72 03 e8 b6 ff 66 8b 46 1c e8 |f.....r....f.F..| 00000110 a0 ff 83 c3 10 e2 cc 66 61 c3 e8 62 00 4d 75 6c |.......fa..b.Mul| 00000120 74 69 70 6c 65 20 61 63 74 69 76 65 20 70 61 72 |tiple active par| 00000130 74 69 74 69 6f 6e 73 2e 0d 0a 66 8b 44 08 66 03 |titions...f.D.f.| 00000140 46 1c 66 89 44 08 e8 30 ff 72 13 81 3e fe 7d 55 |F.f.D..0.r..>.}U| 00000150 aa 0f 85 06 ff bc fa 7b 5a 5f 07 fa ff e4 e8 1e |.......{Z_......| 00000160 00 4f 70 65 72 61 74 69 6e 67 20 73 79 73 74 65 |.Operating syste| 00000170 6d 20 6c 6f 61 64 20 65 72 72 6f 72 2e 0d 0a 5e |m load error...^| 00000180 ac b4 0e 8a 3e 62 04 b3 07 cd 10 3c 0a 75 f1 cd |....>b.....<.u..| 00000190 18 f4 eb fd 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000001a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000001b0 00 00 00 00 00 00 00 00 |........| 000001b8 (In reply to Chris Murphy from comment #5) > fdisk inserts nothing in the first 440 bytes of LBA0. > > Follows first is the parted MBR code, then follows the syslinux mbr code. > > [root@localhost ~]# dd if=/dev/sda bs=440 count=1 2>/dev/null | hexdump -C > 00000000 fa b8 00 10 8e d0 bc 00 b0 b8 00 00 8e d8 8e c0 > |................| > 00000010 fb be 00 7c bf 00 06 b9 00 02 f3 a4 ea 21 06 00 > |...|.........!..| > 00000020 00 be be 07 38 04 75 0b 83 c6 10 81 fe fe 07 75 > |....8.u........u| > 00000030 f3 eb 16 b4 02 b0 01 bb 00 7c b2 80 8a 74 01 8b > |.........|...t..| > 00000040 4c 02 cd 13 ea 00 7c 00 00 eb fe 00 00 00 00 00 > |L.....|.........| > 00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > |................| > * > 000001b0 00 00 00 00 00 00 00 00 |........| > 000001b8 According to your hexdump fdisk does insert some sensible but small code into beginning of the LBA0 (the MBR). This is the code after disassembling: 00000000: FA cli 00000001: B80010 mov ax,01000 00000004: 8ED0 mov ss,ax 00000006: BC00B0 mov sp,0B000 00000009: B80000 mov ax,00000 0000000C: 8ED8 mov ds,ax 0000000E: 8EC0 mov es,ax 00000010: FB sti 00000011: BE007C mov si,07C00 ; 0000h:7C00h 00000014: BF0006 mov di,00600 00000017: B90002 mov cx,00200 0000001A: F3A4 repe movsb 0000001C: EA21060000 jmp 00000:00621 00000021: BEBE07 mov si,007BE 00000024: 3804 cmp [si],al 00000026: 750B jne 000000033 00000028: 83C610 add si,010 0000002B: 81FEFE07 cmp si,007FE 0000002F: 75F3 jne 000000024 00000031: EB16 jmps 000000049 00000033: B402 mov ah,002 00000035: B001 mov al,001 00000037: BB007C mov bx,07C00 0000003A: B280 mov dl,080 0000003C: 8A7401 mov dh,[si][00001] 0000003F: 8B4C02 mov cx,[si][00002] 00000042: CD13 int 013 00000044: EA007C0000 jmp 00000:07C00 00000049: EBFE jmps 000000049 0000h:7C00h is the real mode address BIOS loads the MBR into. As you can see this code copies itself from 0000h:7C00h to 0000h:0600h and then jumps to the new copy right after the JMP instruction and continues execution. Then there is some loop between 0021 and 002F. And so on. 440 is the maximum code size. The actual code might be smaller. You're mistaken. I said ***Follows first is the parted MBR code*** I did not post the code fdisk puts into LBA0 because there is none. A new VDI starts with all zeros in LBA 0, and the first 440 bytes are still all zeros after fdisk has partitioned the disk. (In reply to Chris Murphy from comment #7) > A new VDI starts > with all zeros in LBA 0, and the first 440 bytes are still all zeros after > fdisk has partitioned the disk. This is strange. It means fdisk can make MBR without any boot code. Then if BIOS is trying to boot from such a disk it's stuck. (In reply to Chris Murphy from comment #7) Does your MBR without any boot code, created by Linux fdisk, has the two bytes signature at the end of the sector (0x55, 0xAA)? Yes of course otherwise the MBR is invalid. Arguably there is no perfectly good solution for this bug: - If we always step on the existing bootstrap code without informing the user, it may be wiping out their preferred stage1 bootloader. - If we never wipe it out, then there's a decent chance the system isn't bootable if the bootstrap code is grub legacy stage1 or grub2 boot.img. - If we ask the user, we necessarily increase the complexity of the UI/UX when offering extlinux, because 9 times out of 10 (or more) the user has insufficient information to answer the question. I think the best policy is to wipe it out in favor of syslinux/mbr.bin because it's the least worst option, that still gets them a bootable system. In case they're using something like Windows or BSD, the behavior ends up the same even though the code is different: jump to the start LBA for the partition with an active bit set. (In reply to Chris Murphy from comment #10) > Yes of course otherwise the MBR is invalid. Yes, you're right. I got the same result when tested fdisk with a file instead of a real disk. > Arguably there is no perfectly good solution for this bug: > > - If we always step on the existing bootstrap code without informing the > user, it may be wiping out their preferred stage1 bootloader. > - If we never wipe it out, then there's a decent chance the system isn't > bootable if the bootstrap code is grub legacy stage1 or grub2 boot.img. > - If we ask the user, we necessarily increase the complexity of the UI/UX > when offering extlinux, because 9 times out of 10 (or more) the user has > insufficient information to answer the question. > > I think the best policy is to wipe it out in favor of syslinux/mbr.bin > because it's the least worst option, that still gets them a bootable system. > In case they're using something like Windows or BSD, the behavior ends up > the same even though the code is different: jump to the start LBA for the > partition with an active bit set. I agree with this policy to always install syslinux/mbr.bin. But there is other issue. If Fedora is installed along with other OS, that other OS should still be bootable from the extlinux. This is how GRUB2 is installed, by the way. If there is also a Windows primary partition with the OS it adds it into the boot menu. The quick and simple solution for extlinux would be adding following configuration (in case /dev/sda1 is NTFS and had the active bit set before Fedora installation) LABEL Windows kernel chain.c32 append hd0 1 But if then user reinstall Fedora this configuration will be lost and not recreated, because /dev/sda1 will not have the active bit set. FreeBSD boot0 (instead of syslinux/mbr.bin) would be the best solution for this issue as well. But I feel it's not welcome in Fedora, is it? Just in case somebody wants to try the boot0 MBR code. This is the boot0 SRPM from Alt Linux: http://packages.altlinux.org/en/Sisyphus/srpms/boot0 Patches are welcome here. The secret parameter is focused on a use case which inherently means a "clean" disk. This is why when faced with "Arguably there is no perfectly good solution for this bug" as Chris said above, I decided on the leave-it-alone option. If someone has a better option which works and doesn't cause terrible side effects, I'll be happy to help you in working with the Anaconda team to get it in. Another option would be to document that if the secret option is used, and you have another bootloader in there, you should prepare the MBR first. (In reply to Matthew Miller from comment #13) I think this is a reasonable enough explanation to consider this NOTABUG. Further, for extlinux to be presented along with GRUB2 in the anaconda UI by default (i.e. without requiring the use of the hidden extlinux kernel parameter to reveal), necessitates first the creation and testing of a coherent advanced bootloader UI, following progressive disclosure UI best practices. GRUB2 remains the default, not requiring the user to use the proposed advanced bootloader UI; anyone wanting to use extlinux would use the advanced UI. And it could further reveal the two extlinux MBR behaviors: "leave-it-alone" vs "overwrite" and also work with docs teams so that there's some basic extlinux documentation to fill in the gaps where the UI simply can't hold it. This is probably better off as a different bug, as an RFE against Rawhide, that includes a detailed proposal and UI mockup. It'd also presumably need buyoff from the anaconda team. However, I think it's a lot easier to just document the existing extlinux hidden flag, and not bother with creating and maintaining an advanced bootloader options UI. cc'ing dlehman and bcl in case they have 2 cents to add to this. We're not going to add a bootloader UI selection. extlinux is going to stay a cmdline option, possibly with more documentation, but we're not going to confuse users by adding it to the UI. In comment 24 of bug 986431 David Lehman wrote: "I've added code to anaconda-20.6 that should issue a warning in the case of a too-small MBR gap when /boot is on a normal partition". I think extlinux and syslinux/mbr.bin should be installed automatically instead of GRUB2 in case of that warning. This would be the best compromise. The GRUB2 installation updates the MBR code as well. So installing syslinux/mbr.bin will not be something new, from the user experience point of view. In case of dual-boot machine the user will be able to do one of the following: 1. Add the other OS into /boot/extlinux/extlinux.conf manually 2. Move the active flag (by Fedora fdisk or a similar program) to that other OS partition and add Fedora into that OS boot loader (for example ntldr/boot.ini of Windows) I consider this as the minimum necessary first stage of the bug fix. The second stage of the bug fix would be adding other OS into the extlinux.conf by Anaconda automatically. In case of GRUB2 on dual-boot machine something adds the other OS option into the grub2.cfg configuration file. Is it done by Anaconda or by GRUB2 itself? In case of Anaconda that code might be used as a basis for the second stage of the bug fix. There will be no need to make any UI change. You also may and even would like to keep support of the "secret" extlinux kernel parameter. No comment to my proposal in the previous message in more than two months. Has any decision been taken? Is there any progress in resolving this bug? This probably should be bumped to Rawhide and summary changed to be more clear, e.g. extlinux bootloader option doesn't install mbr.bin if MBR already contains code resulting in unbootable system. Understand that anaconda calls other tools for most things, it uses pyparted which in turn uses libparted for partitioning, and extlinux --install for installing extlinux bootloader which doesn't ever write anything to LBA 0. I think this is what's happening: If LBA 0 is blank, parted puts on tiny bit of jump code that actually will enable an extlinux OS install to boot. But if LBA 0 contains code, parted won't overwrite it. Therefore stale boot strap code is in the MBR, and if it's GRUB code, the system won't boot from a successful extlinux OS install. So what's the right thing to do? I think this isn't exactly obvious because in effect there are two meanings for installing extlinux. One is extlinux --install which doesn't install mbr.bin. Another meaning is extlinux --install and also write mbr.bin. Which one does the UI imply? Stepping on the code in LBA 0? Or only doing what extlinux --install itself does? What do the syslinux/extlinux devs think of extlinux --install always writing mbr.bin to LBA 0, and why doesn't that command do that already? (In reply to Chris Murphy from comment #18) > So what's the right thing to do? I think this isn't exactly obvious because > in effect there are two meanings for installing extlinux. One is extlinux > --install which doesn't install mbr.bin. Another meaning is extlinux > --install and also write mbr.bin. Which one does the UI imply? Stepping on > the code in LBA 0? Or only doing what extlinux --install itself does? What > do the syslinux/extlinux devs think of extlinux --install always writing > mbr.bin to LBA 0, and why doesn't that command do that already? GRUB2 installation always rewrites the boot code in the MBR. So the extlinux installation should do the same. If 'extlinux --install' doesn't install mbr.bin, Anacodna need do it in addition to that command. This is what I proposed in comment 16. Or also what I proposed in the original bug description? This syslinux wiki describes using dd instead of cat to more reliably write this boot strap code to LBA 0: http://www.syslinux.org/wiki/index.php/Mbr It also describes the different mbr.bin files available, so I'd look that over and see which one is the most general purpose. And then check git anaconda/pyanaconda/bootloader.py for the extlinux install stuff, which starts at 2208, and see how to add the recommended dd command, and post a patch here. (In reply to Chris Murphy from comment #20) > Or also what I proposed in the original bug description? I do not claim to be the first and this is not important to me. What is important is making Fedora always bootable after the installation straight away. In comment 16 I mentioned a message of David Lehman. According to that message everything is ready to do it. Somebody just need to do it and it looks quite simple. There were too many discussion in too many bug reports of that issue. reassigning to syslinux This message is a reminder that Fedora 20 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 20. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '20'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 20 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. Fedora 20 changed to end-of-life (EOL) status on 2015-06-23. Fedora 20 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed. Still appears to affect Fedora 24 It probably needs to be taken upstream. And I can bet dollars to donuts they will not implement such an option by default because one of the main points of extlinux is that it does not step on the MBR boot code. So they might come up with an option for 'extlinux --install' like maybe --mbr, --gpt, --hybrid to copy the correct code to the MBR. parted does include some basic code that will jump to the VBR for the partition that has an active bit set in the MBR, functionally the same as /usr/share/syslinux/mbr.bin. But parted only includes that code in the MBR if LBA 0 is already zeros. If it's populated, it refuses to step on what's there. |