Bug 1393846

Summary: no Fedora boot menu in Mac OS X dual boot install
Product: [Fedora] Fedora Reporter: Kamil Páral <kparal>
Component: python-blivetAssignee: Blivet Maintenance Team <blivet-maint-list>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 25CC: abdel.g.martinez.l, awilliam, blivet-maint-list, bugzilla, jan.public, jones.peter.busi, mcatanzaro+wrong-account-do-not-cc, mjg59, pbrobinson, pjones, pschindl, randy, robatino, sgallagh
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: AcceptedBlocker
Fixed In Version: python-blivet-2.1.6-4.fc25 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-18 08:24:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1277289    
Attachments:
Description Flags
anaconda.log
none
journal.log
none
program.log
none
storage.log
none
files present on sda1
none
files present on sda4 none

Description Kamil Páral 2016-11-10 12:50:11 UTC
Description of problem:
I tried dual boot install of OS X 10.7.2 (warning, quite old) and Fedora 25. The installation went fine until the point of installing bootloader, then anaconda complained that it didn't go well and asked me whether to continue in installation or quite. I chose to proceed. The final system still boots to OS X, and there's no menu item anywhere to boot into Fedora. The Fedora installation itself looks OK, if I look at it from the LiveCD.

From anaconda logs, the first error seemed to have occurred in mactel-boot-setup:

13:13:55,603 INFO program: Running... efibootmgr -c -w -L Fedora -d /dev/sda -p 2 -l \EFI\fedora\shim.efi
13:13:55,750 INFO program: BootCurrent: 0000
13:13:55,750 INFO program: Timeout: 5 seconds
13:13:55,750 INFO program: BootOrder: 0000,0080
13:13:55,750 INFO program: Boot0080*
13:13:55,750 INFO program: BootFFFF*
13:13:55,750 INFO program: Boot0000* Fedora
13:13:55,750 DEBUG program: Return code: 0
13:13:55,751 INFO program: Running... /usr/libexec/mactel-boot-setup
13:13:55,811 DEBUG program: Return code: 1
13:13:55,829 INFO program: Running... grub2-set-default 0
13:13:56,129 INFO program: /usr/bin/grub2-editenv: error: cannot rename the file /boot/grub2/grubenv.new to /boot/grub2/grubenv: No such file or directory.
13:13:56,130 INFO program: /usr/bin/grub2-editenv: error: cannot rename the file /boot/grub2/grubenv.new to /boot/grub2/grubenv: No such file or directory.
13:13:56,130 INFO program: /usr/bin/grub2-editenv: error: cannot rename the file /boot/grub2/grubenv.new to /boot/grub2/grubenv: No such file or directory.
13:13:56,131 DEBUG program: Return code: 0
13:13:56,131 INFO program: Running... grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg
13:13:57,654 INFO program: /usr/bin/grub2-editenv: error: cannot rename the file /boot/grub2/grubenv.new to /boot/grub2/grubenv: No such file or directory.
13:13:57,655 INFO program: /sbin/grub2-mkconfig: line 244: /boot/efi/EFI/fedora/grub.cfg.new: No such file or directory
13:13:57,656 DEBUG program: Return code: 1

And due to that, it seems some necessary files were not created/moved.


Version-Release number of selected component (if applicable):
anaconda-25.20.8-1.fc25.x86_64
efibootmgr-14-3.fc25.x86_64
efivar-libs-30-4.fc25.x86_64
grub2-2.02-0.34.fc24.x86_64
grub2-efi-2.02-0.34.fc24.x86_64
grub2-tools-2.02-0.34.fc24.x86_64
mactel-boot-0.9-13.fc24.x86_64

How reproducible:
tried once

Steps to Reproduce:
1. follow https://fedoraproject.org/wiki/QA:Testcase_dualboot_with_OSX

Comment 1 Kamil Páral 2016-11-10 12:51:48 UTC
Created attachment 1219380 [details]
anaconda.log

Comment 2 Kamil Páral 2016-11-10 12:51:56 UTC
Created attachment 1219381 [details]
journal.log

Comment 3 Kamil Páral 2016-11-10 12:51:58 UTC
Created attachment 1219382 [details]
program.log

Comment 4 Kamil Páral 2016-11-10 12:52:01 UTC
Created attachment 1219383 [details]
storage.log

Comment 5 Kamil Páral 2016-11-10 12:58:15 UTC
Here's the current disk layout. EFI files seem to be present on both sda1 (created by OS X) and sda4 (created by anaconda):

NAME            KNAME MAJ:MIN FSTYPE      LABEL        UUID                                   PARTTYPE                             PARTLABEL            PARTUUID                               SIZE
sda             sda     8:0                                                                                                                                                                  465.8G
├─sda4          sda4    8:4   ext4                     49e3c90e-1b3a-4683-b524-f5c1fe4799e5   0fc63daf-8483-4772-8e79-3d69d8477de4                      02929385-0ba4-4ef5-9cdc-346d57644266     1G
├─sda2          sda2    8:2   hfsplus     Macintosh HD 22d105d1-8ca7-3db7-9344-c049f6395d31   48465300-0000-11aa-aa11-00306543ecac Linux HFS+ ESP       00002421-6ff9-0000-8653-0000d3570000 201.6G
├─sda5          sda5    8:5   LVM2_member              oNYsn8-I6rz-UFKi-nugM-Kh93-gF7S-yGFS8B e6d6d379-f507-44c2-a23c-238f2a3df928                      c6adf901-a85f-4fc2-bdeb-d7403e9ab38f 262.4G
│ ├─fedora-home dm-1  253:1   ext4                     068c289e-bb9a-47f1-a39b-f642fae45069                                                                                                  208.6G
│ ├─fedora-root dm-2  253:2   ext4                     3b3727d1-f827-4171-b1bf-789a500caa2d                                                                                                     50G
│ └─fedora-swap dm-0  253:0   swap                     75189c2d-6785-45b1-aff1-1b6cb1348c4e                                                                                                    3.9G
├─sda3          sda3    8:3   hfsplus     Recovery HD  2184e460-197b-373b-8a3f-082bc91e7981   426f6f74-0000-11aa-aa11-00306543ecac Recovery HD          cb168b2b-fe91-4a39-a2d3-56ab52c63ad5 619.9M
└─sda1          sda1    8:1   vfat        EFI          2860-11F4                              c12a7328-f81f-11d2-ba4b-00a0c93ec93b EFI system partition 0000758e-50ea-0000-cd18-00006d180000   200M

Comment 6 Kamil Páral 2016-11-10 12:58:46 UTC
Created attachment 1219384 [details]
files present on sda1

Comment 7 Kamil Páral 2016-11-10 12:58:54 UTC
Created attachment 1219385 [details]
files present on sda4

Comment 8 Kamil Páral 2016-11-10 13:01:08 UTC
Proposing as a blocker:
"The installer must be able to install into free space alongside an existing OS X installation, install and configure a bootloader that will boot Fedora. "
https://fedoraproject.org/wiki/Fedora_25_Final_Release_Criteria#OS_X_dual_boot

It would be nice if somebody could try this with a more recent version of OS X.

Comment 9 Adam Williamson 2016-11-10 15:39:28 UTC
Chris, any chance you can try this?

Comment 10 Stephen Gallagher 2016-11-10 15:59:30 UTC
I'd be inclined to vote not a blocker as long as the OSX boot isn't destroyed. It's unfortunate, certainly. But it's not wrecking their existing system.

Comment 11 Adam Williamson 2016-11-10 16:04:12 UTC
I'm only willing to go with that if we either agree to remove the release criterion or get further testing that indicates this is system-specific.

Comment 12 Stephen Gallagher 2016-11-10 16:05:58 UTC
I don't really like that we have specific criterion for one set of hardware that is actively hostile to Linux, honestly.

Comment 13 Kamil Páral 2016-11-10 16:37:36 UTC
I forgot to mention it was a Mac Mini a tested with. Fairly old, as you can tell by the OS X version (bought during 2011-2012, most probably).

Comment 14 Adam Williamson 2016-11-10 17:12:37 UTC
https://lists.fedoraproject.org/pipermail/test/2014-August/122496.html was the proposal and discussion for the criterion.

Comment 15 Chris Murphy 2016-11-10 18:22:04 UTC
I can't test this, I'm traveling without a Mac. From the attached log:


13:12:57,814 INFO program: rsync: failed to set times on "/mnt/sysimage/boot/efi": Read-only file system (30)

For whatever reason, the file system is being mounted read-only. Also, there is no mkfs.hfsplus, which means the installer is using a pre-existing hfsplus volume. Can you do something like 'dmesg | grep hfs' and let's see if it's finding a journal? The kernel doesn't have hfsplus journal support, so by default it will mount hfsplus volumes that have a journal read only. For a very long time now, Mac OS only creates hfsplus volumes with journals. The installer mkfs command is supposed to create one without a journal, so that the default mount is read write.

I'd say it's not entirely clear it's a blocker because the requirement is that we're installing into free space, which could be interpreted as meaning reuse of an existing hfsplus ESP isn't supported - even though that's what the installer does by default.

"The installer must be able to install into free space alongside an existing OS X installation, install and configure a bootloader that will boot Fedora."

Comment 16 Chris Murphy 2016-11-10 18:29:38 UTC
(In reply to Stephen Gallagher from comment #12)
> I don't really like that we have specific criterion for one set of hardware
> that is actively hostile to Linux, honestly.

This logic doesn't work for me. Everything is actively hostile to everything else. Fedora is actively hostile to even its own installations, e.g. bug 825236. So if this is going to be a metric for wiping away the Mac criteria, fine, then wipe away the Windows criteria too; and continue to put our head in the sand when the Fedora installer, without asking or informing the user, obliterates the ability to boot previously installed Linux installations.

Comment 17 Adam Williamson 2016-11-10 18:36:15 UTC
So I *think* I know what the actual bug is here. It was introduced by this commit, and specifically relates to the highlighted line:

https://github.com/rhinstaller/blivet/commit/368a4db6141c7fdcb31ed45fe6be207ccc08ad30#diff-c0cef2bf2f989e2f94b5d1cca9c8115eL1112

that commit changed up how we do format type detection. In the previous code it was in one giant handleUdevDeviceFormat() function, whose conditions for marking a device as 'macefi' were:

        elif format_type == "hfsplus":
            if isinstance(device, PartitionDevice):
                macefi = formats.getFormat("macefi")
                if macefi.minSize <= device.size <= macefi.maxSize and \
                   device.partedPartition.name == macefi.name:
                    format_designator = "macefi"

that is, if it's got an HFS+ filesystem and it's bigger than 50MiB (the macefi format has no maxSize) and the partition's GPT 'name' is "Linux HFS+ ESP" (that's the MacEFIFS format's 'name'), then we decide it's a macefi device.

In the new code, we use the 'populator helpers' approach. Code is:

https://github.com/rhinstaller/blivet/blob/2.1-devel/blivet/populator/helpers/boot.py#L53-L55

MacEFIFormatPopulator inherits from BootFormatPopulator, and specifies a _type_specifier and _base_type_specifier for the parent classmethod match() , which is what actually decides if a format populator 'matches' and hence we decide the device is of that format. Note the match() logic:

        return (udev.device_get_format(data) == cls._base_type_specifier and
                isinstance(device, PartitionDevice) and
                (device.bootable or not cls._bootable) and
                fmt.min_size <= device.size <= fmt.max_size)

and note that the MacEFIFormatPopulator class does *not* set _bootable to True. So that means we've still got the 'is it HFS+' and 'is it in the valid size range' conditions, but that's all. Importantly, we've lost the condition about the GPT 'name' of the partition from the old code.

If I'm right, this means blivet will decide absolutely *any* HFS+ partition it sees that's larger than 50MiB is a 'macefi' partition, try to use it as /boot/efi for the install, and go badly wrong.

We're actually very lucky that HFS+ partitions with journals only mount read-only, otherwise we'd be copying files and stuff to people's OS X partitions when they try to install. We'd better hope to God no-one tries an install on a system with an HFS+ partition that happens to have journalling disabled.

Comment 18 Chris Murphy 2016-11-10 18:53:16 UTC
(In reply to Adam Williamson from comment #17)
> So I *think* I know what the actual bug is here. It was introduced by this
> commit, and specifically relates to the highlighted line:
> 
> https://github.com/rhinstaller/blivet/commit/
> 368a4db6141c7fdcb31ed45fe6be207ccc08ad30#diff-
> c0cef2bf2f989e2f94b5d1cca9c8115eL1112
> 
> that commit changed up how we do format type detection.

Why is something this significant changing between beta and final?

 
> We're actually very lucky that HFS+ partitions with journals only mount
> read-only, otherwise we'd be copying files and stuff to people's OS X
> partitions when they try to install. We'd better hope to God no-one tries an
> install on a system with an HFS+ partition that happens to have journalling
> disabled.

mactel-boot writes a bunch of identically named dummy files to trick the firmware into seeing the ESP as if it's actually a macOS installation. It'll overwrite things like the kernel for example. This seems bad but journaled HFS Plus is mandatory as a macOS boot volume; via the GUI there is no way to create a non-journaled HFS Plus volume for many years now, although it can be removed via CLI. I can't think of a reason why anyone would do this. So it'd be a rare case indeed. Nevertheless it would be a data loss bug, however hypothetical - but then we have a real macOS data loss bug that isn't hypothetical, and in that case upstream blames the user for it, so I'm just going to shrug at the concern in this bug.

Comment 19 Adam Williamson 2016-11-10 19:42:15 UTC
"Why is something this significant changing between beta and final?"

It wasn't. It changed between 24 and 25. No-one tested OS X dual boot at all until today, it's been broken since whenever blivet 2.x landed, I think.

Comment 20 Chris Murphy 2016-11-10 20:10:02 UTC
(In reply to Adam Williamson from comment #19)
> "Why is something this significant changing between beta and final?"
> 
> It wasn't. It changed between 24 and 25. No-one tested OS X dual boot at all
> until today, it's been broken since whenever blivet 2.x landed, I think.

I've done multiple installations of Fedora 25 along side OS X into free space and never ran into this bug, although I can't tell right now what compose I last tried because the evidence is 1000 miles away.

Comment 21 Adam Williamson 2016-11-10 21:37:31 UTC
Huhh. Well. That's interesting, and makes me doubt my assessment of the bug a bit. But it's really hard to tell without logs. When you do have time, could you please see if you can reproduce the bug with RC-1.2 and also see when's the last time it worked, so we can triage? And post some logs, and stuff? You know the drill...

Also, if that's the case, why didn't you file results in the matrix? I checked, and testcase_stats shows zero runs of the OSX dual boot test before RC-1.2.

Comment 22 Chris Murphy 2016-11-10 21:41:03 UTC
(In reply to Adam Williamson from comment #21)
> Huhh. Well. That's interesting, and makes me doubt my assessment of the bug
> a bit. But it's really hard to tell without logs. When you do have time,
> could you please see if you can reproduce the bug with RC-1.2 and also see
> when's the last time it worked, so we can triage? And post some logs, and
> stuff? You know the drill...

I probably won't have a Mac to test on for two or three weeks.

> Also, if that's the case, why didn't you file results in the matrix? I
> checked, and testcase_stats shows zero runs of the OSX dual boot test before
> RC-1.2.

Dunno. Maybe I was being lazy, or possibly I tested with a nightly that wasn't a current test.

Comment 23 Adam Williamson 2016-11-10 21:46:51 UTC
https://www.happyassassin.net/updates/1393846.0.img is my attempt to fix this, assuming I'm right about the problem. PR is https://github.com/rhinstaller/blivet/pull/523

Comment 24 Adam Williamson 2016-11-10 21:47:53 UTC
When you did your attempts, were you starting from a clean OS X install with no previous Fedora install alongside it? Or was there an existing Fedora install?

Comment 25 Adam Williamson 2016-11-10 23:59:46 UTC
Additionally, a couple of images to hopefully make testing this possible on non-Macs, if you fake up a partition layout:

https://www.happyassassin.net/updates/1393846.0.fakemac-nofix.img
https://www.happyassassin.net/updates/1393846.0.fakemac-fix.img

They both make blivet always decide the system is a Mactel (Intel Mac). The former just does that, otherwise it matches RC-1.2. The latter also includes my patch.

Comment 26 Adam Williamson 2016-11-11 00:59:01 UTC
OK, so yeah, I did my best to test this with a 'fake Mac' setup. In a UEFI VM, I completely wiped the hard disk, then did:

fdisk /dev/vda
g # new gpt label
n # new partition
1 # number
<ret> # default first sector
+200M # size
t # set type
1 # EFI system partition
n # new partition
2 # number
<ret> # default first sector
+2G # size
t # set type
2 # partition number (now there's more than one)
38 # Apple HFS/HFS+
w # write
q # quit

Then I did:

mkfs.vfat /dev/vda1
mkfs.hfsplus /dev/vda2

The idea being to fake up something like a macOS install: a regular EFI system partition, then a biggish HFS+ partition to represent the macOS partition.

I booted my fakemac-nofix.img and verified that it detected vda2 as a 'macefi' partition, and ran the install process and indeed it mounted it as /boot/efi and tried to write to it. In fact it succeeded - I guess because I created the filesystem with 'mkfs.hfsplus' it didn't have journalling, so it was mounted read-write.

Then I repeated the entire setup process, and booted my fakemac-fix.img image. Now it detects vda2 as 'hfsplus'. I ran the install process, and this time it left vda2 alone and created a new 200MB hfsplus vda3 and mounted *that* as /boot/efi and wrote to it.

After the install process completed, I tried booting the nofix image and checking the detection again: it detects both HFS+ partitions as macefi. Then I did the same with the fix image: it detects vda2 as hfsplus and vda3 as macefi, as expected.

So, that backs up both my theory and my patch, so far as I can manage.

Comment 27 Kamil Páral 2016-11-11 08:58:13 UTC
Changing component to the identified cause.

The original Mac I reported this from has this parted output:

Model: ATA TOSHIBA MK5065GS (scsi)
Disk /dev/sda: 500GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start   End    Size    File system  Name                  Flags
 1      20.5kB  210MB  210MB   fat32        EFI system partition  boot, esp
 2      210MB   217GB  216GB   hfs+         Linux HFS+ ESP
 3      217GB   217GB  650MB   hfs+         Recovery HD
 4      217GB   218GB  1074MB  ext4
 5      218GB   500GB  282GB                                      lvm

So the theories about "boot" flag placed on MacOS main partition were not correct, it seems.

Comment 28 Kamil Páral 2016-11-11 09:58:05 UTC
(In reply to Adam Williamson from comment #23)
> https://www.happyassassin.net/updates/1393846.0.img is my attempt to fix this, 

It fixed the problem on my Mac Mini. Installation now succeeds. The final layout is this:

Number  Start   End    Size    File system  Name                  Flags
 1      20.5kB  210MB  210MB   fat32        EFI system partition  boot, esp
 2      210MB   283GB  282GB   hfs+         Customer
 3      283GB   283GB  650MB   hfs+         Recovery HD
 4      283GB   283GB  210MB   hfs+         Linux HFS+ ESP
 5      283GB   285GB  1074MB  ext4
 6      285GB   500GB  216GB                                      lvm

I'm surprised it creates another ESP as sda4 with hfs+, when ESP already existed as sda1 with fat32. But it works. The only problem is that MacOS can't be booted from grub (errors printed), I have to use one-time boot menu to boot into MacOS.

Comment 29 Petr Schindler 2016-11-11 13:35:27 UTC
Discussed at mini blocker review during go/no-go meeting [1].

This bug was accepted as Final Blocker - by adamw's analysis of the cause, this seems like a clear violation of "The installer must be able to install into free space alongside an existing OS X installation, install and configure a bootloader that will boot Fedora." that will affect all dual-boot OS X installs

[1] https://meetbot.fedoraproject.org/fedora-meeting-2/2016-11-10/f25-final-gono-go-meeting.2016-11-10-17.00.html

Comment 30 Adam Williamson 2016-11-11 15:49:33 UTC
"So the theories about "boot" flag placed on MacOS main partition were not correct, it seems"

Yeah, I realized that in the middle of the discussion: it was just a mis-read on my part, I was looking in the wrong class :)

"I'm surprised it creates another ESP as sda4 with hfs+, when ESP already existed as sda1 with fat32."

This is actually intended: it's the trick we use to get the Apple firmware to show Fedora in the graphical boot menu. The fact that we label the partition as an 'ESP' is a bit misleading, but I think we have to do that for the Linux tools to work with it.

It's not really an ESP at all. The firmware doesn't boot from it. What it is, basically, is a fake macOS partition. As I understand it (thanks pjones), the Apple firmware has this wacky setup where it looks for HFS+ partitions whose filesystems include the expected directories for an ESP and have some key files in the places where macOS puts them, then it takes the relevant files from them and 'blesses' them into the actual ESP. If we just install our files directly into the *real* ESP, then you can boot Fedora somehow or other (press a special key on boot or something), but you won't see it in the nice graphical boot menu.

So we're basically being great politicians, lying out of both sides of our mouths: we lie to the Linux tools that the partition is an ESP (by marking it as such and mounting it at /boot/efi), and we lie to the firmware that it's a macOS partition (by formatting it as HFS+ and dumping files into all the special locations that it looks for). And by doing that, it works. Isn't technology great?

That's my outsider's understanding of it, btw, I might have it slightly garbled, but that's basically it. It's all as designed.

One thing that would be great if you could check - can you try installing again over the top of the successful install, again with my fix? When you do that, it should re-use the existing fake-ESP, not create another new one.

Comment 31 Kamil Páral 2016-11-14 13:59:37 UTC
(In reply to Adam Williamson from comment #30)
> One thing that would be great if you could check - can you try installing
> again over the top of the successful install, again with my fix? When you do
> that, it should re-use the existing fake-ESP, not create another new one.

Jan Sedlak tried it and it re-used that partition (sda4, from comment 28), and not created a new one.

Comment 32 Fedora Update System 2016-11-16 08:33:45 UTC
python-blivet-2.1.6-4.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2016-80f65e5670

Comment 33 Adam Williamson 2016-11-16 08:41:16 UTC
Can you please re-confirm this with RC-1.3? Thanks. (Most importantly from a state *without* Fedora already installed, just OS X, but also testing install over existing Fedora alongside OS X is handy).

Comment 34 Kamil Páral 2016-11-16 11:29:43 UTC
Installed F25 alongside a default Mac OS X installation, worked fine.

Comment 35 Abdel Gadiel Martínez Lassonde 2016-11-16 15:04:33 UTC
I installed F25 in a dual-boot schema along with macOS Sierra (MacBook Pro Retina, 15-inch, Mid 2014) and it worked fine.

Comment 36 Fedora Update System 2016-11-16 20:26:01 UTC
python-blivet-2.1.6-4.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-80f65e5670

Comment 37 Adam Williamson 2016-11-16 21:12:33 UTC
Setting back to verified. Can people also karma the update, if the fix is good and it still works fine in other ways? Thanks.

Comment 38 Fedora Update System 2016-11-18 08:24:01 UTC
python-blivet-2.1.6-4.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report.