Bug 737508

Summary: warn if grub is not likely to fit in using the current disk layout (e.g. first partition starting on sector 63)
Product: [Fedora] Fedora Reporter: Andy Burns <fedora>
Component: anacondaAssignee: Brian Lane <bcl>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 17CC: anaconda-maint-list, atu, awilliam, bobgus, buhrt, davidben, dennis, derrien, fedora-bugs, fedora, g.kaviyarasu, iglesias, jonathan, kparal, Lcstyle, lkundrak, mads, martin, martin.wilck, mbreuer, milan.kerslager, mishu, mrsam, naveed, nicku, patrick.pichon, paul.lipps, pavel.lisy, pcfe, piskozub, pjones, redhat-bugzilla, robatino, silfreed, tadej.j, tomek, vanmeeuwen+fedora, wd, zeus
Target Milestone: ---Keywords: CommonBugs
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: https://fedoraproject.org/wiki/Common_F16_bugs#grub2-raid1-sector63 RejectedBlocker
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-12-20 10:10:57 EST Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Attachments:
Description Flags
tomasz.anacond.program.log
none
tomasz.anaconda.storage.log none

Description Andy Burns 2011-09-12 07:28:48 EDT
Description of problem:

Installing FC16 to a machine that also has Centos 5.x installed in a separate partition, shared /boot partition

Anaconda warned it would not be possible to upgrade as previous O/S was too old, this didn't concern me as I don't want to overwrite or upgrade Centos to Fedora, just to install in parallel

At the end of installation there was a message

"There was an error installing the bootloader.
the system may not be bootable."

Version-Release number of selected component (if applicable):

FC16 Alpha DVD 

How reproducible:

Happened on 1st attempt, not retried yet.
Comment 1 Andy Burns 2011-09-12 07:36:48 EDT
Checking on the ALT-F6 console I see the message

grub2-setup: warn: your core.img is unusually large, it won't fit in the embedding area..

grub2-setuo: error embedding is not possible, but this is required when the root device is on a RAID array or LVM volume.

My root device(s) are indeed within LVs on MD arrays, I suspect this may be related to my /boot being only 100MB rather than the larger size that I think Fedora defaults to nowadays.

I would expect anaconda to do a pre-flight check on disk space before upgrading to grub2 to avoid rendering existing systems unbootable.

I have not yet rebooted this system, so will try to free up some space within /boot and manually repair the grub2 installation, hoping to find some nice docs on this somewhere as I've never used grub2 before.
Comment 2 Andy Burns 2011-09-12 07:41:18 EDT
Changed component from Anaconda to Grub2, though I feel Anaconda should perform a sanity check first ...
Comment 3 Andy Burns 2011-09-12 16:27:04 EDT
Gave up trying to manually repair grub, emptied some spacce on /boot by removing old kernels/symbols/xen 43MB available.

Re-installed from DVD, same error, so it's not /boot filling up.

What *is* the embedding area?  Is it the space between the boot sector and the start of the cylinder containing the first partition?  Do I need to start scutinising my partion table and moving partions up to the next cylinder boundary with e.g. gparted?
Comment 4 Andy Burns 2011-09-12 17:21:42 EDT
Just checked my partitions ...

sda1 and sdb1 start at sector 63, end at sector 208844
these form 101MiB /boot on md0 using raid1

sda2 and sdb2 start at sector 208845, end at sector 488392064
these form one 232GiB PV on md1 using raid1
8GiB root and 2GiB swap partitions are LVs within the VG containing this PV.

sdc/d/e/f are USB card reader drives

sdg1/h1/i1/j1/k1/l1/m1/n1 form another PV on md2 using raid5
only data partitions on LVs within the VG containing this PV

So I think I have a normal sized embedding area, how can I see *why* I have a core.img that is too large to fit in this area?
Comment 5 Andy Burns 2011-09-12 17:34:31 EDT
OK, starting to get to grips with grub2 ...

/mnt/sysimage/boot/grub2/core.img is 32446 bytes long, so fractionally larger than 63 sectors, allowing 1 sector for the MBR, I can see this won't fit in the "slack" between the MBR and start of sda1 partition.

Is core.img the same for all machines, or does it differ by arch, or is it generated to be unique for my machine, does xen inflate the size of it, can I shrink it somehow?
Comment 6 Mads Kiilerich 2011-09-12 17:50:17 EDT
(In reply to comment #5)
> Is core.img the same for all machines, or does it differ by arch, or is it
> generated to be unique for my machine, does xen inflate the size of it, can I
> shrink it somehow?

It is created by /sbin/grub2-install before it runs grub2-setup. You can try to run with bash -x and see what it does and if anything can be left out.
Comment 7 Andy Burns 2011-09-13 06:26:32 EDT
OK, I archived off the old contents of my /boot to another device

Deleted the 101MB array and partition

Re-created a 100MB array and partition (I was going to start at sector 127, but fdisk defaulted to 2048 so I stuck with that)

I wasn't sure if grub2 would accept mdraid metatdata 1.x so I stuck with metatdata 0.9 (besides I might wish to go back to centos 5.x without grub2).

Copied back the archived contents of my /boot, re-installed F16alpha and got success :-)

I will check the core.img as kiilerix suggests to see why it was larger than expected.

Presumably there are *lots* of disks out there with only 63 "slack" sectors out there that Fedora16 with grub2 should be expected to install to?
Comment 8 Fedora Admin XMLRPC Client 2011-09-16 15:08:05 EDT
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.
Comment 9 David Lehman 2011-09-16 15:33:39 EDT
grub2 can use 1.x metadata FWIW.

If you're not using the /boot arrays because of their size, that means your stage2 (usually /boot) is on lvm-on-md, right? That means grub2 would need to include both lvm and md modules in core.img. I don't know if that's enough to bump it up over the available space or not.

As for anaconda doing a preflight check, that is impossible. Anaconda cannot know how much space grub2 will need. We could try to guess, but that's not reliable. We'll end up with some users complaining that we guessed too conservatively and others complaining that we guessed too optimistically. Not going to play that game.
Comment 10 Andy Burns 2011-09-16 15:48:16 EDT
I am just using /boot as ext3-on-md (root is ext4-on-lvm-on-md) the 100MB size is OK for an install (though I guess it won't be enough for an upgrade without downloading stage2.img after first reboot)

OK, I learned a bit about grub2 as I progressed with this problem, and I can see now that it's not anaconda's job to guess whether grub2 installation will succeed, but I suggest that yet other users will complain that their systems are left unbootable.

However as a more constructive suggestion, perhaps grub2 should have a --dry-run option that tells definitively the size of the core.img and whether or not that will fit, then anaconda could make use of it?
Comment 11 Andy Burns 2011-09-16 15:50:40 EDT
(In reply to comment #9)

> grub2 can use 1.x metadata FWIW.

I thought it could, but I wanted to leave my options open to revert to grub 0.99 and Centos5, thanks for the confirmation.
Comment 12 Tomasz Torcz 2011-10-06 07:28:11 EDT
Hi, I'm also hit this problem after upgrading to F16.

My /boot is MD-device, RAID1 over 4 partitions. LVM is used for rootfs, but not for /boot.
Comment 13 David Lehman 2011-10-06 09:43:36 EDT
(In reply to comment #12)
> Hi, I'm also hit this problem after upgrading to F16.
> 
> My /boot is MD-device, RAID1 over 4 partitions. LVM is used for rootfs, but not
> for /boot.

Tomasz, you also got the message about an unusually large core.img? Please attach /var/log/anaconda/anaconda.program.log and /var/log/anaconda/anaconda.storage.log to this bug report.
Comment 14 Tomasz Torcz 2011-10-06 09:58:01 EDT
Created attachment 526710 [details]
tomasz.anacond.program.log
Comment 15 Tomasz Torcz 2011-10-06 10:02:25 EDT
Created attachment 526711 [details]
tomasz.anaconda.storage.log

I did a f15→f16 upgrade using Beta DVD dd'ed into USB stick, finished with "Cannot install bootloader". After a reboot, I was greeted with grub1 menu and old kernel. I manually entered kernel and initrd lines and booted to F16 with 3.1 kernel.

Inside F16 I tried to reinstall grub2. I did grub2-mkconfig -o /boot/grub2/grub.cfg sucessfully. But next command failed:


# grub2-install /dev/sda
/sbin/grub2-setup: warn: Your core.img is unusually large.  It won't fit in the embedding area..
/sbin/grub2-setup: error: embedding is not possible, but this is required when the root device is on a RAID array or LVM volume.


This means that I have F16 system with grub1 configuration from F15. System cannot be booted without manual intervention at grub screen.

# findmnt /boot
TARGET SOURCE     FSTYPE OPTIONS
/boot  /dev/md126 ext4 ...

md126 : active raid1 sdb1[0] sda1[3] sdd1[2] sdc1[1]
      979840 blocks [4/4] [UUUU]
Comment 16 Andy Burns 2011-10-06 11:14:31 EDT
(In reply to comment #15)

> This means that I have F16 system with grub1 configuration from F15. System
> cannot be booted without manual intervention at grub screen.

Yep, same boat I was in.  You either need to shrink /boot partition and move its start "down" the disk rather than letting its end move "up" the disk (gparted perhaps) or copy/trash/recreate the /boot partition.

I thought my problem was due to partitions being created by a relatively old O/S with the start of first partition at sector 63, but if F15 also used to start at that sector instead of sector 2048 like F16 seems to, I predict this would/will bite quite a few upgrading users who use RAID.
Comment 17 Mads Kiilerich 2011-10-07 18:25:07 EDT
(In reply to comment #10)
> However as a more constructive suggestion, perhaps grub2 should have a
> --dry-run option that tells definitively the size of the core.img and whether
> or not that will fit, then anaconda could make use of it?

grub2 already has such an option ... kind of. grub2-install can be run with a temporary directory location and a dummy grub2-setup. That will let grub2-install do its magic choice of modules and let grub2-mkimage build a preview of core.img. (The path length might however also have some influence on the size of the img.)
Comment 18 Adam Williamson 2011-10-14 13:36:25 EDT
Discussed at 2011-10-14 blocker review meeting. This is a /boot-on-RAID issue so it hinges on the question of whether anaconda team considers this a 'workable' layout; we need dlehman's input on this one but he's on vacation. We agreed to punt this one to next week's meeting.

David, when you're back, can you give us your opinion on whether this should be a blocker?
Comment 19 Peter Jones 2011-10-21 12:43:58 EDT
Just so it's clear, I can't reproduce this with /boot-on-md-RAID when we're freshly creating the mdraid. This is only a problem when you're keeping an existing partition that's already on sector 63.  So the workaround is to remove the existing partition and create a new one that's properly aligned at 1MB.  Obviously if you're using this to dual-boot, you would need to do a backup beforehand and some sort of restore afterwards.

As for a --dry-run options, that would be nice, and should be pursued upstream.
Comment 20 Adam Williamson 2011-10-21 13:48:33 EDT
Discussed at 2011-10-21 blocker review meeting. Agreed that this is not a blocker issue: the criterion in question is "The installer must be able to create and install to any workable partition layout using any file system offered in a default installer configuration, LVM, software, hardware or BIOS RAID, or combination of the above", and this is not a 'workable partition layout': the space between the MBR and the /boot partition simply isn't big enough and we can't do anything about that.

So the bug is essentially now an RFE for the 'pre-flight check' now being discussed, which would be nice, but isn't a blocker issue. Additionally it's not clear that we would be able to do this in a non-intrusive enough way in the F16 time frame, unless upstream adds a simple --preflight-check parameter we can just call. So we aren't taking this as NTH right now, but we leave the possibility open: if someone's able to come up with a small, safe patch to implement a pre-flight check, the bug could be proposed as NTH at that point.
Comment 21 Wolfgang Denk 2011-11-10 16:27:35 EST
It should not be too difficult to check the partitions and test if there is sufficient space to install grub2.  I would much rather have the installation abort early than leaving me with an upgraded but non-booting system.

I am really disappointed to see that the release team was actually aware of this problem and decided to ignore it. This is - well, ignorant :-(
Comment 22 Nick Urbanik 2011-11-13 20:41:11 EST
Too much of my life has been used up by this bug.  Please can you add a prominent warning to make the large number of others know before their family/work time is sacrificed?
Comment 23 Mads Kiilerich 2011-11-14 12:34:42 EST
anaconda-maint-list has just been removed from CC.

In my opinion the limitation discussed here can't be fixed in grub2 and would have to be handled by anaconda in some way. The component should thus be changed back to anaconda (where it perhaps would be assigned to pjones anyway).
Comment 24 Adam Williamson 2011-11-15 17:06:58 EST
As far as I'm following things, it was assigned to grub2 because the correct way to "fix" this - which can't really be fixed - is to have the proposed 'preflight check' option in grub, and have anaconda just call that.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 25 Mads Kiilerich 2011-11-17 19:45:00 EST
*** Bug 753497 has been marked as a duplicate of this bug. ***
Comment 26 Adam Williamson 2011-12-02 16:39:48 EST

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 27 Michael Breuer 2011-12-03 22:44:17 EST
work-around for anyone stuck at error 16 (hit this today):
After a failed upgrade (stuck at error 16 after a distro-sync upgrade):

1. Booted an f15 rescue disk (didn't have f16 available... yeah... I keep saying I'll do that next time I upgrade.).
2. Backed up /boot to an alternate directory on the root partition
3. Commented out the raid line for /dev/md0 in /etc/mdadm.conf. My /boot was on /dev/md0; /dev/md0 was raid1 comprising /dev/sda1 and /dev/sdb1 (rest of the system is raid6 using sda2, sdb2 and sdc - sdf).
4. Commented out /boot is /etc/fstab.
5. Rebooted back into the same rescue disk (removed the raid drivers, etc).
6. Used fdisk to change /dev/sda1 to a standard linux partition.
7. mkfs.ext3 /dev/sda1
8. chroot /mnt/sysimage
9. mounted /dev/sda1 to /boot (and added to /etc/fstab)
10. restored the saved /boot directory from earlier
11. redid /sbin/grub2-install /sda ... success (failed originally)
12. then... more errors involving missing grubby template, screwed up grub.conf, etc...
13. after this fail... rebooted back to rescue...
14. removed /boot/grub
15. ran grub2-mkconfig -o /boot/grub2/grub.cfg
    Rebooted successfully into F16 (upgraded from F15)
Comment 28 Kevin R. Page 2011-12-18 06:22:56 EST
(In reply to comment #20)
> Discussed at 2011-10-21 blocker review meeting. Agreed that this is not a
> blocker issue: the criterion in question is "The installer must be able to
> create and install to any workable partition layout using any file system
> offered in a default installer configuration, LVM, software, hardware or BIOS
> RAID, or combination of the above", and this is not a 'workable partition
> layout': the space between the MBR and the /boot partition simply isn't big
> enough and we can't do anything about that.


Have just hit this bug upgrading (using the install DVD, since there were warnings of doom for RAIDed /boot using preupgrade).

I would just note that this machine has never run anything except fedora. i.e. the "incorrect" partitioning must have been set up by a Fedora installer...

(I can see why it's a tough problem -- we' all love a crystal ball -- but it does seem like there should be a little more care for problems cause by earlier Fedora installers.)
Comment 29 Wolfgang Denk 2011-12-18 07:51:24 EST
(In reply to comment #28)
>
> I would just note that this machine has never run anything except fedora. i.e.
> the "incorrect" partitioning must have been set up by a Fedora installer...

Same here.  It has not been so long ago that the default start sector
of the first partition has been changed from 63 to 2048 - and that was
for completely different reasons.

So it should have been obvious to everybody that there must be a
zillion of systems out there where we have only 62 sectors room before
the start of the first partition - and this has always been perfectly
sufficient before this release.

> (I can see why it's a tough problem -- we' all love a crystal ball -- but it
> does seem like there should be a little more care for problems cause by earlier
> Fedora installers.)

True.  And especially since

- this is a well known problem, that has been reportd (and even
  discussed) before;
- this problem is simply being ignored, with no warning in advance,
  which reliably results in non-booting systems (where the needed
  recovery procedure clearly exceeds the capabilities of the average
  user);
- there are other places in the installer where it appears to be no
  big problem to shrink or grow partitions to make room for the needed
  data - why not here, too?

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,     MD: Wolfgang Denk & Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
No man knows what true happiness is until he gets married.  By  then,
of course, its too late.
Comment 30 Mads Kiilerich 2011-12-19 10:59:08 EST
(In reply to comment #29)
> So it should have been obvious to everybody that there must be a
> zillion of systems out there where we have only 62 sectors room before
> the start of the first partition - and this has always been perfectly
> sufficient before this release.

Yes, but it is only a problem if /boot is non-trivial. Most Fedora installations have a simple ext4fs /boot.

> - this is a well known problem, that has been reportd (and even
>   discussed) before;
> - this problem is simply being ignored, with no warning in advance,
>   which reliably results in non-booting systems (where the needed
>   recovery procedure clearly exceeds the capabilities of the average
>   user);
> - there are other places in the installer where it appears to be no
>   big problem to shrink or grow partitions to make room for the needed
>   data - why not here, too?

I'm quite sure that anaconda won't try to move existing partitions with data around anywhere. That would be quite risky.

Unfortunately nobody stepped up with an offer to keep maintaining grub legacy and nobody offered a technical solution or workaround. Unfortunately the only solution to that is to contribute more testing and fixing before the release.
Comment 31 Sam Varshavchik 2011-12-19 12:39:55 EST
Grub blows past the first 63 sectors only if the mdraid module gets inserted into the bootloader image. This bug does not occur on the dominant majority of systems owned by Joe Sixpack, that do not employ mdraid.

Still -- and despite one completely wasted weekend recovering from total brickage, by me trying to fix a botched upgrade due to this bug -- I have to accept that this is one of those cases where there are no viable alternatives, and this is the only thing that makes sense is to deal with it. Progress is progress.

The only thing I would recommend is some better clarity to some of the error messages -- that would've been helpful. I have some dim recollection of some non-specific complaint about grub that Anaconda threw in my face before it started the doomed upgrade, that I foolishly ignored and paid the price for. It seemed harmless, but making it patently clear that grub cannot be installed, with the current state of things, period, would've been a welcome heads-up.
Comment 32 Jeff Buhrt 2011-12-21 15:23:47 EST
I figured out how to move md0's start sector to make room to install grub2, but grub2 won't show a menu. I have to manually boot from inside grub.
The system was just yum upgraded to F16, it has a mirrored /dev/md0 eith an ext3 /boot and md's with LVM for the rest of the system. I have used this type of configuration for maybe 6-8yr+ to handle the piles of drives that have failed during that time. The big upside is this is the 1st (and only) I have tried to upgrade from F15 -> F16, remote upgrades will be high risk at best...

1) To move the start of /boot (assuming md0 on partion 1 of the disks (sda1 and sdb1) for below, change my example as needed):
This assumes a mirrored /dev/md0.

#backup /boot
tar cvzf  ~/boot.tgz  --exclude '*lost+found*' /boot/

# make a note of where / is, you will need it until my #3 point is solved
df

# confirm the md and the partitions (PV's)
cat /proc/mdstat

# danger below here!!!! Do at your own risk, not mine.

# free the partitions
mdadm --stop /dev/md0 

fdisk /dev/sda
# delete and re-add sda1
(delete, 'n' to add, use the default 2048 and whatever end is available, 't' (type) to fd (RAID), 'a' (active)), 'w' write

# You will most likely get a re-read warning about needing to boot. DON'T REBOOT!
partprobe

# repeat 'fdisk /dev/sdb' and partprobe

# make a new (now smaller) md0
mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sda1 /dev/sdb1

# create a filesystem
mkfs.ext3 /dev/md0

# make sure mounted
mount /boot
df /boot

# restore /boot
tar -xvzf  ~/boot.tgz  --exclude '*lost+found*'
# (excluded to make sure the empty slots aren't lost)

(assuming it is empty for you too)
grub2-mkconfig -o /boot/grub2/grub.cfg

grub2-install /dev/sda
grub2-install /dev/sdb

2) If you fail to boot and just get a Grub> prompt...
You need to know the root mapper path. from step one above, mine would look like: /dev/mapper/SysVG-RootLV (where Sys is the system name (I'm an ex-IBMer if you wonder about the naming standard).
insmod gzio
insmod part_msdos
insmod ext2
linux /vmlinux-3.1.5-6.fc16.i686 root=/dev/mapper/SysVG-RootLV
initrd /initramfs-3.1.5-6.fc16.i686.img
boot

3) Problem as described in #2: at boot, grub 1.99 goes to a grub prompt vs presenting a menu.

(I also repeated the grub2-install first running 'rm /etc/grub2.cfg', 'ln -s /boot/grub2/grub.cfg /etc', it still doesn't help)

Ideas how to get past grub2 not showing a menu?
Comment 33 Mads Kiilerich 2011-12-21 19:55:45 EST
(In reply to comment #32)
> grub2-mkconfig -o /boot/grub2/grub.cfg
> 
> grub2-install /dev/sda
> grub2-install /dev/sdb
> 
> 2) If you fail to boot and just get a Grub> prompt...
> You need to know the root mapper path. from step one above, mine would look
> like: /dev/mapper/SysVG-RootLV (where Sys is the system name (I'm an ex-IBMer
> if you wonder about the naming standard).
> insmod gzio
> insmod part_msdos
> insmod ext2
> linux /vmlinux-3.1.5-6.fc16.i686 root=/dev/mapper/SysVG-RootLV
> initrd /initramfs-3.1.5-6.fc16.i686.img
> boot

It is weird that grub can load the modules and the kernel and the initrd, but at the same time can't read grub.cfg.

(Ok, the kernel modules might cheat a bit because grub2-install probably built them into core.img.)

Can you find and load grub.cfg with the 'configfile' command?

> 3) Problem as described in #2: at boot, grub 1.99 goes to a grub prompt vs
> presenting a menu.
> 
> (I also repeated the grub2-install first running 'rm /etc/grub2.cfg', 'ln -s
> /boot/grub2/grub.cfg /etc', it still doesn't help)

Jus to clarify:
1: grub2-install doesn't care about the configuration file at all. (grub2-mkconfig do however look at the output of grub2-install.)
2: Nothing looks for /etc/grub.cfg, and /etc/grub2.cfg is only used to find the active configuration file when a new kernel is installed.
Comment 34 Patrick Pichon 2011-12-25 12:01:50 EST
very annoying... In your case you made the assumption that /boot is starting on 63. However all of my installation are / starting on 63 and then I have /boot on the second partition.
Comment 35 Mike Iglesias 2012-01-03 17:01:15 EST
I just ran into this problem.  I used preupgrade to go from F14 -> F15 -> F16 (and yes, preupgrade was up to date on F14/F15, as noted in the common bugs page).  The laptop has Windows Vista on it along with Fedora so I can use either one.  It came pre-installed with Vista, and I shrank the Vista partition so I could put Fedora on it (this was a few years ago).

/dev/sda1 is a Dell utility partition, starting at block 63
/dev/sda2 is Windows Vista
/dev/sda6 is Linux / (I don't use a /boot partition)
/dev/sda7 is Linux swap
/dev/sda8 is Linux /home

The system is unbootable, at least for Fedora (I haven't tried Vista yet).  I got the grub1 boot menu, with the entries for upgrade-to-F16 and Vista, and nothing else.

I booted using the F16 netinstall disk in recovery mode, and tried to install grub2 by hand.  I got a message from grub2-install about core.img being unusually large and it would not install.

So where do I go from here?
Comment 36 Mads Kiilerich 2012-01-03 17:38:22 EST
(In reply to comment #35)
> So where do I go from here?

Your /boot is (apparently?) not md, so your issue is not the one discussed here.

Not having a /boot is not a tested setup and I don't know how supported that is. You might be on your own if you do things your own way.

Preupgrade should have run grub2-install with -f and installed with blocklist like grub legacy did. I don't know why it didn't do that in your case.

But core.img for a system with plain file systems and no md should fit within the 63 blocks, so there is something wrong there too.

You could try to file a new issue and attach the preupgrade logfiles and provide some more exact information, such as the file systems used and exactly which commands you ran and which error you got.
Comment 37 Ron Gonzalez 2012-01-04 00:13:15 EST
Just hit this same bug.

I do not have md devices, and my /boot starts on cyl 63.
My linux box is now dead.
Comment 38 Adam Williamson 2012-01-04 00:34:22 EST
We need more details to know if you're actually hitting the same bug, if you have no RAID devices. Please attach anaconda logs.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 39 Ron Gonzalez 2012-01-04 02:02:22 EST
I won't be able to attach new logs, suffice it to say that I had to reinstall from scratch.  I am unsure why there is such poor Q.C. on upgrades! Users should at the very least receive a warning if a known bug is going to brick their entire system.  Referring to Comment 30, I did have a simple /boot on /dev/sda1 which started at cylinder 63 (parted shows 32KB start.  This /boot partition was 500 mb [more than enough]).  I also had a /secure LUKS encrypted partition. 

I am not sure about mdraid, however I do have an iSCSI target which I mount.  This iScsi target has a Linux Volume Group stored on it, which I proceed to mount when the system is started.

I had major problems with F15 and that as well, since systemd screwed up everything and caused the target not to mount the vg properly, see: 708574 and 743740.

Unfortunately this time, when I tried to use the DVD to upgrade my install I hit bug #735730.  I had to at that point abandon all efforts and once again reinstall F16 from scratch. 

I am abandoning any future attempts of upgrading fedora using preupgrade.  
Each upgrade from version to version is a catastrophe waiting to happen.
Comment 40 Adam Williamson 2012-01-04 02:15:22 EST
"I am unsure why there is such poor Q.C. on upgrades!"

This is not a discussion forum, but the brief simple version of the answer is that the possible permutations which require testing are mind-boggling in number, and we have about a dozen people and about four weeks in which to perform _the entire set_ of release validation tests (not just upgrade install tests). It is entirely impractical for any project remotely on Fedora's scale to come anywhere close to guaranteeing upgrade functionality. Just isn't going to happen.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 41 Mads Kiilerich 2012-01-04 09:14:08 EST
I would like to add a comment too.
(In reply to comment #39)
> I won't be able to attach new logs, suffice it to say that I had to reinstall
> from scratch.

You didn't _have_ to reinstall. It was your own choice. It might have been a reasonable choice if you just wanted something that worked ... assuming it worked for you. But a consequence of that is that your report is of no use. There is no way we without any information can figure out why it didn't work for you when it works for many others.
Comment 42 derrien 2012-01-06 06:42:26 EST
With grub2-1.99-13 anaconda seems to create bigger core.img than grub2-1.99-12 so what worked with 1.99-12 doesn't work anymore with grub2-1.99-13
Comment 43 Mads Kiilerich 2012-01-06 07:03:31 EST
(In reply to comment #42)

That sounds like a small change that causes a major regression. I think it would be better to file a new bug for that, preferably leaving anaconda out of the equation and comparing the output of grub2-install for -12 and -13.
Comment 44 derrien 2012-01-06 07:20:53 EST
(In reply to comment #43)

with -12
[(Fedora 16) ~]$ ll /boot/grub2/core.img 
-rw-r--r--. 1 root root 29236 Dec 19 18:00 /boot/grub2/core.img

with -13
[(Fedora 16) ~]$ grub2-install /dev/sda
/sbin/grub2-setup: warn: Your embedding area is unusually small.  core.img won't fit in it..
/sbin/grub2-setup: warn: Embedding is not possible.  GRUB can only be installed in this setup by using blocklists.  However, blocklists are UNRELIABLE and their use is discouraged..
/sbin/grub2-setup: error: will not proceed with blocklists.
[(Fedora 16) ~]$ ll /boot/grub2/core.img 
-rw-r--r-- 1 root root 31636 Jan  6 13:15 /boot/grub2/core.img
Comment 45 Adam Williamson 2012-01-06 14:26:30 EST
well, crap. that was the kind of thing i was afraid the updated grub2 wouldn't do, but was afraid it would...

as mads says, can you please open a new bug? we'll have to identify which patch results in the larger core.img and back it out. thanks.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 46 Tobias Mueller 2012-01-10 12:51:35 EST
FWIW: I just hit this issue, too. I have a single disk (read: no RAID) but a LUKS partition along with a boot partition. My boot partition was starting at sector 63, as the others have reported.

After a good while fiddling around, I eventually fixed this issue by booting from a Ubuntu pendrive (installing lvm2) and starting gparted to make it shrink and move my boot partition.

I am surprised that I had to jump through so many hoops. The test, whether the grub2 image is too large or not, is relatively simple, I presume. But still the installer just bailed out during the upgrade with a pretty useless message ("installation of bootloader failed"). I would have expected that anaconda at least tells me, that the image is too large and that moving (and possibly) shrinking the first partition might help. A mere reference to this bug would have been appreciated as well. Bonus points if there is a way from anaconda to actually shrink and move the partition.
Comment 47 Adam Williamson 2012-01-12 09:44:38 EST
We didn't really catch this bug as a high-priority issue until after the freeze for F16, and prettying up the error scenario isn't a useful enough change to be worth the risk of messing with anaconda post-freeze, which is why we left it as-is. The test is done by *grub*, not anaconda; unless we added special code to anaconda to catch this particular failure and treat it specially, it's just like any other bootloader fail as far as anaconda knows, it treats them all generically. For F17 we would be able to treat it as a special error, if anaconda team decide it's worthwhile.
Comment 48 David Benjamin 2012-01-16 11:22:21 EST
(In reply to comment #45)
> well, crap. that was the kind of thing i was afraid the updated grub2 wouldn't
> do, but was afraid it would...
> 
> as mads says, can you please open a new bug? we'll have to identify which patch
> results in the larger core.img and back it out. thanks.

I'm not the original reporter, but I've filed bug 782144 for it as it's also bit me.
Comment 49 Russell Harrison 2012-02-21 21:33:55 EST
(In reply to comment #47)
> We didn't really catch this bug as a high-priority issue until after the freeze
> for F16, and prettying up the error scenario isn't a useful enough change to be
> worth the risk of messing with anaconda post-freeze, which is why we left it
> as-is. The test is done by *grub*, not anaconda; unless we added special code
> to anaconda to catch this particular failure and treat it specially, it's just
> like any other bootloader fail as far as anaconda knows, it treats them all
> generically. For F17 we would be able to treat it as a special error, if
> anaconda team decide it's worthwhile.

Adam that explanation doesn't hold true at all.  First describing the issue as "prettying up the error" is grossly understating the issue.  A problem that leave systems in a unusable state is most certainly a change that is more than worth changing anaconda after freeze.  To say I'm disappointed in the team is a bit of an understatement here.  This is an issue that should have pushed the F16 release back until it was solved.
Comment 50 Adam Williamson 2012-02-21 21:54:10 EST
Again: there is nothing we can do to really solve this. If the embedding space is too small, it's too small. The fix is to move your partitions around: that's hardly something we ever want the installer to try and do on its own in order to solve a problem like this. It would only screw things up.



-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 51 Ron Gonzalez 2012-02-21 22:01:51 EST
Perhaps a warning would be in order then before the original boot loader is overwritten.  I agree with #49 a release that leaves systems in an unusable state is unacceptable.
Comment 52 David Benjamin 2012-02-21 22:16:30 EST
The disk pressure would be greatly reduced if bug #782144 were fixed. Upstream's -Os got overwritten by RPM_OPT_FLAGS in grub2-1.99-13.fc16.
Comment 53 Russell Harrison 2012-02-21 22:20:35 EST
Simply recognizing that the embedding space may be too small and exiting the installer before making other changes with an error message would be more than acceptable.  While its still frustrating to the user it doesn't put them out of action completely.  I do get why anaconda shouldn't be shuffling partitions around, thinking about the corner cases that would have to be covered is just massive, attempting the install and only failing at the end leaving the system unusable when there is a easily tested condition likely to cause a failure isn't really good practice at all.
Comment 54 Alex Tunc 2012-03-18 21:48:17 EDT
i had same problem when upgraded to F16 very annoying
Comment 55 Milan Kerslager 2012-03-20 06:02:22 EDT
Bug #782144 solves the problem for me.
Comment 56 Kamil Páral 2012-04-15 15:56:19 EDT
I have hit this while testing
https://fedoraproject.org/wiki/QA:Testcase_dualboot_with_windows

NTFS partition usually starts at sector 63, I have shrunk it prior to installation. I placed my /boot partition inside LVM. Anaconda didn't warn me in advance, but failed to install grub (core.img had 33kB).

This has been rejected as F16 Blocker. Re-proposing for F17. The criterion is:
"The installer must be able to install into free space alongside an existing clean single-partition Windows installation and either install a bootloader which can boot into the Windows installation, or leave the Windows bootloader untouched and working"

Rationale for re-proposal:
1. We have a new "Windows dual-boot" criterion now.
2. You don't have to use RAID, placing your /boot inside LVM is enough (that was probably not known before?). Fortunately it is not the default (the default layout would most probably work), but might be common thing to change (I did it, and I was not experimenting, that was my private system).
3. After installation your PC is left in completely unbootable state.
4. It's easy to say "shift your partition to the right" in CommonBugs, but in Windows case this leaves Windows unbootable and you have to use rescue CD to fix it (another obstacle for inexperienced users).
5. This bug might be difficult to "fix" (core.img seems to shrink and grow periodically), but it can be mitigated - if anaconda detects your first partition starts at sector 63 (or e.g. <50kB) and you are using RAID or you have placed /boot inside LVM, it can warn you and explain the situation ("don't use RAID, create /boot as a standard partition or make sure you have 1MB of space in front of your first partition, or your system might not boot"). A simple warning would save me so much time and nerve (I'll never again re-install system shortly before a "let's watch movie" date:-)).

Re-assigning to anaconda because it seems there's not much that can done in grub itself.
Comment 57 Mads Kiilerich 2012-04-15 19:35:56 EDT
My 5 cents:

/boot on LVM/RAID/btrfs mostly works with grub2, but with some caveats. One of them is that grub then considers grubenv read-only (for safety reasons) (http://www.gnu.org/software/grub/manual/grub.html#Environment-block ), and grub2-reboot will thus change the default kernel permanently (if it works at all). For example pm suspend-to-disk and preupgrade and GRUB_SAVEDEFAULT might thus not work completely as expected.

/boot on anything but a bare-bone device might thus deserve a warning in general, and it might not be feasible to "support" such setups.
Comment 58 Adam Williamson 2012-04-20 14:01:49 EDT
Discussed at 2012-04-20 blocker review meeting - http://meetbot.fedoraproject.org/fedora-bugzappers/2012-04-20/fedora-bugzappers.2012-04-20-17.01.log.txt . Again rejected as a blocker - we considered Kamil's new reasoning, but we don't think it tips the balance enough. A _default_ install alongside Windows, with a separate /boot, would still be okay, and we still can't _fix_ any other case, only provide some kind of information.

Anaconda team, it would definitely be nice to improve the feedback in this case, though.
Comment 59 Kamil Páral 2012-04-23 12:20:58 EDT
(In reply to comment #57)
> grub2-reboot will thus change the default kernel permanently (if it works at
> all). For example pm suspend-to-disk and preupgrade and GRUB_SAVEDEFAULT might
> thus not work completely as expected.

I can confirm grub's 'savedefault' command doesn't work if /boot is on LVM. Value stored by grub2-reboot is therefore never erased. If it is a boot-once menu item, it doesn't really matter, because on subsequent boot grub ignores nonexistent default boot item and selects the first one instead. But you are right we might get into various trouble here, which I can't yet imagine. A warning in anaconda about such setups would be very proper in this case.
Comment 60 Bob Gustafson 2012-06-19 12:08:10 EDT
Congratulations. I was able to upgrade two systems from F16 to F17 with a minimum of problems.

Previously - going to a new Fedora system, I usually wound up wiping the disks and repartitioning during anaconda install. This time, I just clicked on Upgrade (thinking that I would have a chance to say no later on..). Anaconda just swept me along until I had a fully installed Upgrade over my existing partitions and raid/lvm configuration.

My systems both have raid /boot and raid / One system also has a raid swap, the other has swap inside the / LVM. I also had a USB storage stick on both systems. Anaconda seemed to sort out that wrinkle successfully. Congrats again.

One of the systems has an extra sliver of bios_grub storage, the other does not.

All of the partition oddities are a result of my previous F15, F16 installs.

System info is below:

Bigger system:
[root@hoho6 ~]# cat /proc/mdstat
Personalities : [raid1] 
md0 : active raid1 sda2[0] sdb2[1]
      511988 blocks super 1.0 [2/2] [UU]
      
md2 : active raid1 sda4[0] sdb4[1]
      478206840 blocks super 1.2 [2/2] [UU]
      bitmap: 3/4 pages [12KB], 65536KB chunk

md1 : active raid1 sda3[0] sdb3[1]
      9214904 blocks super 1.2 [2/2] [UU]
      
unused devices: <none>
[root@hoho6 ~]# 

[root@hoho6 ~]# df
Filesystem                    1K-blocks     Used Available Use% Mounted on
rootfs                        477668904 49947296 403812440  12% /
devtmpfs                        4078208        8   4078200   1% /dev
tmpfs                           4089132      140   4088992   1% /dev/shm
tmpfs                           4089132     1344   4087788   1% /run
/dev/mapper/vg_hoho6-LogVol00 477668904 49947296 403812440  12% /
tmpfs                           4089132        0   4089132   0% /sys/fs/cgroup
tmpfs                           4089132        0   4089132   0% /media
/dev/md0                         508733   109592    373542  23% /boot
/dev/sdd1                      15611920   277456  15334464   2% /run/media/user1/A32A-7758
[root@hoho6 ~]# 

[root@hoho6 ~]# parted
GNU Parted 3.0
Using /dev/sda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) p                                                                
Model: ATA ST3500418AS (scsi)
Disk /dev/sda: 500GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system  Name           Flags
 1      1049kB  2097kB  1049kB               bios           bios_grub
 2      2097kB  526MB   524MB   ext4         software RAID  boot
 3      526MB   9964MB  9437MB                              raid
 4      9964MB  500GB   490GB                               raid

(parted)q           

[root@hoho6 user1]# free -t -m
             total       used       free     shared    buffers     cached
Mem:          7986       3697       4288          0        327       1702
-/+ buffers/cache:       1668       6318
Swap:         8998          0       8998
Total:       16985       3697      13287
[root@hoho6 user1]# 

================================
Smaller system:
[root@hoho0 ~]# cat /proc/mdstat
Personalities : [raid1] 
md0 : active raid1 sdb1[0] sda1[1]
      511988 blocks super 1.0 [2/2] [UU]
      
md1 : active raid1 sdb2[0] sda2[1]
      487872380 blocks super 1.1 [2/2] [UU]
      bitmap: 0/4 pages [0KB], 65536KB chunk

unused devices: <none>
[root@hoho0 ~]# df
Filesystem                    1K-blocks     Used Available Use% Mounted on
rootfs                        478094888 12989132 441175288   3% /
devtmpfs                         997328        4    997324   1% /dev
tmpfs                           1008248      104   1008144   1% /dev/shm
tmpfs                           1008248     1172   1007076   1% /run
/dev/mapper/vg_hoho2-LogVol01 478094888 12989132 441175288   3% /
tmpfs                           1008248        0   1008248   0% /sys/fs/cgroup
tmpfs                           1008248        0   1008248   0% /media
/dev/md0                         508733   109298    373836  23% /boot
/dev/sdc1                       1973952  1949824     24128  99% /run/media/user1/TOSTRA2008
[root@hoho0 ~]# parted
GNU Parted 3.0
Using /dev/sda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) p                                                                
Model: ATA ST3500418AS (scsi)
Disk /dev/sda: 500GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags: 

Number  Start   End    Size   Type     File system  Flags
 1      1049kB  525MB  524MB  primary  ext4         boot, raid
 2      525MB   500GB  500GB  primary               raid

(parted)q
                                                                  
[root@hoho0 ~]# free -t -m
             total       used       free     shared    buffers     cached
Mem:          1969       1076        892          0         35        457
-/+ buffers/cache:        583       1385
Swap:         9023          0       9023
Total:       10993       1076       9916
[root@hoho0 ~]# 


I did not do any pre-upgrade action.


The only thing I had to do was do grub2-install on both raid disks sda and sdb. One of the systems rebooted successfully from anaconda before this step, the other did not and I had to use the install DVD to recover.

The partition list does not seem to label swap as swap ??

Any suggestions for post-install cleanup and/or sanity checks?
Comment 61 Martin Wilck 2012-10-14 08:17:27 EDT
I wonder if anything has been done to solve this issue in F18 (or upcoming RHEL7)? Just rejecting this as a blocker isn't enough. 

63 sectors/track has been the standard disk geometry for years. Being able to boot from LVM, MD-RAID, and non-standard file systems has been one of the major advocacy items for GRUB2. It's a legitimate user expectation that this works out-of-the-box on standard systems. If it doesn't, at the very least anaconda must detect that and quit before installing or updating.

It's no fun to see grub2 and anaconda people finger-point at each other. A problem as serious as this one would merit a serious joint solution effort.

On the GRUB2 side, it should be possible to decrease code size further in order to make sure "common" scenarios fit in 63 sectors. (In my case biosdisk+ext2+part_msdos+lvm add up to  33187 bytes, not a lot more than 32 sectors).

On the anaconda side, after the bootloader configuration is finished, anaconda could determine the required core.img size by running grub2 tools, e.g. by running "grub2-install --debug --grub-setup=/bin/false" and checking the size of the resulting core.img file. Alternatively, a table of core.img sizes for common combinations of disk/file system/partition table/device abstration modules (generated with grub2-mkimage) could be built into anaconda, so that setups with core.img size exceeding the available space could be excluded early during installation.
Comment 62 Adam Williamson 2012-10-15 19:09:59 EDT
if anything is done, it should be noted here.

I think the general anaconda thinking here is that some kind of test/notification would be nice but isn't actually a high priority because it doesn't achieve a whole lot.

let's consider a case where grub2 just won't fit. anaconda can't fix this. the user cannot fix it from within anaconda. so the only options we could give you are 'bail out' or 'continue regardless and you can try to fix up the bootloader later'. What we actually _do_ is the second, so all we are losing by not having a notification, fundamentally, is the opportunity to do the first.

the consequences of completing the fedora install in such a case are not catastrophic: the existing MBR contents won't be overwritten, the operation just fails. so you get a technically-complete Fedora install, and whatever you previously had in the MBR. this isn't a terrible situation so far as I can see. if you just can't get it to boot, you can just wipe the Fedora install and try something else. all you've really lost is however much time it took to do the Fedora install.

is there something I'm missing in this picture which would make a notification more use?
Comment 63 Wolfgang Denk 2012-10-16 02:57:55 EDT
(In reply to comment #62)
> 
> I think the general anaconda thinking here is that some kind of
> test/notification would be nice but isn't actually a high priority because
> it doesn't achieve a whole lot.

Please just look at the long list of people who try to make clear how
big a problem this is for them.  I cannot really understand why this
is still rated as low priority issue.

> let's consider a case where grub2 just won't fit. anaconda can't fix this.
> the user cannot fix it from within anaconda. so the only options we could
> give you are 'bail out' or 'continue regardless and you can try to fix up
> the bootloader later'. What we actually _do_ is the second, so all we are
> losing by not having a notification, fundamentally, is the opportunity to do
> the first.

We should always bail out with a clear error message _before_ messing
up a working configuration.

> the consequences of completing the fedora install in such a case are not
> catastrophic: the existing MBR contents won't be overwritten, the operation
> just fails. so you get a technically-complete Fedora install, and whatever
> you previously had in the MBR. this isn't a terrible situation so far as I
> can see. if you just can't get it to boot, you can just wipe the Fedora
> install and try something else. all you've really lost is however much time
> it took to do the Fedora install.

You completely ignore that this is not only a problem for installation
from scratch, where indeed you lose only a some time for the
installation (which is still a major pain if you run into this again
and again).

This same problem happens when you upgrade from a previous version of
Fedora, i. e. you have a perfectly working system before, and a broken
one after.  Proposing that people will "just wipe the Fedora install
and try something else" indicates that you either did not understand
the scope of the problem or that you ignore the amount of efforts this
problem causes to a lot of admins.
Comment 64 Martin Wilck 2012-10-16 08:14:44 EDT
(In reply to comment #62)

> let's consider a case where grub2 just won't fit. anaconda can't fix this.
> the user cannot fix it from within anaconda. so the only options we could
> give you are 'bail out' or 'continue regardless and you can try to fix up
> the bootloader later'. What we actually _do_ is the second, so all we are
> losing by not having a notification, fundamentally, is the opportunity to do
> the first.

You oversee the possibility to display an error message and send the user back to the partitioning/boot loader setup, giving her suggestions how to create a setup that will work (e.g. by using a different root/boot file system, creating a boot partition which isn't on LVM/MD, etc.).
Comment 65 Adam Williamson 2012-10-16 15:29:48 EDT
wolfgang: there's no need to be insulting. I specifically said that I was explaining how the issue looked to me in order for you to add anything I was missing. Adding what you wanted to add would be sufficient, there is no need to question my motivation or say I 'don't understand' or am 'ignoring' things. If I was ignoring things I wouldn't be replying.

I think the important use cases have been explained now, so I don't have anything further to add, it is up to the anaconda team. I'm only trying to help out.
Comment 66 Wolfgang Denk 2012-10-17 05:13:09 EDT
(In reply to comment #65)
> wolfgang: there's no need to be insulting. I specifically said that I was
> explaining how the issue looked to me in order for you to add anything I was
> missing. Adding what you wanted to add would be sufficient, there is no need
> to question my motivation or say I 'don't understand' or am 'ignoring'
> things. If I was ignoring things I wouldn't be replying.

Dear Adam, it has never been my intention to be insulting.  If you
read this out of my words, then I formally apologize.

But I have to admit that I still have problems to understand your
assessment that "the consequences of completing the fedora install in
such a case are not catastrophic" and your suggestion to "just wipe
the Fedora install and try something else" if you were really aware of
the scope of the problem, i. e. how many people were hit by this, and
how much time they lost - and this continues to happen as this issue
is still unfixed.

> I think the important use cases have been explained now, so I don't have
> anything further to add, it is up to the anaconda team. I'm only trying to
> help out.

Thanks.
Comment 67 Mads Kiilerich 2012-11-05 08:44:54 EST
Upstream comment on http://lists.gnu.org/archive/html/grub-devel/2012-11/msg00004.html 

Anaconda should probably just warn if there is less than 1 MB available, giving the user an option for taking a chance continuing.

The problem will go away with UEFI.
Comment 68 Martin Wilck 2012-11-05 09:17:04 EST
Less than *1M* !! That's 2048 sectors, or 32x larger than the usual "first cylinder" that has been sufficient for every boot loader before grub2.

Sincerely, that makes me think that grub2 has a *serious* design flaw.

And there'll be many non-uEFI systems around, for some years to come.
Comment 69 Mads Kiilerich 2012-11-05 10:14:57 EST
1M - that is 0.0001% of a typical modern hard drive. Do you really care?

If you do some research you will find a lot of good reasons to start the first partition at a higher sector than what made sense 20 years ago.

Personally I find it quite impressive that it is possible to bootstrap LVM, raids, zfs, btrfs and so on in less than 100k. Arguably it is a brain design flaw that people expect to be able to have a /boot that doesn't use antique technology ... but that is how things are evolving.

Anyway, no amount of complaints are going to make a positive difference. If you prefer grub 1 then dig it up where the experts left it and start maintaining it.
Comment 70 Adam Williamson 2012-11-05 13:14:30 EST
that comment is spectacularly unhelpful, let's face it. it adds nothing to our understanding of or approach to the issue. can we stop going around in circles here? it's not helping anything.
Comment 71 Ron Gonzalez 2012-11-05 15:48:16 EST
(In reply to comment #70)
> that comment is spectacularly unhelpful, let's face it. it adds nothing to
> our understanding of or approach to the issue. can we stop going around in
> circles here? it's not helping anything.

Do you know what the actual consensus is then on this particular issue?  There seems to be a lot of back and forth, but has an actual decision been made on a potential compromise or an agreeable course of action/fix?
Comment 72 Adam Williamson 2012-11-05 16:49:30 EST
anaconda team's position is that the best they can do for this is some sort of check/warning, they can't make it actually work, so it isn't a high priority issue. wolfgang made some reasonable arguments against this, anaconda team has not yet responded. I do not know what upstream's position is; probably 'we're already making it as small as we can'.

when i said 'that comment', btw, to be clear, i was referring to the comment from upstream that mads linked to. posting a warning any time the embedding space is smaller than 1MB would be completely unworkable.
Comment 73 David Lehman 2012-11-05 17:02:17 EST
I'm not opposed to giving a warning when the first partition starts before 1MiB and the rootfs (or is it /boot?) is on a device type other than partition.
Comment 74 David Lehman 2012-11-05 17:07:54 EST
(In reply to comment #57)
> My 5 cents:
> 
> /boot on LVM/RAID/btrfs mostly works with grub2, but with some caveats. One
> of them is that grub then considers grubenv read-only (for safety reasons)
> (http://www.gnu.org/software/grub/manual/grub.html#Environment-block ), and
> grub2-reboot will thus change the default kernel permanently (if it works at
> all). For example pm suspend-to-disk and preupgrade and GRUB_SAVEDEFAULT
> might thus not work completely as expected.
> 
> /boot on anything but a bare-bone device might thus deserve a warning in
> general, and it might not be feasible to "support" such setups.

Mads, to what extent is the above still true in f18?
Comment 75 Mads Kiilerich 2012-11-05 17:33:58 EST
(In reply to comment #73)
> I'm not opposed to giving a warning when the first partition starts before
> 1MiB and the rootfs (or is it /boot?)

It is /boot that matters - that is all grub uses to locate vmlinuz and initramfs which then are responsible for locating / . (grub.cfg might also reference stuff on / , but in that case it will use modules dynamically loaded from /boot/grub2 - they do not have to be in the first "stage".)

> is on a device type other than
> partition.

I guess that for instance /boot on btrfs requires more than 31k. A better heuristic might me that only plain ext2/3/4 fits in less than 1 MiB.

The sizes are quite stable, so it might be possible to do some testing and get useful numbers from looking at the size of /boot/grub2/i386-pc/core.img .

(In reply to comment #74)
> > /boot on LVM/RAID/btrfs mostly works with grub2, but with some caveats. One
> > of them is that grub then considers grubenv read-only (for safety reasons)
> > (http://www.gnu.org/software/grub/manual/grub.html#Environment-block ), and
> > grub2-reboot will thus change the default kernel permanently (if it works at
> > all). For example pm suspend-to-disk and preupgrade and GRUB_SAVEDEFAULT
> > might thus not work completely as expected.
> > 
> > /boot on anything but a bare-bone device might thus deserve a warning in
> > general, and it might not be feasible to "support" such setups.
> 
> Mads, to what extent is the above still true in f18?

Nothing changed.

I guess the only stable solution would be to implement support for storing grubenv in "the gab" or on ESP.

(grubenv do however not work anyway because of Bug 768106 'grubby does not support grub2 set default="${saved_entry}" and replaces with "0"'. And the grubenv concept is incompatible with the concept of grubby --default-kernel ... unless grubby starts wrapping grub2-tools instead of patching grub.cfg directly. It is a mess all the way down...)
Comment 76 Adam Williamson 2012-11-05 19:40:16 EST
do we even give 1MB embedding space *now*? how much space do we leave before the first partition in recent anaconda builds?
Comment 77 David Lehman 2012-11-05 20:55:37 EST
(In reply to comment #76)
> do we even give 1MB embedding space *now*? how much space do we leave before
> the first partition in recent anaconda builds?

If the kernel got hardware-specific alignment data we'll use that. If the kernel doesn't provide any alignment info, parted defaults to a 1MiB grain size. That means that the start sector of 63 gets rounded up to 2048, leaving 2047 sectors for grub2's fat arse.
Comment 78 David Lehman 2012-11-05 20:58:47 EST
Sorry -- 1985 sectors for grub2 to squeeze into.
Comment 79 David Lehman 2012-11-05 21:06:52 EST
(In reply to comment #75)
> (In reply to comment #73)
> > is on a device type other than
> > partition.
> 
> I guess that for instance /boot on btrfs requires more than 31k. A better
> heuristic might me that only plain ext2/3/4 fits in less than 1 MiB.

How about xfs? I actually meant to include btrfs in the "device type other than partition" since in practice it has all the complexity of lvm.
Comment 80 Wolfgang Denk 2012-11-06 02:18:49 EST
(In reply to comment #75)
> (In reply to comment #73)
> > I'm not opposed to giving a warning when the first partition starts before
> > 1MiB and the rootfs (or is it /boot?)
> 
> It is /boot that matters - that is all grub uses to locate vmlinuz and
> initramfs which then are responsible for locating / . (grub.cfg might also
> reference stuff on / , but in that case it will use modules dynamically
> loaded from /boot/grub2 - they do not have to be in the first "stage".)

Actually for this discussion /boot does not matter at all, nor any
other file system on the disk.

The only critical setting here is the start of the first partition on
the disk (whatever this may be used for).


It also appears to me that most people here completely forget that we
are not talking about how to set up a new system - this is easy. When
you partition a disk from scratch recent tools (fdisk, parted, ...)
will default to start sector 2048, leaving nearly 1 MiB free.


The critical situation is when we _update_ and existing, working
system.  In this case the update should either abort with a clear
error message, or complete such that the new system will boot again.

What I'm complaining about is this totally ignorant attitude (and
yes, I mean that, even if some people may be offended by such clear
words) of "come on, let's just try to install the new boot loader",
when we know exactly that for nearly _all_ older systems (with
partition 1 starting at sector 63) this will result in a broken box
that does not boot any more. But hey, what have we lost? ... "just
wipe the Fedora install and try something else"


If we cannot be sure that the update will work, then rather abort it.


If it has been decided that we need at least 1 MB before the first
partition on the boot disk, then just abort any attempt to update
systems which use a partitioning that does not meet such a
requirement.

Yes, this probably means that a large percentage of older systems
cannot be updated any more, but that would be a lot better than
bricking previously working systems without any hint of the risk
or a good way for recovery.
Comment 81 Mads Kiilerich 2012-11-06 05:53:10 EST
(In reply to comment #80)
> Actually for this discussion /boot does not matter at all, nor any
> other file system on the disk.

It does. The first "core.img" part of the boot loader must contain file system drivers for reading the file system on /boot.

> It also appears to me that most people here completely forget that we
> are not talking about how to set up a new system

FWIW, we are also talking about installing a new system on a disk that already is in use - for example by a recovery partition or windows.
Comment 82 Wolfgang Denk 2012-11-06 06:07:28 EST
(In reply to comment #81)
> (In reply to comment #80)
> > Actually for this discussion /boot does not matter at all, nor any
> > other file system on the disk.
> 
> It does. The first "core.img" part of the boot loader must contain file
> system drivers for reading the file system on /boot.

Agreed, but this is totally irrelevant to this bug.

> > It also appears to me that most people here completely forget that we
> > are not talking about how to set up a new system
> 
> FWIW, we are also talking about installing a new system on a disk that
> already is in use - for example by a recovery partition or windows.

If the "disk is already in use" this is not exactly what I call a new
system.  But that's nitpicking.


The question that is relevant to this bug is if there is any existing
partitioning on the disk that cannot be easily changed (like not
without losing data), and where the first used sector of any such
partition is.  If there is not enough free space before the first used
data to successfully install grub2, the installation should abort
with a clear error indications _before_ messing with the existing
boot setup.
Comment 83 Martin Wilck 2012-11-06 06:49:13 EST
Here is a proposal for anaconda:

If the /boot file system is not ext2/3/4, or is on LVM/MD/whatever, and the available embedding space is smaller than X, abort early on (after the partitioning stage this information is available).

From comment #67, X = 2047 sectors.

I personally would prefer to see X somewhat smaller, but if it's consensus that 1MiB is required for the general case, so what.
Comment 84 Ron Gonzalez 2012-11-06 07:42:23 EST
(In reply to comment #80)
> (In reply to comment #75)
> > (In reply to comment #73)
> > > I'm not opposed to giving a warning when the first partition starts before
> > > 1MiB and the rootfs (or is it /boot?)
> > 
> The critical situation is when we _update_ and existing, working
> system.  In this case the update should either abort with a clear
> error message, or complete such that the new system will boot again.

This is actually common sense.  An upgrade should be an upgrade that works: if the upgrade hits a fatal error (which this is) then it should fail and let you know why it failed, not act as if everything is OK and then brick the previously working system.
Comment 85 David Lehman 2012-11-06 10:35:42 EST
If it's a warning, some people will plow on through and then complain that it should have been an error when their box doesn't boot.

If it's an error, some people will complain that it would have worked because in their case grub2 only needed 1500 sectors instead of 1985.

Are there really cases other than 63 and 2048 as the start sector for the first partition?
Comment 86 Ron Gonzalez 2012-11-06 12:31:50 EST
(In reply to comment #85)
> If it's a warning, some people will plow on through and then complain that
> it should have been an error when their box doesn't boot.

These people would have no right to complain.  The product is working as designed, the point is to warn people that the system will not boot before making the changes.
 
> If it's an error, some people will complain that it would have worked
> because in their case grub2 only needed 1500 sectors instead of 1985.

This situation could be handled easily: simply give users the option to continue.  If you know what you're doing, then you know what you're doing.
Comment 87 Adam Williamson 2012-11-07 12:10:00 EST
dlehman: i think we've established as a principle that the answer to 'is there _really_ some idiot out there doing <completely non-standard thing>?' is always 'yes'.
Comment 88 Brian Lane 2012-11-08 13:51:33 EST
I've added a check that limits you to using extX on partitions if the embed space is smaller than 512K. Give it a try and see how it works for you.

http://bcl.fedorapeople.org/updates/737508.img

It is against current Anaconda master, so use the smoke16 images when testing:

https://dl.fedoraproject.org/pub/alt/qa/18/20121107_f17b-smoke16/
Comment 89 Martin Wilck 2012-11-09 04:31:11 EST
(In reply to comment #88)
> I've added a check that limits you to using extX on partitions if the embed
> space is smaller than 512K.

thanks, but is that really what's needed? This test will exclude many totally valid configurations, while possibly not catching some problematic ones (e.g. etx3 on LVM/MD/DM).
Comment 90 Brian Lane 2012-11-09 09:41:47 EST
As long as you have more than 512K of space you can do whatever you want.

The problem, as has been stated previously, is that we don't want to be in the business of adjusting the size every time grub2 changes. The goal here is to prevent users from trashing their system by doing an install that may not work and to let them know up-front instead of waiting until they have partitioned things.

The *right* way to fix this is to have grub2 tell us how much space it needs. I'll be exploring that for F19.
Comment 91 Milan Kerslager 2012-11-09 12:23:08 EST
Yes, this is great idea to make them cooperate (Grub2 <-> Anaconda).
But there is something broken in Grub2 development - Grub2 should fit to as little space for basic setup as previous version does (at least).
Comment 92 Fedora Update System 2012-11-09 19:56:16 EST
anaconda-18.28-1.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/FEDORA-2012-17823/anaconda-18.28-1.fc18
Comment 93 Fedora Update System 2012-11-10 14:38:48 EST
Package anaconda-18.28-1.fc18:
* should fix your issue,
* was pushed to the Fedora 18 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing anaconda-18.28-1.fc18'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-17823/anaconda-18.28-1.fc18
then log in and leave karma (feedback).
Comment 94 Mads Kiilerich 2012-11-10 15:58:21 EST
(In reply to comment #91)
> But there is something broken in Grub2 development - Grub2 should fit to as
> little space for basic setup as previous version does (at least).

I disagree. Software evolve. It receive new features (and bug fixes). That usually makes the software bigger (and slower). Hardware improvements will in general more than compensate for that and the end result will thus in general be computers that can do more faster and better.

If you don't want software that evolves then keep using the old versions, perhaps conservatively backporting bugfixes.

(In reply to comment #90)
> The *right* way to fix this is to have grub2 tell us how much space it
> needs. I'll be exploring that for F19.

That would require no longer using the grub2 functionality where it probes the current setup and creates a bootloader with just the right modules. Then it would be possible to check the available space up front, before actually partitioning, formatting and installing. grub2-efi already do something like that when it ships pre-built boot loaders with 'enough' drivers. IIRC they are > 500 KiB.
Comment 95 Martin Wilck 2012-11-12 12:13:26 EST
(In reply to comment #94)
> (In reply to comment #91)
> > But there is something broken in Grub2 development - Grub2 should fit to as
> > little space for basic setup as previous version does (at least).
> 
> I disagree. Software evolve. It receive new features (and bug fixes). That
> usually makes the software bigger (and slower). Hardware improvements will
> in general more than compensate for that and the end result will thus in
> general be computers that can do more faster and better.

A boot loader that needs to fit in the limited space of an existing partition table is a prime example for a case where this isn't true. I'm sure GRUB2 developers are aware of this constraint of their code's evolution.

> > The *right* way to fix this is to have grub2 tell us how much space it
> > needs. I'll be exploring that for F19.
> 
> That would require no longer using the grub2 functionality where it probes
> the current setup and creates a bootloader with just the right modules. 

Why that? This check would be made after the user set up partitions, LVs, and filesystems. All necessary data would be available. grub2-mkimage would need some sort of "installation mode" where it would gather the data by different methods than normally.
Comment 96 Fedora Update System 2012-12-20 10:11:03 EST
anaconda-18.28-1.fc18 has been pushed to the Fedora 18 stable repository.  If problems still persist, please make note of it in this bug report.