Description of problem: During a PXE installation, grub-install fails to install the bootloader (grub2-install --no-floppy /dev/sda): <snip> test -e /boot/grub2 + : + /sbin/grub2-probe -t fs /boot/grub2 /usr/share/grub/grub-mkconfig_lib: line 53: 17073 Segmentation fault "${grub_probe}" -t fs "$path" > /dev/null 2>&1 + return 1 </snip> An trace on the process shows thousands of calls like this: open("/dev/sda", O_RDONLY) = 3 fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 0), ...}) = 0 ioctl(3, BLKGETSIZE64, 1000204886016) = 0 ioctl(3, BLKSSZGET, 512) = 0 fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 0), ...}) = 0 close(3) = 0 open("/dev/sda", O_RDONLY|O_SYNC) = 3 lseek(3, 1000204877824, SEEK_SET) = 1000204877824 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512 lseek(3, 1000204885504, SEEK_SET) = 1000204885504 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512 lseek(3, 1000204884992, SEEK_SET) = 1000204884992 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512 close(3) = 0 open("/dev/sdb", O_RDONLY) = 3 fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 16), ...}) = 0 ioctl(3, BLKGETSIZE64, 1000204886016) = 0 ioctl(3, BLKSSZGET, 512) = 0 fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 16), ...}) = 0 close(3) = 0 open("/dev/sdb", O_RDONLY|O_SYNC) = 3 lseek(3, 1000204877824, SEEK_SET) = 1000204877824 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512 lseek(3, 1000204885504, SEEK_SET) = 1000204885504 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512 lseek(3, 1000204884992, SEEK_SET) = 1000204884992 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512 close(3) until the process fails with a segfault in the end: open("/dev/sda", O_RDONLY) = 3 fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 0), ...}) = 0 ioctl(3, BLKGETSIZE64, 1000204886016) = 0 ioctl(3, BLKSSZGET, 512) = 0 fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 0), ...}) = 0 close(3) --- {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x7fff3df3cfe8} (Segmentation fault) --- +++ killed by SIGSEGV +++ Segmentation fault The fs-layout looks like this: /dev/mapper/vg_system-root on / type ext4 (rw,relatime,seclabel,user_xattr,barrier=1,data=ordered) /dev/md0 on /boot type ext4 (rw,relatime,seclabel,user_xattr,barrier=1,data=ordered) /dev/mapper/vg_system-data on /data type ext4 (rw,relatime,seclabel,user_xattr,barrier=1,data=ordered) devtmpfs on /dev type devtmpfs (rw,nosuid,relatime,seclabel,size=24714840k,nr_inodes=6178710,mode=755) /dev/devpts on /dev/pts type devpts (rw,relatime,seclabel,gid=5,mode=620,ptmxmode=000) /dev/tmpfs on /dev/shm type tmpfs (rw,relatime,seclabel) /dev/proc on /proc type proc (rw,relatime) /dev/sysfs on /sys type sysfs (rw,relatime,seclabel) /dev/mapper/vg_system-var on /var type ext4 (rw,relatime,seclabel,user_xattr,barrier=1,data=ordered) The relevant part of the kickstart looks like this: clearpart --all --initlabel # Boot part raid.01 --size=250 --ondrive=sda --asprimary part raid.02 --size=250 --ondrive=sdb --asprimary # Rest for System part raid.03 --size=953618 --ondrive=sda part raid.04 --size=953618 --ondrive=sdb # Assembling raid /boot --device=md0 --level=1 raid.01 raid.02 raid pv.system --device=md1 --level=1 raid.03 raid.04 # LVM volgroup vg_system --pesize=32768 pv.system logvol swap --size=8192 --name=swap --vgname=vg_system logvol /var --size=5120 --name=var --vgname=vg_system logvol / --size=10240 --name=root --vgname=vg_system logvol /data --size=100 --grow --name=data --vgname=vg_system bootloader --location=mbr Version-Release number of selected component (if applicable): grub2-1.99-19.fc17.x86_64
Can you do some testing and see if this is related to the PXE booting or to the disk layout? Can you reproduce the problem by running grub2-probe manually? If so: can you try yum install valgrind debuginfo-install grub2 valgrind --log-file=probe.log /sbin/grub2-probe -t fs /boot/grub2 and attach probe.log
I am having a problem installing to a raid partition because grub2-probe is segfaulting so the boot loader doesn't get installed. I will attach a screenshot of what was on the terminal. This bug is probably related to bug 788830.
Created attachment 581451 [details] screenshot of terminal 4 of the installer
(In reply to comment #2) Samuel, you are seeing this when running from an installer media? In addition to the information requested in comment 1 we would like to see if the same crash happens when using grub2 2.0beta4 or if it has been fixed.
The version of grub2 on the installed system (I haven't rebooted yet) is grub2-2.0-0.24.beta4.fc17. I'm going to try valgrind as mentioned above. I can't reboot it because the installed system won't boot. Rescue mode can't even mount the system properly, but that's probably a different bug.
Using gdb it looks like it gets into an infinite loop doing a partition scan until it either runs out of memory or breaks the stack. I'll attach the valgrind log.
Created attachment 581476 [details] output from valgrind
Created attachment 581479 [details] screenshot of gdb showing loop
This bug is related to bug 750794. Having the md0 entry in /boot/grub2/device.map causes the loop. I commented out the line and I get a response from grub2-probe immediately with no segfault. Unless there's another bug for this issue, I suggest that it is a blocker.
Nice catch, Samuel. It _is_ possible that the special chroot setup used by the installer can make a difference - but I don't know if that is the case. It would be nice if you could get the system installed (by cleaning device.map up) and try to reproduce from a normal running system. The valgrind do not show anything special - that is nice to know. Can you describe what makes your system special? You have /boot on raid1?
Proposing as F17 final blocker due to violation of the following F17 beta release criterion [1]: The installer must be able to create and install to software, hardware or BIOS RAID-0, RAID-1 or RAID-5 partitions for anything except /boot [1] http://fedoraproject.org/wiki/Fedora_17_Beta_Release_Criteria
Samuel, how do the bad device.map look like? How is your disk layout?
I can't actually get it installed yet. The / is on raid1, there is no separate /boot. I replaced the grub2-probe with a shell script to give me some time to replace the device.map before it ran. That worked for that part, but then it gave the same error I was getting when running grub2-install by hand. "source_dir doesn't exist". Not sure where to go from here...
The device.map contains: # this device map was generated by anaconda (hd0) /dev/cciss/c0d0 (hd1) /dev/cciss/c0d1 (md127) /dev/md127 The disk configuration is that I have two drives. On both, I have created a 2MB bios boot partition and the rest is a software raid partition. (It is using GPT.) I combined both raid partitions into a raid1 device and set it as /.
Can you attach the installer logs (anaconda.log, program.log, storage.log etc.) from your latest attempt? You can get to them by switching to VT2 - the logs live in /tmp.
(In reply to comment #15) > (md127) /dev/md127 That looks like something that easily can cause a loop ... even though the stacktrace looks a bit like it is looping on md0 ... You can probably work around the problem by hacking grub2-install so the first thing it does is to remove device.map. Upstream hints that the crash should be solved by http://bzr.savannah.gnu.org/lh/grub/trunk/grub/revision/4251 http://bzr.savannah.gnu.org/lh/grub/trunk/grub/revision/4252
Sorry for the md confusion. The stack trace was from the initial install where it was md0. The device.map was from the upgrade attempt where it changed to md127. I will attach the logs.
Created attachment 581487 [details] anaconda.log
Created attachment 581488 [details] program.log
Created attachment 581489 [details] storage.log
(In reply to comment #17) > Upstream hints that the crash should be solved by > http://bzr.savannah.gnu.org/lh/grub/trunk/grub/revision/4251 > http://bzr.savannah.gnu.org/lh/grub/trunk/grub/revision/4252 Included in http://koji.fedoraproject.org/koji/taskinfo?taskID=4043138 - completely untested, but please give it a try ;-)
(In reply to comment #20) > Created attachment 581488 [details] > program.log 10:25:15,269 INFO program: Running... grub2-mkconfig -o /boot/grub2/grub.cfg ... 10:29:28,172 INFO program: Running... grub2-install --no-floppy /dev/cciss/c0d0 Anaconda, you should run grub2-install first, then grub2-mkconfig. grub2-mkconfig might look in /boot/grub2 to see what grub2-install placed there. 10:29:28,283 INFO program: source_dir doesn't exist. Please specify --target or --directory That is strange. Please hack grub2-install line 868 - it should be "$source_dir". It should report /usr/lib/grub/i386-pc which should exist. Which of them is wrong?
(In reply to comment #11) > Proposing as F17 final blocker due to violation of the following F17 beta > release criterion [1]: Setting the blocker flag. > The installer must be able to create and install to software, hardware or BIOS > RAID-0, RAID-1 or RAID-5 partitions for anything except /boot In this case /boot _is_ on raid - I guess that makes a difference but am not sure. Anyway: If /boot on raid is untested and thus 'unsupported' and not recommended then anaconda should prevent such configurations.
I just tried installing using a separate non-raid /boot partition and I get the same problem. I wonder if the criterion should be updated to remove the /boot exception since the bootloader supports software raid now and hardware raid should be transparent unless it needs a special driver.
> 10:29:28,283 INFO program: source_dir doesn't exist. Please specify --target or > --directory > > That is strange. Please hack grub2-install line 868 - it should be > "$source_dir". It should report /usr/lib/grub/i386-pc which should exist. Which > of them is wrong? Not sure what you were meaning here. The file only has that many lines. I did find the line with the message and it's correct. The problem is that /proc/device-tree exists (but is empty) on this system, so it chooses i386-ieee1275 instead of i386-pc. However, this is a different issue than the main one here. I'll see about filing it separately. I removed the line from device.map and specified the correct target to grub2-install and the system is now working.
Discussed at 2012-05-03 blocker review meeting. Accepted as a blocker per criterion "The installer must be able to create and install to any workable partition layout using any file system offered in a default installer configuration, LVM, software, hardware or BIOS RAID, or combination of the above" . pjones isn't sure it's not too system-specific to be a blocker, but there seem to be at least two reports. pjones would like to know if the bug is still reproducible with grub2 beta 4 (builds available in koji). -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
The spurious /proc/device-tree and its consequences are tracked on Bug 818378. (I read the comments here as if the problem was seen with grub 2.0 beta4 too, but it hasn't been said explicitly.) But no matter if grub can handle it or not: anaconda _do_ create invalid device.maps.
The problem comes from non-md device being named with md names. Fixes are in r4251 and r4252 upstream
I thought the problem was the md devices being put in the device.map file causing an infinite loop. And yes, this is with grub 2.0 beta4.
Samuel, can you confirm that the rpm from comment 22 works around your problem - even with the incorrect entries in device.map ?
(In reply to comment #31) > Samuel, can you confirm that the rpm from comment 22 works around your problem > - even with the incorrect entries in device.map ? I'm not Samuel, but I tried it, and this output appeared: $ sudo grub2-install /dev/sda /usr/sbin/grub2-probe: warning: the drive name md0 in device.map is incorrect. Using hostdisk//dev/md0 instead. Please use the form [hfc]d[0-9]* (E.g. `hd0' or `cd'). /usr/sbin/grub2-probe: warning: the drive name md0 in device.map is incorrect. Using hostdisk//dev/md0 instead. Please use the form [hfc]d[0-9]* (E.g. `hd0' or `cd'). /usr/sbin/grub2-probe: warning: the drive name md0 in device.map is incorrect. Using hostdisk//dev/md0 instead. Please use the form [hfc]d[0-9]* (E.g. `hd0' or `cd'). /usr/sbin/grub2-probe: warning: the drive name md0 in device.map is incorrect. Using hostdisk//dev/md0 instead. Please use the form [hfc]d[0-9]* (E.g. `hd0' or `cd'). /usr/sbin/grub2-probe: warning: the drive name md0 in device.map is incorrect. Using hostdisk//dev/md0 instead. Please use the form [hfc]d[0-9]* (E.g. `hd0' or `cd'). /usr/sbin/grub2-probe: warning: the drive name md0 in device.map is incorrect. Using hostdisk//dev/md0 instead. Please use the form [hfc]d[0-9]* (E.g. `hd0' or `cd'). /usr/sbin/grub2-probe: warning: the drive name md0 in device.map is incorrect. Using hostdisk//dev/md0 instead. Please use the form [hfc]d[0-9]* (E.g. `hd0' or `cd'). /usr/sbin/grub2-probe: warning: the drive name md0 in device.map is incorrect. Using hostdisk//dev/md0 instead. Please use the form [hfc]d[0-9]* (E.g. `hd0' or `cd'). /usr/sbin/grub2-bios-setup: warning: the drive name md0 in device.map is incorrect. Using hostdisk//dev/md0 instead. Please use the form [hfc]d[0-9]* (E.g. `hd0' or `cd'). Installation finished. No error reported. Now: about to reboot...
(In reply to comment #32) > (In reply to comment #31) > > Samuel, can you confirm that the rpm from comment 22 works around your problem > > - even with the incorrect entries in device.map ? > > I'm not Samuel, but I tried it, and this output appeared: > $ sudo grub2-install /dev/sda > /usr/sbin/grub2-probe: warning: the drive name md0 in device.map is incorrect. > Using hostdisk//dev/md0 instead. Please use the form [hfc]d[0-9]* (E.g. `hd0' > or `cd'). ... > /usr/sbin/grub2-bios-setup: warning: the drive name md0 in device.map is > incorrect. Using hostdisk//dev/md0 instead. Please use the form [hfc]d[0-9]* > (E.g. `hd0' or `cd'). > Installation finished. No error reported. > > Now: about to reboot... It didn't work. I saw the following: error: no such device: 4d82ff0d-eb........... Entering rescue mode... grub rescue> I am now running from a live F17 beta image. Would love to be able to fix this (so I can boot from my hard disk), but I have lots of other work to do. at the moment.
sudo (In reply to comment #33) > It didn't work. I saw the following: > error: no such device: 4d82ff0d-eb........... > Entering rescue mode... > grub rescue> > > I am now running from a live F17 beta image. > Would love to be able to fix this (so I can boot from my hard disk), > but I have lots of other work to do. > at the moment. Okay, after converting the RAID 1 /boot partition to two ext4 partitions, grub2-install works, with grub2-2.0-0.24.beta4.fc17.x86_64.rpm. I've booted F17 from my disk for the first time. Grub 1 was happy with RAID1 /boot; looks like a regression to me.
I have grub2 working with a RAID1 / with no separate /boot. Unfortunately, I don't have access to the machine I was testing it on right now and since I have it all setup now, I would really rather not re-install it again. I will try with a VM and see if it has the same issue.
Is md127 a fakeraid supported by BIOS? If so your install command should be: grub-install /dev/md127 If no, then it's contrary to what your device.map says. Namely (...) /dev/md127 Claims that md127 is accessible through BIOS as a single disk. Was this line added by you or by installer?
It's software raid. Anaconda created that device.map, see bug 750794 about that.
I can easily reproduce this in a VM. Using that scratch build, I get the same results as Nick. I end up in rescue mode, I will try to find out what's going wrong.
I have been trying to reproduce this but have only been able to do so using i686 media - using x86_64 media seems to work without issues (minimal install, / on lv on mdraid RAID1)
(In reply to comment #39) > I have been trying to reproduce this but have only been able to do so using > i686 media - using x86_64 media seems to work without issues (minimal install, > / on lv on mdraid RAID1) Bug 818378 "grub2 can't install because /proc/device-tree exists" was also i686 non-PAE (because of OLPC compile options). It seem unlikely that they are directly related, but it would perhaps be relevant to make sure kernel-3.3.4-5.fc17 is used for further testing. It would also help to be explicit about what "this" that is reproduced is. Do the system have a device.map? With bogus entries? Do grub2-probe from beta4 crash? Do grub2-probe and grub2-install from the scratch build with the backported patches work?
(In reply to comment #40) > (In reply to comment #39) > > I have been trying to reproduce this but have only been able to do so using > > i686 media - using x86_64 media seems to work without issues (minimal install, > > / on lv on mdraid RAID1) > > Bug 818378 "grub2 can't install because /proc/device-tree exists" was also i686 > non-PAE (because of OLPC compile options). It seem unlikely that they are > directly related, but it would perhaps be relevant to make sure > kernel-3.3.4-5.fc17 is used for further testing. Yeah, it looks like I got the two bugs confused. I re-tested using a custom boot.iso with a newer kernel made for 818378 and the problem went away.
anaconda-17.26-1.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/anaconda-17.26-1.fc17
With TC4, anaconda doesn't create the md entries in the device.map, the installer completes successfully, and the installed system boots. As for the grub2 change, I'm unsure. I got that error once, but I couldn't reproduce it again. Several more times grub installed successfully.
Package anaconda-17.26-1.fc17: * should fix your issue, * was pushed to the Fedora 17 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing anaconda-17.26-1.fc17' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-7623/anaconda-17.26-1.fc17 then log in and leave karma (feedback).
-- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers
anaconda-17.26-1.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report.