Description of problem: If your rootfs is btrfs (icantbelieveitsnotbtr install option), then grubby will fail to update grub.conf upon kernel updates with error: "grubby fatal error: unable to find a suitable template" Version-Release number of selected component (if applicable): grubby-7.0.8-1.fc12.i686 kernel-PAE-2.6.31.1-56.fc12.i686 Steps to Reproduce: 1. install the F12 beta into two VMs where one is the ext4 default and the other is btrfs (icantbelieveitsnotbtr); all else is equal. 2. run "grubby --default-kernel" as root (or attempt to update the kernel and note that grub.conf isn't updated) to see if it's working right. Actual results: ext4: success btrfs: fail Additional info: I've created this new bug to replace the old and now closed bug 124246
The problem is known, but noone fixed it in the old bug, they fixed another problem. btrfs creates a private block device with a fake label in the kernel and thus the output from stat on btrfs and the output from stat on the real block device are not the same. btrfs does this because it can have multiple block devices or even be a snapshot on a block device. We could just give up on this check in userspace, we could add a new btrfs ioctl or something, who knows, but the problem is that btrfs does not report the same block device as grubby expects. https://bugzilla.redhat.com/show_bug.cgi?id=124246#c32 I built grubby/mkinitrd on a F11 rawhide system and turned on the grubby debugging to try to see what's the matter. This system is using grubby from mkinitrd-6.0.81-1.fc11. In my case, I'm trying to upgrade kernel-PAE-2.6.29.1-70.fc11.i686, where my existing GRUB configuration entry is default=0 timeout=0 splashimage=(hd0,0)/grub/splash.xpm.gz hiddenmenu title Fedora (2.6.29.1-68.fc11.i686.PAE) root (hd0,0) kernel /vmlinuz-2.6.29.1-68.fc11.i686.PAE ro root=/dev/mapper/hitachi0-root rhgb quiet initrd /initrd-2.6.29.1-68.fc11.i686.PAE.img The failure appears to be in grubby.c:1087 (suitableImage)... The nash library extracts the correct root device (/dev/mapper/hitachi0-root), but the stat calls around line 1144 don't think that the root device corresponds to the currently-mounted root filesystem. dev = nashGetPathBySpec(_nash_context, dev); if (!dev) return 0; i = stat(dev, &sb); if (i) return 0; stat("/", &sb2); if (sb.st_rdev != sb2.st_dev) return 0; On my system, the last statement triggers, and suitableImage() returns with zero (i.e. not a suitable image) The stat calls show sb (/dev/mapper/hitachi0-root) and sb2 (/) as (gdb) p sb $28 = {st_dev = 15, __pad1 = 0, st_ino = 705, st_mode = 25008, st_nlink = 1, st_uid = 0, st_gid = 6, st_rdev = 64769, __pad2 = 0, st_size = 0, st_blksize = 4096, st_blocks = 0, st_atim = {tv_sec = 1239649664, tv_nsec = 304070388}, st_mtim = {tv_sec = 1239649664, tv_nsec = 171003997}, st_ctim = {tv_sec = 1239649664, tv_nsec = 171003997}, __unused4 = 0, __unused5 = 0} (gdb) p sb2 $27 = {st_dev = 17, __pad1 = 0, st_ino = 256, st_mode = 16877, st_nlink = 1, st_uid = 0, st_gid = 0, st_rdev = 0, __pad2 = 0, st_size = 168, st_blksize = 4096, st_blocks = 8, st_atim = {tv_sec = 1239704224, tv_nsec = 37719940}, st_mtim = {tv_sec = 1239649715, tv_nsec = 958735685}, st_ctim = {tv_sec = 1239649715, tv_nsec = 958735685}, __unused4 = 0, __unused5 = 0} The offending command-line is grubby --add-kernel=/boot/vmlinuz-2.6.29.1-70.fc11.i686.PAE --initrd /boot/initrd-2.6.29.1-70.fc11.i686.PAE.img --copy-default --make-default --title 'Fedora (2.6.29.1-70.fc11.i686.PAE)' '--args=root=/dev/mapper/hitachi0-root ' '--remove-kernel=TITLE=Fedora (2.6.29.1-70.fc11.i686.PAE)' https://bugzilla.redhat.com/show_bug.cgi?id=124246#c37 Did a little poking today, btrfs_getattr has a line like: static int btrfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) { struct inode *inode = dentry->d_inode; generic_fillattr(inode, stat); stat->dev = BTRFS_I(inode)->root->anon_super.s_dev; So btrfs (unlike every other fs in kernel) sets the dev themselves rather than using the dev from generic_fillattr. I'm guessing (but haven't verified) that this is the reason the ->dev found when poking the block device is different than the dev btrfs is reporting and thus the rejection... I just ask an upstream btrfs maintainer why they do this....
Created attachment 365599 [details] Proposed patch This is pretty lame, but I think it keeps the intent of the original idea. Instead of stat'ing / and the device to make sure they're the same, just open /etc/mtab and get the device that is mounted on / and compare it with the device that we got from root=. This seems to be the simplest way to take care of the problem. We could possibly get the the uuid from both devices and compare that, but it's still going to require reading /etc/mtab so it's just a little bit more code to add to this if we want to do the uuid comparison. Tested this on my box with a btrfs root and my box with a ext4 root on lvm.
Just a question, why /etc/mtab instead of /proc/mounts?
because /etc/mtab doesn't make me find the real /. If you look at /etc/mtab /dev/sda3 / btrfs rw,noatime 0 0 proc /proc proc rw 0 0 sysfs /sys sysfs rw 0 0 devpts /dev/pts devpts rw 0 0 /dev/sda1 /boot ext3 rw,noatime 0 0 tmpfs /dev/shm tmpfs rw,rootcontext="system_u:object_r:tmpfs_t:s0" 0 0 tmpfs /tmp tmpfs rw,rootcontext="system_u:object_r:tmp_t:s0" 0 0 debugfs /sys/kernel/debug debugfs rw 0 0 none /proc/sys/fs/binfmt_misc binfmt_misc rw 0 0 sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw 0 0 gvfs-fuse-daemon /home/josef/.gvfs fuse.gvfs-fuse-daemon rw,nosuid,nodev,user=josef 0 0 and then /proc/mounts rootfs / rootfs rw 0 0 /dev/root / btrfs rw,seclabel,noatime 0 0 /dev /dev tmpfs rw,seclabel,relatime,mode=755 0 0 /proc /proc proc rw,relatime 0 0 /sys /sys sysfs rw,relatime 0 0 none /selinux selinuxfs rw,relatime 0 0 /proc/bus/usb /proc/bus/usb usbfs rw,relatime 0 0 devpts /dev/pts devpts rw,seclabel,relatime,mode=600,ptmxmode=000 0 0 /dev/sda1 /boot ext3 rw,seclabel,noatime,errors=continue,data=writeback 0 0 tmpfs /dev/shm tmpfs rw,rootcontext=system_u:object_r:tmpfs_t:s0,seclabel,relatime 0 0 tmpfs /tmp tmpfs rw,rootcontext=system_u:object_r:tmp_t:s0,seclabel,relatime 0 0 debugfs /sys/kernel/debug debugfs rw,relatime 0 0 none /proc/sys/fs/binfmt_misc binfmt_misc rw,relatime 0 0 sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0 gvfs-fuse-daemon /home/josef/.gvfs fuse.gvfs-fuse-daemon rw,nosuid,nodev,relatime,user_id=500,group_id=500 0 0 /dev/root does not match /dev/sda3, and running blkid against /dev/root doesn't give you a uuid, so its completely useless. So unfortunately that means we have to use /etc/mtab. (I really wanted to use /proc/mounts but the uuid thing makes it completely impossible).
Scratch build of the patched grubby is http://koji.fedoraproject.org/koji/taskinfo?taskID=1765009
Created attachment 365887 [details] Proposed patch v2 Ok mcepl hit this nice mount bug where his /etc/mtab has /dev/dm-0 for his root, but root=/dev/mapper/whatever. So here's an updated patch that goes ahead and gets the uuid from both the device specified in the root= line and what we find in /etc/mtab. This is cleaner and will avoid getting screwed by whatever inconsistencies of device pathname we find between the actually resolved devpath and whatever shortcut gets dumped into /etc/mtab. Tested this on my box with lvm by manually changing /etc/mtab to say /dev/dm-0 and it worked fine.
Yes this patch (build of grubby is on http://koji.fedoraproject.org/koji/taskinfo?taskID=1765148) makes kernel installation working.
pjones, seems like we have a working tested patch, any chance we can get a real build?
This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle. Changing version to '12'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
pjones ping? can we get the patch in comment #6 committed? I assume there is no grubby upstream I should be poking here? Pretty please?
Will this also fix the similar issue for users with ext3 file systems as well? I've been dealing with this issue since FC 9 and bug #124426...and still see it in FC10. Just curious to know if I wilol still need to hand edit kernel updates in FC12 before I go thorugh the update process from 10 to 12.
Just adding my "me too" here, I ran into this with the kernel 2.6.31.9-174.fc12 update. The scratch builds from comments 5 and 7 seem to have been already garbage collected from koji :(
I'm seeing this as well. I just had to manually install a grub entry for kernel-2.6.31.12-174.2.3.fc12.x86_64
can we please get this included? it is impossible to properly get a grub entry with btrfs as root without this patch, and with btrfs becoming more stable and more widely used this is becoming more and more of an issue.
ping
Created attachment 389621 [details] proposed patch v3 Ok heres a new version of the patch, made against the git version of grubby. I've tested this on my box with btrfs root and it works fine. I've cleaned the patch up a bit, I use _PATH_MOUNTED instead of "/etc/mtab" and I've commented it out a bit.
Getting this constantly too (with btrfs). will try the patch.
Patch worked nicely for me (x64, single partition, btrfs)
To elaborate, I applied the source patch to 7.0.9.1 src rpm & rebuilt.
I see this, on F-12, but without btrfs. My /boot is ext3.
Ping?
I have a similar problem on an F-13 installation, with grubby 7.0.13-1.fc13: $ rpm -ivvh kernel-2.6.33.5-112.fc13.x86_64.rpm D: ============== kernel-2.6.33.5-112.fc13.x86_64.rpm [snip] D: install: %post(kernel-2.6.33.5-112.fc13.x86_64) scriptlet start D: install: %post(kernel-2.6.33.5-112.fc13.x86_64) execv(/bin/sh) pid 23805 ++ uname -i ++ uname -i + '[' x86_64 == x86_64 -o x86_64 == i386 ']' + '[' -f /etc/sysconfig/kernel ']' + /bin/sed -r -i -e 's/^DEFAULTKERNEL=(kernel-smp|kernel-xen)$/DEFAULTKERNEL=kernel/' /etc/sysconfig/kernel + /sbin/new-kernel-pkg --package kernel --install 2.6.33.5-112.fc13.x86_64 grubby recieved SIGSEGV! Backtrace (8): /sbin/grubby[0x40805f] /lib64/libc.so.6[0x3b4d832a40] /lib64/libc.so.6[0x3b4d87e6c6] /sbin/grubby[0x40695e] /sbin/grubby[0x406ae3] /sbin/grubby[0x407e58] /lib64/libc.so.6(__libc_start_main+0xfd)[0x3b4d81ec5d] /sbin/grubby[0x401709] (gdb) (gdb) run --add-kernel=/boot/vmlinuz-2.6.33.5-112.fc13.x86_64 --copy-default --make-default --title "Fedora (2.6.33.5-112.fc13.x86_64)" --args=root=LABEL=/ --remove-kernel=TITLE="Fedora (2.6.33.5-112.fc13.x86_64)" Starting program: /sbin/grubby --add-kernel=/boot/vmlinuz-2.6.33.5-112.fc13.x86_64 --copy-default --make-default --title "Fedora (2.6.33.5-112.fc13.x86_64)" --args=root=LABEL=/ --remove-kernel=TITLE="Fedora (2.6.33.5-112.fc13.x86_64)" Program received signal SIGSEGV, Segmentation fault. __strcmp_sse2 () at ../sysdeps/x86_64/strcmp.S:106 106 movlpd (%rdi), %xmm1 (gdb) (gdb) thread apply all bt Thread 1 (process 11203): #0 __strcmp_sse2 () at ../sysdeps/x86_64/strcmp.S:106 #1 0x000000000040695e in suitableImage (entry=<value optimized out>, bootPrefix=<value optimized out>, skipRemoved=<value optimized out>, flags=<value optimized out>) at grubby.c:1297 #2 0x0000000000406ae3 in findTemplate (cfg=0x60c9a0, prefix=0x60c680 "/boot", indexPtr=<value optimized out>, skipRemoved=0, flags=0) at grubby.c:1447 #3 0x0000000000407e58 in main (argc=0, argv=0x7fffffffe0a0) at grubby.c:3182 Installed grubby & kernel versions are: grubby.x86_64 7.0.13-1.fc13 kernel.x86_64 2.6.33.4-95.fc13 $ cat grub.conf # grub.conf generated by anaconda # # Note that you do not have to rerun grub after making changes to this file # NOTICE: You have a /boot partition. This means that # all kernel and initrd paths are relative to /boot/, eg. # root (hd0,0) # kernel /vmlinuz-version ro root=/dev/sda5 # initrd /initrd-version.img #boot=/dev/sda default=0 timeout=5 splashimage=(hd0,0)/grub/splash.xpm.gz #hiddenmenu title Fedora (2.6.33.4-95.fc13.x86_64) root (hd0,0) kernel /vmlinuz-2.6.33.4-95.fc13.x86_64 ro root=LABEL=/ rhgb quiet vga=792 SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=us initrd /initramfs-2.6.33.4-95.fc13.x86_64.img title Memtest86+ (2.10) root (hd0,0) kernel /memtest86+-2.10 ro root=LABEL=/ rhgb quiet $ cat fstab LABEL=/ / ext4 defaults,noatime 1 1 LABEL=/boot /boot ext2 defaults,noatime 1 2 devpts /dev/pts devpts gid=5,mode=620 0 0 tmpfs /dev/shm tmpfs defaults 0 0 LABEL=/export /export ext4 defaults,noatime 1 2 LABEL=/home /home ext4 defaults,noatime 1 2 proc /proc proc defaults 0 0 LABEL=/usr /usr ext4 defaults,noatime 1 2 sysfs /sys sysfs defaults 0 0 tmpfs /tmp tmpfs defaults 0 0 LABEL=/var /var ext4 defaults,noatime 1 2 $ cat /etc/mtab /dev/sda2 / ext4 rw,noatime 0 0 proc /proc proc rw 0 0 sysfs /sys sysfs rw 0 0 devpts /dev/pts devpts rw,gid=5,mode=620 0 0 tmpfs /dev/shm tmpfs rw,rootcontext="system_u:object_r:tmpfs_t:s0" 0 0 /dev/sda1 /boot ext2 rw,noatime 0 0 /dev/sdb2 /export ext4 rw,noatime 0 0 /dev/sdc1 /home ext4 rw,noatime 0 0 /dev/sda3 /usr ext4 rw,noatime 0 0 tmpfs /tmp tmpfs rw,rootcontext="system_u:object_r:tmp_t:s0" 0 0 /dev/sdb1 /var ext4 rw,noatime 0 0 $ blkid /dev/sdb1: LABEL="/var" UUID="7a8ccdd0-c4bf-41cc-9685-dca13e644014" TYPE="ext4" /dev/sda1: LABEL="/boot" UUID="fe727f32-7472-4bff-84a2-541730c26e8b" TYPE="ext2" /dev/sdb2: LABEL="/export" UUID="8f053695-570a-4e0d-b90c-bdf8955eb59b" TYPE="ext4" /dev/root: LABEL="/" UUID="579111cb-5d14-4ff8-9dc4-c6e62b8d7552" TYPE="ext4" /dev/sdd1: UUID="aa27068e-4863-4b1e-a3a8-3f40b79dba2b" TYPE="btrfs" UUID_SUB="78c7678f-23a8-4a98-b910-f3fd0b45b06c" /dev/sda3: LABEL="/usr" UUID="178bd9b8-337e-4d58-bbda-5fd5825e2a13" TYPE="ext4" /dev/sdc1: LABEL="/home" UUID="36ca57a0-6c93-412e-a497-4eab3269c0b0" TYPE="ext4" My root '/' partition is ext4, '/boot' is ext2. I also have an unmounted (and unlabelled ?) btrfs partition on /dev/sdd1 (as show in the output of blkid above). If I delete that unmounted btrfs partition, the rpm installation completes without problems and grub.conf is updated correctly.
Having deleted and re-created that unlabeled btrfs partition, I can no longer reproduce. blkid output now shows : [root@localhost grub]# blkid /dev/sdb1: LABEL="/var" UUID="7a8ccdd0-c4bf-41cc-9685-dca13e644014" TYPE="ext4" /dev/sda1: LABEL="/boot" UUID="fe727f32-7472-4bff-84a2-541730c26e8b" TYPE="ext2" /dev/sdb2: LABEL="/export" UUID="8f053695-570a-4e0d-b90c-bdf8955eb59b" TYPE="ext4" /dev/root: LABEL="/" UUID="579111cb-5d14-4ff8-9dc4-c6e62b8d7552" TYPE="ext4" /dev/sda3: LABEL="/usr" UUID="178bd9b8-337e-4d58-bbda-5fd5825e2a13" TYPE="ext4" /dev/sdc1: LABEL="/home" UUID="36ca57a0-6c93-412e-a497-4eab3269c0b0" TYPE="ext4" /dev/sdd1: UUID="9d95fa69-89a8-4168-a232-87a85c0d1889" UUID_SUB="d26415da-1bef-4b15-a4a2-421cb316e50d" TYPE="btrfs"
The only obvious difference I can see in the before(bad) and the after(good) cases is the ordering of the blkid fields [ UUID, UUID_SUB, TYPE ] for the btrfs partition entry : blkid (bad) : /dev/sdd1: UUID="aa27068e-4863-4b1e-a3a8-3f40b79dba2b" TYPE="btrfs" UUID_SUB="78c7678f-23a8-4a98-b910-f3fd0b45b06c" blkid (good) : /dev/sdd1: UUID="9d95fa69-89a8-4168-a232-87a85c0d1889" UUID_SUB="d26415da-1bef-4b15-a4a2-421cb316e50d" TYPE="btrfs" Note also that the original problematic btrfs partition was created with an older kernel, the new one was created with 2.6.33.4-95.fc13.x86_64.
Still present in f13, ext3 /boot partition.
on my system this does not happen anymore with fedora 14 alpha.
This message is a reminder that Fedora 12 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 12. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '12'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 12's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 12 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Appears to be fixed in f14 (though I haven't confirmed). closing.