Bug 530108

Summary: grubby fails to update grub.conf if btrfs present
Product: [Fedora] Fedora Reporter: Jason Farrell <farrellj>
Component: grubbyAssignee: Peter Jones <pjones>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 12CC: bugs, dalecode, dshaw, email.ahmedkamal, eparis, igeorgex, jbacik, jeff, jeremy, kzak, limburgher, martin, mathieu-acct, mcepl, me, pjones, russ+bugzilla-redhat, tomek, ville.skytta
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 7.0.12-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-11-04 08:41:51 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Attachments:
Description Flags
Proposed patch
none
Proposed patch v2
none
proposed patch v3 none

Description Jason Farrell 2009-10-21 11:22:27 EDT
Description of problem:
If your rootfs is btrfs (icantbelieveitsnotbtr install option), then grubby will fail to update grub.conf upon kernel updates with error: "grubby fatal error: unable to find a suitable template"

Version-Release number of selected component (if applicable):
grubby-7.0.8-1.fc12.i686
kernel-PAE-2.6.31.1-56.fc12.i686

Steps to Reproduce:
1. install the F12 beta into two VMs where one is the ext4 default and the other is btrfs (icantbelieveitsnotbtr); all else is equal.
2. run "grubby --default-kernel" as root (or attempt to update the kernel and note that grub.conf isn't updated) to see if it's working right.
  
Actual results:
ext4:  success
btrfs: fail

Additional info:
I've created this new bug to replace the old and now closed bug 124246
Comment 1 Eric Paris 2009-10-21 12:12:03 EDT
The problem is known, but noone fixed it in the old bug, they fixed another problem.  btrfs creates a private block device with a fake label in the kernel and thus the output from stat on btrfs and the output from stat on the real block device are not the same.  btrfs does this because it can have multiple block devices or even be a snapshot on a block device.  We could just give up on this check in userspace, we could add a new btrfs ioctl or something, who knows, but the problem is that btrfs does not report the same block device as grubby expects.

https://bugzilla.redhat.com/show_bug.cgi?id=124246#c32

I built grubby/mkinitrd on a F11 rawhide system and turned on the grubby
debugging to try to see what's the matter.

This system is using grubby from mkinitrd-6.0.81-1.fc11.

In my case, I'm trying to upgrade kernel-PAE-2.6.29.1-70.fc11.i686, where my
existing GRUB configuration entry is

  default=0
  timeout=0
  splashimage=(hd0,0)/grub/splash.xpm.gz
  hiddenmenu
  title Fedora (2.6.29.1-68.fc11.i686.PAE)
          root (hd0,0)
          kernel /vmlinuz-2.6.29.1-68.fc11.i686.PAE ro
root=/dev/mapper/hitachi0-root rhgb quiet
          initrd /initrd-2.6.29.1-68.fc11.i686.PAE.img

The failure appears to be in grubby.c:1087 (suitableImage)...  The nash library
extracts the correct root device (/dev/mapper/hitachi0-root), but the stat
calls around line 1144 don't think that the root device corresponds to the
currently-mounted root filesystem.

    dev = nashGetPathBySpec(_nash_context, dev);
    if (!dev)
        return 0;

    i = stat(dev, &sb);
    if (i)
 return 0;

    stat("/", &sb2);

    if (sb.st_rdev != sb2.st_dev)
        return 0;

On my system, the last statement triggers, and suitableImage() returns with
zero (i.e. not a suitable image)

The stat calls show sb (/dev/mapper/hitachi0-root) and sb2 (/) as

(gdb) p sb
$28 = {st_dev = 15, __pad1 = 0, st_ino = 705, st_mode = 25008, st_nlink = 1, 
  st_uid = 0, st_gid = 6, st_rdev = 64769, __pad2 = 0, st_size = 0, 
  st_blksize = 4096, st_blocks = 0, st_atim = {tv_sec = 1239649664, 
    tv_nsec = 304070388}, st_mtim = {tv_sec = 1239649664, 
    tv_nsec = 171003997}, st_ctim = {tv_sec = 1239649664, 
    tv_nsec = 171003997}, __unused4 = 0, __unused5 = 0}

(gdb) p sb2
$27 = {st_dev = 17, __pad1 = 0, st_ino = 256, st_mode = 16877, st_nlink = 1, 
  st_uid = 0, st_gid = 0, st_rdev = 0, __pad2 = 0, st_size = 168, 
  st_blksize = 4096, st_blocks = 8, st_atim = {tv_sec = 1239704224, 
    tv_nsec = 37719940}, st_mtim = {tv_sec = 1239649715, 
    tv_nsec = 958735685}, st_ctim = {tv_sec = 1239649715, 
    tv_nsec = 958735685}, __unused4 = 0, __unused5 = 0}

The offending command-line is

  grubby --add-kernel=/boot/vmlinuz-2.6.29.1-70.fc11.i686.PAE --initrd
/boot/initrd-2.6.29.1-70.fc11.i686.PAE.img --copy-default --make-default
--title 'Fedora (2.6.29.1-70.fc11.i686.PAE)'
'--args=root=/dev/mapper/hitachi0-root ' '--remove-kernel=TITLE=Fedora
(2.6.29.1-70.fc11.i686.PAE)'  


https://bugzilla.redhat.com/show_bug.cgi?id=124246#c37

Did a little poking today,  btrfs_getattr has a line like:

static int btrfs_getattr(struct vfsmount *mnt,
                         struct dentry *dentry, struct kstat *stat)
{
        struct inode *inode = dentry->d_inode;
        generic_fillattr(inode, stat);
        stat->dev = BTRFS_I(inode)->root->anon_super.s_dev;

So btrfs (unlike every other fs in kernel) sets the dev themselves rather than
using the dev from generic_fillattr.

I'm guessing (but haven't verified) that this is the reason the ->dev found
when poking the block device is different than the dev btrfs is reporting and
thus the rejection...

I just ask an upstream btrfs maintainer why they do this....
Comment 2 Josef Bacik 2009-10-21 16:22:48 EDT
Created attachment 365599 [details]
Proposed patch

This is pretty lame, but I think it keeps the intent of the original idea.  Instead of stat'ing / and the device to make sure they're the same, just open /etc/mtab and get the device that is mounted on / and compare it with the device that we got from root=.  This seems to be the simplest way to take care of the problem.  We could possibly get the the uuid from both devices and compare that, but it's still going to require reading /etc/mtab so it's just a little bit more code to add to this if we want to do the uuid comparison.  Tested this on my box with a btrfs root and my box with a ext4 root on lvm.
Comment 3 Eric Paris 2009-10-21 16:26:38 EDT
Just a question, why /etc/mtab instead of /proc/mounts?
Comment 4 Josef Bacik 2009-10-23 14:21:33 EDT
because /etc/mtab doesn't make me find the real /.  If you look at /etc/mtab

/dev/sda3 / btrfs rw,noatime 0 0
proc /proc proc rw 0 0
sysfs /sys sysfs rw 0 0
devpts /dev/pts devpts rw 0 0
/dev/sda1 /boot ext3 rw,noatime 0 0
tmpfs /dev/shm tmpfs rw,rootcontext="system_u:object_r:tmpfs_t:s0" 0 0
tmpfs /tmp tmpfs rw,rootcontext="system_u:object_r:tmp_t:s0" 0 0
debugfs /sys/kernel/debug debugfs rw 0 0
none /proc/sys/fs/binfmt_misc binfmt_misc rw 0 0
sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw 0 0
gvfs-fuse-daemon /home/josef/.gvfs fuse.gvfs-fuse-daemon rw,nosuid,nodev,user=josef 0 0


and then /proc/mounts

rootfs / rootfs rw 0 0
/dev/root / btrfs rw,seclabel,noatime 0 0
/dev /dev tmpfs rw,seclabel,relatime,mode=755 0 0
/proc /proc proc rw,relatime 0 0
/sys /sys sysfs rw,relatime 0 0
none /selinux selinuxfs rw,relatime 0 0
/proc/bus/usb /proc/bus/usb usbfs rw,relatime 0 0
devpts /dev/pts devpts rw,seclabel,relatime,mode=600,ptmxmode=000 0 0
/dev/sda1 /boot ext3 rw,seclabel,noatime,errors=continue,data=writeback 0 0
tmpfs /dev/shm tmpfs rw,rootcontext=system_u:object_r:tmpfs_t:s0,seclabel,relatime 0 0
tmpfs /tmp tmpfs rw,rootcontext=system_u:object_r:tmp_t:s0,seclabel,relatime 0 0
debugfs /sys/kernel/debug debugfs rw,relatime 0 0
none /proc/sys/fs/binfmt_misc binfmt_misc rw,relatime 0 0
sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0
gvfs-fuse-daemon /home/josef/.gvfs fuse.gvfs-fuse-daemon rw,nosuid,nodev,relatime,user_id=500,group_id=500 0 0


/dev/root does not match /dev/sda3, and running blkid against /dev/root doesn't give you a uuid, so its completely useless.  So unfortunately that means we have to use /etc/mtab.  (I really wanted to use /proc/mounts but the uuid thing makes it completely impossible).
Comment 5 Matěj Cepl 2009-10-23 14:42:35 EDT
Scratch build of the patched grubby is http://koji.fedoraproject.org/koji/taskinfo?taskID=1765009
Comment 6 Josef Bacik 2009-10-23 15:57:41 EDT
Created attachment 365887 [details]
Proposed patch v2

Ok mcepl hit this nice mount bug where his /etc/mtab has /dev/dm-0 for his root, but root=/dev/mapper/whatever.  So here's an updated patch that goes ahead and gets the uuid from both the device specified in the root= line and what we find in /etc/mtab.  This is cleaner and will avoid getting screwed by whatever inconsistencies of device pathname we find between the actually resolved devpath and whatever shortcut gets dumped into /etc/mtab.  Tested this on my box with lvm by manually changing /etc/mtab to say /dev/dm-0 and it worked fine.
Comment 7 Matěj Cepl 2009-10-23 16:20:43 EDT
Yes this patch (build of grubby is on http://koji.fedoraproject.org/koji/taskinfo?taskID=1765148) makes kernel installation working.
Comment 8 Eric Paris 2009-11-10 13:22:25 EST
pjones, seems like we have a working tested patch, any chance we can get a real build?
Comment 9 Bug Zapper 2009-11-16 08:58:12 EST
This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle.
Changing version to '12'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 10 Eric Paris 2009-11-23 14:59:03 EST
pjones ping?  can we get the patch in comment #6 committed?  I assume there is no grubby upstream I should be poking here?  Pretty please?
Comment 11 George R. Kasica 2009-12-01 16:58:15 EST
Will this also fix the similar issue for users with ext3 file systems as well? I've been dealing with this issue since FC 9 and bug #124426...and still see it in FC10. Just curious to know if I wilol still need to hand edit kernel updates in FC12 before I go thorugh the update process from 10 to 12.
Comment 12 Ville Skyttä 2009-12-25 15:58:59 EST
Just adding my "me too" here, I ran into this with the kernel 2.6.31.9-174.fc12 update.  The scratch builds from comments 5 and 7 seem to have been already garbage collected from koji :(
Comment 13 Jeremy Fitzhardinge 2010-01-28 12:33:06 EST
I'm seeing this as well.  I just had to manually install a grub entry for kernel-2.6.31.12-174.2.3.fc12.x86_64
Comment 14 Josef Bacik 2010-01-28 17:41:18 EST
can we please get this included?  it is impossible to properly get a grub entry with btrfs as root without this patch, and with btrfs becoming more stable and more widely used this is becoming more and more of an issue.
Comment 15 Josef Bacik 2010-02-08 15:10:51 EST
ping
Comment 16 Josef Bacik 2010-02-08 16:00:49 EST
Created attachment 389621 [details]
proposed patch v3

Ok heres a new version of the patch, made against the git version of grubby.  I've tested this on my box with btrfs root and it works fine.  I've cleaned the patch up a bit, I use _PATH_MOUNTED instead of "/etc/mtab" and I've commented it out a bit.
Comment 17 Nigel Jones 2010-02-22 05:06:36 EST
Getting this constantly too (with btrfs). will try the patch.
Comment 18 Nigel Jones 2010-02-25 00:48:16 EST
Patch worked nicely for me (x64, single partition, btrfs)
Comment 19 Nigel Jones 2010-02-25 00:49:02 EST
To elaborate, I applied the source patch to 7.0.9.1 src rpm & rebuilt.
Comment 20 Jon Ciesla 2010-03-10 15:54:44 EST
I see this, on F-12, but without btrfs.  My /boot is ext3.
Comment 21 Jon Ciesla 2010-05-13 13:57:21 EDT
Ping?
Comment 22 dale 2010-05-31 20:02:01 EDT
I have a similar problem on an F-13 installation, with grubby 7.0.13-1.fc13:


$ rpm -ivvh kernel-2.6.33.5-112.fc13.x86_64.rpm 
D: ============== kernel-2.6.33.5-112.fc13.x86_64.rpm
[snip]
D:   install: %post(kernel-2.6.33.5-112.fc13.x86_64) scriptlet start
D:   install: %post(kernel-2.6.33.5-112.fc13.x86_64)    execv(/bin/sh) pid 23805
++ uname -i
++ uname -i
+ '[' x86_64 == x86_64 -o x86_64 == i386 ']'
+ '[' -f /etc/sysconfig/kernel ']'
+ /bin/sed -r -i -e 's/^DEFAULTKERNEL=(kernel-smp|kernel-xen)$/DEFAULTKERNEL=kernel/' /etc/sysconfig/kernel
+ /sbin/new-kernel-pkg --package kernel --install 2.6.33.5-112.fc13.x86_64
grubby recieved SIGSEGV!  Backtrace (8):
/sbin/grubby[0x40805f]
/lib64/libc.so.6[0x3b4d832a40]
/lib64/libc.so.6[0x3b4d87e6c6]
/sbin/grubby[0x40695e]
/sbin/grubby[0x406ae3]
/sbin/grubby[0x407e58]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x3b4d81ec5d]
/sbin/grubby[0x401709]


(gdb) 
(gdb) run --add-kernel=/boot/vmlinuz-2.6.33.5-112.fc13.x86_64   --copy-default --make-default --title "Fedora (2.6.33.5-112.fc13.x86_64)"   --args=root=LABEL=/  --remove-kernel=TITLE="Fedora (2.6.33.5-112.fc13.x86_64)"
Starting program: /sbin/grubby --add-kernel=/boot/vmlinuz-2.6.33.5-112.fc13.x86_64   --copy-default --make-default --title "Fedora (2.6.33.5-112.fc13.x86_64)"   --args=root=LABEL=/  --remove-kernel=TITLE="Fedora (2.6.33.5-112.fc13.x86_64)"

Program received signal SIGSEGV, Segmentation fault.
__strcmp_sse2 () at ../sysdeps/x86_64/strcmp.S:106
106             movlpd  (%rdi), %xmm1
(gdb)
(gdb) thread apply all bt

Thread 1 (process 11203):
#0  __strcmp_sse2 () at ../sysdeps/x86_64/strcmp.S:106
#1  0x000000000040695e in suitableImage (entry=<value optimized out>, bootPrefix=<value optimized out>, skipRemoved=<value optimized out>, 
    flags=<value optimized out>) at grubby.c:1297
#2  0x0000000000406ae3 in findTemplate (cfg=0x60c9a0, prefix=0x60c680 "/boot", indexPtr=<value optimized out>, skipRemoved=0, flags=0)
    at grubby.c:1447
#3  0x0000000000407e58 in main (argc=0, argv=0x7fffffffe0a0) at grubby.c:3182



Installed grubby & kernel versions are:
grubby.x86_64         7.0.13-1.fc13
kernel.x86_64         2.6.33.4-95.fc13



$ cat grub.conf 
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/sda5
#          initrd /initrd-version.img
#boot=/dev/sda
default=0
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
#hiddenmenu
title Fedora (2.6.33.4-95.fc13.x86_64)
        root (hd0,0)
        kernel /vmlinuz-2.6.33.4-95.fc13.x86_64 ro root=LABEL=/ rhgb quiet vga=792 SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=us
        initrd /initramfs-2.6.33.4-95.fc13.x86_64.img
title Memtest86+ (2.10)
        root (hd0,0)
        kernel /memtest86+-2.10 ro root=LABEL=/ rhgb quiet




$ cat fstab 
LABEL=/                 /                       ext4    defaults,noatime        1 1
LABEL=/boot             /boot                   ext2    defaults,noatime        1 2
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
tmpfs                   /dev/shm                tmpfs   defaults        0 0
LABEL=/export           /export                 ext4    defaults,noatime        1 2
LABEL=/home             /home                   ext4    defaults,noatime        1 2
proc                    /proc                   proc    defaults        0 0
LABEL=/usr              /usr                    ext4    defaults,noatime        1 2 
sysfs                   /sys                    sysfs   defaults        0 0
tmpfs                   /tmp                    tmpfs   defaults        0 0
LABEL=/var              /var                    ext4    defaults,noatime        1 2


$ cat /etc/mtab
/dev/sda2 / ext4 rw,noatime 0 0
proc /proc proc rw 0 0
sysfs /sys sysfs rw 0 0
devpts /dev/pts devpts rw,gid=5,mode=620 0 0
tmpfs /dev/shm tmpfs rw,rootcontext="system_u:object_r:tmpfs_t:s0" 0 0
/dev/sda1 /boot ext2 rw,noatime 0 0
/dev/sdb2 /export ext4 rw,noatime 0 0
/dev/sdc1 /home ext4 rw,noatime 0 0
/dev/sda3 /usr ext4 rw,noatime 0 0
tmpfs /tmp tmpfs rw,rootcontext="system_u:object_r:tmp_t:s0" 0 0
/dev/sdb1 /var ext4 rw,noatime 0 0


$ blkid 
/dev/sdb1: LABEL="/var" UUID="7a8ccdd0-c4bf-41cc-9685-dca13e644014" TYPE="ext4" 
/dev/sda1: LABEL="/boot" UUID="fe727f32-7472-4bff-84a2-541730c26e8b" TYPE="ext2" 
/dev/sdb2: LABEL="/export" UUID="8f053695-570a-4e0d-b90c-bdf8955eb59b" TYPE="ext4" 
/dev/root: LABEL="/" UUID="579111cb-5d14-4ff8-9dc4-c6e62b8d7552" TYPE="ext4" 
/dev/sdd1: UUID="aa27068e-4863-4b1e-a3a8-3f40b79dba2b" TYPE="btrfs" UUID_SUB="78c7678f-23a8-4a98-b910-f3fd0b45b06c" 
/dev/sda3: LABEL="/usr" UUID="178bd9b8-337e-4d58-bbda-5fd5825e2a13" TYPE="ext4" 
/dev/sdc1: LABEL="/home" UUID="36ca57a0-6c93-412e-a497-4eab3269c0b0" TYPE="ext4" 


My root '/' partition is ext4, '/boot' is ext2.

I also have an unmounted (and unlabelled ?) btrfs partition on /dev/sdd1 (as show in the output of blkid above).

If I delete that unmounted btrfs partition, the rpm installation completes without problems and grub.conf is updated correctly.
Comment 23 dale 2010-05-31 20:18:22 EDT
Having deleted and re-created that unlabeled btrfs partition, I can no longer reproduce.

blkid output now shows :

[root@localhost grub]# blkid 
/dev/sdb1: LABEL="/var" UUID="7a8ccdd0-c4bf-41cc-9685-dca13e644014" TYPE="ext4" 
/dev/sda1: LABEL="/boot" UUID="fe727f32-7472-4bff-84a2-541730c26e8b" TYPE="ext2" 
/dev/sdb2: LABEL="/export" UUID="8f053695-570a-4e0d-b90c-bdf8955eb59b" TYPE="ext4" 
/dev/root: LABEL="/" UUID="579111cb-5d14-4ff8-9dc4-c6e62b8d7552" TYPE="ext4" 
/dev/sda3: LABEL="/usr" UUID="178bd9b8-337e-4d58-bbda-5fd5825e2a13" TYPE="ext4" 
/dev/sdc1: LABEL="/home" UUID="36ca57a0-6c93-412e-a497-4eab3269c0b0" TYPE="ext4" 
/dev/sdd1: UUID="9d95fa69-89a8-4168-a232-87a85c0d1889" UUID_SUB="d26415da-1bef-4b15-a4a2-421cb316e50d" TYPE="btrfs"
Comment 24 dale 2010-05-31 20:35:21 EDT
The only obvious difference I can see in the before(bad) and the after(good) cases is the ordering of the blkid fields [ UUID, UUID_SUB, TYPE ] for the btrfs partition entry :


blkid (bad) :
/dev/sdd1: UUID="aa27068e-4863-4b1e-a3a8-3f40b79dba2b" TYPE="btrfs"
UUID_SUB="78c7678f-23a8-4a98-b910-f3fd0b45b06c"


blkid (good) :
/dev/sdd1: UUID="9d95fa69-89a8-4168-a232-87a85c0d1889"
UUID_SUB="d26415da-1bef-4b15-a4a2-421cb316e50d" TYPE="btrfs"


Note also that the original problematic btrfs partition was created with an older kernel, the new one was created with 2.6.33.4-95.fc13.x86_64.
Comment 25 Jon Ciesla 2010-06-10 08:17:43 EDT
Still present in f13, ext3 /boot partition.
Comment 26 Tobias Florek 2010-08-29 10:20:47 EDT
on my system this does not happen anymore with fedora 14 alpha.
Comment 27 Bug Zapper 2010-11-04 05:19:04 EDT
This message is a reminder that Fedora 12 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 12.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '12'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 12's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 12 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 28 Jason Farrell 2010-11-04 08:41:51 EDT
Appears to be fixed in f14 (though I haven't confirmed). closing.