Bug 1074358

Summary: all initramfs in existing /boot are updated and broken on install
Product: [Fedora] Fedora Reporter: Tom Shield <twshield>
Component: anacondaAssignee: David Shea <dshea>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: rawhideCC: anaconda-maint-list, awilliam, bruno, dshea, g.kaviyarasu, herrold, ipilcher, jkortus, jonathan, jvanek, robatino, robn, vanmeeuwen+fedora
Target Milestone: ---Keywords: CommonBugs
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: RejectedBlocker https://fedoraproject.org/wiki/Common_F21_bugs#anaconda-initramfs-regeneration
Fixed In Version: anaconda-22.16-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-02-18 14:50:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Tom Shield 2014-03-10 01:51:52 UTC
Description of problem:

I configure my systems with separate /boot and two root partitions, so I can do a clean install every time and still have the previous system operational.  Thus /boot contains kernels, initramfs, etc from the previous version that should be left untouched.  This has worked previously (F18 was the last one I installed), but now F20 rebuilds all the initramfs and makes the previous installations unbootable (in one case, put 64 bit executables into initramfs's for 32 bit kernels).

Here are the initramfs in /boot.  I installed F20 on 2013-12-28 as you can see and the initramfs all got done at about the same time (except for the later updates and the Mar 9 F16 one when I finally figured out the problem).

-rw-------. 1 root root 38987305 Dec 28 15:19 initramfs-0-rescue-ff24a48382994aeebab205a34e8da473.img
-rw-------  1 root root 11195634 Feb 28 18:30 initramfs-3.13.4-200.fc20.x86_64.img
-rw-------  1 root root 11284499 Mar  2 12:31 initramfs-3.13.5-200.fc20.x86_64.img
-rw-------  1 root root 11282807 Mar  7 21:10 initramfs-3.13.5-202.fc20.x86_64.img
-rw-------. 1 root root 10174868 Dec 28 15:56 initramfs-3.4.11-1.fc16.i686.PAE.img
-rw-------. 1 root root 10173438 Dec 28 15:56 initramfs-3.6.10-2.fc16.i686.PAE.img
-rw-------. 1 root root 17673628 Mar  9 17:22 initramfs-3.6.11-4.fc16.i686.PAE.img
-rw-------. 1 root root 10173500 Dec 28 15:56 initramfs-3.6.11-4.fc16.i686.PAE.img.bad

I had to rebuild the 3.6.11-4 initramfs from the rescue environment to get F16 to boot.  I left the incorrectly built one as .bad.

Version-Release number of selected component (if applicable):

Fedora 20 DVD install media

How reproducible:

This happened on both machines I've upgraded so far, but with different symptoms as the initramfs were broken differently.

Here's that other machine (F20 with F18):

-rw-------. 1 root root 33005235 Mar  8 22:17 initramfs-0-rescue-1a7b8123c26c49219081edac24766fd2.img
-rw-------. 1 root root  8760987 Mar  8 22:22 initramfs-3.11.10-100.fc18.i686.PAE.img
-rw-------. 1 root root  9017876 Mar  8 22:23 initramfs-3.11.10-301.fc20.i686+PAE.img
-rw-------. 1 root root  8760977 Mar  8 22:21 initramfs-3.11.4-101.fc18.i686.PAE.img
-rw-------. 1 root root  8760676 Mar  8 22:22 initramfs-3.11.7-100.fc18.i686.PAE.img
-rw-------. 1 root root  8828241 Mar  9 13:42 initramfs-3.13.5-202.fc20.i686+PAE.img

This machine won't boot its F18 install, but with a different error as it's not a 64/32 bit issue, but it fails to find the disk at all.  At least I know how to fix it now.

Steps to Reproduce:
1. Start with a machine with an earlier Fedora install in /boot and one of the two root sized partitions
2. Install using custom partitioning to only format the other / partition and use the existing /boot partition
3.  Try booting the previous version install

Actual results:

Kernel panic or other critical boot failure on the previous install.  (The new install works fine.)

Expected results:

Existing files left alone so previous install boots.  This was the behavior for F18 (I did not ever try installing F19 so don't know about it).

Additional info:

Before the install, I also copy the grub2/grub.cfg menuconfig section for the existing install to custom.cfg as the grub probing does not always match up kernels and partitions correctly (although it is much better now than it was at first).  Custom.cfg gets correctly left alone.  

Here is the list of initramfs files from an F18 install (on 2013-3-9) on a system with a F16 in a separate partition with shared /boot.  Note that F16 files are all older than the F18 install date.

-rw------- 1 root root 31884595 Oct 13 13:23 initramfs-3.10.14-100.fc18.x86_64.img
-rw------- 1 root root  8646386 Oct 13 13:31 initramfs-3.10.14-100.fc18.x86_64kdump.img
-rw------- 1 root root 32726011 Dec 20 11:54 initramfs-3.11.10-100.fc18.x86_64.img
-rw------- 1 root root  8688385 Dec 20 11:57 initramfs-3.11.10-100.fc18.x86_64kdump.img
-rw------- 1 root root 32680540 Nov 23 13:18 initramfs-3.11.7-100.fc18.x86_64.img
-rw------- 1 root root  8685706 Nov 23 14:11 initramfs-3.11.7-100.fc18.x86_64kdump.img
-rw-r--r-- 1 root root 17458594 Jul 23  2012 initramfs-3.4.4-4.fc16.i686.PAE.img
-rw------- 1 root root 17639645 Jan  1  2013 initramfs-3.6.10-2.fc16.i686.PAE.img
-rw------- 1 root root 17658099 Feb 23  2013 initramfs-3.6.11-4.fc16.i686.PAE.img

P.S. Yes, I know, make backups.  Got lazy this time and only did my data disks and boom!  Just made the backups for /boot on the rest of my machines. ;)

Comment 1 jiri vanek 2014-09-24 14:27:54 UTC
Is there an possibility to regenerate those iniramfses without the rescue image?

Comment 2 Adam Williamson 2014-09-24 19:22:28 UTC
Proposing as a Beta blocker on the basis of messing with existing installs; to me this seems worse than just not having working dual boot, as it prevents you being able to boot the existing OSes using the usual methods, you additionally have to fix their initramfs'es with some kind of rescue tool to boot them.

No clear criterion for this yet, though we are in the process of drafting one on test@.

Comment 3 Adam Williamson 2014-10-01 16:25:49 UTC
Discussed at 2014-10-01 blocker review meeting - http://meetbot.fedoraproject.org/fedora-blocker-review/2014-10-01/f21-blocker-review.2014-10-01-15.58.log.txt . Rejected as a blocker, as on a close reading, it doesn't violate any existing or proposed criteria: the proposed dual-boot-with-existing-Fedora criterion does not cover shared /boot scenarios .

According to the installer team, we don't really intend to support the sharing of /boot in this way - it's not safe to expect two different distribution releases will be able to agree on what the things in /boot should look like:

<dlehman> my opinion? we shouldn't allow use of preexisting /boot filesystem
<dlehman> adamw: https://git.fedorahosted.org/cgit/anaconda.git/tree/pyanaconda/packaging/__init__.py?h=f21-branch#n345
<adamw> dlehman: so, it's going to Do Stuff for all kernels it finds in /boot, basically?
<dlehman> yes
* dlehman originally tried to write it to use the contents of rpms we installed, but live
<dlehman> reason #376 why I don't like live install
<adamw> dlehman: so what i'm suspecting is that the old kernels would actually work with the newer fedora, but they're broken with the older one because of this linux / linux16 change i didn't get time to grok yet?
<dlehman> sounds likely

The most likely thing to happen here is that we disallow/strongly warn on use of shared /boot. There are ways we could attempt to cope with it, but they're complex and fragile and still might not make people happy.

Comment 4 Adam Williamson 2014-12-05 20:00:10 UTC
Just re-discovered this one, for the record based on later investigations into linux/linux16 I don't think that has anything to do with it, it's probably just breaking when dracut for a later release generates a broken initramfs for an earlier release (or mixes arches, or whatever).

I feel like possibly we could try to filter this somewhat - at least don't operate on kernels for other arches, maybe don't operate on kernels of different release versions - but just not allowing shared /boot would of course avoid the necessity.

Someone else hit this on test@ and was...not happy:

https://lists.fedoraproject.org/pipermail/test/2014-December/124329.html

It does seem like we should *either* disallow shared /boot or come up with some kind of filtering for this for F22, it's a bit dangerous.

Comment 5 Ian Pilcher 2014-12-05 22:08:06 UTC
I just hit this.  Gritting my teeth and leaving arguments about the supportability/desirability of shared /boot aside, why is anaconda doing anything with the initramfs files for any kernel other than the on which it is installing?  Is this an unintentional side effect which can presumably be fixed, or is there some benefit to this behavior that I'm not seeing?

Comment 6 Ian Pilcher 2014-12-05 22:33:09 UTC
chattr +i on the existing initramfs files appears to be a workaround.  Of course, one does need to know to do this in advance.

Comment 7 Adam Williamson 2014-12-05 22:37:02 UTC
It does this intentionally, it's not a side-effect. I don't have the precise explanation of why and I don't really feel like digging through all the code to explain it, but it's easy to see that it's intentional, because there's a function written precisely f or the purpose of regenerating initramfs'es.

def recreateInitrds(self, force=False):
        """ Recreate the initrds by calling new-kernel-pkg

            This needs to be done after all configuration files have been
            written, since dracut depends on some of them.

Comment 8 Ian Pilcher 2014-12-05 23:53:45 UTC
(In reply to Adam Williamson (Red Hat) from comment #7)
> It does this intentionally, it's not a side-effect.

Hmm.  Looking at _updateKernelVersionList in __init__.py, and the commit that introduced it (https://git.fedorahosted.org/cgit/anaconda.git/commit/pyanaconda/packaging/__init__.py?id=58e9cc22fc49bf7a99bf5c9dd0c927ce8a5cb7d7), I don't see any particular evidence of a desire to mess with initramfs files from other operating systems.  I think it's equally possible that it's an unintended side effect.

Adding David Shea who wrote _updateKernelVersionList for his perspective ...

Comment 9 Adam Williamson 2014-12-05 23:58:36 UTC
That was covered in the IRC log in c#3:

<adamw> dlehman: so, it's going to Do Stuff for all kernels it finds in /boot, basically?
<dlehman> yes
* dlehman originally tried to write it to use the contents of rpms we installed, but live
<dlehman> reason #376 why I don't like live install

sorry, I slightly misread your question. It is intentional that anaconda re-generates initrds at all. It doesn't actually want to regenerate any initrds other than the ones it is itself installing, the fact that it in fact touches others is indeed not desired, but solving that is not easy.

Comment 10 Adam Williamson 2014-12-06 00:08:45 UTC
Mind you, a thought occurs on that. pyanaconda/packaging/livepayload.py subclasses Payload. Couldn't the generic Payload class's _kernelVersionList() do the thing where it gets its list from the contents of RPMs we installed, and the LiveImagePayload class could override it with one which derived its list from the kernels present in the live image (so it too wouldn't consider kernels other than ones it was itself deploying)?

Comment 11 Ian Pilcher 2014-12-06 19:44:47 UTC
(In reply to Adam Williamson (Red Hat) from comment #10)
> Mind you, a thought occurs on that. pyanaconda/packaging/livepayload.py
> subclasses Payload. Couldn't the generic Payload class's
> _kernelVersionList() do the thing where it gets its list from the contents
> of RPMs we installed, and the LiveImagePayload class could override it with
> one which derived its list from the kernels present in the live image (so it
> too wouldn't consider kernels other than ones it was itself deploying)?

I'm not at all familiar with anaconda's innards, but it did occur to me that when _updateKernelVersionList finds an initramfs file it could look for a corresponding modules directory in /mnt/sysimage/lib/modules.  If it doesn't find one, it could skip that version.

Comment 12 DO NOT USE account not monitored (old adamwill) 2014-12-06 19:48:37 UTC
y'know, I'm sure some awkward bugger can think of some problem with it, but that sounds pretty viable to me. Or in fact it could just find the kernel versions based on /lib/modules full stop.

Comment 13 DO NOT USE account not monitored (old adamwill) 2014-12-06 20:11:56 UTC
so completely untested, something like this:

diff --git a/pyanaconda/packaging/__init__.py b/pyanaconda/packaging/__init__.py
index 74096a2..478ec42 100644
--- a/pyanaconda/packaging/__init__.py
+++ b/pyanaconda/packaging/__init__.py
@@ -315,15 +315,14 @@ class Payload(object):
 
         files = glob(iutil.getSysroot() + "/boot/vmlinuz-*")
         files.extend(glob(iutil.getSysroot() + "/boot/efi/EFI/%s/vmlinuz-*" % self.instclass.efi_dir))
+        dirs = glob(iutil.getSysroot() + "/lib/modules/*")
 
-        versions = sorted((f.split("/")[-1][8:] for f in files if os.path.isfile(f)), cmp=cmpfunc)
-        log.debug("kernel versions: %s", versions)
+        versions = sorted((d.split("/")[-1][8:] for d in dirs if os.path.isdir(d)), cmp=cmpfunc)
+        rescue = sorted((f.split("/")[-1][8:] for f in files if os.path.isfile(f)) and "-rescue-" in f, cmp=cmpfunc)
 
         # Store regular and rescue kernels separately
-        self._kernelVersionList = (
-                [v for v in versions if "-rescue-" not in v],
-                [v for v in versions if "-rescue-" in v]
-                )
+        self._kernelVersionList = (versions, rescue)
+        log.debug("kernel versions: %s", self._kernelVersionList)

Comment 14 DO NOT USE account not monitored (old adamwill) 2014-12-06 20:13:31 UTC
bah, it doesn't need the [8:] any more, make that:

diff --git a/pyanaconda/packaging/__init__.py b/pyanaconda/packaging/__init__.py
index 74096a2..e46ab80 100644
--- a/pyanaconda/packaging/__init__.py
+++ b/pyanaconda/packaging/__init__.py
@@ -315,15 +315,14 @@ class Payload(object):
 
         files = glob(iutil.getSysroot() + "/boot/vmlinuz-*")
         files.extend(glob(iutil.getSysroot() + "/boot/efi/EFI/%s/vmlinuz-*" % self.instclass.efi_dir))
+        dirs = glob(iutil.getSysroot() + "/lib/modules/*")
 
-        versions = sorted((f.split("/")[-1][8:] for f in files if os.path.isfile(f)), cmp=cmpfunc)
-        log.debug("kernel versions: %s", versions)
+        versions = sorted((d.split("/")[-1] for d in dirs if os.path.isdir(d)), cmp=cmpfunc)
+        rescue = sorted((f.split("/")[-1][8:] for f in files if os.path.isfile(f)) and "-rescue-" in f, cmp=cmpfunc)
 
         # Store regular and rescue kernels separately
-        self._kernelVersionList = (
-                [v for v in versions if "-rescue-" not in v],
-                [v for v in versions if "-rescue-" in v]
-                )
+        self._kernelVersionList = (versions, rescue)
+        log.debug("kernel versions: %s", self._kernelVersionList)
 
     ###
     ### METHODS FOR QUERYING STATE

Comment 15 jiri vanek 2014-12-08 10:37:01 UTC
(In reply to jiri vanek from comment #1)
> Is there an possibility to regenerate those iniramfses without the rescue
> image?

just for record, running dracut in chrooted affected system via eg live disk, is working fine.

Comment 16 Ian Pilcher 2014-12-08 14:01:01 UTC
(In reply to Adam Williamson (Fedora) from comment #14)
> bah, it doesn't need the [8:] any more, make that:
> 
> diff --git a/pyanaconda/packaging/__init__.py
> b/pyanaconda/packaging/__init__.py
> index 74096a2..e46ab80 100644
> --- a/pyanaconda/packaging/__init__.py
> +++ b/pyanaconda/packaging/__init__.py
> @@ -315,15 +315,14 @@ class Payload(object):
>  
>          files = glob(iutil.getSysroot() + "/boot/vmlinuz-*")
>          files.extend(glob(iutil.getSysroot() + "/boot/efi/EFI/%s/vmlinuz-*"
> % self.instclass.efi_dir))
> +        dirs = glob(iutil.getSysroot() + "/lib/modules/*")
>  
> -        versions = sorted((f.split("/")[-1][8:] for f in files if
> os.path.isfile(f)), cmp=cmpfunc)
> -        log.debug("kernel versions: %s", versions)
> +        versions = sorted((d.split("/")[-1] for d in dirs if
> os.path.isdir(d)), cmp=cmpfunc)
> +        rescue = sorted((f.split("/")[-1][8:] for f in files if
> os.path.isfile(f)) and "-rescue-" in f, cmp=cmpfunc)
>  
>          # Store regular and rescue kernels separately
> -        self._kernelVersionList = (
> -                [v for v in versions if "-rescue-" not in v],
> -                [v for v in versions if "-rescue-" in v]
> -                )
> +        self._kernelVersionList = (versions, rescue)
> +        log.debug("kernel versions: %s", self._kernelVersionList)
>  
>      ###
>      ### METHODS FOR QUERYING STATE

I tried this from an F21 live CD, and I got a traceback at this line:

  rescue = sorted((f.split("/")[-1][8:] for f in files if os.path.isfile(f)) and "-rescue-" in f, cmp=cmpfunc)

  NameError: global name 'f' is not defined

Does the [8:] need to be removed from this line as well?

Comment 17 Adam Williamson 2014-12-08 17:16:53 UTC
no, that's not it. What the [8:] does is cut the 'linux-' off the front of 'linux-3.17-3.fc21' or whatever, so you wind up with '3.17-3.fc21'. Not sure what's going wrong there at first glance, I can poke it a bit more but I'd rather get dlehman/davidshea to sign off on the basic idea before spending too much time on it.

Comment 18 David Shea 2014-12-16 16:48:21 UTC
(In reply to Adam Williamson (Red Hat) from comment #10)
> Mind you, a thought occurs on that. pyanaconda/packaging/livepayload.py
> subclasses Payload. Couldn't the generic Payload class's
> _kernelVersionList() do the thing where it gets its list from the contents
> of RPMs we installed, and the LiveImagePayload class could override it with
> one which derived its list from the kernels present in the live image (so it
> too wouldn't consider kernels other than ones it was itself deploying)?

Yeah, that sounds like the best way to go about it.

Comment 19 Adam Williamson 2015-01-22 21:38:09 UTC
For the record, David Shea's proposed fix for this had some issues and has been reverted; there is now a proposed patch to disallow shared /boot . If anyone has a really good argument for allowing shared /boot that isn't mentioned above, please let us know.

Comment 20 Tom Shield 2015-01-22 22:46:46 UTC
(In reply to Adam Williamson (Red Hat) from comment #19)
> For the record, David Shea's proposed fix for this had some issues and has
> been reverted; there is now a proposed patch to disallow shared /boot . If
> anyone has a really good argument for allowing shared /boot that isn't
> mentioned above, please let us know.

In addition to using a shared /boot to allow clean installs (as the Fedora Docs recommend) without losing the previous install as I explained above, I also keep things like a gparted live iso and memtest in my /boot should anything need fixing or testing without the system being live.  I also do my installs by copying the pxeboot install kernel and initrd files into /boot to do an install over the network on machines with no dvd/cd drives.  A failed install that has already reformatted /boot will remove the possibility of booting anything else to try to fix the problem.

I kept meaning to suggest that the default disk setup for Fedora have dual root partitions and directly support what I've been doing manually.  This allows the best features of both an upgrade that keeps existing configuration and a clean install that removes the cruft and starts from a new file system.  Forcing a format on /boot at install seems to me to remove this possibility.

Comment 21 Rob Newton 2015-01-23 00:10:25 UTC
(In reply to Adam Williamson (Red Hat) from comment #19)
> For the record, David Shea's proposed fix for this had some issues and has
> been reverted; there is now a proposed patch to disallow shared /boot . If
> anyone has a really good argument for allowing shared /boot that isn't
> mentioned above, please let us know.

Similar to Tom Shield, I too like to do a clean install with the option to revert back to the previous one.  Having a shared /boot is something many people are used to doing I imagine.  And given Fedora is a fairly bleeding edge distro (in a positive sense) we may need to revert back at times.

Comment 22 Ian Pilcher 2015-01-23 17:00:52 UTC
(In reply to Adam Williamson (Red Hat) from comment #19)
> For the record, David Shea's proposed fix for this had some issues and has
> been reverted; there is now a proposed patch to disallow shared /boot . If
> anyone has a really good argument for allowing shared /boot that isn't
> mentioned above, please let us know.

Aaargh!  Please don't do this.  Throw up all the warnings that you want, but this will make it almost impossible for me to install Fedora on my system.  Almost all of my disk space is used by either my Windows partitions or bcache, with only about 1GB left over for /boot.

What about the idea of looking in the newly installed system's /lib/modules directory to determine which initramfs files should be rebuilt?

Comment 23 Adam Williamson 2015-01-23 18:23:11 UTC
Everyone can cease panicking now, we're back to trying to fix it:

https://lists.fedorahosted.org/pipermail/anaconda-patches/2015-January/015699.html

I'll try and get some testing done on that today.

Comment 24 David Shea 2015-05-04 19:41:08 UTC
*** Bug 1217781 has been marked as a duplicate of this bug. ***