Bug 994775 - upgrade-prep failure; prevents boot after f18-f19 upgrade
Summary: upgrade-prep failure; prevents boot after f18-f19 upgrade
Keywords:
Status: CLOSED DUPLICATE of bug 964303
Alias: None
Product: Fedora
Classification: Fedora
Component: fedup
Version: 19
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Will Woods
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-08-08 01:31 UTC by Przemek Klosowski
Modified: 2013-10-15 01:49 UTC (History)
11 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2013-10-09 19:47:01 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
diff between log messages for a successful 3.9.6 and unsuccessful 3.9.11 boot (199.24 KB, patch)
2013-08-08 01:31 UTC, Przemek Klosowski
no flags Details | Diff

Description Przemek Klosowski 2013-08-08 01:31:48 UTC
Created attachment 784185 [details]
diff between log messages for a successful 3.9.6 and unsuccessful 3.9.11 boot

Description of problem:
Happens on a fairly standard x86_64 system with raid1 md in a LVM PV, divided into several LVs: swap, ext4 / and /boot and  btrfs /home.

After upgrading a  F18 install to F19 using fedup, the system fails to complete boot (infinite reboot). I can still boot with older kernel 3.9.6-200.fc18.x86_64, but not with 3.10.4-300.fc19, nor with 3.9.11-200.fc18.x86_64 (sic!).

I log this against systemd but it's quite possible that it's really an anaconda or fedup problem and systemd only reveals it. Reassign as necessary if you see a good reason. 

Version-Release number of selected component (if applicable):
systemd-204-9.fc19.x86_64

How reproducible:
happens every time, but to reproduce I guess one would have to rebuild a similar system and I didn't try it.


Steps to Reproduce:
1. reboot, select 3.9.11 or 3.10.4 kernel
2. hit escape to watch systemd startup messages

Actual results:
systemd shutting services and going into reboot


Expected results:
successful boot, as with 3.9.6

Comment 1 Przemek Klosowski 2013-08-08 01:37:02 UTC
I suspect that the failure has to do with the systemd upgrade-prep.service unit, per the journalctl fragment below that seems to be the only salient difference between logs from a successful and failed boot.. I don't know why it's not a problem on 3.9.6 and fails for later kernels.

The direct reason for the crash is that ugrade-prep.service tries to run a non-existent script /run/initramfs/upgrade.conf

Aug 06 hr:mn:ss base.home systemd-journal[364]: Allowing system journal files to grow to 4.0G.
Aug 06 hr:mn:ss base.home systemd-journal[364]: Journal stopped
Aug 06 hr:mn:ss base.home systemd-journal[617]: Allowing system journal files to grow to 4.0G.
Aug 06 hr:mn:ss base.home systemd[1]: Unit upgrade-prep.service entered failed state.
Aug 06 hr:mn:ss base.home systemd[1]: Triggering OnFailure= dependencies of upgrade-prep.service.
Aug 06 hr:mn:ss base.home systemd[1]: Stopping Swap.
Aug 06 hr:mn:ss base.home systemd[1]: Stopped target Swap.
Aug 06 hr:mn:ss base.home systemd[1]: Stopping Local File Systems.
Aug 06 hr:mn:ss base.home systemd[1]: Stopped target Local File Systems.
Aug 06 hr:mn:ss base.home systemd[1]: Unmounting /home...
Aug 06 hr:mn:ss base.home systemd[1]: Unmounting /boot...
Aug 06 hr:mn:ss base.home systemd[1]: Unmounting Temporary Directory...
Aug 06 hr:mn:ss base.home systemd[1]: Stopping Configure read-only root support...
Aug 06 hr:mn:ss base.home systemd[1]: Stopped Configure read-only root support.
Aug 06 hr:mn:ss base.home systemd[1]: Stopping Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling...
Aug 06 hr:mn:ss base.home systemd[1]: Starting Journal Service...
Aug 06 hr:mn:ss base.home systemd[1]: boot.mount mount process exited, code=exited status=32
Aug 06 hr:mn:ss base.home systemd[1]: Failed unmounting /boot.
Aug 06 hr:mn:ss base.home systemd[1]: Unmounted Temporary Directory.
Aug 06 hr:mn:ss base.home systemd[1]: Stopped Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling.
Aug 06 hr:mn:ss base.home systemd[1]: Stopping LVM2 metadata daemon...
Aug 06 hr:mn:ss base.home systemd[1]: Stopping Device-mapper event daemon FIFOs.
Aug 06 hr:mn:ss base.home systemd[1]: Closed Device-mapper event daemon FIFOs.
Aug 06 hr:mn:ss base.home systemd[1]: Stopped LVM2 metadata daemon.
Aug 06 hr:mn:ss base.home systemd[1]: Stopping LVM2 metadata daemon socket.
Aug 06 hr:mn:ss base.home systemd[1]: Closed LVM2 metadata daemon socket.
Aug 06 hr:mn:ss base.home systemd[1]: Started Restore /run/initramfs.
Aug 06 hr:mn:ss base.home systemd-journald[364]: Received SIGTERM

Comment 2 Zbigniew Jędrzejewski-Szmek 2013-08-08 04:04:45 UTC
Hi Przemek,

at least from pasted logs, everything seems OK from the point of view of systemd. upgrade-prep.service has OnFailure=reboot.target, it fails, and a reboot follows. I'm reassigning to fedup... Can you paste more logs, which show why /run/initramfs/upgrade.conf is missing?

Comment 3 Przemek Klosowski 2013-08-08 13:14:44 UTC
I can't figure out why /run/initramfs/upgrade.conf is missing. I did 

find / -xdev -type f | xargs grep upgrade.conf

and it shows up in several RPM databases (e.g. /var/cache/yum/x86_64/19/updates/gen/filelists_db.sqlite) as well as in /var/lib/mlocate/mlocate.db so it looks like it existed at some point, but it's not there now. 

rpm -qf /run/initramfs/upgrade.conf just says No such file

yum whatprovides /run/initramfs/upgrade.conf says No matches found

what else can I try?

Comment 4 Przemek Klosowski 2013-08-08 13:16:41 UTC
Forgot to mention that the only mention of /run/initramfs/upgrade.conf in journalctl output is those two lines repeated on every reboot:

Aug 06 23:09:52 base.home upgrade-prep.sh[579]: /usr/lib/systemd/upgrade-prep.sh: line 23: /run/initramfs/upgrade.conf: No such file or directory
Aug 06 23:09:52 base.home upgrade-prep.sh[579]: can't find /run/initramfs/upgrade.conf

Comment 5 Zbigniew Jędrzejewski-Szmek 2013-08-08 15:23:13 UTC
upgrade.conf is created by the fedup python code. It's created dynamically, so it won't be found in rpm database. And anyway, files in /run are not in the rpm database. As to why it's missing... sorry, I don't know anything about fedup innards.

Comment 6 Will Woods 2013-08-09 18:39:27 UTC
Inside upgrade.img, usr/lib/dracut/hooks/pre-pivot/99-upgrade-pre-pivot.sh writes /run/upgrade/upgrade.conf.

You can boot with 'rd.break', and when you get the dracut shell, check to see if /run/upgrade/upgrade.conf exists; if it doesn't, you have the wrong initramfs (or a corrupt one).

If it does exist, then it's somehow getting lost/deleted/mounted-over during the switch-root to your system root.

Does it exist?

Comment 7 Zbigniew Jędrzejewski-Szmek 2013-08-09 20:12:33 UTC
> Aug 06 23:09:52 base.home upgrade-prep.sh[579]: /usr/lib/systemd/upgrade-
> prep.sh: line 23: /run/initramfs/upgrade.conf: No such file or directory

(In reply to Will Woods from comment #6)
> If it does exist, then it's somehow getting lost/deleted/mounted-over during
> the switch-root to your system root.
> 
> Does it exist?
I'm pretty sure it doesn't exist based on the error message.

Comment 8 Przemek Klosowski 2013-08-09 22:57:29 UTC
I do not have the files in Comment 6 (/usr/lib/dracut/hooks, upgrade.img and consequently /run/upgrade/upgrade.conf). I suppose they were deleted as part of the fedup/upgrade process?

Comment 9 Will Woods 2013-08-14 19:53:35 UTC
They don't exist on your system, they only exist *inside the fedup initramfs*.
/usr/lib/dracut/hooks/pre-pivot/99-upgrade-pre-pivot.sh is part of the initramfs image itself, and /run/upgrade/upgrade.conf gets created during the initial startup of the upgrade.

As I said in comment 6, edit the fedup boot entry to add 'rd.break', then boot fedup. When you get the emergency shell you'll be inside the initramfs; check to see if those two files exist there.

Comment 10 Przemek Klosowski 2013-10-01 02:02:22 UTC
Sorry for being dense but what's the 'fedup boot entry'? I have four grub.cfg entries: three for F19 that all fail to boot for the reason above, and an old F18 entry that works:
Fedora (3.11.1-200.fc19.x86_64) 19
Fedora (3.10.11-200.fc19.x86_64) 19
Fedora 19 Rescue 214a8ffc9a4c4229be1612a5d8e0640f (3.9.9-302.fc19.x86_64)
Fedora (3.9.6-200.fc18.x86_64)

I edited the kernel commandline for the 3.11.1-200 entry and in the emergency shell checked for the files; it appears that there is no 'upgrade/upgrade.conf' nor 99-upgrade-pre-pivot.sh. The initrd  this entry uses is 
/initramfs-3.11.1-200.fc19.x86_64.img


# ls /run
blkid initramfs lock log mdadm mount plymouth systemd udev
# ls /usr/lib/dracut/hooks/pre-*

pre-mount:
10-mdraid-waitclean.sh

pre-pivot:

pre-shutdown:
30-mdmon-pre-shutdown.sh

pre-trigger:
30-parse-dm.sh
30-parse-md.sh

pre-udev:
30-mdmon-pre-udev.sh
30-resume-genrules.sh

Comment 11 Will Woods 2013-10-08 17:59:39 UTC
What are the boot arguments for the failing boot entries? (In short: can you attach the grub*.cfg that was giving you problems?)

Comment 12 Will Woods 2013-10-09 19:47:01 UTC
Got the grub.cfg in an email. As suspected, the boot lines all have:

  upgrade systemd.unit=system-upgrade.target plymouth.splash=fedup enforcing=0

which are the boot arguments that fedup adds. You don't want these on your normal system!

I'm guessing you did something like:

1) Run fedup (which adds its boot entry with the above arguments)
2) Decided not to start upgrade (or tried and failed to start upgrade)
3) Went back to F18
4) Updated F18, *including the kernel* 
5) Ran the upgrade successfully

The problem is: when you update the kernel, it copies the default boot arguments from the *newest* boot entry. Which, in this case, was fedup.

When the upgrade runs, it removes the fedup boot entry immediately after starting. When the new kernel gets installed during the upgrade, it copies the boot arguments from the (now) newest boot entry - your previous F18 kernel. 

Except in your case the F18 kernel had copied its arguments from fedup - so now you're stuck with the fedup arguments.

Anyway, the workaround for you is easy - just remove 'upgrade systemd.unit=system-upgrade.target plymouth.splash=fedup enforcing=0' wherever it appears in your grub.cfg and your system should boot normally.

*** This bug has been marked as a duplicate of bug 964303 ***

Comment 13 Przemek Klosowski 2013-10-15 01:49:01 UTC
You are right although I can't remember the exact reason why the kernel was updated after fedup.

It seems to me that there are three approaches to prevent this from happening, corresponding to three subsystems that have to interfere with each other for this thing to occur: kernel updater, fedup and systemd:

- kernel updater could look for and delete magic fedup kernel commandline phrase "upgrade systemd.unit=system-upgrade.target plymouth.splash=fedup enforcing=0"

- fedup could somehow set up the kernel commandline, eg. perhaps by the use of variables, in such a way that it would include the magic fedup phrase when fedup runs, but it would expand to nothing otherwise, therefore being innocuous when copied blindly to other entries

- maybe the simplest solution is to change the OnFailure setting for upgrade-prep.service away from reboot.target? maybe to emergency.target? What would be a consequence of simply ignoring failure in a normal scenario?

So, which subsystem  there are two


Note You need to log in before you can comment on or make changes to this bug.