1045168 – failure to boot upgrade environment if /var is not on rootfs

Bug 1045168 - failure to boot upgrade environment if /var is not on rootfs

Summary: failure to boot upgrade environment if /var is not on rootfs

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	fedup
Sub Component:
Version:	20
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Will Woods
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:	https://fedoraproject.org/wiki/Common...
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-12-19 18:55 UTC by Chris Murphy
Modified:	2015-02-19 05:59 UTC (History)
CC List:	28 users (show)
Fixed In Version:	fedup-0.8.0-4.fc19
Clone Of:
Environment:
Last Closed:	2014-04-24 07:42:29 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
fstab (541 bytes, text/plain) 2013-12-19 18:56 UTC, Chris Murphy	no flags	Details
grub.cfg (4.25 KB, text/plain) 2013-12-19 18:56 UTC, Chris Murphy	no flags	Details
journalctl (148.38 KB, text/plain) 2013-12-19 19:04 UTC, Chris Murphy	no flags	Details
fedupdebug.log (142.86 KB, text/plain) 2013-12-19 19:06 UTC, Chris Murphy	no flags	Details
fstab, var on ext4 partition (595 bytes, text/plain) 2013-12-19 19:33 UTC, Chris Murphy	no flags	Details
grub.cfg, var on ext4 partition (4.21 KB, text/plain) 2013-12-19 19:33 UTC, Chris Murphy	no flags	Details
fedupdebug.log, var on ext4 partition (137.46 KB, text/plain) 2013-12-19 19:34 UTC, Chris Murphy	no flags	Details
journal.log, var on ext4 partition (145.91 KB, text/plain) 2013-12-19 19:34 UTC, Chris Murphy	no flags	Details
View All

Description Chris Murphy 2013-12-19 18:55:31 UTC

Description of problem:
If Fedora 19 is installed with /var on an LV, the fedup upgrade environment fails to startup and upgrade the system. 

The grub.cfg does not contain rd.lvm.lv=fedora/var, therefore it's not activated by the time initramfs-fedup expects it to be present, mounted, and the upgrade files available. If normal boot is runlevel 5, startup proceeds all the way to gnome-shell with no indication the upgrade failed or why.

Version-Release number of selected component (if applicable):
fedup-dracut-0.8.0-2 most likely created the initramfs-fedup
fedup-0.8.0-3.fc19

How reproducible:
Always. Possibly also if /var is on its own partition, if encrypted, since there isn't an rd.luks= grub.cfg entry in that case either.

Steps to Reproduce:
1.Install Fedora 19 to a clean disk, /boot on ext, and LVs for root, var and swap.
2.Reboot from install - works fine.
3.Install fedup-0.8.0-3 

Actual results:
Does not update the system to F20, no reboot or messages of failure or error.

Expected results:
Not this.

Additional info:

Comment 1 Chris Murphy 2013-12-19 18:56:09 UTC

Created attachment 839204 [details]
fstab

Comment 2 Chris Murphy 2013-12-19 18:56:23 UTC

Created attachment 839205 [details]
grub.cfg

Comment 3 Chris Murphy 2013-12-19 19:04:30 UTC

Created attachment 839209 [details]
journalctl

systemd.log_level=debug

Comment 4 Chris Murphy 2013-12-19 19:06:41 UTC

Created attachment 839210 [details]
fedupdebug.log

Comment 5 Chris Murphy 2013-12-19 19:13:23 UTC

Ha OK I'm wrong somehow. If /var is on a separate partition, it's also not mounted in time for some reason.

Comment 6 Chris Murphy 2013-12-19 19:33:29 UTC

Created attachment 839224 [details]
fstab, var on ext4 partition

Comment 7 Chris Murphy 2013-12-19 19:33:45 UTC

Created attachment 839225 [details]
grub.cfg, var on ext4 partition

Comment 8 Chris Murphy 2013-12-19 19:34:00 UTC

Created attachment 839226 [details]
fedupdebug.log, var on ext4 partition

Comment 9 Chris Murphy 2013-12-19 19:34:13 UTC

Created attachment 839227 [details]
journal.log, var on ext4 partition

Comment 10 Chris Murphy 2013-12-19 19:58:29 UTC

This exists in normal boot but not fedup boot:
systemd[1]: Installed new job var.mount/start as 82

As it turns out, not even boot is being mounted.

Comment 11 Panos Kavalagios 2013-12-21 09:25:49 UTC

Any workaround for non-LVM configurations? I have /var on a separate old school regular partition:

/dev/sda6        6061632  2518008   3212668  44% /var

and it starts my current system and not the upgrade process. Does this issue only affect --network or also --iso fedup options?

Comment 12 Enzo Marinari 2013-12-21 10:28:11 UTC

Same here. --network 20 fedup update and a standard /var
/dev/sda8        4908540   4010532    625628  87% /var
Starting System Update falls back in fedora 19.

Comment 13 Steven Hardy 2013-12-21 11:19:41 UTC

Confirmed, I see the same, /var is a separate logical volume on my system, fedup fails after rebooting and I end up back in the F19 desktop, only some stuff seems to be broken, notably wireless networks are not visible in the NetworkManager applet (xfce).

Comment 14 Scott Shambarger 2013-12-21 19:33:35 UTC

Adding the rd.lvm.lv entry to grub.cfg (vg_raid/lv_var in my case), wasn't enough for the upgrade to start correctly.

Copying everything from /var mount to / mount works...(thankfully can resize on lvm :) so that was definitely the only thing different and breaking upgrade.

Still, getting this fixed for fedup to F21 would be very useful, as I have specific mount options on /var and need it on a unique mount.

Comment 15 Philip Prindeville 2013-12-21 21:08:07 UTC

(In reply to Scott Shambarger from comment #14)
> Adding the rd.lvm.lv entry to grub.cfg (vg_raid/lv_var in my case), wasn't
> enough for the upgrade to start correctly.
> 
> Copying everything from /var mount to / mount works...(thankfully can resize
> on lvm :) so that was definitely the only thing different and breaking
> upgrade.
> 
> Still, getting this fixed for fedup to F21 would be very useful, as I have
> specific mount options on /var and need it on a unique mount.

Agreed.  I have several severals that have separate (and large) /var filesystems:

/var/svn for an SCM server
/var/imap and /var/spool for a mail server
/var/lib/mock for a build server

and I can't be the only one with this or similar configurations. Hopefully if there is a fix, it will be released in the current fedup so that F19 to F20 updates (for anyone who hasn't done that yet) can benefit as well.

Comment 16 Adam Williamson 2013-12-21 22:08:37 UTC

/me holds up sign saying "THIS IS NOT OFFICIAL ADVICE", but upgrading from 19 to 20 using yum should work fine. 19>20 isn't an upgrade which requires any special handling. If you really want to get up to F20 and you don't want to wait for a fix for this, just shutting down services and doing a yum upgrade per https://fedoraproject.org/wiki/Upgrading_Fedora_using_yum should probably work fine.

Comment 17 Jean-Philippe-Prade 2013-12-21 23:40:28 UTC

I have the same issue I fed up does nothing and just start f19...

/dev/sdb2           100660656  17228608   78295664  19% /var

This is not the first time there is this issue maybe the last time it was with preupgrade. My original install is a fc13 I am upgrading it since then and I am sure it happened in the past.


please fix it quickly it shouldn't be too hard ?

Comment 18 Frank Crawford 2013-12-22 04:17:36 UTC

I presume when you say add rd.lvm.lv=fedora/var assumes that the VG is named fedora?

Anyway, I've hit this bug as well, and adding an rd.lvm.lv option makes no difference, with the correct VG name.

Sorry to say, that while fedup is a step forward, the continual assumption that you are only upgrading trivial systems keeps breaking it.  It needs to be tested on reasonably complex systems, since that is what people are more likely to want to run it on.

If you do have something to test, I'm willing to give it a go.

Comment 19 galens 2013-12-22 04:47:47 UTC

The first time I tried setting rd.lvm.lv, I literally set it to "fedora/var".  Which mounted _something_, but failed to work in any reasonable way.

I've since tried with setting it to my actual vg/lv, and that has no change in behavior.  I boot up to level 5, with some oddities (no network, a root shell on VT2).

(With the oddly-set fd.lvm.lv, a lovely kicker is that due to how the booted ramdisk is configured (1) the dracut-status.sh file has permissions 660 (or similar; whichever it is, no x bits are set) and the trivial shell doesn't have a chmod command.  Nor does the error log get saved to anything other than the ramdisk.)

Anyone have a quick link to fedup's test plan?  I'd like to give some feedback.

Comment 20 Adam Williamson 2013-12-22 07:15:27 UTC

There is no such 'continual assumption'; there is only the practicality of writing software. You write it to cover the most simple common cases first.

We have never undertaken to support all upgrades of all possible disk layouts and hardware/software configurations, to do so is impossible. We cover what we can with the resources that are available.

Comment 21 Adam Williamson 2013-12-22 07:21:29 UTC

galens: we test upgrades of a stock minimal and stock desktop install of the previous Fedora release, with and without encryption. That's it. That's all we've ever undertaken to test or 'guarantee' works. We don't have the time for any more extensive testing, and when it comes to upgrades, further testing is kind of just a drop in the ocean anyway; we get better bang for buck in other areas.

Comment 22 Enzo Marinari 2013-12-22 08:37:49 UTC

yum update worked for me very smoothly (after fedup failing because of /var). I am up and running apparently without any distress.

Comment 23 Frank Crawford 2013-12-22 08:48:57 UTC

Adam, I don't want to knock your efforts, because fedup is a massive improvement over previous upgrade options, and I don't really expect you to test every possible configuration, because it is an impossible task, but fairly standard configurations such as a /var partition should be fairly simple to add.

As a method to see what configurations cause problems, why not review the past bug reports and see what stands out.

I'd also say that extensive testing should not be seen as something you don't get any "bang for (your) buck", since if done correctly you don't have to waste time going back over work you had thought you had finished with, and in terms support for end users you get far more positive responses and less wasted time for them, if it works as expected.

Comment 24 Steven Hardy 2013-12-22 09:48:42 UTC

(In reply to Adam Williamson from comment #21)
> galens: we test upgrades of a stock minimal and stock desktop install of the
> previous Fedora release, with and without encryption. That's it. That's all
> we've ever undertaken to test or 'guarantee' works. We don't have the time
> for any more extensive testing, and when it comes to upgrades, further
> testing is kind of just a drop in the ocean anyway; we get better bang for
> buck in other areas.

Sounds like we need some sort of automated testing which proves a number of easily predictable, or known-to-have-broken scenarios work.

FYI, I also found yum update worked perfectly.

FWIW, every enterprise Linux build I've ever seen has /var on a separate partition or logical volume (it's specified in the CIS guidelines), so I'd argue this is a pretty common case.

Comment 25 galens 2013-12-22 17:27:19 UTC

Having /var on its own partition is, IIRC, a recommended setup.

I'll ask the more basic question, then: Does fedup have a test script?

(Seriously? More bang for the buck elsewhere? Where? Is testing fedup preventing work from being done on the kernel, systemd, yum, rpm, or syslog? fedup is a program you expect (1) every single installation to run, and (2) has a huge potential to break each and every one of those installations. I've, personally, lost more than 10 hours at the keyboard and 3 nights of attempted automated upgrades, and I have a trivial installation. Admittedly, I do software dev, not system management or IT, but at my billable rate, that would be a charge well into 4 digits for what should have been one command line.

If your test harness can't spin up a few dozen VMs and do automated OS installs followed by automated upgrades on each, or your test script doesn't require doing so, your QA process is deficient. I would suggest making sure you have at least coverage of:
single partition and every root directory on its own partition
Console only/headless boxes, GNOME, and KDE desktops
FreeIPA authentication/authorization and standard passwd/shadow
Installs with LAMP/LAPP, OpenOffice/LibreOffice, and/or Eclipse
selinux in disabled, permissive and enforcing; using targeted and mls.
base install, latest (non-testing), and latest-testing versions of each supported Fedora version.
Networked and non-networked upgrades.

You don't need full coverage of each possible variation, and you don't need to run the full harness on every minor change, but at least once for every release, and using about 2 dozen VMs should get you good breadth of coverage.)

Comment 26 Philip Prindeville 2013-12-22 18:45:34 UTC

(In reply to Frank Crawford from comment #23)
> Adam, I don't want to knock your efforts, because fedup is a massive
> improvement over previous upgrade options, and I don't really expect you to
> test every possible configuration, because it is an impossible task, but
> fairly standard configurations such as a /var partition should be fairly
> simple to add.

I concur with Frank. While there are an unlimited number of configurations possible, fedup obviously leverages RPM and RPM uses the /var/lib/rpm directory.

It's not an unreasonable question to ask, "what happens if that directory doesn't live on the root partition?"

Comment 27 Adam Williamson 2013-12-22 18:57:05 UTC

Frank: "but fairly standard configurations such as a /var partition should be fairly simple to add"

of course they're simple! Here's other things that would be simple to add!

1: All of this:

https://fedoraproject.org/wiki/User:Adamwill/Draft_storage_matrix

2: Per-package test cases for all 650+ critical path packages: https://admin.fedoraproject.org/pkgdb/lists/critpath?tg_format=plain&collctn_list=devel

3: Tests for Bluetooth, printer configuration, file sharing, the screen lock, VPN, wireless and IPv6 testing, and other basic functions none of which we currently have time to test fully

4: Upgrade tests for nested LVM encryption (those guys got their bug report in before you did, I'm afraid, they win), btrfs, UEFI, all desktops...

5: Testing features of previous releases to make sure they still work

6: Hell, testing most of the 'features' of the CURRENT release

7: Tests for web server functionality, or FreeIPA server, or mail server, or really anything at all you might actually want to do with a Fedora box besides boot it up and click on apps (and, of course, tests of *upgrading* all those things! It's a force multiplier!)

I'm not trying to be mean, I'm just trying to illustrate the problem we're dealing with. There is not a magical huge test team behind the curtain, there's <10 RH staff and some very hard-working volunteers trying to test an entire operating system plus huge collection of software which is released every six months in radically altered form, with freeze periods of about two weeks. "Thinking of more stuff we could be testing" is not, historically, something that has given us trouble. The thing that gives us trouble is deciding which precise subset of the entire galaxy of things we _could_ be testing are the most important for us to _actually_ test in the time we have available.

"Seriously? More bang for the buck elsewhere? Where?"

One place? Installation storage testing. See matrix linked above.

You also have to consider development resources. There is one guy working on fedup, and he doesn't work on it anywhere close to full time. There isn't much point us filing 30 bugs in 20 different fedup workflows if he only has time to fix one per release. This release we fixed non-US keyboard layouts for encrypted storage devices, and GPG checking of upgrade packages. That's about as much as you can expect to get fixed every release.

"If your test harness can't spin up a few dozen VMs and do automated OS installs followed by automated upgrades on each, or your test script doesn't require doing so, your QA process is deficient. I would suggest making sure you have at least coverage of:"

Wow, that's real nice. That's a great way to make people feel positive towards your complaints, y'know.

No, our test harness can't do that, and no, fedup does not have a test script. Want to know why? Read up, to the bit about how many people we have working on Fedora QA. We've been writing an automated test system for the last four years (yes, we're not a bunch of complete f**king incompetents, hard to believe I know) but it's not exactly an easy project. Let's see:

i) we don't have anywhere near enough people to manually validate the releases we are committed to pushing out every six months, *including* the people who are supposed to be writing all our tools, who don't do that most of the time, but work on doing what we can to test the release that's inevitably going out within the next six weeks just about all the time (Alpha, Beta or Final)

ii) it's Fedora, so everything changes on you all the f**king time. When we started writing AutoQA fedup didn't even exist, the installer had an entirely different interface...and about a zillion other things were different compared to how they work today. It doesn't make this any easier. We had a test in AutoQA for a while which could successfully do an automated minimal install of Rawhide and report the result. That took a few months to get working and promptly broke about a week later because _people goddamn well change stuff all the time_. (Funny story: we asked FESCo for a development delay of F21 so we could work on taskotron (the current name of autoqa's replacement) and other tools. We spent the time to put together https://fedoraproject.org/wiki/User:Tflink/f21_delay_taskotron_development_proposal . They gave us a month.)

Look, we know there is a lot more stuff that could be tested. Believe me, I have a wishlist. We know automated testing is a Good Thing, this is why I've been trying to get it working viably for the last half a damn decade. We do not need instruction on these things. If you want to come and help, rather than issue instructions from teh sidelines, that would be appreciated.

https://fedoraproject.org/wiki/User:Tflink/taskotron_development_plan

Comment 28 Andrew Meredith 2013-12-22 20:52:09 UTC

I also have this issue. All my server class machines of more than a year or so vintage have a separate /var /usr & /tmp volumes using LVM. I added the recommended rd.lvm.lv entry for /var and retried the boot/install run. It failed in the same way that it did before. As I also have a separate /usr I added another boot argument for that as well. Still no dice. On inspection of the volumes enabled the 2 that I had specifically mentioned in the boot line were enabled and the rest were listed as disabled in lvscan. Also to note that they were not actually mounted, just enabled in the VG.

I fully understand the issue of being told to build the moon on a stick in 15 minutes with a budget of two plastic buttons. I am also happy to try stuff where I can.

Good luck.

Comment 29 Chris Murphy 2013-12-23 02:20:03 UTC

OK so my next speculation is that var simply isn't being mounted in time, which is why it fails even when the LV is activated soon enough. The rd.lvm.lv= only activates that LV, it doesn't mount it. I don't know if it not mounting is a problem in the special fedup initramfs or the upgrade mode of systemd.

Until there's a fix, if anyone is inclined to find a work around, it might be rd.break=mount to drop to a dracut shell and manually mounting /var will work. There are other values for rd.break= it might be one of those. Type exit to resume boot if you succeed at manually mounting /var. More info here: https://fedoraproject.org/wiki/How_to_debug_Dracut_problems

Comment 30 Adam Williamson 2013-12-23 02:21:40 UTC

I put the rd.lvm.lv note in commonbugs because someone on this bug report said it worked. If it doesn't, I'll take it out again. Despite working more or less non-stop since F20 came out I still haven't had time to actually _reproduce_ this issue myself; I'm not kidding when I say we have a lot of stuff to do.

Here's another possible workaround: pass fedup '--packagedir' and '--cachedir' parameters that point to somewhere on the real root partition. These (not yet documented) parameters basically tell fedup where to put the stuff that it puts, by default, in /var/lib/system-upgrade and /var/tmp/system-upgrade .

Comment 31 Adam Williamson 2013-12-23 02:23:42 UTC

eh, I'm not 100% sure if --packagedir and --cachedir actually exist yet. but I can play with it.

Comment 32 Adam Williamson 2013-12-23 03:25:46 UTC

ah, nope, they don't. but, still might be possible to do something broadly similar. hold on.

Comment 33 Adam Williamson 2013-12-23 03:40:40 UTC

So, it's not particularly pretty, but it seems to work here: try editing /usr/lib/python2.7/site-packages/fedup/__init__.py and changing the lines that set 'cachedir' and 'packagedir' to some location on the root partition. I created /share/lib/system-upgrade and /share/tmp/system-upgrade , and used those. With that change, I could successfully upgrade a system with a separate /var partition.

Comment 34 Chris Murphy 2013-12-24 00:55:11 UTC

(In reply to Adam Williamson from comment #33)
This work around worked for me with minimal package set test.

The one modification I made was I used cp -a /var/lib/system-upgrade/* /share/lib/system-upgrade/ and cp -a /var/tmp/system-upgrade/* /share/tmp/system-upgrade/. Then I simply reran fedup with the same options as before, it finds the rpms in the new location so they don't need to be redownloaded.

The /share locations rpms are not cleaned up after upgrade however. Looks like the /var locations are cleaned up.

Comment 35 Adam Williamson 2013-12-24 01:05:41 UTC

yeah, I saw that too, I think the cleanup stuff is just hardcoded and doesn't check the values or something.

Comment 36 smkr 2013-12-25 09:29:50 UTC

(In reply to Adam Williamson from comment #33)
> So, it's not particularly pretty, but it seems to work here: try editing
> /usr/lib/python2.7/site-packages/fedup/__init__.py and changing the lines
> that set 'cachedir' and 'packagedir' to some location on the root partition.
> I created /share/lib/system-upgrade and /share/tmp/system-upgrade , and used
> those. With that change, I could successfully upgrade a system with a
> separate /var partition.

Thanks Adam, that worked for me.

Comment 37 smkr 2013-12-27 02:51:25 UTC

My 2cents:
    
Further to Adam's workaround in comment #33 and to keep things simple I'd vote 
for defaulting 'cachedir' and 'packagedir' to a directory on '/' such as:

   cachedir = '/fedup/tmp/fedora-upgrade'
   packagedir = '/fedup/lib/fedora-upgrade'

then 'fedup' would only need to:

   1. check that '/' has enough disk space
   2. 'mkdir -p /fedup/{tmp,lib}'
   3. 'rm -rf /fedup' upon successful completion

Comment 38 Arif Saleem 2013-12-30 15:00:30 UTC

Hi
I had an F19 machine with a separate /var, standard partition. fedup 0.8 would fail in the second stage, after reboot. It would hang saying /var wasn't accessible.
So I changed /usr/lib/python2.7/site-packages/fedup/__init__.py as mentioned above, and set the 'cachedir' and 'packagedir' to /var2 instead, so that they would be on the root partition. I then re-ran fedup --network 20, rebooted and the the upgrade started this time (after a warning about /var again).
The upgrade finished, and the machine restarted, but it looks like the RPM database which was in /var has not been updated - it still has all the old F19 data. So the actual files are F20, but the rpm database is F19 :(
I have started a clean install now as this is a bit of a mess, but just thought I would warn others that this might happen. Not sure how smkr above had a successful upgrade?

Comment 39 Andrew Meredith 2013-12-30 16:31:19 UTC

(In reply to Arif Saleem from comment #38)
> The upgrade finished, and the machine restarted, but it looks like the RPM
> database which was in /var has not been updated - it still has all the old
> F19 data. So the actual files are F20, but the rpm database is F19 :(

I suspect that if you had booted in rescue mode and had a look at the contents of the /var directory under root (ie under the real /var mount) you would find an updated rpm database.

I bit the bullet and merged the /var root and /usr volumes into one. The upgrades then ran perfectly.

Comment 40 Adam Williamson 2013-12-30 17:24:08 UTC

I did check when I tested the 'use a different directory' workaround, and I didn't have any problem with the RPM database or anything else on /var. And indeed the /var volume was mounted during the second-stage upgrade process, when I looked. I suppose it might behave differently if you're using partitions rather than LVM...

Comment 41 Chris Murphy 2013-12-30 18:04:08 UTC

I did my comment 34 test on partitions not LVM. My setup was minimal package set, and very clean. I only experienced delayed mounting of /var, never a failure to mount /var. 

Comment 38 is unclear whether this is a fully persistent failure to mount /var, or if it's a long delayed mounting of /var that resulted in partial updating. We'd need to see logs reproducing this, probably with both rd.debug and systemd.log_level=debug set as boot parameters.

Both long delayed /var mount that results in partial update of /var, or the failure to mount /var that results in no update of /var seem rare. But might it be safer to suggest F19 to F20 upgrades use yum?
https://fedoraproject.org/wiki/Upgrading_Fedora_using_yum?rd=YumUpgradeFaq#Fedora_19_-.3E_Fedora_20

Comment 42 smkr 2013-12-30 19:21:27 UTC

(In reply to Arif Saleem from comment #38)
> The upgrade finished, and the machine restarted, but it looks like the RPM
> database which was in /var has not been updated - it still has all the old
> F19 data. So the actual files are F20, but the rpm database is F19 :(
> I have started a clean install now as this is a bit of a mess, but just
> thought I would warn others that this might happen. Not sure how smkr above
> had a successful upgrade?

My upgrade path was F18 to F19.

Comment 43 David Mansfield 2014-01-21 14:20:22 UTC

IMHO one of the advantages of 'fedup' is that it can be updated after the ISO release date. 

For this bug, I think it's irresponsible to not have released an update to fedup that would at least die with an error message "/var/tmp cannot be on a separate partition".

Even debugging the reason it's failing is difficult because the original fedup 0.7 vs 0.8 snafu clogs up the google search results, and the failure scenario is that it boots right up to GDM.

I've been using red hat since rhl 4.2 (> 15 years) and in that time only a few releases have been borked this badly.

Comment 44 Adam Williamson 2014-01-21 15:16:36 UTC

"IMHO one of the advantages of 'fedup' is that it can be updated after the ISO release date."

We can only update part of fedup, not *all* of it. This bug isn't easily resolvable by updating the part in the 'fedup' package, AIUI (though it may be doable somehow).

"I've been using red hat since rhl 4.2 (> 15 years) and in that time only a few releases have been borked this badly."

Uh. You're saying there've been no worse bugs in the upgrade mechanism in 15 years? Really? I can think of ten without even trying. This doesn't eat anyone's data or break system boot or really do anything other than...not work (the upgrade attempt fails, but your system is perfectly fine otherwise).

Comment 45 Will Woods 2014-01-21 21:05:59 UTC

The fix turns out to be really easy:

diff --git a/systemd/system-upgrade-generator.in b/systemd/system-upgrade-genera
index c6d1275..1da30f3 100644
--- a/systemd/system-upgrade-generator.in
+++ b/systemd/system-upgrade-generator.in
@@ -5,7 +5,7 @@
 UPGRADE_UNIT_PATH='@UNITDIR@'
 
 ready_for_upgrade() {
-    [ -L /system-upgrade -a -d /system-upgrade ] && \
+    [ -L /system-upgrade ] && \
     [ -d /system-upgrade-root ] &&
     [ -L /run/system-upgrade -o -f /run/initramfs/upgrade.conf ]
 }

Fix pushed to upstream git: https://github.com/wgwoods/fedup/commit/a595375

There'll be an update soon(ish), but in the meantime, here's how you can apply the fix manually:

  sudo sed -i 's# -a -d /system-upgrade##' \
    /lib/systemd/system-generators/system-upgrade-generator

Comment 46 Adam Williamson 2014-01-21 21:32:23 UTC

aha, nice. I did look at it, but didn't spot that :(

Comment 47 Adam Williamson 2014-01-21 22:39:28 UTC

Common bugs note updated.

Comment 48 Jose-Marcio MC 2014-01-22 09:40:45 UTC

Well, I've done the modif above (Comment 45) to upgrade from 18 to 19 on a test machine.

It works fine, except that after that I tried to update it with yum before upgrade from 19 to 20 and yum says :

[root@cc-vm-2-2-009 ~]# yum update
Loaded plugins: langpacks, refresh-packagekit
fedora                                                  | 4.2 kB  00:00:00     
updates                                                 | 4.7 kB  00:00:00     
(1/4): fedora/19/x86_64/group_gz                        | 384 kB  00:00:00     
(2/4): updates/19/x86_64/group_gz                       | 394 kB  00:00:00     
(3/4): updates/19/x86_64/primary_db                     |  10 MB  00:00:02     
(4/4): fedora/19/x86_64/primary_db                      |  17 MB  00:00:03     
(1/2): updates/19/x86_64/pkgtags                        | 777 kB  00:00:00     
(2/2): updates/19/x86_64/updateinfo                     | 999 kB  00:00:00     
Error: Package tuple ('preupgrade', 'noarch', '0', '1.1.11', '2.fc18') could not be found in rpmdb
[root@cc-vm-2-2-009 ~]# rpmdb --rebuilddb
[root@cc-vm-2-2-009 ~]# yum update
Loaded plugins: langpacks, refresh-packagekit
Error: Package tuple ('preupgrade', 'noarch', '0', '1.1.11', '2.fc18') could not be found in rpmdb
[root@cc-vm-2-2-009 ~]# 


[root@cc-vm-2-2-009 ~]# yum update
Loaded plugins: langpacks, refresh-packagekit
Error: Package tuple ('preupgrade', 'noarch', '0', '1.1.11', '2.fc18') could not be found in rpmdb
[root@cc-vm-2-2-009 ~]#

Comment 49 Jose-Marcio MC 2014-01-22 10:19:14 UTC

Meantime update of (Comment 48)

Well I tried some manips with rpmdb --init, --rebuild, ... without success.

Finally, I've installed a package which wasn't yet in the system (munin-node, to be exact). 

And after that, the upgrade from fc19 to fc20 worked like a charm (sure, I've patched again the "system-upgrade-generator" file (Comment 45).

I didn't understood what really happened. I shall again recreate a virtual test machine and redo everything, if necessary.

Comment 50 Panos Kavalagios 2014-02-20 08:16:58 UTC

May I ask about the progress. This is a blocking issue and there is no activity for a long time. All my F19 boxes use separate /var and I still cannot upgrade to F20. I am aware of the workarounds, but I would prefer to test the official fedup fix as well.

Comment 51 Adam Williamson 2014-02-20 08:19:42 UTC

Panos: if you just do what's suggested in c#45, that basically *is* the official fix. Will might've forgotten about releasing an update, though, I'll poke him tomorrow.

Comment 52 Jose-Marcio MC 2014-02-20 09:17:46 UTC

I confirm... I've upgrades more than 20 machines this way (c#45), and it worked as a charm...
The problem mentioned on c#48 was unrelated to this.

Comment 53 Will Woods 2014-02-20 16:55:20 UTC

Yes, the patch in upstream fedup for this issue is exactly equivalent to the "sudo sed ..." bit from comment #45. See the patch here:

  https://github.com/wgwoods/fedup/commit/a595375

I haven't forgotten about the update - I was trying to get fixes for a couple of other problems before pushing an update, and I got waylaid by some Unnamed Large Project.

If you want to test current git master, it's real easy:

  git clone git://github.com/wgwoods/fedup.git
  cd fedup
  make && sudo make install

Otherwise, I'll push an update as soon as I have some time. Next week, maybe.

Comment 54 Raman Gupta 2014-02-22 06:38:15 UTC

I ran into this issue while upgrading FC19 to FC20. After the suggested fix in Comment #45, the upgrade worked. But, I did get an error during the upgrade related to /var, and so I post this just in case it is relevant.

-----------------------------------------
umount: /var: target is busy
...
failed to move /var
-----------------------------------------

Screenshot with full log here:

http://imgur.com/epFdCLc

Comment 55 Stephen Satchell 2014-02-24 21:48:40 UTC

I just tried to upgrade from Fedora 17 to Fedora 20 using the information here.  When I tried to use the patch method, I didn't find the file /lib/systemd/system-generators/system-upgrade-generator -- it's just not there.  So I tried the second workaround (changing the __init__.py method) and it laid an egg as well.  In both cases, the system came back to the F17 screen, so I didn't lose anything.  I'm about to punt, save off all the data, and do a fresh install (who knows what shape the RPM database is in).  Any words of wisdom?

My update from 17 to 20 went flawlessly where /var was in the root...

Comment 56 Philip Prindeville 2014-02-24 21:55:19 UTC

I tried to upgrade from 19 to 20 using the info in Comment #45.

Fedup says it completed without any errors, but there's no 'upgrade' entry in /boot/grub2/grub.cfg so something stops the script from updating that file.

How do I go about figuring out what?

This might be related or not... I had run a failed fedup once before the fix in Comment #45 was posted.

Then I applied the patch, and ran it again and it failed with a dependency issue for xl2tpd... which I subsequently removed, and ran fedup a 3rd time.

It claims to have succeeded this third time, but I'm wondering if it's not confused about state from prior runs.

How do I erase prior state and run it completely afresh?

Comment 57 Will Woods 2014-02-24 22:01:52 UTC

(In reply to Stephen Satchell from comment #55)
> I just tried to upgrade from Fedora 17 to Fedora 20 using the information
> here.

None of this applies to F17. You likely have a totally different problem.

(In reply to Philip Prindeville from comment #56)
> I tried to upgrade from 19 to 20 using the info in Comment #45.
> 
> Fedup says it completed without any errors, but there's no 'upgrade' entry
> in /boot/grub2/grub.cfg so something stops the script from updating that
> file.

Probably you have a grub config that grubby doesn't understand. This, too, is a totally different problem.

Please, file different bugs for different problems. And please attach your fedup.log.

(In reply to Philip Prindeville from comment #56)
> It claims to have succeeded this third time, but I'm wondering if it's not
> confused about state from prior runs.
> 
> How do I erase prior state and run it completely afresh?

If you mean "erase downloaded data etc.", `fedup --clean` will do that.

If you mean "revert to the previous system"... you restore from a backup. But probably all you really need is `yum distro-sync`.

Comment 58 Jose-Marcio MC 2014-02-24 22:03:12 UTC

Maybe fedora maintainers would have another suggestion.
Maybe you could first upgrade from 17 to 18, before upgrading to fedora 20.

Comment 59 Philip Prindeville 2014-02-24 22:21:23 UTC

(In reply to Will Woods from comment #57)

> (In reply to Philip Prindeville from comment #56)
> > I tried to upgrade from 19 to 20 using the info in Comment #45.
> > 
> > Fedup says it completed without any errors, but there's no 'upgrade' entry
> > in /boot/grub2/grub.cfg so something stops the script from updating that
> > file.
> 
> Probably you have a grub config that grubby doesn't understand. This, too,
> is a totally different problem.
> 
> Please, file different bugs for different problems. And please attach your
> fedup.log.


Will do. Is there a simple way to 'fake' an upgrade entry with a cut&paste of an existing one?

Comment 60 Philip Prindeville 2014-02-24 22:23:51 UTC

I'll also note that my /lib/systemd/system/default.target symlink doesn't seem to have been updated either, but there's no mention of that in the fedup.log either.

Comment 61 Philip Prindeville 2014-02-24 22:30:59 UTC

(In reply to Will Woods from comment #57)
> Probably you have a grub config that grubby doesn't understand. This, too,
> is a totally different problem.
> 
> Please, file different bugs for different problems. And please attach your
> fedup.log.

Looks like someone already did as bug #902498 and it got punted.

Comment 62 Philip Prindeville 2014-02-24 22:56:37 UTC

Does system-upgrade-generator get run during the initial fedup, or during the 'System Upgrade' boot phase?

I'm trying to tell if I'm seeing a further manifestation of a bug or if this isn't supposed to happen until later...

Comment 63 Will Woods 2014-02-25 14:38:45 UTC

system-upgrade-generator runs during bootup, like other systemd generators.

Comment 64 Fedora Update System 2014-02-28 22:22:06 UTC

fedup-0.8.0-4.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/fedup-0.8.0-4.fc19

Comment 65 Fedora Update System 2014-02-28 22:22:26 UTC

fedup-0.8.0-4.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/fedup-0.8.0-4.fc20

Comment 66 Fedora Update System 2014-03-01 14:06:26 UTC

Package fedup-0.8.0-4.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing fedup-0.8.0-4.fc20'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2014-3270/fedup-0.8.0-4.fc20
then log in and leave karma (feedback).

Comment 67 Stephen Satchell 2014-03-16 15:30:29 UTC

This is in answer to the people who think the QA procedure needs revamping:

From http://docs.fedoraproject.org/en-US/Fedora/19/html/Installation_Guide/s2-diskpartrecommend-x86.html

"Many systems have more partitions than the minimum listed above. Choose partitions based on your particular system needs. Refer to Section 9.14.5.1.1, 'Advice on Partitions' for more information.

"If you create many partitions instead of one large / partition, upgrades become easier.

"The following table summarizes minimum partition sizes for the partitions containing the listed directories. You do not have to make a separate partition for each of these directories. For instance, if the partition containing /foo must be at least 500 MB, and you do not make a separate /foo partition, then the / (root) partition must be at least 500 MB.

Table 9.3. Minimum partition sizes
Directory 	Minimum size
/ 	250 MB
/usr 	250 MB, but avoid placing this on a separate partition
/tmp 	50 MB
/var 	384 MB
/home 	100 MB
/boot 	250 MB 

(end quote)

So any future QA of fedup should include something like this recommended disk structure from the Red Hat Fedora documentation.

Me?  I'm in the process of building a new 2 TB drive with Fedora 20, and coping my old data over.  With the /var structure in the root partition, instead of in a separate partition (or separate hard drive).  Once burned, twice shy.

Why rebuild instead of using the test code?  In my attempts to use the fedup method, the system ended up not being able to boot into anything other than single-user mode.  Good thing I had my data backed up...

Comment 68 Stephen Satchell 2014-03-16 15:46:58 UTC

FYI, my rebuild partitioning on a 2-TB drive; (p) is for primary partitions, the rest in a single LVM partition:

(p) /boot 500
(p) swap 4096
/ 32768
/big 65536
/tmp 1024
/home 16384
/Dropbox 409600

This leave plenty of room for growth.  Note particularly that /usr and /var are both in the / (root) partition.  I would prefer having /var be in its own partition, but it is not to be.  Where necessary, I'll move large files to /big and use symlinks from /var.

Comment 69 Adam Williamson 2014-03-16 16:36:02 UTC

I'd rather revise that documentation. There is no sense in which creating a separate /var partition makes upgrades easier; that's an incorrect assertion. Separating /home makes the 'ghetto upgrade' trick easier (that is, possible), but also does nothing for 'proper' upgrades.

Adding documentation keyword. Docs team, can you advise on the provenance of the above table? It looks rather old. Could we please revise it?

Comment 70 Chris Murphy 2014-03-16 17:52:16 UTC

(In reply to Adam Williamson from comment #69)
I agree it needs to be revised, in particular why would the documented "Recommended partition scheme" be any different than the installer's default layout?

Comment 71 Chris Murphy 2014-03-16 19:49:31 UTC

(In reply to Chris Murphy from comment #70)
Filed Bug 1076963.

Comment 72 Paul DeStefano 2014-03-22 20:56:08 UTC

Okay, so it's not clear exactly what is going to happen with FedUp as a result of this bug.  It almost sounds like, even though this bug is fixed now, /var partitions will not be supported in the future.  That can't be right, right?

Comment 73 Adam Williamson 2014-03-22 21:15:31 UTC

That's not what we're talking about, no. We're talking about not having the documentation recommend them as 'making upgrades easier', since they don't.

Comment 74 Paul DeStefano 2014-03-24 07:02:45 UTC

Okay, great.

I'm very grateful for this fix!  Thank you.  Gave karma, but I think someone just pushed it.

Comment 75 Fedora Update System 2014-04-24 07:42:29 UTC

fedup-0.8.0-4.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 76 Fedora Update System 2014-06-04 16:49:08 UTC

fedup-0.8.1-1.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/fedup-0.8.1-1.fc20

Comment 77 Lloyd Kvam 2014-07-07 11:58:33 UTC

fedup-0.8.0-4.fc19.noarch was still unable to mount my separate /var partition when rebooting to apply the upgrade.  The system has a RAID setup, not LVM.

I'm NOT expecting any kind of fix for 19 => 20, but thought the issue was worth reporting.

Comment 78 Fedora Update System 2014-08-15 02:43:42 UTC

fedup-0.8.1-1.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 79 Herbert J. Skuhra 2014-11-25 10:46:48 UTC

I've just tried to upgrade from 20 to 21 using fedup 0.9.0-2.fc20 and upgrade-prep.sh still fails to unmount /var (target is busy). So obviously this problem still persists. :-(

Comment 80 Raman Gupta 2014-11-25 16:31:31 UTC

(In reply to Herbert J. Skuhra from comment #79)
> I've just tried to upgrade from 20 to 21 using fedup 0.9.0-2.fc20 and
> upgrade-prep.sh still fails to unmount /var (target is busy). So obviously
> this problem still persists. :-(

I got the same error during the 20->21 upgrade (I also got it during the 19->20 upgrade with the fix in this bug). However, in both cases the upgrade worked perfectly fine, and the error seems to be benign.

Comment 81 Adam Williamson 2014-11-25 18:19:59 UTC

please file a separate bug for that, it sounds like a separate (and less serious) issue. thanks.

Comment 82 Philip Prindeville 2015-02-19 05:59:59 UTC

(In reply to Adam Williamson (Red Hat) from comment #81)
> please file a separate bug for that, it sounds like a separate (and less
> serious) issue. thanks.

I did an upgrade, but it left my system with what seems to be a corrupt rpmdb.

yum history info fedup

yields:

http://www.redfish-solutions.com/misc/history.txt

and running Kevin's rpm-verify.sh script yielded:

http://paste.fedoraproject.org/187349/24315703

Note You need to log in before you can comment on or make changes to this bug.

ahsaleem
andrew
awilliam
bugzilla
charkins
enzo.marinari
fedora
frank
h.skuhra
jose.marcio.mc
jpcartal
ldelouw
mruwek
mwoehlke.floss
Panos.Kavalagios
philipp
prd-fedora
rocketraman
sergio.pasra
sergio
shardy
spamfilter
steve.mckuhr
tflink
t.h.amundsen
vchepkov
william
wwoods