Red Hat Bugzilla – Bug 1295213
dnf system-upgrade download --releasever=23 reboots but stalls on 'starting system upgrade'
Last modified: 2016-12-20 12:34:27 EST
Description of problem:
When attempting to upgrade from Fedora 21 to Fedora 23 using the dnf system-upgrade plugin, download of version 23 packages proceeds smoothly but system stalls at 'Starting system upgrade' upon reboot. (At least, I assume it should not still be starting the upgrade after 8+ hours.)
Version-Release number of selected component (if applicable):
Not currently accessible as system is in stalled, non-booted state. However, the system was a fully upgraded installation of Fedora 21 (KDE) and the new Fedora 23 packages were downloaded yesterday (2016-01-02).
Unknown. I've only tried this once and don't know how to recover to a working system. (I'm afraid to switch the machine off in case that's the wrong thing to do and I can't find any information about how I should proceed in this situation.)
Steps to Reproduce:
1. Ensure Fedora 21 is fully upgraded. System has been rebooted multiple times with active kernel with no issues. KDE spin. System is very close to a vanilla installation with mostly default settings. (I do the admin but the box isn't mine and it does not occur to the user to customise anything. Boot theme, shells, desktop environment etc. are all default for KDE spin.)
2. dnf system-upgrade download --releasever=23
3. dnf system-upgrade reboot
Steps 1 and 2 proceed smoothly. The expected message providing instructions about rebooting to activate the upgrade is received at the end of step 2. On initiating step 3, the machine reboots successfully and gets as far as 'Starting system upgrade. This may take a while.' This appears on both the graphical boot splash with the Fedora symbol and at the end of the scrolled console output. All visible messages prior to this have green 'OK's in the console output, although it is possible I missed something which scrolled off screen. Nothing further happens. It is about 14 hours since I initiated step 3 and the system is still 'starting' the system upgrade. It has been stalled for at least 8 hours (actually several more).
System upgrade completes as well as starting.
This is the second machine I've upgraded using this method. I've therefore tried to think about the differences between the two, but cannot come up with anything which might explain the failure in the second case.
Both use the KDE spin. The first proceeded relatively uneventfully. (I filed a bug but it was nothing like this problematic.) The main differences in configuration between the two are as follows.
1. The first, successful upgrade was on a much more customised installation involving more complications (e.g. LVM over LUKS) so, if anything, I expected the second upgrade to be the more straightforward.
2. The second machine boots in EFI mode. However, Fedora 21 generally coped with this better than BIOS mode booting. For example, grub.cfg always got automatically updated on the second machine when a new kernel was installed, whereas it never got automatically updated on the first. (Possibly because Fedora insisted on creating an EFI partition even for BIOS booting, but I'm not sure.) Again, this suggested that I'd be more likely to encounter issues upgrading the first machine than the second. In any case, the reboot goes fine. It is just the stuff which happens next which encounters problems.
3. The second machine uses WIFI whereas the first uses wired. Although WIFI is generally more complex, I can't see how this could explain the issues on reboot, as opposed to issues when downloading the packages or networking issues post-upgrade. Certainly nothing in the upgrade instructions suggested that a wired connection was required.
4. The second machine uses an SSD rather than the traditional spinning kind. However, it is hard to see why this would cause the upgrade to stall and Fedora 19 and 21 both lived quite happily upon precisely the same hardware, so I wouldn't expect this to suddenly be an issue for Fedora 23.
It seems that the root of the problem probably lies in a lack of disk space. This did not occur to me because when I upgraded the first machine, it told me I had too little space and ensured that I had sufficient space before I ever got to the point of triggering the upgrade by rebooting.
In the case of the second machine, not only did it fail to check for sufficient space prior to preparing the machine for upgrade on reboot, the putative upgrade failed silently with no useful error message or information at all.
May I suggest that dnf should ideally check for sufficient space prior to preparing the system for upgrade?
Failing that, the system-upgrade process itself should fail with a meaningful error in the case that it finds insufficient space to proceed on reboot.
Enough space is a pretty basic check and I'm very surprised that the software neither checked for this nor provided any information about the problem.
Or perhaps the process is supposed to check this and somehow that check gave a false 'OK' in this case? As I say, it did refuse to proceed due to lack of space on the first machine, so it seems odd that it behaved so badly on the second.
I'm not yet clear whether the upgrade will proceed correctly from here, but it has got beyond the point of merely starting the process, at least.
(In reply to reescf from comment #1)
> May I suggest that dnf should ideally check for sufficient space prior to
> preparing the system for upgrade?
It does - RPM does a disk space check as part of the test transaction.
> Or perhaps the process is supposed to check this and somehow that check gave
> a false 'OK' in this case? As I say, it did refuse to proceed due to lack of
> space on the first machine, so it seems odd that it behaved so badly on the
Correct, you got a false positive.
The disk space check is necessarily inaccurate because of RPM scriptlets; RPM can't predict how much data any given %pre or %post scriptlet will write to the disk, or how much temporary space they might use. So while it checks for sufficient disk space (and pads the check a bit to be sure), if you're close to the boundary you can still get false positives.
There's not a lot the plugin can do about that problem.
It *could* implement its own extra disk check, which pads the required disk space a little more than RPM does. This would lead to people who are near the borderline getting false *negatives* instead of false positives. That's arguably safer, but we'll definitely get bug reports from people who are *sure* they have enough disk space but fail the extra check.
So: it would also need a config option to control how much padding the extra diskspace check adds (and/or disable the check entirely). Probably it'd also need a CLI flag to control it, plus documentation.
I added https://github.com/rpm-software-management/dnf-plugin-system-upgrade/issues/48 to keep track of that idea. Pull Requests welcome!
Would it be possible to at least give some sort of useful error when it fails? Or at least an error, even if not a useless one?
It just stalled on 'Starting upgrade'. It never even printed the first installation/upgrade message. So 'got part way through and ran out of space' did not seem an obvious diagnosis. Had it appeared to install some packages and then got stuck, even that would have been a clue. But I got nothing after the 'starting' message.
Also, I'm not entirely convinced that it was even close to having enough space. Shouldn't it have shown some visible progress in that case? Why didn't it upgrade/install a bunch of stuff and only then stall? Why stall at the very start of the process?
I managed to upgrade using
# rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-fedora-23-$(uname -i)
# dnf upgrade
# dnf clean all
# dnf --releasever=23 --setopt=deltarpm=false distro-sync
from the https://fedoraproject.org/wiki/Upgrading_Fedora_using_package_manager?rd=Upgrading_Fedora_using_yum
This message is a reminder that Fedora 23 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 23. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora 'version'
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.
Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 23 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
Fedora 23 changed to end-of-life (EOL) status on 2016-12-20. Fedora 23 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.
If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
Thank you for reporting this bug and we are sorry it could not be fixed.