Bug 1398845

Summary: Fedora should provide an easy mechanism for failed upgrades to boot into existing system
Product: [Fedora] Fedora Reporter: Walter Francis <wally>
Component: dnf-plugin-system-upgradeAssignee: Will Woods <wwoods>
Status: CLOSED UPSTREAM QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 25CC: wwoods, zbyszek
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-28 20:49:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Walter Francis 2016-11-26 15:26:38 UTC
Description of problem:

Under the old system, grub had two entries, one for the old system, one for the upgrade.  If for some reason your upgrade failed, you could boot into the old system and get back online, possibly fix the issue, try again.  This is more difficult with the new system.

If a systemd upgrade fails, the user now must boot into the system somehow and remove /system-update and then they can boot into the normal system.

I think this could be improved if we had a similar system of grub entries created that allowed the user to directly boot into the system rather than having to remove a file from the filesystem.  It would allow less experienced users to self-help and provide a better experience, even in a failure state.

Version-Release number of selected component (if applicable):

As far as I know, this has been the state since we moved from preupgrade to dnf system upgrade.

How reproducible:

If the user has an issue upgrading, 100%.

Steps to Reproduce:

Not really applicable.

Actual results:

No way for users to easily avoid upgrading in the case of a failure.

Expected results:

Users can get their previous system back online and possibly fix the upgrade and continue on.

Additional info:

Only 2% of Earth population naturally has green eyes.

Comment 1 Zbigniew Jędrzejewski-Szmek 2016-11-28 20:49:27 UTC
In the old system, there was a separate kernel and initramfs image for the upgrade. In the new system, there is nothing like that. Simply during boot, we divert into an upgrade instead of starting the normal things. If the system is broken, it is broken for both the "normal boot", and the "upgrade boot".

If we get as far as the upgrade, the first thing that happens is the removal of the symlink which causes the upgrade to be attempted. As a backup, if the upgrade process fails, we also delete the symlink in an ExecStopPost action in the systemd unit. Basically, if anything goes wrong with the update, we don't try again.

So the only case in which a) the system is at all bootable, and b) upgrade is attempted more than once, would be if the whole system (always) crashed between the time that the service is started and the time the symlink is deleted. But that's rather unlikely, because the upgrade process is a normal dnf invocation, nothing special should be happening. Note that the upgrade process removes the symlink *before* it starts any work on the packages. If the system is so broken that it doesn't boot, not trying to upgrade is not going to help.

Hence, closing as not a bug, since the suggestion does not really make sense for the way that system-upgrade works.

Comment 2 Walter Francis 2016-11-28 21:26:51 UTC
While I am sure that's all technically accurate, the reason I logged a bug is that a user can get in a situation where they cannot boot the system without manually removing the symlink.  We had a user in #fedora who had a failure state updating 24->25 and after he was asked to try removing the symlink he was able to boot and use the system.

Perhaps the upgrade can fail to start, thus never removing the symlink, and therefore remaining in a failure state?

Comment 3 Zbigniew Jędrzejewski-Szmek 2016-11-30 17:35:43 UTC
In the meanwhile, we made the system much more robust, see https://github.com/systemd/systemd/pull/4763. The symlink should now be removed automatically if the upgrade fails for any reason.

Comment 4 Walter Francis 2016-11-30 19:44:38 UTC
(In reply to Zbigniew Jędrzejewski-Szmek from comment #3)
> In the meanwhile, we made the system much more robust, see
> https://github.com/systemd/systemd/pull/4763. The symlink should now be
> removed automatically if the upgrade fails for any reason.

Nice!  Thank you!