Bug 1710543 - Cannot find os-release file after update from F29 using dnf distro-sync
Summary: Cannot find os-release file after update from F29 using dnf distro-sync
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: fedora-release
Version: 30
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Stephen Gallagher
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1705651 1711632 1718026 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-05-15 18:01 UTC by Milan Crha
Modified: 2020-10-21 20:59 UTC (History)
25 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-07-11 18:29:54 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Milan Crha 2019-05-15 18:01:13 UTC
I updated a Fedora 29 machine to Fedora 30 using `dnf distro-sync` command (there had been some issues with some fedora-release packages, thus I tried also with --allowerasing, as suggested by the dnf command) and after it finished successfully the system didn't want to boot due to missing os-release file (according to the logs). It took me some time to figure out that the
/usr/lib/os-release is a symlink in the Fedora 29:

>   $ ls -l /usr/lib/os-release 
>   lrwxrwxrwx. 1 root root 37 May  6 16:04 /usr/lib/os-release -> ./os.release.d/os-release-workstation

but Fedora 30 has it as a real file. The problem is that the update didn't change the symlink to a regular file, which resulted in the boot issues. After I replaced it with the one from the cinnamon package it begun to boot properly again. The system had installed cinnamon in Fedora 29 too.

I can try to find in the logs what the conflict was, if you can guide me where to look for it (I've no idea whether the F30 distro-sync logs are still on the machine) and if you think it'll be helpful.

Comment 1 Milan Crha 2019-05-16 13:38:12 UTC
dnf system-upgrade is also affected, caused by conflicts mentioned in bug #1710759, which I've been told is a known bug in Fedora 30. Feel free to close this too, if you think you cannot do anything better with this.

Comment 2 Stephen Gallagher 2019-05-20 12:19:40 UTC
*** Bug 1711632 has been marked as a duplicate of this bug. ***

Comment 3 Stephen Gallagher 2019-05-20 12:21:30 UTC
I'm moving this to dnf-plugins-core because I think this is a problem with the transaction test. DNF should not be allowing the system-upgrade to proceed because it should be detecting that the resulting system cannot have multiple fedora-release-* packages on it. They have clearly-defined Conflicts: in the RPMs.

Comment 4 amatej 2019-05-29 07:11:24 UTC
I believe the logs should still be present under /var/log/dnf.log. You should be able to find the exact commands there `dnf distro-sync` with its output/debug and then, since I assume you ran it immediately after, `dnf distro-sync --allowerasing` and its output/debug. If you could upload these it could be helpful. 

Also if I understand correctly from the other referenced bug #1710759 you were able to reproduce this on a different machine with `dnf system-upgrade download --refresh --releasever=30`, if possible can you provide list of packages installed on both machines before the update? It would be great if we could reliably reproduce this. So far I have tried in a container, but it seems having installed the specified fedora-release-.. packages from bug #1710759 is not enough.

Comment 5 amatej 2019-06-05 07:25:57 UTC
Thanks to the virtual machine provided by the reporter I was able to reproduce this and I believe even figure out what exactly is going on. 

What is happening:

On F29 we have:
>   [asd@localhost ~]$ dnf repoquery --installed fedora-rel*
>   fedora-release-0:29-10.noarch
>   fedora-release-matecompiz-0:29-10.noarch
>   fedora-release-workstation-0:29-10.noarch
and
>   [asd@localhost ~]$ ls -la /usr/lib/
>   ...
>   lrwxrwxrwx.   1 root root    37 Jun  4 15:33 os-release -> ./os.release.d/os-release-workstation
>   ...

We then do update from F29 to F30 either with distro-sync or system-upgrade, at first we get a conflict as correctly expected by comment #3, using --allowerasing we proceed.
The upgrade transaction contains:
>   Upgrading:
>    fedora-release-matecompiz                noarch               30-3                updates                 11 k
>    fedora-gpg-keys                          noarch               30-1                fedora                 102 k
>    fedora-repos                             noarch               30-1                fedora                 9.3 k
>   Installing dependencies:
>    fedora-release-common                    noarch               30-3                updates                 19 k
>        replacing  fedora-release.noarch 29-10
>   Removing dependent packages:
>    fedora-release-workstation               noarch               29-10               @updates               3.0 k

>   Running transaction
>     Preparing        :                                                                                        1/1 
>     Running scriptlet: fedora-gpg-keys-30-1.noarch                                                            1/1 
>     Upgrading        : fedora-gpg-keys-30-1.noarch                                                            1/9 
>     Upgrading        : fedora-repos-30-1.noarch                                                               2/9 
>     Upgrading        : fedora-release-matecompiz-30-3.noarch                                                  3/9 
>     Installing       : fedora-release-common-30-3.noarch                                                      4/9 
>     Running scriptlet: fedora-release-workstation-29-10.noarch                                                5/9 
>     Erasing          : fedora-release-workstation-29-10.noarch                                                5/9 
>     Running scriptlet: fedora-release-workstation-29-10.noarch                                                5/9 
>     Running scriptlet: fedora-release-matecompiz-29-10.noarch                                                 6/9 
>     Cleanup          : fedora-release-matecompiz-29-10.noarch                                                 6/9 
>     Obsoleting       : fedora-release-29-10.noarch                                                            7/9 
>     Cleanup          : fedora-repos-29-5.noarch                                                               8/9 
>     Cleanup          : fedora-gpg-keys-29-5.noarch                                                            9/9 
>     Running scriptlet: fedora-gpg-keys-29-5.noarch                                                            9/9 
>     Verifying        : fedora-release-common-30-3.noarch                                                      1/9 
>     Verifying        : fedora-release-29-10.noarch                                                            2/9 
>     Verifying        : fedora-release-matecompiz-30-3.noarch                                                  3/9 
>     Verifying        : fedora-release-matecompiz-29-10.noarch                                                 4/9 
>     Verifying        : fedora-gpg-keys-30-1.noarch                                                            5/9 
>     Verifying        : fedora-gpg-keys-29-5.noarch                                                            6/9 
>     Verifying        : fedora-repos-30-1.noarch                                                               7/9 
>     Verifying        : fedora-repos-29-5.noarch                                                               8/9 
>     Verifying        : fedora-release-workstation-29-10.noarch                                                9/9 

Upgrade is successful, but:
>   [asd@localhost ~]$ ls -la /usr/lib/
>   ...
>   lrwxrwxrwx.   1 root root    32 Jun  4 15:49 os-release -> ./os.release.d/os-release-fedora
>   ...

The target of the symlink "./os.release.d/os-release-fedora" no longer exist in F30, we have an invalid os-release and the system won't boot correctly.

I think the problem is the %preun scriptlet for fedora-release-workstation-29-10.noarch, it runs after installing of fedora-release-common-30-3.noarch and sets the faulty symlink.

Comment 6 Stephen Gallagher 2019-06-05 12:51:37 UTC
Thanks for the detailed investigation! I'll have a look today and see if I can get that fixed.

Comment 7 Stephen Gallagher 2019-06-06 15:39:04 UTC
OK, this was a bit tricky to identify, but the real problem here is that we can't handle the following case:

* There are at least two `fedora-release-$VARIANT` packages on the system.
* dnf does not pick the `fedora-release-$VARIANT` that matches the installed system when deciding which one to erase.

The logic for the uninstallation of the package in %preun is:

    if read_variant() == edition then
      set_variant("nonproduct")
      convert_to_edition("nonproduct", false)
    end

So in the example above, since the system was initially installed as Workstation, when DNF decided to keep cinnamon instead of workstation, it triggered the logic to set the symlink to the "nonproduct" case. Whereas if only fedora-release-cinnamon had been selected to be removed, it would have not made any changes.

The issue is partly one of timing: the F30+ fedora-release package no longer contains /usr/lib/variant, which was used to keep track of the installed variant. This file doesn't get removed until after the old F29 package is removed, which occurs *after* the old package's %preun is run. So that file is still there and still showing a value that matches the package getting uninstalled. If that file had already been deleted or didn't match the package being removed, we'd be fine.


So, the ideal fix here would be to have a way to indicate to DNF which of the available packages it should keep, that way on upgrade you retain the system identity you were supposed to have.


The *quick* fix is for me to re-add the /usr/lib/variant file, but expressly set it to "_disabled_" (not matching any value that it ever should have had). This file will be in place before the %preun of the old package gets executed and thus it will not end up calling convert_to_edition("nonproduct")


The downside to the quick fix is that you may end up with a different identity than you had before the upgrade. Unfortunately, I don't know of any way to hint to DNF that it needs to pick a package based on content on the filesystem, so I think we may just be stuck here. Not breaking upgrades outweighs getting the identity wrong.

Comment 8 Stephen Gallagher 2019-06-06 17:00:17 UTC
OK, I have implemented the quick fix and submitted it as a PR to fedora-release:

* https://src.fedoraproject.org/rpms/fedora-release/pull-request/84 (F31/Rawhide)
* https://src.fedoraproject.org/rpms/fedora-release/pull-request/85 (F30)

Comment 9 Mohan Boddu 2019-06-06 17:45:35 UTC
*** Bug 1718026 has been marked as a duplicate of this bug. ***

Comment 10 Jindrich Novy 2019-06-07 12:03:36 UTC
(In reply to Stephen Gallagher from comment #7)
> The issue is partly one of timing: the F30+ fedora-release package no longer
> contains /usr/lib/variant, which was used to keep track of the installed
> variant. This file doesn't get removed until after the old F29 package is
> removed, which occurs *after* the old package's %preun is run.

Wouldn't it help to move the /usr/lib/os-release creation logic in fedora-release-workstation to %posttrans so that the scriptlet execution happens after all files of all packages are already settled on the filesystem to avoid dependency hell?

Jindrich

Comment 11 Stephen Gallagher 2019-06-07 12:53:28 UTC
(In reply to Jindrich Novy from comment #10)
> (In reply to Stephen Gallagher from comment #7)
> > The issue is partly one of timing: the F30+ fedora-release package no longer
> > contains /usr/lib/variant, which was used to keep track of the installed
> > variant. This file doesn't get removed until after the old F29 package is
> > removed, which occurs *after* the old package's %preun is run.
> 
> Wouldn't it help to move the /usr/lib/os-release creation logic in
> fedora-release-workstation to %posttrans so that the scriptlet execution
> happens after all files of all packages are already settled on the
> filesystem to avoid dependency hell?
> 

The creation logic *is* in %posttrans. This is the uninstallation logic for if you end up removing fedora-release-workstation. It's supposed to mark the system as being just "Fedora" and no longer "Fedora Workstation. This uninstallation logic has to be in %preun because %posttrans doesn't have a mechanism to know if the package is being upgraded or uninstalled, and this has to happen only on uninstall. Secondly, the %preun fires from the package being removed, not the package being added. So any change we'd want to make to the uninstallation logic would require us to first push another fedora-release update to stable on F28 and F29. Not everyone bothers to full upgrade their system before updating, so this won't help those folks, and the outcome is bad enough that we really need to be more fail-safe.

So that's why I implemented the hack I did; it should always be *safe*, even if it does result in the system changing its identity sometimes.

We also need to document that the recommended behavior is *not* to use `--best --allowerasing`, but instead to manually remove whichever fedora-release-$edition package they don't want their system to report as.

Comment 12 Fedora Update System 2019-06-10 14:07:55 UTC
FEDORA-2019-ffb90829eb has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-ffb90829eb

Comment 13 Fedora Update System 2019-06-11 01:19:13 UTC
fedora-release-30-4 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-ffb90829eb

Comment 14 Fedora Update System 2019-06-18 18:13:57 UTC
fedora-release-30-4 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.

Comment 15 Jaroslav Mracek 2019-07-01 18:09:57 UTC
*** Bug 1705651 has been marked as a duplicate of this bug. ***

Comment 16 Thomas Bennett 2020-10-21 20:59:49 UTC
Not sure if this is related directly to this bug, but:
Upgrading  from 29 to 30 on Sept 25, 2020 on a Dell PowerEdge R710 18TB storage using dnf the server would only reboot in rescue mode and could not mount the root files system even using a live cd.  A backup of the XFS metadata and using xfs_repair -L on a disk image copy of the metadata resulted in very much lost data and unlinked blocks.  Seeing that expensive data was missing in that test I decided not to use that on the  server storage.  Finally bought UFS Explorer, licensed for $75, and had to install on a laptop then copy the binary to a usb drive.  booted server from live Fedora 30 cd, had a 7TB usb drive, and the usb drive with the UFS Explorer. In a terminal I had to run UFSE as root to see any file systems and using that application am able to recover 4.7TB of essential (not backed up) data to the storage usb drive attached.


Note You need to log in before you can comment on or make changes to this bug.