Created attachment 1410940 [details]
tarball of all of /var/log from a failure
After https://bugzilla.redhat.com/show_bug.cgi?id=1558354 is fully fixed (i.e. with freeipa-4.6.90.pre1-4.fc28), upgrade of a FreeIPA server from Fedora 27 to Fedora 28 still fails. There seem to be two different failures, one during the actual upgrade boot, one during the first boot after upgrade: the IPA upgrade process runs both times, and fails differently each time.
This bug is for the *second* failure, the failure of the upgrade when ipa.service tries to start on the first boot *after* the upgrade boot. In this case, we see the following in ipaupgrade.log:
2018-03-21T03:05:38Z DEBUG Destroyed connection context.ldap2_140356854979384
2018-03-21T03:05:38Z DEBUG Starting external process
2018-03-21T03:05:38Z DEBUG args=['/bin/systemctl', 'restart', 'sssd.service']
2018-03-21T03:05:38Z DEBUG Process finished, return code=1
2018-03-21T03:05:38Z DEBUG stdout=
2018-03-21T03:05:38Z DEBUG stderr=Job for sssd.service failed because the control process exited with error code.
And in the journal:
Mar 20 20:05:38 ipa001.domain.local systemd: Starting System Security Services Daemon...
Mar 20 20:05:38 ipa001.domain.local sssd: Starting up
Mar 20 20:05:38 ipa001.domain.local sssd: Lower version of database is expected!
Mar 20 20:05:38 ipa001.domain.local sssd: Removing cache files in /var/lib/sss/db should fix the issue, but note that removing cache files will also remove all of your cached credentials.
Mar 20 20:05:38 ipa001.domain.local systemd: sssd.service: Main process exited, code=exited, status=3/NOTIMPLEMENTED
Mar 20 20:05:38 ipa001.domain.local systemd: sssd.service: Failed with result 'exit-code'.
Mar 20 20:05:38 ipa001.domain.local systemd: Failed to start System Security Services Daemon.
I'll be filing another bug for the other failure (during the upgrade boot). Attaching a tarball of the whole /var/log archive, including logs for *both* failures. The logs for this failure are in the final boot in the logs, around 20:05 (system local time - journal timestamps), 03:05 (UTC time - IPA log timestamps).
Proposing as a Beta blocker. We should probably establish whether fixing either, both, or one specific one of the two bugs is necessary to make the upgrade work successfully.
The bug for the other failure is https://bugzilla.redhat.com/show_bug.cgi?id=1558818 .
Fabiano, could you please investigate this one? From my cursory look it is initialization of the monitor code which just treats any failure of sysdb_init_ext() as a problem with ldb database upgrade.
Remember, we are upgrading from F27 to F28 here.
I'm on it, Alexander.
Adam, thanks for filing the bug.
The issue happens because the upgrade ended up downgrading SSSD from the version used in Fedora27.
The dnf.log shows: 2018-03-21T02:49:25Z DEBUG ---> Package sssd-common.x86_64 1.16.0-12.fc28 will be a downgrade
The newest version of SSSD package was already in testing (and here I'd suggest to always consider getting packages from updates-testing as well in the OpenQA) and it's now pushed to stable and the problem should be solved.
(In reply to Fabiano Fidêncio from comment #4)
> The issue happens because the upgrade ended up downgrading SSSD from the
> version used in Fedora27.
> The dnf.log shows: 2018-03-21T02:49:25Z DEBUG ---> Package
> sssd-common.x86_64 1.16.0-12.fc28 will be a downgrade
> The newest version of SSSD package was already in testing (and here I'd
> suggest to always consider getting packages from updates-testing as well in
> the OpenQA) and it's now pushed to stable and the problem should be solved.
OpenQA must test packages that are in the stable set because we need to know that what we're *currently* shipping must actually work. If it wasn't; we wouldn't have detected this issue and upgrades would be broken for users. This was all good. I could see an argument for potentially doing the same test *twice* -- once with u-t enabled and one with only stable repos -- in order to be able to tell if updates will introduce breakage or succeed where stable is failing.
Anyway, I think this is a pretty obvious blocker: +1 blocker vote from me. Since you say it's in stable now, I'll move it to ON_QA.
Note that the package (SSSD) is submitted for stable, not in stable yet. I tried a new test run in OpenQA and it failed due to forced downgrade of sssd packages: https://openqa.stg.fedoraproject.org/tests/260869
We need this as a blocker, indeed, otherwise we couldn't get it in stable.
So, +1 for blocker.
OK, that's +3 and this is an obvious one, so marking AcceptedBlocker so we can push the update stable.
sssd-1.16.1-1.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-309397b340
sssd-1.16.1-1.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.