Bug 2242759 - dnf system-upgrade fails on some RPi4 due to system boot date that pre-dates gpg key
Summary: dnf system-upgrade fails on some RPi4 due to system boot date that pre-dates ...
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: distribution
Version: 38
Hardware: aarch64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Aoife Moloney
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: AcceptedPreviousRelease https://discu...
: 2249246 (view as bug list)
Depends On:
Blocks: BetaBlocker, F40BetaBlocker
TreeView+ depends on / blocked
 
Reported: 2023-10-08 19:27 UTC by Brad Smith
Modified: 2024-02-27 22:53 UTC (History)
33 users (show)

Fixed In Version: systemd-253.12-1.fc38
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-10-27 01:26:31 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
shows upgrade-failed-journal.log (320.74 KB, text/plain)
2023-10-09 21:38 UTC, Hristo Marinov
no flags Details
shows upgrade-complete-journal.log (1.07 MB, text/plain)
2023-10-09 21:40 UTC, Hristo Marinov
no flags Details
Shows failed upgrade process log (176.62 KB, text/plain)
2023-10-26 09:50 UTC, Hristo Marinov
no flags Details

Description Brad Smith 2023-10-08 19:27:16 UTC
F38 to F39 beta dnf system-upgrade reboot fails on Raspberry Pi 4 (8 gb).

Using dnf system-upgrade log --number=-1 I see an entry like "Signature 10d5 created at Wed Sep 27 16:33:34 2023 invalid: signature is not alive" for each rpm in the upgrade list.

Raspberry Pi 4 does not have a hardware real time clock so when the Pi is booting Fedora system time is at some (arbitrary?) date and time. During a normal boot chronyd is executed and will set the clock: "chronyd[571]: System clock wrong by 944623.935135 seconds".

If the gpg key used by DNF during the system-upgrade has a valid period more recent than the boot time, system-upgrade will fail.

A work-around is to use DNF_SYSTEM_UPGRADE_NO_REBOOT=1 as documented for the plugin and not reboot during the system upgrade process.

I have been using dnf system-upgrade on these Pi 4 machines since F32 without a problem. This is the first time for this error so I do not know how prevalent it is.  

This problem occurs on both RPi4 machines I tested.



Reproducible: Always

Steps to Reproduce:
1. RPi4 at Fedora 38 (i use minimal)
2. Follow directions to do a dnf system-upgrade
3. After dnf system-upgrade --reboot is executed, machine reboots to F38 and starts the upgrade. This stops after a couple of minutes.
4. ssh to RPi4 and use dnf system-upgrade log --number=-1 to inspect log
5. Find Signature 10d5 created at Wed Sep 27 16:33:34 2023 invalid: signature is not alive or similar in log file.
Actual Results:  
Machine remains at F38

Expected Results:  
Machine is upgraded to F39 (beta)

There are reports of success with this process on similar Raspberry Pi 4s from others so I have no idea is this is just highly localized. Both machines I tried failed.

Comment 1 Brad Smith 2023-10-08 21:30:07 UTC
I also tried "sudo dnf system-upgrade reboot --nogpgcheck" but still had the same failure. I tried dnf system-upgrade reboot on 4 RPi4 machines. Had to use the workaround on all 4.

Comment 2 Kamil Páral 2023-10-09 08:22:02 UTC
Proposing for a blocker discussion.

Comment 3 Hristo Marinov 2023-10-09 13:06:44 UTC
I can confirm this bug on Raspberry Pi 4 Model B, Rev 1.4, Firmware Version 2023.04, 8GB. During the tests for Test Day: 2023-10-09 Virtualization I ran into the same issue. My workaround was to disable chronyd.service and enable systemd-timesyncd.service. After that, the upgrade was successful.

Comment 4 Geoffrey Marr 2023-10-09 17:51:21 UTC
Discussed during the 2023-10-09 blocker review meeting: [0]

The decision to classify this bug as an "AcceptedBlocker (Final)" was made as it violates the following criterion:

"For each one of the release-blocking package sets, it must be possible to successfully complete a direct upgrade from a fully updated, clean default installation of each of the last two stable Fedora releases with that package set installed.", on affected RPi 4 systems. Note that the RPi4 is considered blocking hardware.

[0] https://meetbot.fedoraproject.org/fedora-blocker-review/2023-10-09/f39-blocker-review.2023-10-09-16.00.txt

Comment 5 Hristo Marinov 2023-10-09 21:38:35 UTC
Created attachment 1993134 [details]
shows upgrade-failed-journal.log

- System boots with wrong date/time stamp
	Sep 17 20:00:01 localhost kernel: Booting Linux on physical CPU 0x0000000000 [0x410fd083]
	...
	
- chronyd has not been started
	Sep 18 00:00:40 localhost python3[692]: Starting system upgrade. This will take a while.
	...
	Sep 18 00:01:00 localhost dnf-3[713]: error: Verifying a signature using certificate E8F23996F23218640CB44CBE75CF5AC418B8E74C (Fedora (39) <fedora-39-primary>):
	Sep 18 00:01:00 localhost dnf-3[713]:   Signature c27b created at Fri Sep 29 17:10:47 2023 invalid: signature is not alive
	Sep 18 00:01:00 localhost dnf-3[713]:       because: Not live until 2023-09-29T17:05:47Z
	...
	Sep 18 00:01:55 localhost dnf-3[692]: Error: GPG check FAILED
	Sep 18 00:01:55 localhost systemd[1]: dnf-system-upgrade.service: Main process exited, code=exited, status=1/FAILURE
	Sep 18 00:01:55 localhost systemd[1]: dnf-system-upgrade.service: Failed with result 'exit-code'.
	Sep 18 00:01:55 localhost systemd[1]: Failed to start dnf-system-upgrade.service - System Upgrade using DNF.
	...

Comment 6 Hristo Marinov 2023-10-09 21:40:51 UTC
Created attachment 1993135 [details]
shows upgrade-complete-journal.log

- System boots with wrong date/time stamp
	Sep 17 20:00:01 localhost kernel: Booting Linux on physical CPU 0x0000000000 [0x410fd083]
	...
	
- systemd-timesyncd corrects the clock
	Oct 09 16:43:05 localhost systemd-timesyncd[667]: System clock time unset or jumped backwards, restored from recorded timestamp: Mon 2023-10-09 16:43:05 EDT

- systemd-timesyncd has already adjusted the clock
	Oct 09 16:43:25 localhost python3[689]: Starting system upgrade. This will take a while.
	...
	Oct 09 17:04:39 localhost dnf-3[689]: Complete!
	Oct 09 17:04:39 localhost dnf-3[689]: Cleaning up downloaded data...
	...

Comment 7 Petr Pisar 2023-10-10 07:43:10 UTC
I'm keen to move this to rust-rpm-sequoia because this is where the time check is implemented and enforced.

Either verifying a signature time is important and than the desynchronized clock is a bug in user's system, or it's not important and then Sequoia should ignore the times.

Comment 8 Petr Pisar 2023-10-10 07:57:14 UTC
By the way systems with broken clock can even fail to download the repository metadata because of HTTPS. X.509 mandates checking NotBefore and NotAfter in certificates. I don't think we are going   to disable the checks in OpenSSL. I think an appropriate resolution is rejecting this bug report.

Comment 9 Marek Blaha 2023-10-10 08:04:29 UTC
(In reply to Brad Smith from comment #1)
> I also tried "sudo dnf system-upgrade reboot --nogpgcheck" but still had the
> same failure. I tried dnf system-upgrade reboot on 4 RPi4 machines. Had to
> use the workaround on all 4.

Using the switch during `system-upgrade reboot` is too late. Did you try to use it with `sudo dnf system-upgrade download --nogpgcheck`? The value is then stored in the state file and should be used during upgrade.

Comment 10 Panu Matilainen 2023-10-10 08:10:50 UTC
Yep, we're not going to disable certificate time checks for everybody everywhere just because Rpi doesn't have a clock.

> My workaround was to disable chronyd.service and enable systemd-timesyncd.service. After that, the upgrade was successful.

That seems like a meaningful resolution to this. I don't know how the system-upgrade environment is defined/prepared but I guess that's this action needs to happen.

Comment 11 Jaroslav Rohel 2023-10-10 09:46:42 UTC
The system time should not jump backwards after a reboot. And not in years at all. This can cause many more problems than just OpenPGP keys for packages. Files created at the wrong time may have the wrong timestamp. I have already experienced systems that did not boot because the control system reported corruption - files with a senselessly old time stamp appeared in the system.

The correct solution is to ensure that time does not jump backwards after a reboot. And that's what systemd-timesyncd service does according to the mailing list: https://lists.freedesktop.org/archives/systemd-devel/2014-May/019537.html
>The daemon saves the current clock to disk every time a new NTP sync has been acquired, and uses this to possibly correct the system clock early at bootup, in order to accommodate for systems that lack an RTC such as the Raspberry Pi and embedded devices, and make sure that time monotonically progresses on these systems, even if it is not always correct.

As a last resort, if we are not able to ensure a monotonic passage of time, I see the introduction of a partial turn off time control for OpenPGP. Something like "--accept-pgp-signature-from-future". I find this better than turning off signature checking altogether ("--nogpgcheck").

Comment 12 Jaroslav Mracek 2023-10-10 10:32:41 UTC
We would like to to close the bug with workaround - use --nogpgcheck for `dnf system-upgrade download` or use DNF_SYSTEM_UPGRADE_NO_REBOOT=1 environment variable (see https://dnf-plugins-core.readthedocs.io/en/latest/system-upgrade.html).

We cannot disable the signature verification by default, but user can. May I ask you a recommendation how to proceed with this bug and blocker workflow?

Comment 13 Panu Matilainen 2023-10-10 10:38:50 UTC
--nogpgcheck is a horrible workaround. 

What decides which services run in the system-upgrade environment? Just ensure timesync runs there, that sounds like it would actually *solve* this issue rather than put users at risk.

Comment 14 Panu Matilainen 2023-10-10 10:58:34 UTC
Hmm okay I guess you need timesync running *before* reboot to system-upgrade, but still that's a worlds better workaround than --nogpgcheck.

Comment 15 Marek Blaha 2023-10-10 11:18:26 UTC
Yep, you need to replace chronyd (installed by default) by systemd-timesyncd, as mentioned in comment#3 before rebooting. The reason why it works is that systemd-timesyncd tries to make sure that time progresses monotonically even on systems without RTC and without network connectivity (as described in comment#11).
And I agree that this is better solution than --nogpgchecks. I only wanted to clarify how --nogpgcheck works with system-upgrade.

Comment 16 Jaroslav Rohel 2023-10-10 12:12:52 UTC
As I already wrote in comment comment#11. The system time should not jump backwards after a reboot.
By the way. A long time ago (years ago) I built a device with a Raspberry Pi. I used the Debian distribution and the "fake-hwclock" package (https://manpages.debian.org/jessie/fake-hwclock/fake-hwclock.8.en.html). Later I added RTC hardware to the device.
If I didn't use "fake-hwclock" or hardware RTC and no NTP server was available, the device ran with the time shifted backwards. And that had other consequences. And it wasn't about package signatures.
So years later, it's rather a surprise to me that instead of fixing the real problem - the time jump, we have a blocker on dnf-plugins-core.

Comment 17 Kamil Páral 2023-10-10 13:05:10 UTC
(In reply to Jaroslav Rohel from comment #16)
> it's rather a surprise to me that instead of fixing the real problem - the time jump, we have a blocker on dnf-plugins-core.

The bug has to be assigned against something. The purpose of this discussion is to figure how to best fix it. It definitely doesn't mean that it has to be fixed in dnf. Feel free to change the component to a better fitting one, thanks.

(In reply to Jaroslav Mracek from comment #12)
> We would like to to close the bug with workaround - use --nogpgcheck for
> `dnf system-upgrade download` or use DNF_SYSTEM_UPGRADE_NO_REBOOT=1
> environment variable (see
> https://dnf-plugins-core.readthedocs.io/en/latest/system-upgrade.html).
> 
> We cannot disable the signature verification by default, but user can. May I
> ask you a recommendation how to proceed with this bug and blocker workflow?

To answer your question, hopefully the ARM SIG people would comment on whether this approach is fine with them, or whether they'd like to do something else/better. Let's tag @pbrobinson .
From what I read, it seems that fixing the time daemon would be a better solution than disabling security checks.

Comment 18 Jaroslav Mracek 2023-10-10 13:15:40 UTC
I think that there is consensus that the issue is not in DNF or RPM and we cannot provide any patch in our component.

I agree that `--nogpgcheck` or DNF_SYSTEM_UPGRADE_NO_REBOOT=1 are not nice workarrounds, but we are in blocker period where workarounds might help. I did not explicitly mentioned that there is a long list of problems with those workaround. Mainly security and stability issues.

I think that the solution in comment 3 makes sense (it is a real solution, not a workaround), but we (Software management team) are unable to provide that kind of solution, but distribution can modify installed content and default set of enabled services. The distribution component is a special component and I am not 100 % sure whether the solution from Comment 3 will be applicable in general, but we will see.

I am changing the component to distribution, to clean the path for a proper solution, because we can provide only incorrect workarounds.

Comment 19 Brad Smith 2023-10-10 13:51:32 UTC
(In reply to Marek Blaha from comment #9)
> (In reply to Brad Smith from comment #1)
> > I also tried "sudo dnf system-upgrade reboot --nogpgcheck" but still had the
> > same failure. I tried dnf system-upgrade reboot on 4 RPi4 machines. Had to
> > use the workaround on all 4.
> 
> Using the switch during `system-upgrade reboot` is too late. Did you try to
> use it with `sudo dnf system-upgrade download --nogpgcheck`? The value is
> then stored in the state file and should be used during upgrade.


I did not try this variant. It might be useful to clarify in the online documentation for the plugin or the Fedora system-upgrade process at which step to apply any additional DNF related options so that they take affect where needed.

Comment 20 Brad Smith 2023-10-10 14:02:12 UTC
(In reply to Jaroslav Mracek from comment #18)
> 
> 
> I am changing the component to distribution, to clean the path for a proper
> solution, because we can provide only incorrect workarounds.

Super. I chose the original dnf related component because, as others have noted, a component is needed and it was not clear what would have been more correct. Perhaps the Raspberry or Arm related web content for Fedora can be amended to include options for when an RTC is not present? I will take a look and offer some possible amendments.

Comment 21 Peter Robinson 2023-10-10 14:58:47 UTC
(In reply to Petr Pisar from comment #8)
> By the way systems with broken clock can even fail to download the
> repository metadata because of HTTPS. X.509 mandates checking NotBefore and
> NotAfter in certificates. I don't think we are going   to disable the checks
> in OpenSSL. I think an appropriate resolution is rejecting this bug report.

It generally won't fail then because the network needs to be up at that point in time and the timesync will have updated the time.

Comment 22 Peter Robinson 2023-10-10 15:34:48 UTC
(In reply to Brad Smith from comment #1)
> I also tried "sudo dnf system-upgrade reboot --nogpgcheck" but still had the
> same failure. I tried dnf system-upgrade reboot on 4 RPi4 machines. Had to
> use the workaround on all 4.

Can we run the system-upgrade service after the time services (chronyd or others) starts and syncs the time?

Comment 23 Jeremy Linton 2023-10-10 18:39:16 UTC
I'm sorta the opinion that, if not now, then in the very near future, we should require RTC's for actual fedora-supported machines. And this is just one of the many cases where anything having to do with modern crypto/cert chains is going to fail if the system time is too far off, and things like chrony/ntp themselves can't (or rather won't/shouldn't) move the clock by years.

Comment 24 Adam Williamson 2023-10-10 19:29:47 UTC
"Can we run the system-upgrade service after the time services (chronyd or others) starts and syncs the time?"

I haven't checked, but I suspect that chronyd never actually starts on an upgrade boot.

I don't think the significant difference between systemd-timesync and chronyd here is *how* they work, but just whether they run during the system upgrade boot at all. systemd-timesync.service has `WantedBy=sysinit.target`. chronyd.service has `WantedBy=multi-user.target`.

When we do a system upgrade, we boot to `system-update.target`. That target has:

After=sysinit.target system-update-pre.target

so to reach systemd-update.target we have to run through sysinit.target and system-update-pre.target, but we do *not* have to get to multi-user.target.

So, *possibly* we could fix this by configuring system-update-pre.target to have `Wants=chronyd.service`?

Comment 25 Jaroslav Rohel 2023-10-10 20:12:59 UTC
> So, *possibly* we could fix this by configuring system-update-pre.target to have `Wants=chronyd.service`?

I don't think this is the solution. The packages were downloaded before the reboot to ensure that the upgrade would work offline. Chrony uses NTP -> needs a working network. Which can be a problem soon after the reboot. E.g. WiFi that needs a password.

At least we have to make sure time doesn't jump back. As I mentioned in comment #11, the systemd-timesyncd service should be able to do this. Or an application specifically designed for this, e.g. "fake-hwclock".

Comment 26 Adam Williamson 2023-10-10 20:24:00 UTC
well, the thing with that is, either of those are fairly major changes that we shouldn't introduce to F39 at this point, let alone the already-released F38...

Comment 27 Jaroslav Rohel 2023-10-11 06:28:11 UTC
> well, the thing with that is, either of those are fairly major changes that we shouldn't introduce to F39 at this point, let alone the already-released F38...

Now we know we have a bad time after the reboot. The time is moved far back (for example, 1970). The correct solution is to solve this problem. But if it's a major change...

OK. Let's try it from the other side. Let's try to solve the problem as originally reported. That is, we need to make sure that the "system-upgrade" after reboot happens even with a bad time. It's terrible, there will be nonsensical times in the log, file timestamps, ... but we'll move on.

I got an idea. Installing unverified packages ("--nogpgcheck") is unacceptable. But we don't have to disable signature checking. Signature checking can take place before reboot. At a time when we have the right time. After the reboot, already verified packages will be installed. I looked in the source code and it looks like "system-upgrade" performs signature checking including the test transaction already before reboot. So the signature check after reboot is done redundantly. So we can disable signature checking after reboot and still install the verified packages.
Turning off signature checking after reboot, or introducing the option to turn it off after reboot (new argument to the system-upgrade command) is the solution in my opinion.

Repeat. From my point of view, this is not a clean solution. But it solves the original system-upgrade problem.

Comment 28 Jaroslav Rohel 2023-10-11 08:32:53 UTC
I did the test.
I moved back time. And tried to upgrade the package. "--nogpgcheck" didn't help. There were other errors - the signature problem was caught elsewhere.
I added "--setopt=tsflags=nocrypto". That didn't help either. Transaction test failed:
Error: Transaction test error:
  pakage iftop-1.0-0.31.pre4.fc39.x86_64 does not verify: Header V4 RSA/SHA256 Signature, key ID a15b79cc: BAD

So, on my test PC, the signature check cannot be turned off. DNF error or do I have an RPM policy that doesn't allow it? Anyway, it might complicate the solution suggested in comment #27.

Comment 29 Jaroslav Rohel 2023-10-11 10:47:10 UTC
Sorry for so many messages. So probably the last message from me about this bug.

My suggestion in comment #27 is an ugly solution to the problem that is caused by an unset time after a system reboot. Ugly because it only solves one consequence and not the problem. And it only delays other problems that may arise. Anything else (even a package scriptlet) can also depend on time.

I would personally address the cause of the problem. Setting the time after restart. I would save the time just before power off/reboot and set it back after power on/reboot. As I already mentioned "fake-hwclock". This is a "simple" solution that does not need a network. In addition, it is commonly used in several other distributions.
I also searched the internet and found: https://lists.centos.org/pipermail/arm-dev/2018-August/023617.html
So they already ran into this problem years ago with CentOS. And the solution is the same as I recommend.

Comment 30 Thomas 2023-10-11 19:04:12 UTC
There are some needinfo flags still set.  As i understand, the problem is well understood and two workarounds are proposed.  Is there any further information required?

My personal opinion is: this affects the update from 38 to 39, only (i. e., not a fresh install).  This means, there is a working instance before the update is started.  A mere information/warning to the user, e. g. "pls install fake-hwclock before continuing", in case systemd-timesyncd in not started and the platform is known to not have a working real time clock would also solve the problem (kind of), w/o having the huge impact any change of the update process would have.

Comment 31 Jaroslav Rohel 2023-10-12 07:16:01 UTC
>A mere information/warning to the user, e. g. "pls install fake-hwclock before continuing", in case systemd-timesyncd in not started and the platform is known to not have a working real time clock would also solve the problem

Great solution. But wait. I found "fake-hwclock" with Fedora spec on gitlab https://gitlab.com/stevenfalco/fake-hwclock. I found a similar service "soft-hwclock" for CentOS https://github.com/kristjanvalur/soft-hwclock. And it is a very simple solution. "fake-hwclock", "soft-hwclock" and similar are implemented as simple scripts using the "date" command.

But do we have such a package in the Fedora repository? It would not be good to write the information "please install package xy" to the user and then not have it in the repository.

Comment 32 Kamil Páral 2023-10-12 10:31:03 UTC
Common Issues entry:
https://discussion.fedoraproject.org/t/common-issues/92403

Comment 33 Brad Smith 2023-10-12 15:18:43 UTC
I am puzzled about why this issue occurs with the F39 to F39 upgrade but, to my best recollection, did not occur with, for example F36 to F37 using the dnf system-upgrade process. I have been using headless Fedora since F33 on these RPI4s. Every 2nd or 3rd Fedora release upgrade I reimage instead of using DNF as a way to eliminate cruft from stuff I install and try but still dnf system-upgrade has worked in the past. I wonder what changed?

Comment 34 Petr Pisar 2023-10-12 15:44:36 UTC
(In reply to Brad Smith from comment #33)
> I am puzzled about why this issue occurs with the F39 to F39 upgrade[...] I wonder what changed?

See comment #7. libdnf used to check signatures with GnuPG. Now it uses Sequoia for that. The former handles discrepancies as a warning. The latter as an error.

Comment 35 Petr Pisar 2023-10-12 15:45:25 UTC
Time discrepancies.

Comment 36 Adam Williamson 2023-10-12 15:51:16 UTC
As kparal points out, this should probably be a 'previous release' blocker - that is, it's a blocker for F39's release, but the fix will likely be applied to F37 and F38. See https://fedoraproject.org/wiki/QA:SOP_blocker_bug_process#Normal,_0-Day_and_Previous_Release_blockers

Comment 37 Brad Smith 2023-10-14 19:49:37 UTC
Initial attempt at summary so that Adam gets a recommendation from stakeholders (Peter Robinson and ?) on any fix and what would be needed to clear F39 blocker (common issue write up?).

1. Problem understood (https://bugzilla.redhat.com/show_bug.cgi?id=2242759#c34, https://bugzilla.redhat.com/show_bug.cgi?id=2242759#c7)
2. Problem exists for dnf system-upgrade reboot users going from F38 (F37?) to F39.
3. Two workarounds available using existing capabilities of F38.
3-a. systemd-timesyncd work around. More effort for user. Documented as proposed common issue 
3-b. DNF_SYSTEM_UPGRADE_NO_REBOOT=1. Less effort for user. Not documented yet.
4. Based on the comprehensive analysis and comments from Jaroslav, I assume any fix involves changes to the system upgrade dnf plugin. Changes could be new command line option, or setting and reading a time stamp so that a fall back time is available. Probably not realistic for F39?
5. I assume that given Petr's comment (https://bugzilla.redhat.com/show_bug.cgi?id=2242759#c34) this issue will occur with F40 unless changes are made.


I can amend the common issue write up to include the DNF_SYSTEM_UPGRADE_NO_REBOOT=1 workaround.

Comment 38 Adam Williamson 2023-10-14 21:36:25 UTC
I would not recommend `DNF_SYSTEM_UPGRADE_NO_REBOOT=1` and don't think we should write it up as a recommended workaround. Upgrading 'live' is substantially less safe; that's why all our recommended upgrade methods use the dedicated boot approach.

I don't have the same read on 4). I think Jaroslav's feedback is that we *can't* sensibly fix this within dnf. The only things that seem like possible fixes to me are adjustments that attempt to ensure a sensible time is set on boot during the upgrade - so far, that's switching to systemd-timesyncd (we would never normally make a change like that as an update to a stable release, and it would have to go through the whole Change process to be made for a future release), or somehow adding something like fake-hwclock or soft-hwclock to existing installs (only Pi installs, somehow? All ARM installs? All installs?), which again is a much more disruptive change than we'd normally make as an upgrade, and would require packaging one of them first.

Yes, as things stand this will affect upgrades from F39 and F40 as well, since we have not actually changed anything to address this in F39 or F40.

Comment 39 Brad Smith 2023-10-14 22:17:27 UTC
With respect to #4 I was thinking that the plug-in would/could use something analogous to fake-hwclock under the hood so to speak. When the user executes 'sudo dnf system-upgrade reboot' system time and date would be stored somewhere securely and after reboot but before doing the gpg check, that time stamp would be retrieved and checked against system clock. If system clock is newer, erase the time stamp, and proceed. If the system clock is older then log that fact, use the time stamp to set system time and proceed with upgrade. I believe this is what Jaraslav recommends in comment 29 (https://bugzilla.redhat.com/show_bug.cgi?id=2242759#c29). Modifications to the plug-in would seem to be the place to make these changes.  

I suppose that systemd-timesyncd could even be used to accomplish this in a transparent way, assuming that systemd is available whenever dnf system-upgrade plug-in is used (not sure that is valid).

Comment 40 Jaroslav Mracek 2023-10-19 07:06:36 UTC
The problem with incorrect time stamp is not only related to system-upgrade. There are more ways how issue might be reproduced with other dnf commands (boot without network and try to install anything from local repositories) and offline upgrades are also provided by PackageKit.

Comment 41 Zbigniew Jędrzejewski-Szmek 2023-10-25 22:00:29 UTC
Systemd has a built-in mechanism to set the clock during early boot.
It'll use either a built-in TIME_EPOCH (which is usually the build time of systemd itself),
or the mtime of /usr/lib/clock-epoch.

In comment #c5 above we have:
	Sep 18 00:00:40 localhost python3[692]: Starting system upgrade. This will take a while.
This is because systemd-253.10-1.fc38 was built on the 18th.
(Or more precisely, the last commit is from the 18th, hence the last autogenerated
changelog entry is from the 18th, so $SOURCE_DATE_EPOCH was set to the 18th, and systemd's
build system honours $SOURCE_DATE_EPOCH. This actually explains why the time seems to
start at midnight: the timestamp in the %changelog has date granularity.)

So I see two possible solutions:
1. tell users to 'sudo touch /usr/lib/clock-epoch'
2. rebuild systemd in F38.

We can probably do both. I'll reassign this to systemd.
Somebody please add option 1. to the Common Bugs entry. I think we shouldn't
tell users to enable timesyncd just because of this.

Comment 42 Fedora Update System 2023-10-25 22:42:32 UTC
FEDORA-2023-ca928a30eb has been submitted as an update to Fedora 38. https://bodhi.fedoraproject.org/updates/FEDORA-2023-ca928a30eb

Comment 43 Peter Robinson 2023-10-26 00:01:46 UTC
> 1. tell users to 'sudo touch /usr/lib/clock-epoch'

Why is that /usr/lib? In places like ostree that could be read only, even for root, maybe it should be somewhere which should be considered static data like in /var or /etc

Comment 44 Fedora Update System 2023-10-26 02:48:00 UTC
FEDORA-2023-ca928a30eb has been pushed to the Fedora 38 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-ca928a30eb`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-ca928a30eb

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 45 Dusty Mabe 2023-10-26 03:36:16 UTC
(In reply to Peter Robinson from comment #43)
> > 1. tell users to 'sudo touch /usr/lib/clock-epoch'
> 
> Why is that /usr/lib? In places like ostree that could be read only, even
> for root, maybe it should be somewhere which should be considered static
> data like in /var or /etc

FWIW systemd-timesyncd (if enabled) uses /var/lib/systemd/timesync/clock for this purpose. I enable systemd-timesyncd (and disable chrony) on my Pi4 systems specifically for this behavior.

Comment 46 Zbigniew Jędrzejewski-Szmek 2023-10-26 07:03:08 UTC
(In reply to Peter Robinson from comment #43)
> > 1. tell users to 'sudo touch /usr/lib/clock-epoch'
> 
> Why is that /usr/lib? In places like ostree that could be read only, even
> for root, maybe it should be somewhere which should be considered static
> data like in /var or /etc

It is static data. This file is part of the package payload and normally it wouldn't
be touched by the user. It is in /usr because /var might not be available when
systemd is started.

Comment 47 Hristo Marinov 2023-10-26 09:50:11 UTC
Created attachment 1995550 [details]
Shows failed upgrade process log

On Fedora-Minimal-38-1.6.aarch64.raw.xz after installing the upgrade with command:

sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-ca928a30eb

and trying to upgrade to 39, the upgrade process

Failed to start dnf-system-upgrade.service - System Upgrade using DNF.

The complete log of the failed upgrade process can be seen in the attached file.

Comment 48 Peter Robinson 2023-10-26 10:13:29 UTC
(In reply to Zbigniew Jędrzejewski-Szmek from comment #46)
> (In reply to Peter Robinson from comment #43)
> > > 1. tell users to 'sudo touch /usr/lib/clock-epoch'
> > 
> > Why is that /usr/lib? In places like ostree that could be read only, even
> > for root, maybe it should be somewhere which should be considered static
> > data like in /var or /etc
> 
> It is static data. This file is part of the package payload and normally it
> wouldn't
> be touched by the user. It is in /usr because /var might not be available
> when
> systemd is started.

Right, but I don't think you can modify the mtime of a file in a RO filesystem. When I do it on a Fedora IoT system:

$ sudo touch /usr/lib/firmware/
[sudo] password for peter: 
touch: setting times of '/usr/lib/firmware/': Read-only file system

Comment 49 Petr Pisar 2023-10-26 10:21:56 UTC
(In reply to Hristo Marinov from comment #47)
> Created attachment 1995550 [details]
> Shows failed upgrade process log
> 
> On Fedora-Minimal-38-1.6.aarch64.raw.xz after installing the upgrade with
> command:
> 
> sudo dnf upgrade --enablerepo=updates-testing --refresh
> --advisory=FEDORA-2023-ca928a30eb
> 
> and trying to upgrade to 39, the upgrade process
> 
> Failed to start dnf-system-upgrade.service - System Upgrade using DNF.
> 
> The complete log of the failed upgrade process can be seen in the attached
> file.

Oct 24 20:01:16 fedora dnf-3[477]: Running transaction check
Oct 24 20:01:18 fedora dnf-3[477]: error: Verifying a signature using certificate 6A51BBABBA3D5467B6171221809A8D7CEB10B464 (Fedora (38) <fedora-38-primary>):
Oct 24 20:01:18 fedora dnf-3[477]:   Signature 73c5 created at Wed Oct 25 22:43:06 2023 invalid: signature is not alive
Oct 24 20:01:18 fedora dnf-3[477]:       because: Not live until 2023-10-25T22:38:06Z
Oct 24 20:01:18 fedora dnf-3[477]: error: rpmdbNextIterator: skipping h#     704
Oct 24 20:01:18 fedora dnf-3[477]: Header V4 RSA/SHA256 Signature, key ID eb10b464: BAD
Oct 24 20:01:18 fedora dnf-3[477]: Header SHA256 digest: OK
Oct 24 20:01:18 fedora dnf-3[477]: Header SHA1 digest: OK

This looks like a signature of a package which was signed after your systemd was built. There always will be packages updated after systemd.

Comment 50 Brad Smith 2023-10-26 17:02:33 UTC
I see the same 3 signature errors as @Hristo Marinoz. So if there will be, as Petr notes, packages built after systemd then something else needs to be tried.

I then tried option 1 from Zbigniew.

[bgsmith@pi03 ~]$ ll /usr/lib/clock*
ls: cannot access '/usr/lib/clock*': No such file or directory
[bgsmith@pi03 ~]$ sudo touch /usr/lib/clock-epoch
[bgsmith@pi03 ~]$ ll /usr/lib/clock*
-rw-r--r--. 1 root root 0 Oct 26 09:45 /usr/lib/clock-epoch
[bgsmith@pi03 ~]$ sudo dnf system-upgrade reboot

After the RPi restarts:

[bgsmith@pi03 ~]$ cat /etc/redhat-release 
Fedora release 39 (Thirty Nine)

I have reached out to Kamil Páral so that edits to the Common Issue on this topic can be amended with this simple and straight-forward fix.

Comment 51 Fedora Update System 2023-10-27 01:26:31 UTC
FEDORA-2023-ca928a30eb has been pushed to the Fedora 38 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 52 Kamil Páral 2023-10-27 08:21:40 UTC
The update hasn't fixed it, as reported above, reopening.

Comment 53 Zbigniew Jędrzejewski-Szmek 2023-10-27 12:41:09 UTC
Yeah, "[t]here always will be packages updated after systemd", so this is not a solution.

Thinking about this some more, I think we should make two changes:

1. Stop checking signature time on packages. It simply is not useful. There are two possible cases: a) the signature was made by the distribution. The package is valid "always": there is no intended initial validity of the package and the package never stops being a valid artifact produced by the distro. We currently set the initial validity to the moment when the package was built, but that's an implementation accident. If we see an otherwise-valid signature that is not yet valid we can deduce that there is clock disagreement between us and the signing server, nothing more. Also, on the other end, the package may become obsoleted by a later update, but it always remain a legitimate artifact produced by the distro. The other case is b) the signature is invalid / made by an unauthorized entity / fake. If the fake signer is able to make a signature that we treat as valid, they can also set the time as they see fit. So checking the time doesn't give us any additional protection.

When we stop checking signature time, the whole system becomes more robust.

2. Make sure that the time before the upgrade is set properly. Even if we do 1., we want to have proper time so that logs and file mtimes are correct. If the user then looks at 'dnf history' later on, we don't want to have confusing timestamps.

We could implement 2. by using systemd-timesyncd, or by some other mechanism… I think that's open for discussion. I'll reassign this back to 'distribution'.

Comment 54 Kevin Kofler 2023-10-28 01:32:40 UTC
The RPM signature verification should just never depend on the wallclock time, period.

Comment 55 Villy Kruse 2023-10-28 07:56:09 UTC
Is the "-s" option in chronyd working as described in the man page?  It is supposed to set the initial time from the driftfile if no RTC is available.

That would be similar to what systemd-timesyncd does.

 -s

    This option will set the system clock from the computer's real-time clock (RTC)
    or to the last modification time of the file specified by the driftfile directive.

Comment 56 Brad Smith 2023-10-28 15:34:43 UTC
I think the drawback to chrony is that it is not started during the actual dnf system upgrade step. The -s option seems like it would be very useful for devices like the RPi that are also on a network with intermittent connections to upstream time sources. 

(In reply to Villy Kruse from comment #55)
> Is the "-s" option in chronyd working as described in the man page?  It is
> supposed to set the initial time from the driftfile if no RTC is available.
> 
> That would be similar to what systemd-timesyncd does.
> 
>  -s
> 
>     This option will set the system clock from the computer's real-time
> clock (RTC)
>     or to the last modification time of the file specified by the driftfile
> directive.

Comment 57 Adam Williamson 2023-11-02 19:01:59 UTC
Discussed at 2023-11-02 F39 go/no-go meeting, acting as a blocker review meeting: https://meetbot-raw.fedoraproject.org/fedora-meeting/2023-11-02/f39-final-go_no_go-meeting.2023-11-02-17.04.log.txt . We voted to waive this one to Fedora 40 Beta, as it proves to be not at all simple to fix and we don't want to rush in some kind of bodge for the release window. We think it's more sensible to document the various possible workarounds, and keep trying to come up with a practical and safe fix. As soon as we can think of something, it can be applied to all current stable releases and hopefully fix upgrades to all releases, including 39 as well as 40.

Comment 58 amatej 2023-11-13 06:44:08 UTC
*** Bug 2249246 has been marked as a duplicate of this bug. ***

Comment 59 Adam Williamson 2024-02-27 22:53:24 UTC
So, um. We're now in the F40 Beta freeze, and it looks like we all kinda took our eye off this ball. Did anyone actually come up with a better solution?


Note You need to log in before you can comment on or make changes to this bug.