Bug 2213636

Summary: systemd 253.5 introduces regression in coreos CI ntp tests
Product: [Fedora] Fedora Reporter: Aashish Radhakrishnan <aaradhak>
Component: systemdAssignee: systemd-maint
Status: NEW --- QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: high    
Version: 39CC: dustymabe, fedoraproject, filbranden, lnykryn, msekleta, ryncsn, systemd-maint, yuwatana, zbyszek
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
URL: https://github.com/coreos/fedora-coreos-tracker/issues/1508
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Aashish Radhakrishnan 2023-06-08 19:54:16 UTC
In Fedora CoreOS CI we noticed that our `ext.config.ntp.chrony.dhcp-propagation` and `ext.config.ntp.timesyncd.dhcp-propagation`
tests started timing out specifically with the transition of systemd-253.4-1.fc39 -> 253.5-1.fc39. This also affects Fedora 38.

This issue is tracked in the Fedora CoreOS issue tracker at https://github.com/coreos/fedora-coreos-tracker/issues/1508

I have attached the logs from a good run (with systemd-253.4-1.fc39) and a bad run (with systemd-253.5-1.fc39). I couldn't see
anything that stood out in the logs as a smoking gun for why the systemd update could have caused this change in behavior.

Is there anything in systemd-253.5 that stands out as a possible culprit for the change in behavior? Here's the commit list:
https://github.com/systemd/systemd-stable/compare/v253.4...v253.5



Reproducible: Always

Steps to Reproduce:
1) Build and run latest FCOS rawhide image.

```
cosa fetch && cosa build
```

2) Run kola tests - ext.config.ntp.chrony.dhcp-propagation & ext.config.ntp.timesyncd.dhcp-propagation

```
cosa kola run ext.config.ntp.chrony.dhcp-propagation ext.config.ntp.timesyncd.dhcp-propagation
```

3) Tests fail due to timeout

```
[2023-06-06T13:10:51.596Z] --- FAIL: ext.config.ntp.chrony.dhcp-propagation (903.63s)
[2023-06-06T13:10:51.596Z] harness.go:105: TIMEOUT[15m0s]: ssh: sudo ./kolet run-test-unit kola-runext.service
[2023-06-06T13:10:51.596Z] harness.go:105: TIMEOUT[15m0s]: ssh: journalctl -t kola-runext-dhcp-propagation
[2023-06-06T13:19:13.099Z] 2023-06-06T13:19:06Z kola: Test timed out. Adding as candidate for rerun success: ext.config.ntp.timesyncd.dhcp-propagation
[2023-06-06T13:19:13.099Z] --- FAIL: ext.config.ntp.timesyncd.dhcp-propagation (903.65s)
[2023-06-06T13:19:13.099Z] harness.go:105: TIMEOUT[15m0s]: ssh: sudo ./kolet run-test-unit kola-runext.service
[2023-06-06T13:19:13.099Z] harness.go:105: TIMEOUT[15m0s]: ssh: journalctl -t kola-runext-test.sh
```

Actual Results:  
ext.config.ntp.chrony.dhcp-propagation & ext.config.ntp.timesyncd.dhcp-propagation fail due to timeout

Expected Results:  
ext.config.ntp.chrony.dhcp-propagation & ext.config.ntp.timesyncd.dhcp-propagation to pass

The log message from the bad run does container an SELinux denial that isn't in the good run:

```
[    7.672266] audit: type=1400 audit(1686056144.155:4): avc:  denied  { audit_control } for  pid=1 comm="systemd" capability=30  scontext=system_u:system_r:init_t:s0 tcontext=system_u:system_r:init_t:s0 tclass=capability permissive=0
[    7.673805] systemd[1]: Listening on systemd-journald-audit.socket - Journal Audit Socket.
[[0;32m  OK  [0m] Listening on [0;1;39msystemd-journ…socket[0m - Journal Audit Socket.
[    7.683161] systemd[1]: Starting systemd-journald.service - Journal Service...
         Starting [0;1;39msystemd-journald.service[0m - Journal Service...
```

but setting selinux to permissive (with enforcing=0) didn't seem to make a difference.

Comment 1 Aashish Radhakrishnan 2023-06-08 20:09:14 UTC
Created attachment 1969816 [details]
console_log - good run (using systemd-253.4-1.fc39)

Comment 2 Aashish Radhakrishnan 2023-06-08 20:10:34 UTC
Created attachment 1969817 [details]
journal log - good run (using systemd-253.4-1.fc39)

Comment 3 Aashish Radhakrishnan 2023-06-08 20:11:48 UTC
Created attachment 1969818 [details]
console log - bad run (using systemd-253.5-1.fc39)

Comment 4 Aashish Radhakrishnan 2023-06-08 20:12:28 UTC
Created attachment 1969819 [details]
journal log - bad run (using systemd-253.5-1.fc39)

Comment 5 Fedora Release Engineering 2023-08-16 08:07:47 UTC
This bug appears to have been reported against 'rawhide' during the Fedora Linux 39 development cycle.
Changing version to 39.