2213636 – systemd 253.5 introduces regression in coreos CI ntp tests

Bug 2213636 - systemd 253.5 introduces regression in coreos CI ntp tests

Summary: systemd 253.5 introduces regression in coreos CI ntp tests

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	systemd
Sub Component:
Version:	39
Hardware:	Unspecified
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	systemd-maint
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:	https://github.com/coreos/fedora-core...
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2023-06-08 19:54 UTC by Aashish Radhakrishnan
Modified:	2024-06-28 12:18 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2024-06-28 12:18:39 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Aashish Radhakrishnan 2023-06-08 19:54:16 UTC

In Fedora CoreOS CI we noticed that our `ext.config.ntp.chrony.dhcp-propagation` and `ext.config.ntp.timesyncd.dhcp-propagation`
tests started timing out specifically with the transition of systemd-253.4-1.fc39 -> 253.5-1.fc39. This also affects Fedora 38.

This issue is tracked in the Fedora CoreOS issue tracker at https://github.com/coreos/fedora-coreos-tracker/issues/1508

I have attached the logs from a good run (with systemd-253.4-1.fc39) and a bad run (with systemd-253.5-1.fc39). I couldn't see
anything that stood out in the logs as a smoking gun for why the systemd update could have caused this change in behavior.

Is there anything in systemd-253.5 that stands out as a possible culprit for the change in behavior? Here's the commit list:
https://github.com/systemd/systemd-stable/compare/v253.4...v253.5



Reproducible: Always

Steps to Reproduce:
1) Build and run latest FCOS rawhide image.

```
cosa fetch && cosa build
```

2) Run kola tests - ext.config.ntp.chrony.dhcp-propagation & ext.config.ntp.timesyncd.dhcp-propagation

```
cosa kola run ext.config.ntp.chrony.dhcp-propagation ext.config.ntp.timesyncd.dhcp-propagation
```

3) Tests fail due to timeout

```
[2023-06-06T13:10:51.596Z] --- FAIL: ext.config.ntp.chrony.dhcp-propagation (903.63s)
[2023-06-06T13:10:51.596Z] harness.go:105: TIMEOUT[15m0s]: ssh: sudo ./kolet run-test-unit kola-runext.service
[2023-06-06T13:10:51.596Z] harness.go:105: TIMEOUT[15m0s]: ssh: journalctl -t kola-runext-dhcp-propagation
[2023-06-06T13:19:13.099Z] 2023-06-06T13:19:06Z kola: Test timed out. Adding as candidate for rerun success: ext.config.ntp.timesyncd.dhcp-propagation
[2023-06-06T13:19:13.099Z] --- FAIL: ext.config.ntp.timesyncd.dhcp-propagation (903.65s)
[2023-06-06T13:19:13.099Z] harness.go:105: TIMEOUT[15m0s]: ssh: sudo ./kolet run-test-unit kola-runext.service
[2023-06-06T13:19:13.099Z] harness.go:105: TIMEOUT[15m0s]: ssh: journalctl -t kola-runext-test.sh
```

Actual Results:  
ext.config.ntp.chrony.dhcp-propagation & ext.config.ntp.timesyncd.dhcp-propagation fail due to timeout

Expected Results:  
ext.config.ntp.chrony.dhcp-propagation & ext.config.ntp.timesyncd.dhcp-propagation to pass

The log message from the bad run does container an SELinux denial that isn't in the good run:

```
[    7.672266] audit: type=1400 audit(1686056144.155:4): avc:  denied  { audit_control } for  pid=1 comm="systemd" capability=30  scontext=system_u:system_r:init_t:s0 tcontext=system_u:system_r:init_t:s0 tclass=capability permissive=0
[    7.673805] systemd[1]: Listening on systemd-journald-audit.socket - Journal Audit Socket.
[[0;32m  OK  [0m] Listening on [0;1;39msystemd-journ…socket[0m - Journal Audit Socket.
[    7.683161] systemd[1]: Starting systemd-journald.service - Journal Service...
         Starting [0;1;39msystemd-journald.service[0m - Journal Service...
```

but setting selinux to permissive (with enforcing=0) didn't seem to make a difference.

Comment 1 Aashish Radhakrishnan 2023-06-08 20:09:14 UTC

Created attachment 1969816 [details]
console_log - good run (using systemd-253.4-1.fc39)

Comment 2 Aashish Radhakrishnan 2023-06-08 20:10:34 UTC

Created attachment 1969817 [details]
journal log - good run (using systemd-253.4-1.fc39)

Comment 3 Aashish Radhakrishnan 2023-06-08 20:11:48 UTC

Created attachment 1969818 [details]
console log - bad run (using systemd-253.5-1.fc39)

Comment 4 Aashish Radhakrishnan 2023-06-08 20:12:28 UTC

Created attachment 1969819 [details]
journal log - bad run (using systemd-253.5-1.fc39)

Comment 5 Fedora Release Engineering 2023-08-16 08:07:47 UTC

This bug appears to have been reported against 'rawhide' during the Fedora Linux 39 development cycle.
Changing version to 39.

Comment 6 David Tardon 2024-06-28 12:18:39 UTC

According to https://github.com/coreos/fedora-coreos-tracker/issues/1508#issuecomment-1674182241 this was caused by a commit that's been reverted since (even on v253-stable). Closing.

Note You need to log in before you can comment on or make changes to this bug.