Bug 2213636 - systemd 253.5 introduces regression in coreos CI ntp tests
Summary: systemd 253.5 introduces regression in coreos CI ntp tests
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: 39
Hardware: Unspecified
OS: Linux
high
high
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL: https://github.com/coreos/fedora-core...
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-06-08 19:54 UTC by Aashish Radhakrishnan
Modified: 2024-06-28 12:18 UTC (History)
10 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2024-06-28 12:18:39 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Aashish Radhakrishnan 2023-06-08 19:54:16 UTC
In Fedora CoreOS CI we noticed that our `ext.config.ntp.chrony.dhcp-propagation` and `ext.config.ntp.timesyncd.dhcp-propagation`
tests started timing out specifically with the transition of systemd-253.4-1.fc39 -> 253.5-1.fc39. This also affects Fedora 38.

This issue is tracked in the Fedora CoreOS issue tracker at https://github.com/coreos/fedora-coreos-tracker/issues/1508

I have attached the logs from a good run (with systemd-253.4-1.fc39) and a bad run (with systemd-253.5-1.fc39). I couldn't see
anything that stood out in the logs as a smoking gun for why the systemd update could have caused this change in behavior.

Is there anything in systemd-253.5 that stands out as a possible culprit for the change in behavior? Here's the commit list:
https://github.com/systemd/systemd-stable/compare/v253.4...v253.5



Reproducible: Always

Steps to Reproduce:
1) Build and run latest FCOS rawhide image.

```
cosa fetch && cosa build
```

2) Run kola tests - ext.config.ntp.chrony.dhcp-propagation & ext.config.ntp.timesyncd.dhcp-propagation

```
cosa kola run ext.config.ntp.chrony.dhcp-propagation ext.config.ntp.timesyncd.dhcp-propagation
```

3) Tests fail due to timeout

```
[2023-06-06T13:10:51.596Z] --- FAIL: ext.config.ntp.chrony.dhcp-propagation (903.63s)
[2023-06-06T13:10:51.596Z] harness.go:105: TIMEOUT[15m0s]: ssh: sudo ./kolet run-test-unit kola-runext.service
[2023-06-06T13:10:51.596Z] harness.go:105: TIMEOUT[15m0s]: ssh: journalctl -t kola-runext-dhcp-propagation
[2023-06-06T13:19:13.099Z] 2023-06-06T13:19:06Z kola: Test timed out. Adding as candidate for rerun success: ext.config.ntp.timesyncd.dhcp-propagation
[2023-06-06T13:19:13.099Z] --- FAIL: ext.config.ntp.timesyncd.dhcp-propagation (903.65s)
[2023-06-06T13:19:13.099Z] harness.go:105: TIMEOUT[15m0s]: ssh: sudo ./kolet run-test-unit kola-runext.service
[2023-06-06T13:19:13.099Z] harness.go:105: TIMEOUT[15m0s]: ssh: journalctl -t kola-runext-test.sh
```

Actual Results:  
ext.config.ntp.chrony.dhcp-propagation & ext.config.ntp.timesyncd.dhcp-propagation fail due to timeout

Expected Results:  
ext.config.ntp.chrony.dhcp-propagation & ext.config.ntp.timesyncd.dhcp-propagation to pass

The log message from the bad run does container an SELinux denial that isn't in the good run:

```
[    7.672266] audit: type=1400 audit(1686056144.155:4): avc:  denied  { audit_control } for  pid=1 comm="systemd" capability=30  scontext=system_u:system_r:init_t:s0 tcontext=system_u:system_r:init_t:s0 tclass=capability permissive=0
[    7.673805] systemd[1]: Listening on systemd-journald-audit.socket - Journal Audit Socket.
[[0;32m  OK  [0m] Listening on [0;1;39msystemd-journ…socket[0m - Journal Audit Socket.
[    7.683161] systemd[1]: Starting systemd-journald.service - Journal Service...
         Starting [0;1;39msystemd-journald.service[0m - Journal Service...
```

but setting selinux to permissive (with enforcing=0) didn't seem to make a difference.

Comment 1 Aashish Radhakrishnan 2023-06-08 20:09:14 UTC
Created attachment 1969816 [details]
console_log - good run (using systemd-253.4-1.fc39)

Comment 2 Aashish Radhakrishnan 2023-06-08 20:10:34 UTC
Created attachment 1969817 [details]
journal log - good run (using systemd-253.4-1.fc39)

Comment 3 Aashish Radhakrishnan 2023-06-08 20:11:48 UTC
Created attachment 1969818 [details]
console log - bad run (using systemd-253.5-1.fc39)

Comment 4 Aashish Radhakrishnan 2023-06-08 20:12:28 UTC
Created attachment 1969819 [details]
journal log - bad run (using systemd-253.5-1.fc39)

Comment 5 Fedora Release Engineering 2023-08-16 08:07:47 UTC
This bug appears to have been reported against 'rawhide' during the Fedora Linux 39 development cycle.
Changing version to 39.

Comment 6 David Tardon 2024-06-28 12:18:39 UTC
According to https://github.com/coreos/fedora-coreos-tracker/issues/1508#issuecomment-1674182241 this was caused by a commit that's been reverted since (even on v253-stable). Closing.


Note You need to log in before you can comment on or make changes to this bug.