Bug 2213636 - systemd 253.5 introduces regression in coreos CI ntp tests
Summary: systemd 253.5 introduces regression in coreos CI ntp tests
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: 39
Hardware: Unspecified
OS: Linux
high
high
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL: https://github.com/coreos/fedora-core...
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-06-08 19:54 UTC by Aashish Radhakrishnan
Modified: 2023-08-16 08:07 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Aashish Radhakrishnan 2023-06-08 19:54:16 UTC
In Fedora CoreOS CI we noticed that our `ext.config.ntp.chrony.dhcp-propagation` and `ext.config.ntp.timesyncd.dhcp-propagation`
tests started timing out specifically with the transition of systemd-253.4-1.fc39 -> 253.5-1.fc39. This also affects Fedora 38.

This issue is tracked in the Fedora CoreOS issue tracker at https://github.com/coreos/fedora-coreos-tracker/issues/1508

I have attached the logs from a good run (with systemd-253.4-1.fc39) and a bad run (with systemd-253.5-1.fc39). I couldn't see
anything that stood out in the logs as a smoking gun for why the systemd update could have caused this change in behavior.

Is there anything in systemd-253.5 that stands out as a possible culprit for the change in behavior? Here's the commit list:
https://github.com/systemd/systemd-stable/compare/v253.4...v253.5



Reproducible: Always

Steps to Reproduce:
1) Build and run latest FCOS rawhide image.

```
cosa fetch && cosa build
```

2) Run kola tests - ext.config.ntp.chrony.dhcp-propagation & ext.config.ntp.timesyncd.dhcp-propagation

```
cosa kola run ext.config.ntp.chrony.dhcp-propagation ext.config.ntp.timesyncd.dhcp-propagation
```

3) Tests fail due to timeout

```
[2023-06-06T13:10:51.596Z] --- FAIL: ext.config.ntp.chrony.dhcp-propagation (903.63s)
[2023-06-06T13:10:51.596Z] harness.go:105: TIMEOUT[15m0s]: ssh: sudo ./kolet run-test-unit kola-runext.service
[2023-06-06T13:10:51.596Z] harness.go:105: TIMEOUT[15m0s]: ssh: journalctl -t kola-runext-dhcp-propagation
[2023-06-06T13:19:13.099Z] 2023-06-06T13:19:06Z kola: Test timed out. Adding as candidate for rerun success: ext.config.ntp.timesyncd.dhcp-propagation
[2023-06-06T13:19:13.099Z] --- FAIL: ext.config.ntp.timesyncd.dhcp-propagation (903.65s)
[2023-06-06T13:19:13.099Z] harness.go:105: TIMEOUT[15m0s]: ssh: sudo ./kolet run-test-unit kola-runext.service
[2023-06-06T13:19:13.099Z] harness.go:105: TIMEOUT[15m0s]: ssh: journalctl -t kola-runext-test.sh
```

Actual Results:  
ext.config.ntp.chrony.dhcp-propagation & ext.config.ntp.timesyncd.dhcp-propagation fail due to timeout

Expected Results:  
ext.config.ntp.chrony.dhcp-propagation & ext.config.ntp.timesyncd.dhcp-propagation to pass

The log message from the bad run does container an SELinux denial that isn't in the good run:

```
[    7.672266] audit: type=1400 audit(1686056144.155:4): avc:  denied  { audit_control } for  pid=1 comm="systemd" capability=30  scontext=system_u:system_r:init_t:s0 tcontext=system_u:system_r:init_t:s0 tclass=capability permissive=0
[    7.673805] systemd[1]: Listening on systemd-journald-audit.socket - Journal Audit Socket.
[[0;32m  OK  [0m] Listening on [0;1;39msystemd-journ…socket[0m - Journal Audit Socket.
[    7.683161] systemd[1]: Starting systemd-journald.service - Journal Service...
         Starting [0;1;39msystemd-journald.service[0m - Journal Service...
```

but setting selinux to permissive (with enforcing=0) didn't seem to make a difference.

Comment 1 Aashish Radhakrishnan 2023-06-08 20:09:14 UTC
Created attachment 1969816 [details]
console_log - good run (using systemd-253.4-1.fc39)

Comment 2 Aashish Radhakrishnan 2023-06-08 20:10:34 UTC
Created attachment 1969817 [details]
journal log - good run (using systemd-253.4-1.fc39)

Comment 3 Aashish Radhakrishnan 2023-06-08 20:11:48 UTC
Created attachment 1969818 [details]
console log - bad run (using systemd-253.5-1.fc39)

Comment 4 Aashish Radhakrishnan 2023-06-08 20:12:28 UTC
Created attachment 1969819 [details]
journal log - bad run (using systemd-253.5-1.fc39)

Comment 5 Fedora Release Engineering 2023-08-16 08:07:47 UTC
This bug appears to have been reported against 'rawhide' during the Fedora Linux 39 development cycle.
Changing version to 39.


Note You need to log in before you can comment on or make changes to this bug.