Bug 1975383

Summary: No NTP sources defined in a cluster after assisted installation
Product: OpenShift Container Platform Reporter: Udi Kalifon <ukalifon>
Component: assisted-installerAssignee: Rewant <resoni>
assisted-installer sub component: assisted-service QA Contact: Yuri Obshansky <yobshans>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: medium CC: aos-bugs, resoni
Version: 4.8   
Target Milestone: ---   
Target Release: 4.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: AI-Team-Core
Fixed In Version: OCP-Metal-v1.0.24.1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-18 17:36:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Udi Kalifon 2021-06-23 14:44:30 UTC
Description of problem:
I tried to enable ceph after installing a cluster with OCS, and it's stuck in HEALTH_WARN and it shows that the node is not synchronized. Checking the chrony configuration, I see that it doesn't have any servers defined in chrony.conf:

$ cat /etc/chrony.conf 
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync


Version-Release number of selected component (if applicable):
4.8.0-fc9


How reproducible:
Unknown


Steps to Reproduce:
1. Install a cluster with 3 masters and 3 workers using the AI, and include the OCS operator
2. After installation, ssh to one of the nodes
3. Run the command "chronyc sources" and check /etc/chrony.conf


Actual results:
Number of sources = 0


Expected results:
The cluster should get the ntp configuration from dhcp, and if there isn't any - it should use some global servers from the internet. There should be no situation where there is no ntp at all.

Comment 2 Udi Kalifon 2021-06-23 16:46:02 UTC
I reinstalled the same cluster, this time making sure to provide an NTP source in the networking step in AI. Ceph is healthy and I was able to enable the registry with it and build an app. The command "chronyc sources" shows 6 sources, even though I specified ony 1. Still, I don't see why the AI won't use a known global server if we don't provide any.

Comment 3 Gobinda Das 2021-06-25 13:17:30 UTC
Rewant is looking into it.

Comment 4 Rewant 2021-06-29 07:12:14 UTC
@udi can you provide more logs? and try it without OCS and see if the error is still reproduced?

Comment 5 Udi Kalifon 2021-07-02 10:19:11 UTC
Rewant, can you specify which additional logs? You can find the must-gather ^^.
I will try to recreate without OCS.

Comment 6 Rewant 2021-07-02 12:01:00 UTC
@Udi chrony sources, agent logs

Comment 7 Udi Kalifon 2021-07-02 12:15:26 UTC
I recreated the issue also without including OCS. It happens pretty consistently in our US lab if we don't explicitly specify an NTP source in the networking page in the assisted installer.

[core@worker-0-1 ~]$ cat /etc/chrony.conf 

driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
logdir /var/log/chrony

[core@worker-0-1 ~]$ chronyc sources
210 Number of sources = 0
MS Name/IP address         Stratum Poll Reach LastRx Last sample               
===============================================================================

Comment 14 errata-xmlrpc 2021-10-18 17:36:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759