Bug 1884465
| Summary: | [Assisted-4.6 ]Installation fails when there is a time drift between the nodes | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | nshidlin <nshidlin> |
| Component: | assisted-installer | Assignee: | Igal Tsoiref <itsoiref> |
| assisted-installer sub component: | discovery-agent | QA Contact: | Yuri Obshansky <yobshans> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | unspecified | ||
| Priority: | unspecified | CC: | alazar, aos-bugs, lgamliel |
| Version: | 4.6 | ||
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-10-08 04:18:09 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Workaround for this issue: https://issues.redhat.com/browse/MGMT-2342 Full solution: https://issues.redhat.com/browse/MGMT-2322? We have implemented validation that verifies that we don't have time drift in the hosts. Allowed time diff is 4 minutes. u can set time with date +%T -s "hh:mm:ss" and verify that if we have time diff more than 4 minutes installation must not start and NTP validator error returns Verified on staging:
{
'release_tag': 'v1.0.9.8-ds',
'versions': {
'assisted-ignition-generator': 'quay.io/ocpmetal/assisted-ignition-generator:v1.0.9.5',
'assisted-installer': 'registry-proxy.engineering.redhat.com/rh-osbs/openshift4-assisted-installer-rhel8:v4.6.0-21',
'assisted-installer-controller': 'registry-proxy.engineering.redhat.com/rh-osbs/openshift4-assisted-installer-reporter-rhel8:v4.6.0-17',
'assisted-installer-service': 'quay.io/app-sre/assisted-service:394a627',
'discovery-agent': 'registry-proxy.engineering.redhat.com/rh-osbs/openshift4-assisted-installer-agent-rhel8:v4.6.0-17',
'image-builder': 'quay.io/app-sre/assisted-iso-create:394a627'
}
}
Scenario:
1. Boot nodes into ISO
2. Set the time of one of the nodes to a 5 difference from the other nodes
3. After boot discovery fill in necessary fields to install cluster
4. Cluster installation is disabled with message "Host clocks are not synchronized, please configure an NTP server via DHCP."
5. Fix time difference between the nodes
6. Cluster is ready to be installed
7. Cluster Installation Succeeds
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 assisted installer), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:4199 |
Description of problem: When there is a time drift between the nodes in the cluster the installation fails with error: Host master-0-0: updated status from "installing-in-progress" to "error" (Host failed to install because its installation stage Joined took longer than expected 20m0s) Version-Release number of selected component (if applicable): { 'release_tag': 'v1.0.9.6-ds', 'versions': { 'assisted-ignition-generator': 'quay.io/ocpmetal/assisted-ignition- generator:v1.0.9.5', 'assisted-installer': 'registry.redhat.io/openshift4/assisted-installer- rhel8:v4.6.0-19', 'assisted-installer-controller': 'registry.redhat.io/openshift4/assisted- installer-reporter-rhel8:v4.6.0-15', 'assisted-installer-service': 'quay.io/app-sre/assisted-service:bcea367', 'discovery-agent': 'registry.redhat.io/openshift4/assisted-installer- agent-rhel8:v4.6.0-15', 'image-builder': 'quay.io/app-sre/assisted-iso-create:bcea367'}} Steps to Reproduce: 1.Boot nodes into discovery ISO 2.Change the time on the nodes such that there is a multi hour time diff between the nodes: # ssh core@master-0-0 date +%T -s "08:35:00" # ssh core@master-0-1 date +%T -s "14:23:00" # ssh core@master-0-2 date +%T -s "20:00:00" 3.Start cluster installation Actual results: Cluster installation fails Expected results: Additional info: Credentials pod fails to start: Oct 01 14:33:47 master-0-0 hyperkube[6947]: E1001 14:33:47.556970 6947 pod_workers.go:191] Error syncing pod 49bf00fccfe455708243ba513f26ede2 ("cloud-credential-operator-master-0-0_openshift-cloud-credential-operator(49bf00fccfe455708>