Bug 1765609
Summary: | [Azure] Virtual machines are unable to reach public NTP servers | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Nils <nils> | |
Component: | RHCOS | Assignee: | Colin Walters <walters> | |
Status: | CLOSED ERRATA | QA Contact: | Michael Nguyen <mnguyen> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 4.5 | CC: | amurdaca, bbreard, cmarches, dornelas, dustymabe, gpei, hongli, imcleod, jlebon, jligon, mharri, miabbott, nstielau, skumari, smilner, stwalter, walters | |
Target Milestone: | --- | |||
Target Release: | 4.5.0 | |||
Hardware: | Unspecified | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1828342 1834565 1837039 (view as bug list) | Environment: | ||
Last Closed: | 2020-07-13 17:11:31 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1186913, 1828342, 1834565, 1837039 |
Description
Nils
2019-10-25 14:59:27 UTC
This also needs to be done for the worker nodes. The following seems to work: cluster_id="cluster02-tflm4" for rule in $(az network lb rule list -g "${cluster_id}-rg" --lb-name "${cluster_id}" --query [].name -o tsv) do az network lb rule update -g "${cluster_id}-rg" --lb-name "${cluster_id}" --name "${rule}" --disable-outbound-snat true done frontend_ip="$(az network lb frontend-ip list -g "${cluster_id}-rg" --lb-name "${cluster_id}" --query [0].name -o tsv)" az network lb outbound-rule create -g "${cluster_id}-rg" --lb-name "${cluster_id}" --protocol All --address-pool "${cluster_id}" --frontend-ip-configs "${frontend_ip}" --name AllowOutbound After reviewing this the best solution looks to be that when running on Azure we should customize the NTP configuration to use host local time syncing per the Azure best practices. Is there any more information that I'm asked to provide? If not, can we remove the needinfo flag set to my email address? fix for this issue is up for review https://github.com/openshift/machine-config-operator/pull/1658. Please review it and if needed tag along right people. thanks. The MCO PR will fix the specific issue with an azure cluster - it will not however take care of the bootstrap host. It might make sense in the future to have something like the MCO create and configure the bootstrap host but it's not the case today and it's not trivial. We're going to open a PR to the installer as well to take care of this issue on the bootstrap host as well with the gotcha to keep those chrony confs in sync. *** Bug 1828342 has been marked as a duplicate of this bug. *** MR merged. Should this move to modified? After this lands in 4.5 and we've tested it, I'll clone this for 4.4 and probably 4.3 too. This change is part of https://openshift-release.svc.ci.openshift.org/releasestream/4.5.0-0.nightly/release/4.5.0-0.nightly-2020-05-11-211039 at least. Verified this: walters@toolbox /s/w/rhcos-master> oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.0-0.nightly-2020-05-11-211039 True False 2m39s Cluster version is 4.5.0-0.nightly-2020-05-11-211039 walters@toolbox /s/w/rhcos-master> oc debug node/ci-ln-0wb12s2-002ac-t2xsf-master-0 ... [root@ci-ln-0wb12s2-002ac-t2xsf-master-0 /]# rpm-ostree status -b State: idle AutomaticUpdates: disabled BootedDeployment: * pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d202a8b981606f4ca691b6f87d847782f95b666963c81b207aa091ce1f806198 CustomOrigin: Managed by machine-config-operator Version: 45.81.202005111729-0 (2020-05-11T17:33:22Z) [root@ci-ln-0wb12s2-002ac-t2xsf-master-0 /]# chronyc sources 210 Number of sources = 1 MS Name/IP address Stratum Poll Reach LastRx Last sample =============================================================================== #* PHC0 0 3 377 9 +772ns[ +525ns] +/- 339ns [root@ci-ln-0wb12s2-002ac-t2xsf-master-0 /]# $ oc get node NAME STATUS ROLES AGE VERSION ci-ln-tfmy982-002ac-mgnvn-master-0 Ready master 33m v1.18.2 ci-ln-tfmy982-002ac-mgnvn-master-1 Ready master 33m v1.18.2 ci-ln-tfmy982-002ac-mgnvn-master-2 Ready master 33m v1.18.2 ci-ln-tfmy982-002ac-mgnvn-worker-centralus1-c5hhw Ready worker 18m v1.18.2 ci-ln-tfmy982-002ac-mgnvn-worker-centralus2-xp5k9 Ready worker 18m v1.18.2 ci-ln-tfmy982-002ac-mgnvn-worker-centralus3-fhmbq Ready worker 18m v1.18.2 $ oc debug node/ci-ln-tfmy982-002ac-mgnvn-master-0 Starting pod/ci-ln-tfmy982-002ac-mgnvn-master-0-debug ... To use host binaries, run `chroot /host` If you don't see a command prompt, try pressing enter. sh-4.2# chroot /host sh-4.4# rpm-ostree status -b State: idle AutomaticUpdates: disabled BootedDeployment: * pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:434a0401a0fb22773e49fda5127c9610145d21b67435e8bc822bdc860aafaf98 CustomOrigin: Managed by machine-config-operator Version: 45.81.202005121031-0 (2020-05-12T10:34:32Z) sh-4.4# chronyc sources 210 Number of sources = 1 MS Name/IP address Stratum Poll Reach LastRx Last sample =============================================================================== #* PHC0 0 3 377 6 -17us[ -25us] +/- 8808ns sh-4.4# exit exit sh-4.2# exit exit Removing debug pod ... $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.0-0.nightly-2020-05-12-163804 True False 17m Cluster version is 4.5.0-0.nightly-2020-05-12-163804 $ oc debug node/ci-ln-tfmy982-002ac-mgnvn-worker-centralus3-fhmbq Starting pod/ci-ln-tfmy982-002ac-mgnvn-worker-centralus3-fhmbq-debug ... To use host binaries, run `chroot /host` If you don't see a command prompt, try pressing enter. sh-4.2# chroot /host sh-4.4# rpm-ostree status -b State: idle AutomaticUpdates: disabled BootedDeployment: * pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:434a0401a0fb22773e49fda5127c9610145d21b67435e8bc822bdc860aafaf98 CustomOrigin: Managed by machine-config-operator Version: 45.81.202005121031-0 (2020-05-12T10:34:32Z) sh-4.4# chronyc sources 210 Number of sources = 1 MS Name/IP address Stratum Poll Reach LastRx Last sample =============================================================================== #* PHC0 0 3 377 10 -5379ns[-6278ns] +/- 1374ns sh-4.4# cd /run/systemd/generator/chronyd.service.d/ sh-4.4# ls coreos-azure-phc.conf sh-4.4# cat coreos-azure-phc.conf [Service] ExecStart= ExecStart=/usr/sbin/chronyd -f /run/coreos-azure-phc-chrony.conf $OPTIONS sh-4.4# cat /dev/kmsg | grep PHC 12,654,17038655,-;coreos-azure-phc: Updated chrony to use Azure PHC Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 |