https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_installer/3561/pull-ci-openshift-installer-master-e2e-azure/540 https://search.apps.build01.ci.devcluster.openshift.com/?search=NodeClockNotSynchronising&maxAge=168h&context=1&type=bug%2Bjunit&name=azure&maxMatches=5&maxBytes=20971520&groupBy=job Across 805 runs and 80 jobs (54.29% failed), matched 35.24% of failing runs and 13.75% of jobs [sig-instrumentation][Late] Alerts shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Suite:openshift/conformance/parallel] expand_less 1m30s fail [github.com/openshift/origin/test/extended/util/prometheus/helpers.go:174]: Expected <map[string]error | len:1>: { "count_over_time(ALERTS{alertname!~\"Watchdog|AlertmanagerReceiversNotConfigured|KubeAPILatencyHigh\",alertstate=\"firing\",severity!=\"info\"}[2h]) >= 1": { s: "promQL query: count_over_time(ALERTS{alertname!~\"Watchdog|AlertmanagerReceiversNotConfigured|KubeAPILatencyHigh\",alertstate=\"firing\",severity!=\"info\"}[2h]) >= 1 had reported incorrect results:\n[{\"metric\":{\"alertname\":\"NodeClockNotSynchronising\",\"alertstate\":\"firing\",\"endpoint\":\"https\",\"instance\":\"ci-op-hy8z3bni-2dc90-xpt9z-master-0\" Looks like in the run I linked NodeClockNotSynchronising is firing on all three nodes because node_timex_sync_status is empty.
This should be fixed since https://gitlab.cee.redhat.com/coreos/redhat-coreos/merge_requests/918 but rollout was stalled by https://bugzilla.redhat.com/show_bug.cgi?id=1781575 I think also we've only done the bump in 4.6 and need to backport it to 4.5.
https://search.apps.build01.ci.devcluster.openshift.com/?search=NodeClockNotSynchronising&maxAge=168h&context=1&type=junit&name=&maxMatches=5&maxBytes=20971520&groupBy=job Looks like no hits in last 6 days. Going to mark this as verified.
Awesome! Since then we have the same fix inbound for EC2 and GCP: https://github.com/coreos/fedora-coreos-config/pull/393
This is marked closed in 4.5, but it's still happening, and a lot: https://search.apps.build01.ci.devcluster.openshift.com/?search=NodeClockNotSynchronising&maxAge=168h&context=1&type=junit&name=&maxMatches=5&maxBytes=20971520&groupBy=job example: https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-etcd-operator/354/pull-ci-openshift-cluster-etcd-operator-master-e2e-azure/1552
We also need https://github.com/openshift/installer/pull/3613 AKA bug 1837039 for the bootimage, although I didn't think that would be critical. It's also possible that I regressed this when generalizing it in https://github.com/coreos/fedora-coreos-config/pull/393 I'll take some time to verify the code in the current release payload.
:cry: https://gitlab.cee.redhat.com/coreos/redhat-coreos/merge_requests/955
Re-marking as verified; haven't seen this in the last 12 hours, which is around when the fix merged into CI.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days