Bug 1833098
Summary: | 35% of Azure failures include the alert e2e test NodeClockNotSynchronising firing | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Clayton Coleman <ccoleman> | |
Component: | RHCOS | Assignee: | Colin Walters <walters> | |
Status: | CLOSED ERRATA | QA Contact: | Michael Nguyen <mnguyen> | |
Severity: | high | Docs Contact: | ||
Priority: | medium | |||
Version: | 4.5 | CC: | bbreard, dmace, ffranz, imcleod, jligon, miabbott, nstielau, walters, wking | |
Target Milestone: | --- | Keywords: | Reopened | |
Target Release: | 4.6.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | No Doc Update | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1835801 (view as bug list) | Environment: |
[sig-instrumentation] Prometheus when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early]
|
|
Last Closed: | 2020-10-27 15:58:53 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1835801 |
Description
Clayton Coleman
2020-05-07 19:31:00 UTC
This should be fixed since https://gitlab.cee.redhat.com/coreos/redhat-coreos/merge_requests/918 but rollout was stalled by https://bugzilla.redhat.com/show_bug.cgi?id=1781575 I think also we've only done the bump in 4.6 and need to backport it to 4.5. https://search.apps.build01.ci.devcluster.openshift.com/?search=NodeClockNotSynchronising&maxAge=168h&context=1&type=junit&name=&maxMatches=5&maxBytes=20971520&groupBy=job Looks like no hits in last 6 days. Going to mark this as verified. Awesome! Since then we have the same fix inbound for EC2 and GCP: https://github.com/coreos/fedora-coreos-config/pull/393 This is marked closed in 4.5, but it's still happening, and a lot: https://search.apps.build01.ci.devcluster.openshift.com/?search=NodeClockNotSynchronising&maxAge=168h&context=1&type=junit&name=&maxMatches=5&maxBytes=20971520&groupBy=job example: https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-etcd-operator/354/pull-ci-openshift-cluster-etcd-operator-master-e2e-azure/1552 We also need https://github.com/openshift/installer/pull/3613 AKA bug 1837039 for the bootimage, although I didn't think that would be critical. It's also possible that I regressed this when generalizing it in https://github.com/coreos/fedora-coreos-config/pull/393 I'll take some time to verify the code in the current release payload. Re-marking as verified; haven't seen this in the last 12 hours, which is around when the fix merged into CI. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |