Bug 1955229
| Summary: | release-openshift-origin-installer-e2e-aws-calico-4.7 is permfailing | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Ben Parees <bparees> |
| Component: | Networking | Assignee: | jamo luhrsen <jluhrsen> |
| Networking sub component: | openshift-sdn | QA Contact: | zhaozhanqi <zzhao> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | high | ||
| Priority: | medium | CC: | aconstan, dcbw, vrutkovs |
| Version: | 4.8 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.8.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: |
job=release-openshift-origin-installer-e2e-aws-calico-4.7=all
|
|
| Last Closed: | 2021-07-27 23:05:13 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Ben Parees
2021-04-29 17:50:32 UTC
[sig-network-edge][Conformance][Area:Networking][Feature:Router] The HAProxy router should be able to connect to a service that is idled because a GET on the route will unidle it [Suite:openshift/conformance/parallel/minimal] ^^^^^ I'm pretty sure Calico doesn't support unidling, so this test should be skipped for anything !openshift-sdn && !ovn-kubernetes [sig-instrumentation] Prometheus when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early] [Suite:openshift/conformance/parallel]
^^^^^ looks legit and should get fixed in our kube-proxy handling
Apr 27 23:46:22.731: INFO: stdout: "{\"status\":\"success\",\"data\":{\"resultType\":\"vector\",\"result\":[{\"metric\":{\"__name__\":\"ALERTS\",\"alertname\":\"TargetDown\",\"alertstate\":\"firing\",\"job\":\"kube-proxy\",\"namespace\":\"openshift-kube-proxy\",\"service\":\"openshift-kube-proxy\",\"severity\":\"warning\"},\"value\":[1619567182.717,\"1\"]}]}}"
Apr 27 23:46:22.731: INFO: promQL query: ALERTS{alertname!~"Watchdog|AlertmanagerReceiversNotConfigured|PrometheusRemoteWriteDesiredShards",alertstate="firing",severity!="info"} >= 1 had reported incorrect results:
[{"metric":{"__name__":"ALERTS","alertname":"TargetDown","alertstate":"firing","job":"kube-proxy","namespace":"openshift-kube-proxy","service":"openshift-kube-proxy","severity":"warning"},"value":[1619567182.717,"1"]}]
STEP: perform prometheus metric query ALERTS{alertname!~"Watchdog|AlertmanagerReceiversNotConfigured|PrometheusRemoteWriteDesiredShards",alertstate="firing",severity!="info"} >= 1
Apr 27 23:46:33.082: INFO: stdout: "{\"status\":\"success\",\"data\":{\"resultType\":\"vector\",\"result\":[{\"metric\":{\"__name__\":\"ALERTS\",\"alertname\":\"TargetDown\",\"alertstate\":\"firing\",\"job\":\"kube-proxy\",\"namespace\":\"openshift-kube-proxy\",\"service\":\"openshift-kube-proxy\",\"severity\":\"warning\"},\"value\":[1619567193.068,\"1\"]}]}}"
Apr 27 23:46:33.082: INFO: promQL query: ALERTS{alertname!~"Watchdog|AlertmanagerReceiversNotConfigured|PrometheusRemoteWriteDesiredShards",alertstate="firing",severity!="info"} >= 1 had reported incorrect results:
[{"metric":{"__name__":"ALERTS","alertname":"TargetDown","alertstate":"firing","job":"kube-proxy","namespace":"openshift-kube-proxy","service":"openshift-kube-proxy","severity":"warning"},"value":[1619567193.068,"1"]}]
I will take a look at this as soon as I get a few higher priority things taken care of. This job was moved to use the step-registry for 4.8 and that job is not perma-failing. I think you could mark this bug as verified now. The job properly twice already and did fail, but each failed for a different reason. https://testgrid.k8s.io/redhat-openshift-ocp-release-4.8-informing#periodic-ci-openshift-release-master-ci-4.8-e2e-aws-calico here is another PR that will bring 4.3-4.7 versions of this job to also use the step registry: https://github.com/openshift/release/pull/18803 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |