1970363 – E2e trigger KubeAPIErrorBudgetBurn alert on Power platform

Bug 1970363 - E2e trigger KubeAPIErrorBudgetBurn alert on Power platform

Summary: E2e trigger KubeAPIErrorBudgetBurn alert on Power platform

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-apiserver
Sub Component:
Version:	4.8
Hardware:	ppc64le
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Piyush Gupta
QA Contact:	Ke Wang
Docs Contact:
URL:
Whiteboard:	LifecycleFrozen
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-06-10 11:33 UTC by Tania Kapoor
Modified:	2023-09-15 01:34 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-08-18 14:33:56 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Tania Kapoor 2021-06-10 11:33:19 UTC

Description of problem:

Running e2e on Power cluster, results KubeAPIErrorBudgetBurn Alert issue causing  ``` [sig-instrumentation][Late] Alerts shouldn't report any alerts in firing or pending state apart from Watchdog and AlertmanagerReceiversNotConfigured and have no gaps in Watchdog firing [Suite:openshift/conformance/parallel] ``` e2e test to fail. 

Version-Release number of selected component (if applicable):


e2e Logs

```
[root@rdr-sdntest-mon01-bastion-0 origin]# ./openshift-tests run-test "[sig-instrumentation][Late] Alerts shouldn't report any alerts in firing or pending state apart from Watchdog and AlertmanagerReceiversNotConfigured and have no gaps in Watchdog firing [Suite:openshift/conformance/parallel]"
warning: KUBE_TEST_REPO_LIST may not be set when using openshift-tests and will be ignored
[BeforeEach] [Top Level]
  github.com/openshift/origin/test/extended/util/framework.go:1450
[BeforeEach] [Top Level]
  github.com/openshift/origin/test/extended/util/framework.go:1450
[BeforeEach] [Top Level]
  github.com/openshift/origin/test/extended/util/test.go:59
[BeforeEach] [sig-instrumentation][Late] Alerts
  github.com/openshift/origin/test/extended/util/client.go:142
STEP: Creating a kubernetes client
[BeforeEach] [sig-instrumentation][Late] Alerts
  github.com/openshift/origin/test/extended/prometheus/prometheus.go:50
[It] shouldn't report any alerts in firing or pending state apart from Watchdog and AlertmanagerReceiversNotConfigured and have no gaps in Watchdog firing [Suite:openshift/conformance/parallel]
  github.com/openshift/origin/test/extended/prometheus/prometheus.go:58
Jun 10 04:54:10.609: INFO: Creating namespace "e2e-test-prometheus-wbx66"
Jun 10 04:54:10.876: INFO: Waiting for ServiceAccount "default" to be provisioned...
Jun 10 04:54:10.990: INFO: Creating new exec pod
Jun 10 04:54:15.053: INFO: Running '/usr/local/bin/kubectl --server=https://api.rdr-sdntest.redhat.com:6443 --kubeconfig=/root/openstack-upi/auth/kubeconfig --namespace=e2e-test-prometheus-wbx66 exec execpod -- /bin/sh -x -c curl --retry 15 --max-time 2 --retry-delay 1 -s -k -H 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6ImFDaTNLd1pZU01FdGpsTlo4Y1hSbGxJVTcwVmFxYTQ0RS1IdUE1T04xMGcifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbW9uaXRvcmluZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLWFkYXB0ZXItdG9rZW4ta2g4d2ciLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cy1hZGFwdGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiZDUzYjczN2MtMjQ2NC00YzQ2LTgxZDAtYzIxNjcwYjVkN2ViIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1tb25pdG9yaW5nOnByb21ldGhldXMtYWRhcHRlciJ9.kIx2TzjT9SeGa1uPTeO50Wk2GsJHVr01GMvCrY8edZf6RqsWveUCpK8fDJHsP6-udP04M6l28rh2MZA9BhO-dH7Hg59QmIFvbNbnt5Pj4mbGBrXMsm3UzyJHN6XvGZ7i0F-Dx8nBueSO81bXWFNINB_aBMl8zA-MbkJn9CHt2ea1ajaJ-pSvl5LEZbIAvwqkIQ3m3vj59OaTWHNsHU5ep2qlPrTs_O4C_sgbOTWY_pUIYx22GActyiOkpDtnAmIhp0uF7RMeTOaoid8VPPHrtSR7rZsr4AJVfXwGGDDXgY4YeXxCaTnU0cqIk_dIy1CY_5NsfYfXUnq5jhvqURaSkw' "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=changes%28%28max%28%28ALERTS%7Balertstate%3D%22firing%22%2Calertname%3D%22Watchdog%22%2Cseverity%3D%22none%22%7D%29+or+%28absent%28ALERTS%7Balertstate%3D%22firing%22%2Calertname%3D%22Watchdog%22%2Cseverity%3D%22none%22%7D%29%2A0%29%29%29%5B1h0m0s%3A1s%5D%29+%3E+1"'
Jun 10 04:54:15.664: INFO: stderr: "+ curl --retry 15 --max-time 2 --retry-delay 1 -s -k -H 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6ImFDaTNLd1pZU01FdGpsTlo4Y1hSbGxJVTcwVmFxYTQ0RS1IdUE1T04xMGcifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbW9uaXRvcmluZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLWFkYXB0ZXItdG9rZW4ta2g4d2ciLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cy1hZGFwdGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiZDUzYjczN2MtMjQ2NC00YzQ2LTgxZDAtYzIxNjcwYjVkN2ViIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1tb25pdG9yaW5nOnByb21ldGhldXMtYWRhcHRlciJ9.kIx2TzjT9SeGa1uPTeO50Wk2GsJHVr01GMvCrY8edZf6RqsWveUCpK8fDJHsP6-udP04M6l28rh2MZA9BhO-dH7Hg59QmIFvbNbnt5Pj4mbGBrXMsm3UzyJHN6XvGZ7i0F-Dx8nBueSO81bXWFNINB_aBMl8zA-MbkJn9CHt2ea1ajaJ-pSvl5LEZbIAvwqkIQ3m3vj59OaTWHNsHU5ep2qlPrTs_O4C_sgbOTWY_pUIYx22GActyiOkpDtnAmIhp0uF7RMeTOaoid8VPPHrtSR7rZsr4AJVfXwGGDDXgY4YeXxCaTnU0cqIk_dIy1CY_5NsfYfXUnq5jhvqURaSkw' 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=changes%28%28max%28%28ALERTS%7Balertstate%3D%22firing%22%2Calertname%3D%22Watchdog%22%2Cseverity%3D%22none%22%7D%29+or+%28absent%28ALERTS%7Balertstate%3D%22firing%22%2Calertname%3D%22Watchdog%22%2Cseverity%3D%22none%22%7D%29%2A0%29%29%29%5B1h0m0s%3A1s%5D%29+%3E+1'\n"
Jun 10 04:54:15.664: INFO: stdout: "{\"status\":\"success\",\"data\":{\"resultType\":\"vector\",\"result\":[]}}\n"
Jun 10 04:54:15.665: INFO: Running '/usr/local/bin/kubectl --server=https://api.rdr-sdntest.redhat.com:6443 --kubeconfig=/root/openstack-upi/auth/kubeconfig --namespace=e2e-test-prometheus-wbx66 exec execpod -- /bin/sh -x -c curl --retry 15 --max-time 2 --retry-delay 1 -s -k -H 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6ImFDaTNLd1pZU01FdGpsTlo4Y1hSbGxJVTcwVmFxYTQ0RS1IdUE1T04xMGcifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbW9uaXRvcmluZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLWFkYXB0ZXItdG9rZW4ta2g4d2ciLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cy1hZGFwdGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiZDUzYjczN2MtMjQ2NC00YzQ2LTgxZDAtYzIxNjcwYjVkN2ViIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1tb25pdG9yaW5nOnByb21ldGhldXMtYWRhcHRlciJ9.kIx2TzjT9SeGa1uPTeO50Wk2GsJHVr01GMvCrY8edZf6RqsWveUCpK8fDJHsP6-udP04M6l28rh2MZA9BhO-dH7Hg59QmIFvbNbnt5Pj4mbGBrXMsm3UzyJHN6XvGZ7i0F-Dx8nBueSO81bXWFNINB_aBMl8zA-MbkJn9CHt2ea1ajaJ-pSvl5LEZbIAvwqkIQ3m3vj59OaTWHNsHU5ep2qlPrTs_O4C_sgbOTWY_pUIYx22GActyiOkpDtnAmIhp0uF7RMeTOaoid8VPPHrtSR7rZsr4AJVfXwGGDDXgY4YeXxCaTnU0cqIk_dIy1CY_5NsfYfXUnq5jhvqURaSkw' "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=%0Asort_desc%28%0Acount_over_time%28ALERTS%7Balertstate%3D%22firing%22%2Cseverity%21%3D%22info%22%2Calertname%21~%22Watchdog%7CAlertmanagerReceiversNotConfigured%22%7D%5B1h0m0s%3A1s%5D%29%0A%29+%3E+0%0A"'
Jun 10 04:54:16.049: INFO: stderr: "+ curl --retry 15 --max-time 2 --retry-delay 1 -s -k -H 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6ImFDaTNLd1pZU01FdGpsTlo4Y1hSbGxJVTcwVmFxYTQ0RS1IdUE1T04xMGcifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbW9uaXRvcmluZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLWFkYXB0ZXItdG9rZW4ta2g4d2ciLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cy1hZGFwdGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiZDUzYjczN2MtMjQ2NC00YzQ2LTgxZDAtYzIxNjcwYjVkN2ViIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1tb25pdG9yaW5nOnByb21ldGhldXMtYWRhcHRlciJ9.kIx2TzjT9SeGa1uPTeO50Wk2GsJHVr01GMvCrY8edZf6RqsWveUCpK8fDJHsP6-udP04M6l28rh2MZA9BhO-dH7Hg59QmIFvbNbnt5Pj4mbGBrXMsm3UzyJHN6XvGZ7i0F-Dx8nBueSO81bXWFNINB_aBMl8zA-MbkJn9CHt2ea1ajaJ-pSvl5LEZbIAvwqkIQ3m3vj59OaTWHNsHU5ep2qlPrTs_O4C_sgbOTWY_pUIYx22GActyiOkpDtnAmIhp0uF7RMeTOaoid8VPPHrtSR7rZsr4AJVfXwGGDDXgY4YeXxCaTnU0cqIk_dIy1CY_5NsfYfXUnq5jhvqURaSkw' 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=%0Asort_desc%28%0Acount_over_time%28ALERTS%7Balertstate%3D%22firing%22%2Cseverity%21%3D%22info%22%2Calertname%21~%22Watchdog%7CAlertmanagerReceiversNotConfigured%22%7D%5B1h0m0s%3A1s%5D%29%0A%29+%3E+0%0A'\n"
Jun 10 04:54:16.049: INFO: stdout: "{\"status\":\"success\",\"data\":{\"resultType\":\"vector\",\"result\":[{\"metric\":{\"alertname\":\"KubeAPIErrorBudgetBurn\",\"alertstate\":\"firing\",\"long\":\"1d\",\"prometheus\":\"openshift-monitoring/k8s\",\"severity\":\"warning\",\"short\":\"2h\"},\"value\":[1623315256.038,\"3600\"]},{\"metric\":{\"alertname\":\"KubeAPIErrorBudgetBurn\",\"alertstate\":\"firing\",\"long\":\"3d\",\"prometheus\":\"openshift-monitoring/k8s\",\"severity\":\"warning\",\"short\":\"6h\"},\"value\":[1623315256.038,\"3600\"]},{\"metric\":{\"alertname\":\"CannotRetrieveUpdates\",\"alertstate\":\"firing\",\"endpoint\":\"metrics\",\"instance\":\"193.168.200.123:9099\",\"job\":\"cluster-version-operator\",\"namespace\":\"openshift-cluster-version\",\"pod\":\"cluster-version-operator-5b64488d4d-7r5xb\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"cluster-version-operator\",\"severity\":\"warning\"},\"value\":[1623315256.038,\"930\"]}]}}\n"
Jun 10 04:54:16.049: INFO: Running '/usr/local/bin/kubectl --server=https://api.rdr-sdntest.redhat.com:6443 --kubeconfig=/root/openstack-upi/auth/kubeconfig --namespace=e2e-test-prometheus-wbx66 exec execpod -- /bin/sh -x -c curl --retry 15 --max-time 2 --retry-delay 1 -s -k -H 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6ImFDaTNLd1pZU01FdGpsTlo4Y1hSbGxJVTcwVmFxYTQ0RS1IdUE1T04xMGcifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbW9uaXRvcmluZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLWFkYXB0ZXItdG9rZW4ta2g4d2ciLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cy1hZGFwdGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiZDUzYjczN2MtMjQ2NC00YzQ2LTgxZDAtYzIxNjcwYjVkN2ViIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1tb25pdG9yaW5nOnByb21ldGhldXMtYWRhcHRlciJ9.kIx2TzjT9SeGa1uPTeO50Wk2GsJHVr01GMvCrY8edZf6RqsWveUCpK8fDJHsP6-udP04M6l28rh2MZA9BhO-dH7Hg59QmIFvbNbnt5Pj4mbGBrXMsm3UzyJHN6XvGZ7i0F-Dx8nBueSO81bXWFNINB_aBMl8zA-MbkJn9CHt2ea1ajaJ-pSvl5LEZbIAvwqkIQ3m3vj59OaTWHNsHU5ep2qlPrTs_O4C_sgbOTWY_pUIYx22GActyiOkpDtnAmIhp0uF7RMeTOaoid8VPPHrtSR7rZsr4AJVfXwGGDDXgY4YeXxCaTnU0cqIk_dIy1CY_5NsfYfXUnq5jhvqURaSkw' "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=ALERTS%7Balertname%21~%22Watchdog%7CAlertmanagerReceiversNotConfigured%22%2Calertstate%3D%22pending%22%2Cseverity%21%3D%22info%22%7D"'
Jun 10 04:54:16.447: INFO: stderr: "+ curl --retry 15 --max-time 2 --retry-delay 1 -s -k -H 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6ImFDaTNLd1pZU01FdGpsTlo4Y1hSbGxJVTcwVmFxYTQ0RS1IdUE1T04xMGcifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbW9uaXRvcmluZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLWFkYXB0ZXItdG9rZW4ta2g4d2ciLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cy1hZGFwdGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiZDUzYjczN2MtMjQ2NC00YzQ2LTgxZDAtYzIxNjcwYjVkN2ViIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1tb25pdG9yaW5nOnByb21ldGhldXMtYWRhcHRlciJ9.kIx2TzjT9SeGa1uPTeO50Wk2GsJHVr01GMvCrY8edZf6RqsWveUCpK8fDJHsP6-udP04M6l28rh2MZA9BhO-dH7Hg59QmIFvbNbnt5Pj4mbGBrXMsm3UzyJHN6XvGZ7i0F-Dx8nBueSO81bXWFNINB_aBMl8zA-MbkJn9CHt2ea1ajaJ-pSvl5LEZbIAvwqkIQ3m3vj59OaTWHNsHU5ep2qlPrTs_O4C_sgbOTWY_pUIYx22GActyiOkpDtnAmIhp0uF7RMeTOaoid8VPPHrtSR7rZsr4AJVfXwGGDDXgY4YeXxCaTnU0cqIk_dIy1CY_5NsfYfXUnq5jhvqURaSkw' 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=ALERTS%7Balertname%21~%22Watchdog%7CAlertmanagerReceiversNotConfigured%22%2Calertstate%3D%22pending%22%2Cseverity%21%3D%22info%22%7D'\n"
Jun 10 04:54:16.447: INFO: stdout: "{\"status\":\"success\",\"data\":{\"resultType\":\"vector\",\"result\":[]}}\n"
Jun 10 04:54:16.447: FAIL: Unexpected alerts fired or pending after the test run:

alert CannotRetrieveUpdates fired for 930 seconds with labels: {endpoint="metrics", instance="193.168.200.123:9099", job="cluster-version-operator", namespace="openshift-cluster-version", pod="cluster-version-operator-5b64488d4d-7r5xb", service="cluster-version-operator", severity="warning"}
alert KubeAPIErrorBudgetBurn fired for 3600 seconds with labels: {long="1d", severity="warning", short="2h"}
alert KubeAPIErrorBudgetBurn fired for 3600 seconds with labels: {long="3d", severity="warning", short="6h"}

Full Stack Trace
github.com/onsi/ginkgo/internal/leafnodes.(*runner).runSync(0xc0011eed80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	github.com/onsi/ginkgo.0-origin.0+incompatible/internal/leafnodes/runner.go:113 +0xb8
github.com/onsi/ginkgo/internal/leafnodes.(*runner).run(0xc0011eed80, 0xc000f7ca20, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	github.com/onsi/ginkgo.0-origin.0+incompatible/internal/leafnodes/runner.go:64 +0x180
github.com/onsi/ginkgo/internal/leafnodes.(*ItNode).Run(0xc00167ce40, 0x141852bd8, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	github.com/onsi/ginkgo.0-origin.0+incompatible/internal/leafnodes/it_node.go:26 +0x98
github.com/onsi/ginkgo/internal/spec.(*Spec).runSample(0xc001f5dd10, 0x0, 0x141852bd8, 0xc00046a040)
	github.com/onsi/ginkgo.0-origin.0+incompatible/internal/spec/spec.go:215 +0x22c
github.com/onsi/ginkgo/internal/spec.(*Spec).Run(0xc001f5dd10, 0x141852bd8, 0xc00046a040)
	github.com/onsi/ginkgo.0-origin.0+incompatible/internal/spec/spec.go:138 +0x110
github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpec(0xc001ef5b80, 0xc001f5dd10, 0x0)
	github.com/onsi/ginkgo.0-origin.0+incompatible/internal/specrunner/spec_runner.go:200 +0x100
github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpecs(0xc001ef5b80, 0x1)
	github.com/onsi/ginkgo.0-origin.0+incompatible/internal/specrunner/spec_runner.go:170 +0x148
github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).Run(0xc001ef5b80, 0xc002aad2e8)
	github.com/onsi/ginkgo.0-origin.0+incompatible/internal/specrunner/spec_runner.go:66 +0x118
github.com/onsi/ginkgo/internal/suite.(*Suite).Run(0xc0002fbef0, 0x141852e98, 0xc001cbd400, 0x0, 0x0, 0xc0003e83b0, 0x1, 0x1, 0x141929658, 0xc00046a040, ...)
	github.com/onsi/ginkgo.0-origin.0+incompatible/internal/suite/suite.go:62 +0x378
github.com/openshift/origin/pkg/test/ginkgo.(*TestOptions).Run(0xc001d76960, 0xc000ac4c00, 0x1, 0x1, 0x1440a7f00, 0x13d8efb70)
	github.com/openshift/origin/pkg/test/ginkgo/cmd_runtest.go:61 +0x390
main.newRunTestCommand.func1.1()
	github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:426 +0x60
github.com/openshift/origin/test/extended/util.WithCleanup(0xc00193fbb0)
	github.com/openshift/origin/test/extended/util/test.go:167 +0x80
main.newRunTestCommand.func1(0xc0016d5b80, 0xc000ac4c00, 0x1, 0x1, 0x0, 0x0)
	github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:426 +0x2d4
github.com/spf13/cobra.(*Command).execute(0xc0016d5b80, 0xc000ac4b30, 0x1, 0x1, 0xc0016d5b80, 0xc000ac4b30)
	github.com/spf13/cobra.1/command.go:850 +0x3d0
github.com/spf13/cobra.(*Command).ExecuteC(0xc0016d5080, 0x0, 0x14185a778, 0x1444c40e8)
	github.com/spf13/cobra.1/command.go:958 +0x2b4
github.com/spf13/cobra.(*Command).Execute(...)
	github.com/spf13/cobra.1/command.go:895
main.main.func1(0xc0016d5080, 0x0, 0x0)
	github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:84 +0xa0
main.main()
	github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:85 +0x3b4
[AfterEach] [sig-instrumentation][Late] Alerts
  github.com/openshift/origin/test/extended/util/client.go:140
STEP: Collecting events from namespace "e2e-test-prometheus-wbx66".
STEP: Found 6 events.
Jun 10 04:54:16.463: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for execpod: { } Scheduled: Successfully assigned e2e-test-prometheus-wbx66/execpod to worker-1
Jun 10 04:54:16.463: INFO: At 2021-06-10 04:54:10 -0400 EDT - event for e2e-test-prometheus-wbx66: {namespace-security-allocation-controller } CreatedSCCRanges: created SCC ranges
Jun 10 04:54:16.463: INFO: At 2021-06-10 04:54:13 -0400 EDT - event for execpod: {multus } AddedInterface: Add eth0 [10.128.3.240/23]
Jun 10 04:54:16.463: INFO: At 2021-06-10 04:54:13 -0400 EDT - event for execpod: {kubelet worker-1} Pulled: Container image "image-registry.openshift-image-registry.svc:5000/openshift/tools:latest" already present on machine
Jun 10 04:54:16.463: INFO: At 2021-06-10 04:54:14 -0400 EDT - event for execpod: {kubelet worker-1} Created: Created container agnhost-container
Jun 10 04:54:16.463: INFO: At 2021-06-10 04:54:14 -0400 EDT - event for execpod: {kubelet worker-1} Started: Started container agnhost-container
Jun 10 04:54:16.465: INFO: POD      NODE      PHASE    GRACE  CONDITIONS
Jun 10 04:54:16.465: INFO: execpod  worker-1  Running  1s     [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2021-06-10 04:54:11 -0400 EDT  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2021-06-10 04:54:14 -0400 EDT  } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 2021-06-10 04:54:14 -0400 EDT  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2021-06-10 04:54:11 -0400 EDT  }]
Jun 10 04:54:16.465: INFO: 
Jun 10 04:54:16.470: INFO: skipping dumping cluster info - cluster too large
[AfterEach] [sig-instrumentation][Late] Alerts
  github.com/openshift/origin/test/extended/util/client.go:141
STEP: Destroying namespace "e2e-test-prometheus-wbx66" for this suite.
fail [github.com/onsi/ginkgo.0-origin.0+incompatible/internal/leafnodes/runner.go:113]: Jun 10 04:54:16.448: Unexpected alerts fired or pending after the test run:

alert CannotRetrieveUpdates fired for 930 seconds with labels: {endpoint="metrics", instance="193.168.200.123:9099", job="cluster-version-operator", namespace="openshift-cluster-version", pod="cluster-version-operator-5b64488d4d-7r5xb", service="cluster-version-operator", severity="warning"}
alert KubeAPIErrorBudgetBurn fired for 3600 seconds with labels: {long="1d", severity="warning", short="2h"}
alert KubeAPIErrorBudgetBurn fired for 3600 seconds with labels: {long="3d", severity="warning", short="6h"}

```

Expected results:

Pass

Comment 2 Andy McCrae 2021-06-10 16:57:36 UTC

This looks like it may be related to https://bugzilla.redhat.com/show_bug.cgi?id=1953798 - the PRs for that merged a few hours ago, but there have not yet been any new nightly builds that include the fixes.
For now, I'm going to spin up a cluster with the latest nightly to confirm the issue and then will compare to the latest builds tomorrow once the fixes are included in a nightly.

Comment 3 Tania Kapoor 2021-06-17 06:02:24 UTC

@amccrae I tried the deploy a cluster on Power platform (Ppc64le arch) with build 4.8.0-0.nightly-ppc64le-2021-06-13-101555  and 4.8.0-rc.0  build , but the issue is getting reproduced. I'm still seeing the KubeAPIErrorBudgetBurn Alert getting fired. Can you please let me know which build will have the fix included. 

Thanks

Comment 4 Andy McCrae 2021-06-17 10:13:36 UTC

Hi Tania,

There isn't a fix at this point - the issue seems to be impacting only P CI deploys - the issue with this alert is that it indicates that either there are too many requests resulting in errors or there are too many slow requests (or both). In the test instances I have setup there are minimal errors, so it looks to be related to slow requests.

Can you provide details on what the infrastructure you're running on looks like? Additionally, it would be useful to keep a cluster up and access the console to see the Dashboard which provides additional information on the long running requests, for example on the test instances I have setup the vast majority of long running requests are configmap operations - which suggests etcd slowness, since configmaps are stored in etcd - this could be due to slow disks or some other resource issue within the cluster.

This doesn't necessarily indicate a bug though - since it could just be that your etcd performance is not suitable.

Andy

Comment 5 Tania Kapoor 2021-06-17 11:56:32 UTC

amccrae  As per https://bugzilla.redhat.com/show_bug.cgi?id=1953798#c18 in  https://bugzilla.redhat.com/show_bug.cgi?id=1953798 by Ivan Sim , I did the steps to debug this further hope these following results are useful


-----------------------------------------------------------------------------------------------------------------
Labels
alertname= KubeAPIErrorBudgetBurn long= 3d severity=warning short=6h


### Restarting the Kubelet 

[root@rdr-tanikubapi-mon01-bastion-0 ~]# ssh core@worker-0 sudo systemctl restart kubelet.service
[root@rdr-tanikubapi-mon01-bastion-0 ~]# ssh core@worker-0 sudo systemctl is-active kubelet.service
active
[root@rdr-tanikubapi-mon01-bastion-0 ~]# ssh core@worker-1 sudo systemctl restart kubelet.service
[root@rdr-tanikubapi-mon01-bastion-0 ~]# ssh core@worker-0 sudo systemctl is-active kubelet.service
active
[root@rdr-tanikubapi-mon01-bastion-0 ~]# ssh core@master-0 sudo systemctl restart kubelet.service
[root@rdr-tanikubapi-mon01-bastion-0 ~]# ssh core@master-0 sudo systemctl is-active kubelet.service
active
[root@rdr-tanikubapi-mon01-bastion-0 ~]# ssh core@master-1 sudo systemctl restart kubelet.service
[root@rdr-tanikubapi-mon01-bastion-0 ~]# ssh core@master-1 sudo systemctl is-active kubelet.service
active
[root@rdr-tanikubapi-mon01-bastion-0 ~]# ssh core@master-2 sudo systemctl restart kubelet.service
[root@rdr-tanikubapi-mon01-bastion-0 ~]# ssh core@master-2 sudo systemctl is-active kubelet.service
active


### Cluster debugging tool

[root@rdr-tanikubapi-mon01-bastion-0 audit_logs]# ls
etcd  etcd.audit_logs_listing  kube-apiserver  kube-apiserver.audit_logs_listing  oauth-apiserver  oauth-apiserver.audit_logs_listing  openshift-apiserver  openshift-apiserver.audit_logs_listing

[root@rdr-tanikubapi-mon01-bastion-0 audit_logs]# kubectl-dev_tool audit -f oauth-apiserver --by resource --user=default --failed-only -otop

[root@rdr-tanikubapi-mon01-bastion-0 audit_logs]# kubectl-dev_tool audit -f openshift-apiserver --by resource --user=default --failed-only -otop

[root@rdr-tanikubapi-mon01-bastion-0 audit_logs]# kubectl-dev_tool audit -f etcd --by resource --user=default --failed-only -otop

[root@rdr-tanikubapi-mon01-bastion-0 audit_logs]# kubectl-dev_tool audit -f kube-apiserver --by resource --user=default --failed-only -otop
had 30398 line read failures

[root@rdr-tanikubapi-mon01-bastion-0 audit_logs]# kubectl-dev_tool audit -f kube-apiserver -otop --by=user resource="apirequestcounts"
had 30398 line read failures
count: 1420621, first: 2021-06-16T03:26:42-04:00, last: 2021-06-16T10:46:21-04:00, duration: 7h19m39.495358s
171830x              system:serviceaccount:openshift-kube-storage-version-migrator-operator:kube-storage-version-migrator-operator
139228x              system:apiserver
108042x              system:serviceaccount:openshift-kube-scheduler-operator:openshift-kube-scheduler-operator
103602x              system:serviceaccount:openshift-kube-apiserver-operator:kube-apiserver-operator
98150x               system:serviceaccount:openshift-kube-controller-manager-operator:kube-controller-manager-operator
92470x               system:serviceaccount:openshift-cluster-version:default
65950x               system:serviceaccount:openshift-monitoring:cluster-monitoring-operator
52747x               system:kube-scheduler
49130x               system:kube-controller-manager
43280x               system:serviceaccount:openshift-apiserver:openshift-apiserver-sa


[root@rdr-tanikubapi-mon01-bastion-0 kube-apiserver]# ls
master-0-audit-2021-06-16T08-17-42.550.log     master-0-audit.log.gz                          master-2-audit-2021-06-16T12-01-01.047.log.gz  master-2-audit-2021-06-16T14-00-53.152.log.gz
master-0-audit-2021-06-16T09-23-21.773.log.gz  master-0-termination.log.gz                    master-2-audit-2021-06-16T12-21-25.117.log.gz  master-2-audit-2021-06-16T14-21-16.959.log.gz
master-0-audit-2021-06-16T10-29-49.402.log.gz  master-1-audit-2021-06-16T10-39-20.886.log.gz  master-2-audit-2021-06-16T12-41-54.859.log.gz  master-2-audit-2021-06-16T14-41-48.861.log.gz
master-0-audit-2021-06-16T11-36-30.432.log.gz  master-1-audit.log.gz                          master-2-audit-2021-06-16T13-00-51.462.log.gz  master-2-audit.log.gz
master-0-audit-2021-06-16T12-42-34.180.log.gz  master-1-termination.log.gz                    master-2-audit-2021-06-16T13-20-16.718.log.gz  master-2-termination.log.gz
master-0-audit-2021-06-16T13-42-30.810.log.gz  master-2-audit-2021-06-16T11-40-47.376.log.gz  master-2-audit-2021-06-16T13-40-38.835.log.gz

Comment 6 Tania Kapoor 2021-06-17 13:46:41 UTC

Piyush from dev team did further analysis, and here are the findings.

There is a solution in-progress for a related issue which impacts all releases after OCP 4.6+ (https://access.redhat.com/solutions/5931541)

Details -
- kubeapi-server pods are generating unnecessary log lines which contain this line - controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing"
- This is treated as an error and results in fast consumption of the total error budget causing it to generate a Warning state alert.
- These errors are not harmful and can be ignored (as per RH in progress solution)
- Now the warning is based on a timeline - long=3d and short=6h which means that OCP is warning us that the error budget may get exhausted in the next 30 days.


amccrae  Could you please check and confirm if this is something that can be safely ignored?.. thanks.

Comment 7 Mark Hamzy 2021-06-22 14:03:48 UTC

Just FYI, I also tried it on a baremetal cluster on rc.0, and didn't see a budgetburn

(export KUBECONFIG=/root/ocp4-workdir/auth/kubeconfig; /bin/rm -rf /tmp/e2e.log /tmp/fixture-testdata-dir* /tmp/junit/ /tmp/tmp.*; /home/test/origin/openshift-tests run-test "[sig-instrumentation][Late] Alerts shouldn't report any alerts in firing or pending state apart from Watchdog and AlertmanagerReceiversNotConfigured and have no gaps in Watchdog firing [Suite:openshift/conformance/parallel]")

[BeforeEach] [Top Level]
  github.com/openshift/origin/test/extended/util/framework.go:1450
[BeforeEach] [Top Level]
  github.com/openshift/origin/test/extended/util/framework.go:1450
[BeforeEach] [Top Level]
  github.com/openshift/origin/test/extended/util/test.go:59
[BeforeEach] [sig-instrumentation][Late] Alerts
  github.com/openshift/origin/test/extended/util/client.go:142
STEP: Creating a kubernetes client
[BeforeEach] [sig-instrumentation][Late] Alerts
  github.com/openshift/origin/test/extended/prometheus/prometheus.go:50
[It] shouldn't report any alerts in firing or pending state apart from Watchdog and AlertmanagerReceiversNotConfigured and have no gaps in Watchdog firing [Suite:openshift/conformance/parallel]
  github.com/openshift/origin/test/extended/prometheus/prometheus.go:58
Jun 22 09:58:10.287: INFO: Creating namespace "e2e-test-prometheus-6vkg9"
Jun 22 09:58:10.560: INFO: Waiting for ServiceAccount "default" to be provisioned...
Jun 22 09:58:10.669: INFO: Creating new exec pod
Jun 22 09:58:16.726: INFO: Running '/usr/local/bin/kubectl --server=https://api.ocp-ppc64le-test-080078.aus.stglabs.ibm.com:6443 --kubeconfig=/root/ocp4-workdir/auth/kubeconfig --namespace=e2e-test-prometheus-6vkg9 exec execpod -- /bin/sh -x -c curl --retry 15 --max-time 2 --retry-delay 1 -s -k -H 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6Ik5XakczTkd3dGdGbml4N1U2U0JhUGh5ekhZNDg1TTdRZU5YUEtGRU9SZDQifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbW9uaXRvcmluZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLWFkYXB0ZXItdG9rZW4tNWNicngiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cy1hZGFwdGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiNTcyYjhkYTktMTMwZC00NDY1LWI0ZWEtMzcyNTNhODA0ZmY2Iiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1tb25pdG9yaW5nOnByb21ldGhldXMtYWRhcHRlciJ9.Nhmyy5Ln9X03y1N7VwdmwfZWHc0M_i-OHdPOw--xC3LPs-8bc6z9R1sDlGZnr_7qp-3oIqmFshs_oRI1LpzNRfpC53C0DLh0GVFO1P7MTiSqi06UKMm0Fj62_4u6NJ39CsQD1uF7RiGLFzyWzcw5pv_U7GviYmlFpQPixe1WvmO9_55ckjk94LqpSDZb198o4MvaoL1Yh6qOIpxG0k6_EiMdksqfBemXW6sR7RBx1jJ97CmxPNvMRSLNpnbQUPcgYXKK9L5Lkt5MeWL7vsh-o_F0oYth2jH1xzuCpoOvd6374tTwMDUhrFgfKoKcnX8pCHSVCYB9Ig8p2wK_LmmP5A' "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=changes%28%28max%28%28ALERTS%7Balertstate%3D%22firing%22%2Calertname%3D%22Watchdog%22%2Cseverity%3D%22none%22%7D%29+or+%28absent%28ALERTS%7Balertstate%3D%22firing%22%2Calertname%3D%22Watchdog%22%2Cseverity%3D%22none%22%7D%29%2A0%29%29%29%5B1h0m0s%3A1s%5D%29+%3E+1"'
Jun 22 09:58:17.379: INFO: stderr: "+ curl --retry 15 --max-time 2 --retry-delay 1 -s -k -H 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6Ik5XakczTkd3dGdGbml4N1U2U0JhUGh5ekhZNDg1TTdRZU5YUEtGRU9SZDQifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbW9uaXRvcmluZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLWFkYXB0ZXItdG9rZW4tNWNicngiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cy1hZGFwdGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiNTcyYjhkYTktMTMwZC00NDY1LWI0ZWEtMzcyNTNhODA0ZmY2Iiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1tb25pdG9yaW5nOnByb21ldGhldXMtYWRhcHRlciJ9.Nhmyy5Ln9X03y1N7VwdmwfZWHc0M_i-OHdPOw--xC3LPs-8bc6z9R1sDlGZnr_7qp-3oIqmFshs_oRI1LpzNRfpC53C0DLh0GVFO1P7MTiSqi06UKMm0Fj62_4u6NJ39CsQD1uF7RiGLFzyWzcw5pv_U7GviYmlFpQPixe1WvmO9_55ckjk94LqpSDZb198o4MvaoL1Yh6qOIpxG0k6_EiMdksqfBemXW6sR7RBx1jJ97CmxPNvMRSLNpnbQUPcgYXKK9L5Lkt5MeWL7vsh-o_F0oYth2jH1xzuCpoOvd6374tTwMDUhrFgfKoKcnX8pCHSVCYB9Ig8p2wK_LmmP5A' 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=changes%28%28max%28%28ALERTS%7Balertstate%3D%22firing%22%2Calertname%3D%22Watchdog%22%2Cseverity%3D%22none%22%7D%29+or+%28absent%28ALERTS%7Balertstate%3D%22firing%22%2Calertname%3D%22Watchdog%22%2Cseverity%3D%22none%22%7D%29%2A0%29%29%29%5B1h0m0s%3A1s%5D%29+%3E+1'\n"
Jun 22 09:58:17.379: INFO: stdout: "{\"status\":\"success\",\"data\":{\"resultType\":\"vector\",\"result\":[]}}\n"
Jun 22 09:58:17.379: INFO: Running '/usr/local/bin/kubectl --server=https://api.ocp-ppc64le-test-080078.aus.stglabs.ibm.com:6443 --kubeconfig=/root/ocp4-workdir/auth/kubeconfig --namespace=e2e-test-prometheus-6vkg9 exec execpod -- /bin/sh -x -c curl --retry 15 --max-time 2 --retry-delay 1 -s -k -H 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6Ik5XakczTkd3dGdGbml4N1U2U0JhUGh5ekhZNDg1TTdRZU5YUEtGRU9SZDQifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbW9uaXRvcmluZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLWFkYXB0ZXItdG9rZW4tNWNicngiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cy1hZGFwdGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiNTcyYjhkYTktMTMwZC00NDY1LWI0ZWEtMzcyNTNhODA0ZmY2Iiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1tb25pdG9yaW5nOnByb21ldGhldXMtYWRhcHRlciJ9.Nhmyy5Ln9X03y1N7VwdmwfZWHc0M_i-OHdPOw--xC3LPs-8bc6z9R1sDlGZnr_7qp-3oIqmFshs_oRI1LpzNRfpC53C0DLh0GVFO1P7MTiSqi06UKMm0Fj62_4u6NJ39CsQD1uF7RiGLFzyWzcw5pv_U7GviYmlFpQPixe1WvmO9_55ckjk94LqpSDZb198o4MvaoL1Yh6qOIpxG0k6_EiMdksqfBemXW6sR7RBx1jJ97CmxPNvMRSLNpnbQUPcgYXKK9L5Lkt5MeWL7vsh-o_F0oYth2jH1xzuCpoOvd6374tTwMDUhrFgfKoKcnX8pCHSVCYB9Ig8p2wK_LmmP5A' "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=%0Asort_desc%28%0Acount_over_time%28ALERTS%7Balertstate%3D%22firing%22%2Cseverity%21%3D%22info%22%2Calertname%21~%22Watchdog%7CAlertmanagerReceiversNotConfigured%22%7D%5B1h0m0s%3A1s%5D%29%0A%29+%3E+0%0A"'
Jun 22 09:58:17.876: INFO: stderr: "+ curl --retry 15 --max-time 2 --retry-delay 1 -s -k -H 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6Ik5XakczTkd3dGdGbml4N1U2U0JhUGh5ekhZNDg1TTdRZU5YUEtGRU9SZDQifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbW9uaXRvcmluZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLWFkYXB0ZXItdG9rZW4tNWNicngiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cy1hZGFwdGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiNTcyYjhkYTktMTMwZC00NDY1LWI0ZWEtMzcyNTNhODA0ZmY2Iiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1tb25pdG9yaW5nOnByb21ldGhldXMtYWRhcHRlciJ9.Nhmyy5Ln9X03y1N7VwdmwfZWHc0M_i-OHdPOw--xC3LPs-8bc6z9R1sDlGZnr_7qp-3oIqmFshs_oRI1LpzNRfpC53C0DLh0GVFO1P7MTiSqi06UKMm0Fj62_4u6NJ39CsQD1uF7RiGLFzyWzcw5pv_U7GviYmlFpQPixe1WvmO9_55ckjk94LqpSDZb198o4MvaoL1Yh6qOIpxG0k6_EiMdksqfBemXW6sR7RBx1jJ97CmxPNvMRSLNpnbQUPcgYXKK9L5Lkt5MeWL7vsh-o_F0oYth2jH1xzuCpoOvd6374tTwMDUhrFgfKoKcnX8pCHSVCYB9Ig8p2wK_LmmP5A' 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=%0Asort_desc%28%0Acount_over_time%28ALERTS%7Balertstate%3D%22firing%22%2Cseverity%21%3D%22info%22%2Calertname%21~%22Watchdog%7CAlertmanagerReceiversNotConfigured%22%7D%5B1h0m0s%3A1s%5D%29%0A%29+%3E+0%0A'\n"
Jun 22 09:58:17.876: INFO: stdout: "{\"status\":\"success\",\"data\":{\"resultType\":\"vector\",\"result\":[{\"metric\":{\"alertname\":\"SystemMemoryExceedsReservation\",\"alertstate\":\"firing\",\"node\":\"master-1.ocp-ppc64le-test-080078.aus.stglabs.ibm.com\",\"prometheus\":\"openshift-monitoring/k8s\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"3600\"]},{\"metric\":{\"alertname\":\"SystemMemoryExceedsReservation\",\"alertstate\":\"firing\",\"node\":\"master-0.ocp-ppc64le-test-080078.aus.stglabs.ibm.com\",\"prometheus\":\"openshift-monitoring/k8s\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"3600\"]},{\"metric\":{\"alertname\":\"SystemMemoryExceedsReservation\",\"alertstate\":\"firing\",\"node\":\"infnod-0.ocp-ppc64le-test-080078.aus.stglabs.ibm.com\",\"prometheus\":\"openshift-monitoring/k8s\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"3600\"]},{\"metric\":{\"alertname\":\"CannotRetrieveUpdates\",\"alertstate\":\"firing\",\"endpoint\":\"metrics\",\"instance\":\"192.168.79.24:9099\",\"job\":\"cluster-version-operator\",\"namespace\":\"openshift-cluster-version\",\"pod\":\"cluster-version-operator-7f48df7545-pzdt7\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"cluster-version-operator\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"3600\"]},{\"metric\":{\"alertname\":\"KubeDaemonSetMisScheduled\",\"alertstate\":\"firing\",\"container\":\"kube-rbac-proxy-main\",\"daemonset\":\"ingress-canary\",\"endpoint\":\"https-main\",\"job\":\"kube-state-metrics\",\"namespace\":\"openshift-ingress-canary\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"kube-state-metrics\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"3600\"]},{\"metric\":{\"alertname\":\"ClusterNotUpgradeable\",\"alertstate\":\"firing\",\"condition\":\"Upgradeable\",\"endpoint\":\"metrics\",\"name\":\"version\",\"prometheus\":\"openshift-monitoring/k8s\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"162\"]},{\"metric\":{\"alertname\":\"ClusterOperatorDown\",\"alertstate\":\"firing\",\"endpoint\":\"metrics\",\"instance\":\"192.168.79.24:9099\",\"job\":\"cluster-version-operator\",\"name\":\"machine-config\",\"namespace\":\"openshift-cluster-version\",\"pod\":\"cluster-version-operator-7f48df7545-pzdt7\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"cluster-version-operator\",\"severity\":\"critical\",\"version\":\"4.8.0-rc.0\"},\"value\":[1624370297.835,\"72\"]},{\"metric\":{\"alertname\":\"ClusterOperatorDegraded\",\"alertstate\":\"firing\",\"condition\":\"Degraded\",\"endpoint\":\"metrics\",\"instance\":\"192.168.79.24:9099\",\"job\":\"cluster-version-operator\",\"name\":\"machine-config\",\"namespace\":\"openshift-cluster-version\",\"pod\":\"cluster-version-operator-7f48df7545-pzdt7\",\"prometheus\":\"openshift-monitoring/k8s\",\"reason\":\"MachineConfigDaemonFailed\",\"service\":\"cluster-version-operator\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"72\"]},{\"metric\":{\"alertname\":\"ClusterOperatorDegraded\",\"alertstate\":\"firing\",\"condition\":\"Degraded\",\"endpoint\":\"metrics\",\"instance\":\"192.168.79.24:9099\",\"job\":\"cluster-version-operator\",\"name\":\"network\",\"namespace\":\"openshift-cluster-version\",\"pod\":\"cluster-version-operator-7f48df7545-pzdt7\",\"prometheus\":\"openshift-monitoring/k8s\",\"reason\":\"RolloutHung\",\"service\":\"cluster-version-operator\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"72\"]},{\"metric\":{\"alertname\":\"ClusterOperatorDegraded\",\"alertstate\":\"firing\",\"condition\":\"Degraded\",\"endpoint\":\"metrics\",\"instance\":\"192.168.79.24:9099\",\"job\":\"cluster-version-operator\",\"name\":\"openshift-apiserver\",\"namespace\":\"openshift-cluster-version\",\"pod\":\"cluster-version-operator-7f48df7545-pzdt7\",\"prometheus\":\"openshift-monitoring/k8s\",\"reason\":\"APIServerDeployment_UnavailablePod\",\"service\":\"cluster-version-operator\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"72\"]},{\"metric\":{\"alertname\":\"SDNPodNotReady\",\"alertstate\":\"firing\",\"condition\":\"true\",\"container\":\"kube-rbac-proxy-main\",\"endpoint\":\"https-main\",\"job\":\"kube-state-metrics\",\"namespace\":\"openshift-sdn\",\"pod\":\"sdn-ftb2l\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"kube-state-metrics\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"61\"]},{\"metric\":{\"alertname\":\"SDNPodNotReady\",\"alertstate\":\"firing\",\"condition\":\"true\",\"container\":\"kube-rbac-proxy-main\",\"endpoint\":\"https-main\",\"job\":\"kube-state-metrics\",\"namespace\":\"openshift-sdn\",\"pod\":\"sdn-controller-hcgzk\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"kube-state-metrics\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"61\"]},{\"metric\":{\"alertname\":\"KubeDaemonSetMisScheduled\",\"alertstate\":\"firing\",\"container\":\"kube-rbac-proxy-main\",\"daemonset\":\"sdn-controller\",\"endpoint\":\"https-main\",\"job\":\"kube-state-metrics\",\"namespace\":\"openshift-sdn\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"kube-state-metrics\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"59\"]},{\"metric\":{\"alertname\":\"KubePodNotReady\",\"alertstate\":\"firing\",\"namespace\":\"openshift-authentication\",\"pod\":\"oauth-openshift-7cf58f567d-4d262\",\"prometheus\":\"openshift-monitoring/k8s\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"59\"]},{\"metric\":{\"alertname\":\"KubePodNotReady\",\"alertstate\":\"firing\",\"namespace\":\"openshift-oauth-apiserver\",\"pod\":\"apiserver-68d6564447-mxn99\",\"prometheus\":\"openshift-monitoring/k8s\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"59\"]},{\"metric\":{\"alertname\":\"KubeDaemonSetMisScheduled\",\"alertstate\":\"firing\",\"container\":\"kube-rbac-proxy-main\",\"daemonset\":\"dns-default\",\"endpoint\":\"https-main\",\"job\":\"kube-state-metrics\",\"namespace\":\"openshift-dns\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"kube-state-metrics\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"59\"]},{\"metric\":{\"alertname\":\"PodDisruptionBudgetAtLimit\",\"alertstate\":\"firing\",\"namespace\":\"openshift-etcd\",\"poddisruptionbudget\":\"etcd-quorum-guard\",\"prometheus\":\"openshift-monitoring/k8s\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"59\"]},{\"metric\":{\"alertname\":\"KubePodNotReady\",\"alertstate\":\"firing\",\"namespace\":\"openshift-apiserver\",\"pod\":\"apiserver-6c879879c5-8hmgk\",\"prometheus\":\"openshift-monitoring/k8s\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"59\"]},{\"metric\":{\"alertname\":\"KubeDaemonSetMisScheduled\",\"alertstate\":\"firing\",\"container\":\"kube-rbac-proxy-main\",\"daemonset\":\"multus-admission-controller\",\"endpoint\":\"https-main\",\"job\":\"kube-state-metrics\",\"namespace\":\"openshift-multus\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"kube-state-metrics\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"59\"]},{\"metric\":{\"alertname\":\"KubeDaemonSetMisScheduled\",\"alertstate\":\"firing\",\"container\":\"kube-rbac-proxy-main\",\"daemonset\":\"machine-config-server\",\"endpoint\":\"https-main\",\"job\":\"kube-state-metrics\",\"namespace\":\"openshift-machine-config-operator\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"kube-state-metrics\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"59\"]},{\"metric\":{\"alertname\":\"etcdMembersDown\",\"alertstate\":\"firing\",\"job\":\"etcd\",\"namespace\":\"openshift-etcd\",\"pod\":\"etcd-master-2.ocp-ppc64le-test-080078.aus.stglabs.ibm.com\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"etcd\",\"severity\":\"critical\"},\"value\":[1624370297.835,\"57\"]},{\"metric\":{\"alertname\":\"KubeNodeNotReady\",\"alertstate\":\"firing\",\"condition\":\"Ready\",\"container\":\"kube-rbac-proxy-main\",\"endpoint\":\"https-main\",\"job\":\"kube-state-metrics\",\"namespace\":\"openshift-monitoring\",\"node\":\"master-2.ocp-ppc64le-test-080078.aus.stglabs.ibm.com\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"kube-state-metrics\",\"severity\":\"warning\",\"status\":\"true\"},\"value\":[1624370297.835,\"53\"]},{\"metric\":{\"alertname\":\"KubeNodeUnreachable\",\"alertstate\":\"firing\",\"container\":\"kube-rbac-proxy-main\",\"effect\":\"NoSchedule\",\"endpoint\":\"https-main\",\"job\":\"kube-state-metrics\",\"key\":\"node.kubernetes.io/unreachable\",\"namespace\":\"openshift-monitoring\",\"node\":\"master-2.ocp-ppc64le-test-080078.aus.stglabs.ibm.com\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"kube-state-metrics\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"53\"]},{\"metric\":{\"alertname\":\"TargetDown\",\"alertstate\":\"firing\",\"job\":\"scheduler\",\"namespace\":\"openshift-kube-scheduler\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"scheduler\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"48\"]},{\"metric\":{\"alertname\":\"TargetDown\",\"alertstate\":\"firing\",\"job\":\"machine-config-daemon\",\"namespace\":\"openshift-machine-config-operator\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"machine-config-daemon\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"48\"]},{\"metric\":{\"alertname\":\"TargetDown\",\"alertstate\":\"firing\",\"job\":\"kube-controller-manager\",\"namespace\":\"openshift-kube-controller-manager\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"kube-controller-manager\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"48\"]},{\"metric\":{\"alertname\":\"TargetDown\",\"alertstate\":\"firing\",\"job\":\"etcd\",\"namespace\":\"openshift-etcd\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"etcd\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"48\"]},{\"metric\":{\"alertname\":\"TargetDown\",\"alertstate\":\"firing\",\"job\":\"dns-default\",\"namespace\":\"openshift-dns\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"dns-default\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"48\"]},{\"metric\":{\"alertname\":\"TargetDown\",\"alertstate\":\"firing\",\"job\":\"controller-manager\",\"namespace\":\"openshift-controller-manager\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"controller-manager\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"48\"]},{\"metric\":{\"alertname\":\"TargetDown\",\"alertstate\":\"firing\",\"job\":\"multus-admission-controller\",\"namespace\":\"openshift-multus\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"multus-admission-controller\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"48\"]},{\"metric\":{\"alertname\":\"NTOPodsNotReady\",\"alertstate\":\"firing\",\"condition\":\"true\",\"container\":\"kube-rbac-proxy-main\",\"endpoint\":\"https-main\",\"job\":\"kube-state-metrics\",\"namespace\":\"openshift-cluster-node-tuning-operator\",\"pod\":\"tuned-qfb6t\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"kube-state-metrics\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"46\"]},{\"metric\":{\"alertname\":\"ClusterOperatorDown\",\"alertstate\":\"firing\",\"endpoint\":\"metrics\",\"instance\":\"192.168.79.24:9099\",\"job\":\"cluster-version-operator\",\"name\":\"monitoring\",\"namespace\":\"openshift-cluster-version\",\"pod\":\"cluster-version-operator-7f48df7545-pzdt7\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"cluster-version-operator\",\"severity\":\"critical\",\"version\":\"4.8.0-rc.0\"},\"value\":[1624370297.835,\"42\"]},{\"metric\":{\"alertname\":\"ClusterOperatorDown\",\"alertstate\":\"firing\",\"endpoint\":\"metrics\",\"instance\":\"192.168.79.24:9099\",\"job\":\"cluster-version-operator\",\"name\":\"authentication\",\"namespace\":\"openshift-cluster-version\",\"pod\":\"cluster-version-operator-7f48df7545-pzdt7\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"cluster-version-operator\",\"severity\":\"critical\",\"version\":\"4.8.0-rc.0\"},\"value\":[1624370297.835,\"42\"]},{\"metric\":{\"alertname\":\"ClusterOperatorDegraded\",\"alertstate\":\"firing\",\"condition\":\"Degraded\",\"endpoint\":\"metrics\",\"instance\":\"192.168.79.24:9099\",\"job\":\"cluster-version-operator\",\"name\":\"kube-apiserver\",\"namespace\":\"openshift-cluster-version\",\"pod\":\"cluster-version-operator-7f48df7545-pzdt7\",\"prometheus\":\"openshift-monitoring/k8s\",\"reason\":\"NodeController_MasterNodesReady\",\"service\":\"cluster-version-operator\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"42\"]},{\"metric\":{\"alertname\":\"ClusterOperatorDegraded\",\"alertstate\":\"firing\",\"condition\":\"Degraded\",\"endpoint\":\"metrics\",\"instance\":\"192.168.79.24:9099\",\"job\":\"cluster-version-operator\",\"name\":\"kube-scheduler\",\"namespace\":\"openshift-cluster-version\",\"pod\":\"cluster-version-operator-7f48df7545-pzdt7\",\"prometheus\":\"openshift-monitoring/k8s\",\"reason\":\"NodeController_MasterNodesReady\",\"service\":\"cluster-version-operator\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"42\"]},{\"metric\":{\"alertname\":\"ClusterOperatorDegraded\",\"alertstate\":\"firing\",\"condition\":\"Degraded\",\"endpoint\":\"metrics\",\"instance\":\"192.168.79.24:9099\",\"job\":\"cluster-version-operator\",\"name\":\"monitoring\",\"namespace\":\"openshift-cluster-version\",\"pod\":\"cluster-version-operator-7f48df7545-pzdt7\",\"prometheus\":\"openshift-monitoring/k8s\",\"reason\":\"UpdatingnodeExporterFailed\",\"service\":\"cluster-version-operator\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"42\"]},{\"metric\":{\"alertname\":\"ClusterOperatorDegraded\",\"alertstate\":\"firing\",\"condition\":\"Degraded\",\"endpoint\":\"metrics\",\"instance\":\"192.168.79.24:9099\",\"job\":\"cluster-version-operator\",\"name\":\"authentication\",\"namespace\":\"openshift-cluster-version\",\"pod\":\"cluster-version-operator-7f48df7545-pzdt7\",\"prometheus\":\"openshift-monitoring/k8s\",\"reason\":\"APIServerDeployment_UnavailablePod::OAuthServerDeployment_UnavailablePod::WellKnownReadyController_SyncError\",\"service\":\"cluster-version-operator\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"42\"]},{\"metric\":{\"alertname\":\"ClusterOperatorDegraded\",\"alertstate\":\"firing\",\"condition\":\"Degraded\",\"endpoint\":\"metrics\",\"instance\":\"192.168.79.24:9099\",\"job\":\"cluster-version-operator\",\"name\":\"etcd\",\"namespace\":\"openshift-cluster-version\",\"pod\":\"cluster-version-operator-7f48df7545-pzdt7\",\"prometheus\":\"openshift-monitoring/k8s\",\"reason\":\"EtcdMembers_UnhealthyMembers::NodeController_MasterNodesReady\",\"service\":\"cluster-version-operator\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"42\"]},{\"metric\":{\"alertname\":\"ClusterMonitoringOperatorReconciliationErrors\",\"alertstate\":\"firing\",\"prometheus\":\"openshift-monitoring/k8s\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"35\"]},{\"metric\":{\"alertname\":\"TargetDown\",\"alertstate\":\"firing\",\"job\":\"crio\",\"namespace\":\"kube-system\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"kubelet\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"18\"]},{\"metric\":{\"alertname\":\"TargetDown\",\"alertstate\":\"firing\",\"job\":\"network-metrics-service\",\"namespace\":\"openshift-multus\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"network-metrics-service\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"18\"]},{\"metric\":{\"alertname\":\"TargetDown\",\"alertstate\":\"firing\",\"job\":\"node-exporter\",\"namespace\":\"openshift-monitoring\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"node-exporter\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"18\"]},{\"metric\":{\"alertname\":\"TargetDown\",\"alertstate\":\"firing\",\"job\":\"sdn\",\"namespace\":\"openshift-sdn\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"sdn\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"18\"]},{\"metric\":{\"alertname\":\"TargetDown\",\"alertstate\":\"firing\",\"job\":\"kubelet\",\"namespace\":\"kube-system\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"kubelet\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"18\"]},{\"metric\":{\"alertname\":\"ClusterOperatorDegraded\",\"alertstate\":\"firing\",\"condition\":\"Degraded\",\"endpoint\":\"metrics\",\"instance\":\"192.168.79.24:9099\",\"job\":\"cluster-version-operator\",\"name\":\"kube-controller-manager\",\"namespace\":\"openshift-cluster-version\",\"pod\":\"cluster-version-operator-7f48df7545-pzdt7\",\"prometheus\":\"openshift-monitoring/k8s\",\"reason\":\"NodeController_MasterNodesReady\",\"service\":\"cluster-version-operator\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"12\"]}]}}\n"
Jun 22 09:58:17.878: INFO: Running '/usr/local/bin/kubectl --server=https://api.ocp-ppc64le-test-080078.aus.stglabs.ibm.com:6443 --kubeconfig=/root/ocp4-workdir/auth/kubeconfig --namespace=e2e-test-prometheus-6vkg9 exec execpod -- /bin/sh -x -c curl --retry 15 --max-time 2 --retry-delay 1 -s -k -H 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6Ik5XakczTkd3dGdGbml4N1U2U0JhUGh5ekhZNDg1TTdRZU5YUEtGRU9SZDQifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbW9uaXRvcmluZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLWFkYXB0ZXItdG9rZW4tNWNicngiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cy1hZGFwdGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiNTcyYjhkYTktMTMwZC00NDY1LWI0ZWEtMzcyNTNhODA0ZmY2Iiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1tb25pdG9yaW5nOnByb21ldGhldXMtYWRhcHRlciJ9.Nhmyy5Ln9X03y1N7VwdmwfZWHc0M_i-OHdPOw--xC3LPs-8bc6z9R1sDlGZnr_7qp-3oIqmFshs_oRI1LpzNRfpC53C0DLh0GVFO1P7MTiSqi06UKMm0Fj62_4u6NJ39CsQD1uF7RiGLFzyWzcw5pv_U7GviYmlFpQPixe1WvmO9_55ckjk94LqpSDZb198o4MvaoL1Yh6qOIpxG0k6_EiMdksqfBemXW6sR7RBx1jJ97CmxPNvMRSLNpnbQUPcgYXKK9L5Lkt5MeWL7vsh-o_F0oYth2jH1xzuCpoOvd6374tTwMDUhrFgfKoKcnX8pCHSVCYB9Ig8p2wK_LmmP5A' "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=ALERTS%7Balertname%21~%22Watchdog%7CAlertmanagerReceiversNotConfigured%22%2Calertstate%3D%22pending%22%2Cseverity%21%3D%22info%22%7D"'
Jun 22 09:58:18.351: INFO: stderr: "+ curl --retry 15 --max-time 2 --retry-delay 1 -s -k -H 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6Ik5XakczTkd3dGdGbml4N1U2U0JhUGh5ekhZNDg1TTdRZU5YUEtGRU9SZDQifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbW9uaXRvcmluZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLWFkYXB0ZXItdG9rZW4tNWNicngiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cy1hZGFwdGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiNTcyYjhkYTktMTMwZC00NDY1LWI0ZWEtMzcyNTNhODA0ZmY2Iiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1tb25pdG9yaW5nOnByb21ldGhldXMtYWRhcHRlciJ9.Nhmyy5Ln9X03y1N7VwdmwfZWHc0M_i-OHdPOw--xC3LPs-8bc6z9R1sDlGZnr_7qp-3oIqmFshs_oRI1LpzNRfpC53C0DLh0GVFO1P7MTiSqi06UKMm0Fj62_4u6NJ39CsQD1uF7RiGLFzyWzcw5pv_U7GviYmlFpQPixe1WvmO9_55ckjk94LqpSDZb198o4MvaoL1Yh6qOIpxG0k6_EiMdksqfBemXW6sR7RBx1jJ97CmxPNvMRSLNpnbQUPcgYXKK9L5Lkt5MeWL7vsh-o_F0oYth2jH1xzuCpoOvd6374tTwMDUhrFgfKoKcnX8pCHSVCYB9Ig8p2wK_LmmP5A' 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=ALERTS%7Balertname%21~%22Watchdog%7CAlertmanagerReceiversNotConfigured%22%2Calertstate%3D%22pending%22%2Cseverity%21%3D%22info%22%7D'\n"
Jun 22 09:58:18.351: INFO: stdout: "{\"status\":\"success\",\"data\":{\"resultType\":\"vector\",\"result\":[]}}\n"
Jun 22 09:58:18.353: FAIL: Unexpected alerts fired or pending after the test run:

alert CannotRetrieveUpdates fired for 3600 seconds with labels: {endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", service="cluster-version-operator", severity="warning"}
alert ClusterMonitoringOperatorReconciliationErrors fired for 35 seconds with labels: {severity="warning"}
alert ClusterNotUpgradeable fired for 162 seconds with labels: {condition="Upgradeable", endpoint="metrics", name="version", severity="warning"}
alert ClusterOperatorDegraded fired for 12 seconds with labels: {condition="Degraded", endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="kube-controller-manager", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", reason="NodeController_MasterNodesReady", service="cluster-version-operator", severity="warning"}
alert ClusterOperatorDegraded fired for 42 seconds with labels: {condition="Degraded", endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="etcd", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", reason="EtcdMembers_UnhealthyMembers::NodeController_MasterNodesReady", service="cluster-version-operator", severity="warning"}
alert ClusterOperatorDegraded fired for 42 seconds with labels: {condition="Degraded", endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="kube-apiserver", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", reason="NodeController_MasterNodesReady", service="cluster-version-operator", severity="warning"}
alert ClusterOperatorDegraded fired for 42 seconds with labels: {condition="Degraded", endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="kube-scheduler", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", reason="NodeController_MasterNodesReady", service="cluster-version-operator", severity="warning"}
alert ClusterOperatorDegraded fired for 42 seconds with labels: {condition="Degraded", endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="monitoring", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", reason="UpdatingnodeExporterFailed", service="cluster-version-operator", severity="warning"}
alert ClusterOperatorDegraded fired for 72 seconds with labels: {condition="Degraded", endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="machine-config", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", reason="MachineConfigDaemonFailed", service="cluster-version-operator", severity="warning"}
alert ClusterOperatorDegraded fired for 72 seconds with labels: {condition="Degraded", endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="network", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", reason="RolloutHung", service="cluster-version-operator", severity="warning"}
alert ClusterOperatorDegraded fired for 72 seconds with labels: {condition="Degraded", endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="openshift-apiserver", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", reason="APIServerDeployment_UnavailablePod", service="cluster-version-operator", severity="warning"}
alert ClusterOperatorDown fired for 42 seconds with labels: {endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="monitoring", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", service="cluster-version-operator", severity="critical", version="4.8.0-rc.0"}
alert ClusterOperatorDown fired for 72 seconds with labels: {endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="machine-config", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", service="cluster-version-operator", severity="critical", version="4.8.0-rc.0"}
alert KubeDaemonSetMisScheduled fired for 3600 seconds with labels: {container="kube-rbac-proxy-main", daemonset="ingress-canary", endpoint="https-main", job="kube-state-metrics", namespace="openshift-ingress-canary", service="kube-state-metrics", severity="warning"}
alert KubeDaemonSetMisScheduled fired for 59 seconds with labels: {container="kube-rbac-proxy-main", daemonset="dns-default", endpoint="https-main", job="kube-state-metrics", namespace="openshift-dns", service="kube-state-metrics", severity="warning"}
alert KubeDaemonSetMisScheduled fired for 59 seconds with labels: {container="kube-rbac-proxy-main", daemonset="machine-config-server", endpoint="https-main", job="kube-state-metrics", namespace="openshift-machine-config-operator", service="kube-state-metrics", severity="warning"}
alert KubeDaemonSetMisScheduled fired for 59 seconds with labels: {container="kube-rbac-proxy-main", daemonset="multus-admission-controller", endpoint="https-main", job="kube-state-metrics", namespace="openshift-multus", service="kube-state-metrics", severity="warning"}
alert KubeDaemonSetMisScheduled fired for 59 seconds with labels: {container="kube-rbac-proxy-main", daemonset="sdn-controller", endpoint="https-main", job="kube-state-metrics", namespace="openshift-sdn", service="kube-state-metrics", severity="warning"}
alert KubeNodeNotReady fired for 53 seconds with labels: {condition="Ready", container="kube-rbac-proxy-main", endpoint="https-main", job="kube-state-metrics", namespace="openshift-monitoring", node="master-2.ocp-ppc64le-test-080078.aus.stglabs.ibm.com", service="kube-state-metrics", severity="warning", status="true"}
alert KubeNodeUnreachable fired for 53 seconds with labels: {container="kube-rbac-proxy-main", effect="NoSchedule", endpoint="https-main", job="kube-state-metrics", key="node.kubernetes.io/unreachable", namespace="openshift-monitoring", node="master-2.ocp-ppc64le-test-080078.aus.stglabs.ibm.com", service="kube-state-metrics", severity="warning"}
alert KubePodNotReady fired for 59 seconds with labels: {namespace="openshift-apiserver", pod="apiserver-6c879879c5-8hmgk", severity="warning"}
alert KubePodNotReady fired for 59 seconds with labels: {namespace="openshift-authentication", pod="oauth-openshift-7cf58f567d-4d262", severity="warning"}
alert KubePodNotReady fired for 59 seconds with labels: {namespace="openshift-oauth-apiserver", pod="apiserver-68d6564447-mxn99", severity="warning"}
alert NTOPodsNotReady fired for 46 seconds with labels: {condition="true", container="kube-rbac-proxy-main", endpoint="https-main", job="kube-state-metrics", namespace="openshift-cluster-node-tuning-operator", pod="tuned-qfb6t", service="kube-state-metrics", severity="warning"}
alert PodDisruptionBudgetAtLimit fired for 59 seconds with labels: {namespace="openshift-etcd", poddisruptionbudget="etcd-quorum-guard", severity="warning"}
alert SDNPodNotReady fired for 61 seconds with labels: {condition="true", container="kube-rbac-proxy-main", endpoint="https-main", job="kube-state-metrics", namespace="openshift-sdn", pod="sdn-controller-hcgzk", service="kube-state-metrics", severity="warning"}
alert SDNPodNotReady fired for 61 seconds with labels: {condition="true", container="kube-rbac-proxy-main", endpoint="https-main", job="kube-state-metrics", namespace="openshift-sdn", pod="sdn-ftb2l", service="kube-state-metrics", severity="warning"}
alert SystemMemoryExceedsReservation fired for 3600 seconds with labels: {node="infnod-0.ocp-ppc64le-test-080078.aus.stglabs.ibm.com", severity="warning"}
alert SystemMemoryExceedsReservation fired for 3600 seconds with labels: {node="master-0.ocp-ppc64le-test-080078.aus.stglabs.ibm.com", severity="warning"}
alert SystemMemoryExceedsReservation fired for 3600 seconds with labels: {node="master-1.ocp-ppc64le-test-080078.aus.stglabs.ibm.com", severity="warning"}
alert TargetDown fired for 18 seconds with labels: {job="crio", namespace="kube-system", service="kubelet", severity="warning"}
alert TargetDown fired for 18 seconds with labels: {job="kubelet", namespace="kube-system", service="kubelet", severity="warning"}
alert TargetDown fired for 18 seconds with labels: {job="network-metrics-service", namespace="openshift-multus", service="network-metrics-service", severity="warning"}
alert TargetDown fired for 18 seconds with labels: {job="node-exporter", namespace="openshift-monitoring", service="node-exporter", severity="warning"}
alert TargetDown fired for 18 seconds with labels: {job="sdn", namespace="openshift-sdn", service="sdn", severity="warning"}
alert TargetDown fired for 48 seconds with labels: {job="controller-manager", namespace="openshift-controller-manager", service="controller-manager", severity="warning"}
alert TargetDown fired for 48 seconds with labels: {job="dns-default", namespace="openshift-dns", service="dns-default", severity="warning"}
alert TargetDown fired for 48 seconds with labels: {job="etcd", namespace="openshift-etcd", service="etcd", severity="warning"}
alert TargetDown fired for 48 seconds with labels: {job="kube-controller-manager", namespace="openshift-kube-controller-manager", service="kube-controller-manager", severity="warning"}
alert TargetDown fired for 48 seconds with labels: {job="machine-config-daemon", namespace="openshift-machine-config-operator", service="machine-config-daemon", severity="warning"}
alert TargetDown fired for 48 seconds with labels: {job="multus-admission-controller", namespace="openshift-multus", service="multus-admission-controller", severity="warning"}
alert TargetDown fired for 48 seconds with labels: {job="scheduler", namespace="openshift-kube-scheduler", service="scheduler", severity="warning"}
alert etcdMembersDown fired for 57 seconds with labels: {job="etcd", namespace="openshift-etcd", pod="etcd-master-2.ocp-ppc64le-test-080078.aus.stglabs.ibm.com", service="etcd", severity="critical"}

Full Stack Trace
github.com/onsi/ginkgo/internal/leafnodes.(*runner).runSync(0xc0020fefc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	github.com/onsi/ginkgo.0-origin.0+incompatible/internal/leafnodes/runner.go:113 +0xb8
github.com/onsi/ginkgo/internal/leafnodes.(*runner).run(0xc0020fefc0, 0xc0027c06c0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	github.com/onsi/ginkgo.0-origin.0+incompatible/internal/leafnodes/runner.go:64 +0x180
github.com/onsi/ginkgo/internal/leafnodes.(*ItNode).Run(0xc001f29820, 0x1213aee48, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	github.com/onsi/ginkgo.0-origin.0+incompatible/internal/leafnodes/it_node.go:26 +0x98
github.com/onsi/ginkgo/internal/spec.(*Spec).runSample(0xc0033c12c0, 0x0, 0x1213aee48, 0xc0004c1f80)
	github.com/onsi/ginkgo.0-origin.0+incompatible/internal/spec/spec.go:215 +0x22c
github.com/onsi/ginkgo/internal/spec.(*Spec).Run(0xc0033c12c0, 0x1213aee48, 0xc0004c1f80)
	github.com/onsi/ginkgo.0-origin.0+incompatible/internal/spec/spec.go:138 +0x110
github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpec(0xc0023f2140, 0xc0033c12c0, 0x0)
	github.com/onsi/ginkgo.0-origin.0+incompatible/internal/specrunner/spec_runner.go:200 +0x100
github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpecs(0xc0023f2140, 0x1)
	github.com/onsi/ginkgo.0-origin.0+incompatible/internal/specrunner/spec_runner.go:170 +0x148
github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).Run(0xc0023f2140, 0xc001357708)
	github.com/onsi/ginkgo.0-origin.0+incompatible/internal/specrunner/spec_runner.go:66 +0x118
github.com/onsi/ginkgo/internal/suite.(*Suite).Run(0xc0004899a0, 0x1213af108, 0xc003021b30, 0x0, 0x0, 0xc0023dc3b0, 0x1, 0x1, 0x121480ad8, 0xc0004c1f80, ...)
	github.com/onsi/ginkgo.0-origin.0+incompatible/internal/suite/suite.go:62 +0x378
github.com/openshift/origin/pkg/test/ginkgo.(*TestOptions).Run(0xc0006b9b30, 0xc001261790, 0x1, 0x1, 0x123b17f00, 0x11d5b72b0)
	github.com/openshift/origin/pkg/test/ginkgo/cmd_runtest.go:61 +0x390
main.newRunTestCommand.func1.1()
	github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:426 +0x60
github.com/openshift/origin/test/extended/util.WithCleanup(0xc0023bfbb0)
	github.com/openshift/origin/test/extended/util/test.go:167 +0x80
main.newRunTestCommand.func1(0xc00291eb00, 0xc001261790, 0x1, 0x1, 0x0, 0x0)
	github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:426 +0x2d4
github.com/spf13/cobra.(*Command).execute(0xc00291eb00, 0xc0012616b0, 0x1, 0x1, 0xc00291eb00, 0xc0012616b0)
	github.com/spf13/cobra.1/command.go:850 +0x3d0
github.com/spf13/cobra.(*Command).ExecuteC(0xc00291e000, 0x0, 0x1213b66c8, 0x123f31978)
	github.com/spf13/cobra.1/command.go:958 +0x2b4
github.com/spf13/cobra.(*Command).Execute(...)
	github.com/spf13/cobra.1/command.go:895
main.main.func1(0xc00291e000, 0x0, 0x0)
	github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:84 +0xa0
main.main()
	github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:85 +0x3b4
[AfterEach] [sig-instrumentation][Late] Alerts
  github.com/openshift/origin/test/extended/util/client.go:140
STEP: Collecting events from namespace "e2e-test-prometheus-6vkg9".
STEP: Found 6 events.
Jun 22 09:58:18.373: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for execpod: { } Scheduled: Successfully assigned e2e-test-prometheus-6vkg9/execpod to infnod-1.ocp-ppc64le-test-080078.aus.stglabs.ibm.com
Jun 22 09:58:18.373: INFO: At 2021-06-22 09:58:10 -0400 EDT - event for e2e-test-prometheus-6vkg9: {namespace-security-allocation-controller } CreatedSCCRanges: created SCC ranges
Jun 22 09:58:18.373: INFO: At 2021-06-22 09:58:13 -0400 EDT - event for execpod: {multus } AddedInterface: Add eth0 [10.128.3.3/23] from openshift-sdn
Jun 22 09:58:18.373: INFO: At 2021-06-22 09:58:13 -0400 EDT - event for execpod: {kubelet infnod-1.ocp-ppc64le-test-080078.aus.stglabs.ibm.com} Pulled: Container image "image-registry.openshift-image-registry.svc:5000/openshift/tools:latest" already present on machine
Jun 22 09:58:18.373: INFO: At 2021-06-22 09:58:14 -0400 EDT - event for execpod: {kubelet infnod-1.ocp-ppc64le-test-080078.aus.stglabs.ibm.com} Created: Created container agnhost-container
Jun 22 09:58:18.373: INFO: At 2021-06-22 09:58:14 -0400 EDT - event for execpod: {kubelet infnod-1.ocp-ppc64le-test-080078.aus.stglabs.ibm.com} Started: Started container agnhost-container
Jun 22 09:58:18.377: INFO: POD      NODE                                                  PHASE    GRACE  CONDITIONS
Jun 22 09:58:18.377: INFO: execpod  infnod-1.ocp-ppc64le-test-080078.aus.stglabs.ibm.com  Running  1s     [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2021-06-22 09:58:10 -0400 EDT  } {Ready True 0001-01-01 00:00:00 +0000 UTC 2021-06-22 09:58:14 -0400 EDT  } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 2021-06-22 09:58:14 -0400 EDT  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2021-06-22 09:58:10 -0400 EDT  }]
Jun 22 09:58:18.377: INFO: 
Jun 22 09:58:18.384: INFO: skipping dumping cluster info - cluster too large
[AfterEach] [sig-instrumentation][Late] Alerts
  github.com/openshift/origin/test/extended/util/client.go:141
STEP: Destroying namespace "e2e-test-prometheus-6vkg9" for this suite.
fail [github.com/onsi/ginkgo.0-origin.0+incompatible/internal/leafnodes/runner.go:113]: Jun 22 09:58:18.353: Unexpected alerts fired or pending after the test run:

alert CannotRetrieveUpdates fired for 3600 seconds with labels: {endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", service="cluster-version-operator", severity="warning"}
alert ClusterMonitoringOperatorReconciliationErrors fired for 35 seconds with labels: {severity="warning"}
alert ClusterNotUpgradeable fired for 162 seconds with labels: {condition="Upgradeable", endpoint="metrics", name="version", severity="warning"}
alert ClusterOperatorDegraded fired for 12 seconds with labels: {condition="Degraded", endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="kube-controller-manager", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", reason="NodeController_MasterNodesReady", service="cluster-version-operator", severity="warning"}
alert ClusterOperatorDegraded fired for 42 seconds with labels: {condition="Degraded", endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="etcd", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", reason="EtcdMembers_UnhealthyMembers::NodeController_MasterNodesReady", service="cluster-version-operator", severity="warning"}
alert ClusterOperatorDegraded fired for 42 seconds with labels: {condition="Degraded", endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="kube-apiserver", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", reason="NodeController_MasterNodesReady", service="cluster-version-operator", severity="warning"}
alert ClusterOperatorDegraded fired for 42 seconds with labels: {condition="Degraded", endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="kube-scheduler", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", reason="NodeController_MasterNodesReady", service="cluster-version-operator", severity="warning"}
alert ClusterOperatorDegraded fired for 42 seconds with labels: {condition="Degraded", endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="monitoring", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", reason="UpdatingnodeExporterFailed", service="cluster-version-operator", severity="warning"}
alert ClusterOperatorDegraded fired for 72 seconds with labels: {condition="Degraded", endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="machine-config", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", reason="MachineConfigDaemonFailed", service="cluster-version-operator", severity="warning"}
alert ClusterOperatorDegraded fired for 72 seconds with labels: {condition="Degraded", endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="network", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", reason="RolloutHung", service="cluster-version-operator", severity="warning"}
alert ClusterOperatorDegraded fired for 72 seconds with labels: {condition="Degraded", endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="openshift-apiserver", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", reason="APIServerDeployment_UnavailablePod", service="cluster-version-operator", severity="warning"}
alert ClusterOperatorDown fired for 42 seconds with labels: {endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="monitoring", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", service="cluster-version-operator", severity="critical", version="4.8.0-rc.0"}
alert ClusterOperatorDown fired for 72 seconds with labels: {endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="machine-config", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", service="cluster-version-operator", severity="critical", version="4.8.0-rc.0"}
alert KubeDaemonSetMisScheduled fired for 3600 seconds with labels: {container="kube-rbac-proxy-main", daemonset="ingress-canary", endpoint="https-main", job="kube-state-metrics", namespace="openshift-ingress-canary", service="kube-state-metrics", severity="warning"}
alert KubeDaemonSetMisScheduled fired for 59 seconds with labels: {container="kube-rbac-proxy-main", daemonset="dns-default", endpoint="https-main", job="kube-state-metrics", namespace="openshift-dns", service="kube-state-metrics", severity="warning"}
alert KubeDaemonSetMisScheduled fired for 59 seconds with labels: {container="kube-rbac-proxy-main", daemonset="machine-config-server", endpoint="https-main", job="kube-state-metrics", namespace="openshift-machine-config-operator", service="kube-state-metrics", severity="warning"}
alert KubeDaemonSetMisScheduled fired for 59 seconds with labels: {container="kube-rbac-proxy-main", daemonset="multus-admission-controller", endpoint="https-main", job="kube-state-metrics", namespace="openshift-multus", service="kube-state-metrics", severity="warning"}
alert KubeDaemonSetMisScheduled fired for 59 seconds with labels: {container="kube-rbac-proxy-main", daemonset="sdn-controller", endpoint="https-main", job="kube-state-metrics", namespace="openshift-sdn", service="kube-state-metrics", severity="warning"}
alert KubeNodeNotReady fired for 53 seconds with labels: {condition="Ready", container="kube-rbac-proxy-main", endpoint="https-main", job="kube-state-metrics", namespace="openshift-monitoring", node="master-2.ocp-ppc64le-test-080078.aus.stglabs.ibm.com", service="kube-state-metrics", severity="warning", status="true"}
alert KubeNodeUnreachable fired for 53 seconds with labels: {container="kube-rbac-proxy-main", effect="NoSchedule", endpoint="https-main", job="kube-state-metrics", key="node.kubernetes.io/unreachable", namespace="openshift-monitoring", node="master-2.ocp-ppc64le-test-080078.aus.stglabs.ibm.com", service="kube-state-metrics", severity="warning"}
alert KubePodNotReady fired for 59 seconds with labels: {namespace="openshift-apiserver", pod="apiserver-6c879879c5-8hmgk", severity="warning"}
alert KubePodNotReady fired for 59 seconds with labels: {namespace="openshift-authentication", pod="oauth-openshift-7cf58f567d-4d262", severity="warning"}
alert KubePodNotReady fired for 59 seconds with labels: {namespace="openshift-oauth-apiserver", pod="apiserver-68d6564447-mxn99", severity="warning"}
alert NTOPodsNotReady fired for 46 seconds with labels: {condition="true", container="kube-rbac-proxy-main", endpoint="https-main", job="kube-state-metrics", namespace="openshift-cluster-node-tuning-operator", pod="tuned-qfb6t", service="kube-state-metrics", severity="warning"}
alert PodDisruptionBudgetAtLimit fired for 59 seconds with labels: {namespace="openshift-etcd", poddisruptionbudget="etcd-quorum-guard", severity="warning"}
alert SDNPodNotReady fired for 61 seconds with labels: {condition="true", container="kube-rbac-proxy-main", endpoint="https-main", job="kube-state-metrics", namespace="openshift-sdn", pod="sdn-controller-hcgzk", service="kube-state-metrics", severity="warning"}
alert SDNPodNotReady fired for 61 seconds with labels: {condition="true", container="kube-rbac-proxy-main", endpoint="https-main", job="kube-state-metrics", namespace="openshift-sdn", pod="sdn-ftb2l", service="kube-state-metrics", severity="warning"}
alert SystemMemoryExceedsReservation fired for 3600 seconds with labels: {node="infnod-0.ocp-ppc64le-test-080078.aus.stglabs.ibm.com", severity="warning"}
alert SystemMemoryExceedsReservation fired for 3600 seconds with labels: {node="master-0.ocp-ppc64le-test-080078.aus.stglabs.ibm.com", severity="warning"}
alert SystemMemoryExceedsReservation fired for 3600 seconds with labels: {node="master-1.ocp-ppc64le-test-080078.aus.stglabs.ibm.com", severity="warning"}
alert TargetDown fired for 18 seconds with labels: {job="crio", namespace="kube-system", service="kubelet", severity="warning"}
alert TargetDown fired for 18 seconds with labels: {job="kubelet", namespace="kube-system", service="kubelet", severity="warning"}
alert TargetDown fired for 18 seconds with labels: {job="network-metrics-service", namespace="openshift-multus", service="network-metrics-service", severity="warning"}
alert TargetDown fired for 18 seconds with labels: {job="node-exporter", namespace="openshift-monitoring", service="node-exporter", severity="warning"}
alert TargetDown fired for 18 seconds with labels: {job="sdn", namespace="openshift-sdn", service="sdn", severity="warning"}
alert TargetDown fired for 48 seconds with labels: {job="controller-manager", namespace="openshift-controller-manager", service="controller-manager", severity="warning"}
alert TargetDown fired for 48 seconds with labels: {job="dns-default", namespace="openshift-dns", service="dns-default", severity="warning"}
alert TargetDown fired for 48 seconds with labels: {job="etcd", namespace="openshift-etcd", service="etcd", severity="warning"}
alert TargetDown fired for 48 seconds with labels: {job="kube-controller-manager", namespace="openshift-kube-controller-manager", service="kube-controller-manager", severity="warning"}
alert TargetDown fired for 48 seconds with labels: {job="machine-config-daemon", namespace="openshift-machine-config-operator", service="machine-config-daemon", severity="warning"}
alert TargetDown fired for 48 seconds with labels: {job="multus-admission-controller", namespace="openshift-multus", service="multus-admission-controller", severity="warning"}
alert TargetDown fired for 48 seconds with labels: {job="scheduler", namespace="openshift-kube-scheduler", service="scheduler", severity="warning"}
alert etcdMembersDown fired for 57 seconds with labels: {job="etcd", namespace="openshift-etcd", pod="etcd-master-2.ocp-ppc64le-test-080078.aus.stglabs.ibm.com", service="etcd", severity="critical"}

Comment 8 Andy McCrae 2021-06-24 14:25:52 UTC

Hi Tania,

The issue you linked is still in flight (should land in kube 1.22), that said those are seen as "Information" and whilst they do generate noise in the logs, it shouldn't be the cause of this issue, as it won't count towards the Error budget.

The KubeAPIErrorBudgetBurn is 2 things - errors, and/or slow queries. You can check in the console under "Monitoring --> Dashboards" and the API Performance dashboard will give you a rundown of request statuses (confirming there are not many errors - at least in the instances I have checked), and how many "long running" queries there are.

Since it looks like there are no outright errors it means this is caused by slow queries - Mark confirmed that on the baremetal deploys they didn't see this alert, and we haven't seen this on other architectures (s390x/x86_64) - it looks like it is specific to the test/CI infra performance. I'd suspect disk speed as a start point (since the bulk of slow queries I saw were configMap, which should use etcd, and etcd is disk intensive) and it may be worth checking that vs etcd.

Comment 9 Manoj Kumar 2021-06-24 19:54:54 UTC

@tania: Can you leave the cluster in this state and give me access?

Comment 10 Manoj Kumar 2021-06-25 15:28:21 UTC

Here is the etcdctl output from @tkapoor's cluster.

[root@rdr-tanumig-mon01-bastion-0 ~]# oc rsh  etcd-rdr-tanumig-mon01-master-0 
Defaulted container "etcdctl" out of: etcdctl, etcd, etcd-metrics, etcd-health-monitor, setup (init), etcd-ensure-env-vars (init), etcd-resources-copy (init)
sh-4.4# etcdctl check perf --load="s"
 60 / 60 Boooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00% 1m0s
PASS: Throughput is 150 writes/s
PASS: Slowest request took 0.042616s
PASS: Stddev is 0.002902s
PASS
sh-4.4# 
sh-4.4# set -o vi
sh-4.4# etcdctl check perf --load="m"
 60 / 60 Boooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00% 1m0s
PASS: Throughput is 915 writes/s
PASS: Slowest request took 0.205755s
PASS: Stddev is 0.015610s
PASS
sh-4.4# etcdctl check perf --load="l"
 60 / 60 Boooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00% 1m0s
FAIL: Throughput too low: 3115 writes/s
Slowest request took too long: 0.534104s
PASS: Stddev is 0.043344s
FAIL


And here is the output from the bare-metal cluster from @mhamzy 

sh-4.4# etcdctl check perf --load="s"
 60 / 60 Booooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00% 1m0s
PASS: Throughput is 151 writes/s
PASS: Slowest request took 0.033216s
PASS: Stddev is 0.001029s
PASS
sh-4.4# 
sh-4.4# set -o vi
sh-4.4# etcdctl check perf --load="m"
 60 / 60 Booooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00% 1m0s
PASS: Throughput is 999 writes/s
PASS: Slowest request took 0.027295s
PASS: Stddev is 0.001508s
PASS
sh-4.4# etcdctl check perf --load="l"
 60 / 60 Booooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00% 1m0s
PASS: Throughput is 7595 writes/s
PASS: Slowest request took 0.234594s
PASS: Stddev is 0.006870s
PASS


Throughput and latency for the 'l' load seems to be beyond the acceptable value.  
@tkapoor , can you try changing etcd to use ramdisk?

Comment 11 Mark Hamzy 2021-06-28 12:30:14 UTC

I deployed a cluster in the ppc64le CI environment:

[root@C155F2U33 ~]# (HOSTNAME=$(oc get routes/alertmanager-main -n openshift-monitoring -o json | jq -r '.spec.host'); TOKEN=$(oc -n openshift-monitoring sa get-token prometheus-k8s); curl --silent --insecure --header "Authorization: Bearer ${TOKEN}" https://${HOSTNAME}/api/v1/alerts | jq '.data[] | select(.labels.alertname=="KubeAPIErrorBudgetBurn")')
...
{
  "labels": {
    "alertname": "KubeAPIErrorBudgetBurn",
    "long": "3d",
    "prometheus": "openshift-monitoring/k8s",
    "severity": "warning",
    "short": "6h"
  },
...


sh-4.4# etcdctl check perf --load="s"
 60 / 60 Booooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00% 1m0s
PASS: Throughput is 149 writes/s
PASS: Slowest request took 0.154165s
PASS: Stddev is 0.005295s
PASS
sh-4.4# etcdctl check perf --load="m"
 60 / 60 Booooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00% 1m0s
FAIL: Throughput too low: 864 writes/s
PASS: Slowest request took 0.141168s
PASS: Stddev is 0.008066s
FAIL
sh-4.4# etcdctl check perf --load="l"
 60 / 60 Booooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00% 1m0s
FAIL: Throughput too low: 2476 writes/s
PASS: Slowest request took 0.335178s
PASS: Stddev is 0.039842s
FAIL

Comment 12 Andy McCrae 2021-08-02 12:32:29 UTC

Piyush is looking into this alert still - assigning it over. My understanding is that we think this is related to IO throttling in the environment.

Comment 14 Michal Fojtik 2022-08-18 14:33:56 UTC

Dear reporter, 

As part of the migration of all OpenShift bugs to Red Hat Jira, we are evaluating all bugs which will result in some stale issues or those without high or urgent priority to be closed. If you believe this bug still requires engineering resolution, we kindly ask you to follow this link[1] and continue working with us in Jira by recreating the issue and providing the necessary information. Also, please provide the link to the original Bugzilla in the description. 

To create an issue, follow this link:
[1] https://issues.redhat.com/secure/CreateIssueDetails!init.jspa?pid=12332330&issuetype=1&priority=10300&components=12367637

Comment 15 Red Hat Bugzilla 2023-09-15 01:34:28 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days

Note You need to log in before you can comment on or make changes to this bug.