Description of problem: Running e2e on Power cluster, results KubeAPIErrorBudgetBurn Alert issue causing ``` [sig-instrumentation][Late] Alerts shouldn't report any alerts in firing or pending state apart from Watchdog and AlertmanagerReceiversNotConfigured and have no gaps in Watchdog firing [Suite:openshift/conformance/parallel] ``` e2e test to fail. Version-Release number of selected component (if applicable): e2e Logs ``` [root@rdr-sdntest-mon01-bastion-0 origin]# ./openshift-tests run-test "[sig-instrumentation][Late] Alerts shouldn't report any alerts in firing or pending state apart from Watchdog and AlertmanagerReceiversNotConfigured and have no gaps in Watchdog firing [Suite:openshift/conformance/parallel]" warning: KUBE_TEST_REPO_LIST may not be set when using openshift-tests and will be ignored [BeforeEach] [Top Level] github.com/openshift/origin/test/extended/util/framework.go:1450 [BeforeEach] [Top Level] github.com/openshift/origin/test/extended/util/framework.go:1450 [BeforeEach] [Top Level] github.com/openshift/origin/test/extended/util/test.go:59 [BeforeEach] [sig-instrumentation][Late] Alerts github.com/openshift/origin/test/extended/util/client.go:142 STEP: Creating a kubernetes client [BeforeEach] [sig-instrumentation][Late] Alerts github.com/openshift/origin/test/extended/prometheus/prometheus.go:50 [It] shouldn't report any alerts in firing or pending state apart from Watchdog and AlertmanagerReceiversNotConfigured and have no gaps in Watchdog firing [Suite:openshift/conformance/parallel] github.com/openshift/origin/test/extended/prometheus/prometheus.go:58 Jun 10 04:54:10.609: INFO: Creating namespace "e2e-test-prometheus-wbx66" Jun 10 04:54:10.876: INFO: Waiting for ServiceAccount "default" to be provisioned... Jun 10 04:54:10.990: INFO: Creating new exec pod Jun 10 04:54:15.053: INFO: Running '/usr/local/bin/kubectl --server=https://api.rdr-sdntest.redhat.com:6443 --kubeconfig=/root/openstack-upi/auth/kubeconfig --namespace=e2e-test-prometheus-wbx66 exec execpod -- /bin/sh -x -c curl --retry 15 --max-time 2 --retry-delay 1 -s -k -H 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6ImFDaTNLd1pZU01FdGpsTlo4Y1hSbGxJVTcwVmFxYTQ0RS1IdUE1T04xMGcifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbW9uaXRvcmluZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLWFkYXB0ZXItdG9rZW4ta2g4d2ciLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cy1hZGFwdGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiZDUzYjczN2MtMjQ2NC00YzQ2LTgxZDAtYzIxNjcwYjVkN2ViIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1tb25pdG9yaW5nOnByb21ldGhldXMtYWRhcHRlciJ9.kIx2TzjT9SeGa1uPTeO50Wk2GsJHVr01GMvCrY8edZf6RqsWveUCpK8fDJHsP6-udP04M6l28rh2MZA9BhO-dH7Hg59QmIFvbNbnt5Pj4mbGBrXMsm3UzyJHN6XvGZ7i0F-Dx8nBueSO81bXWFNINB_aBMl8zA-MbkJn9CHt2ea1ajaJ-pSvl5LEZbIAvwqkIQ3m3vj59OaTWHNsHU5ep2qlPrTs_O4C_sgbOTWY_pUIYx22GActyiOkpDtnAmIhp0uF7RMeTOaoid8VPPHrtSR7rZsr4AJVfXwGGDDXgY4YeXxCaTnU0cqIk_dIy1CY_5NsfYfXUnq5jhvqURaSkw' "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=changes%28%28max%28%28ALERTS%7Balertstate%3D%22firing%22%2Calertname%3D%22Watchdog%22%2Cseverity%3D%22none%22%7D%29+or+%28absent%28ALERTS%7Balertstate%3D%22firing%22%2Calertname%3D%22Watchdog%22%2Cseverity%3D%22none%22%7D%29%2A0%29%29%29%5B1h0m0s%3A1s%5D%29+%3E+1"' Jun 10 04:54:15.664: INFO: stderr: "+ curl --retry 15 --max-time 2 --retry-delay 1 -s -k -H 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6ImFDaTNLd1pZU01FdGpsTlo4Y1hSbGxJVTcwVmFxYTQ0RS1IdUE1T04xMGcifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbW9uaXRvcmluZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLWFkYXB0ZXItdG9rZW4ta2g4d2ciLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cy1hZGFwdGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiZDUzYjczN2MtMjQ2NC00YzQ2LTgxZDAtYzIxNjcwYjVkN2ViIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1tb25pdG9yaW5nOnByb21ldGhldXMtYWRhcHRlciJ9.kIx2TzjT9SeGa1uPTeO50Wk2GsJHVr01GMvCrY8edZf6RqsWveUCpK8fDJHsP6-udP04M6l28rh2MZA9BhO-dH7Hg59QmIFvbNbnt5Pj4mbGBrXMsm3UzyJHN6XvGZ7i0F-Dx8nBueSO81bXWFNINB_aBMl8zA-MbkJn9CHt2ea1ajaJ-pSvl5LEZbIAvwqkIQ3m3vj59OaTWHNsHU5ep2qlPrTs_O4C_sgbOTWY_pUIYx22GActyiOkpDtnAmIhp0uF7RMeTOaoid8VPPHrtSR7rZsr4AJVfXwGGDDXgY4YeXxCaTnU0cqIk_dIy1CY_5NsfYfXUnq5jhvqURaSkw' 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=changes%28%28max%28%28ALERTS%7Balertstate%3D%22firing%22%2Calertname%3D%22Watchdog%22%2Cseverity%3D%22none%22%7D%29+or+%28absent%28ALERTS%7Balertstate%3D%22firing%22%2Calertname%3D%22Watchdog%22%2Cseverity%3D%22none%22%7D%29%2A0%29%29%29%5B1h0m0s%3A1s%5D%29+%3E+1'\n" Jun 10 04:54:15.664: INFO: stdout: "{\"status\":\"success\",\"data\":{\"resultType\":\"vector\",\"result\":[]}}\n" Jun 10 04:54:15.665: INFO: Running '/usr/local/bin/kubectl --server=https://api.rdr-sdntest.redhat.com:6443 --kubeconfig=/root/openstack-upi/auth/kubeconfig --namespace=e2e-test-prometheus-wbx66 exec execpod -- /bin/sh -x -c curl --retry 15 --max-time 2 --retry-delay 1 -s -k -H 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6ImFDaTNLd1pZU01FdGpsTlo4Y1hSbGxJVTcwVmFxYTQ0RS1IdUE1T04xMGcifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbW9uaXRvcmluZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLWFkYXB0ZXItdG9rZW4ta2g4d2ciLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cy1hZGFwdGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiZDUzYjczN2MtMjQ2NC00YzQ2LTgxZDAtYzIxNjcwYjVkN2ViIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1tb25pdG9yaW5nOnByb21ldGhldXMtYWRhcHRlciJ9.kIx2TzjT9SeGa1uPTeO50Wk2GsJHVr01GMvCrY8edZf6RqsWveUCpK8fDJHsP6-udP04M6l28rh2MZA9BhO-dH7Hg59QmIFvbNbnt5Pj4mbGBrXMsm3UzyJHN6XvGZ7i0F-Dx8nBueSO81bXWFNINB_aBMl8zA-MbkJn9CHt2ea1ajaJ-pSvl5LEZbIAvwqkIQ3m3vj59OaTWHNsHU5ep2qlPrTs_O4C_sgbOTWY_pUIYx22GActyiOkpDtnAmIhp0uF7RMeTOaoid8VPPHrtSR7rZsr4AJVfXwGGDDXgY4YeXxCaTnU0cqIk_dIy1CY_5NsfYfXUnq5jhvqURaSkw' "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=%0Asort_desc%28%0Acount_over_time%28ALERTS%7Balertstate%3D%22firing%22%2Cseverity%21%3D%22info%22%2Calertname%21~%22Watchdog%7CAlertmanagerReceiversNotConfigured%22%7D%5B1h0m0s%3A1s%5D%29%0A%29+%3E+0%0A"' Jun 10 04:54:16.049: INFO: stderr: "+ curl --retry 15 --max-time 2 --retry-delay 1 -s -k -H 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6ImFDaTNLd1pZU01FdGpsTlo4Y1hSbGxJVTcwVmFxYTQ0RS1IdUE1T04xMGcifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbW9uaXRvcmluZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLWFkYXB0ZXItdG9rZW4ta2g4d2ciLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cy1hZGFwdGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiZDUzYjczN2MtMjQ2NC00YzQ2LTgxZDAtYzIxNjcwYjVkN2ViIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1tb25pdG9yaW5nOnByb21ldGhldXMtYWRhcHRlciJ9.kIx2TzjT9SeGa1uPTeO50Wk2GsJHVr01GMvCrY8edZf6RqsWveUCpK8fDJHsP6-udP04M6l28rh2MZA9BhO-dH7Hg59QmIFvbNbnt5Pj4mbGBrXMsm3UzyJHN6XvGZ7i0F-Dx8nBueSO81bXWFNINB_aBMl8zA-MbkJn9CHt2ea1ajaJ-pSvl5LEZbIAvwqkIQ3m3vj59OaTWHNsHU5ep2qlPrTs_O4C_sgbOTWY_pUIYx22GActyiOkpDtnAmIhp0uF7RMeTOaoid8VPPHrtSR7rZsr4AJVfXwGGDDXgY4YeXxCaTnU0cqIk_dIy1CY_5NsfYfXUnq5jhvqURaSkw' 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=%0Asort_desc%28%0Acount_over_time%28ALERTS%7Balertstate%3D%22firing%22%2Cseverity%21%3D%22info%22%2Calertname%21~%22Watchdog%7CAlertmanagerReceiversNotConfigured%22%7D%5B1h0m0s%3A1s%5D%29%0A%29+%3E+0%0A'\n" Jun 10 04:54:16.049: INFO: stdout: "{\"status\":\"success\",\"data\":{\"resultType\":\"vector\",\"result\":[{\"metric\":{\"alertname\":\"KubeAPIErrorBudgetBurn\",\"alertstate\":\"firing\",\"long\":\"1d\",\"prometheus\":\"openshift-monitoring/k8s\",\"severity\":\"warning\",\"short\":\"2h\"},\"value\":[1623315256.038,\"3600\"]},{\"metric\":{\"alertname\":\"KubeAPIErrorBudgetBurn\",\"alertstate\":\"firing\",\"long\":\"3d\",\"prometheus\":\"openshift-monitoring/k8s\",\"severity\":\"warning\",\"short\":\"6h\"},\"value\":[1623315256.038,\"3600\"]},{\"metric\":{\"alertname\":\"CannotRetrieveUpdates\",\"alertstate\":\"firing\",\"endpoint\":\"metrics\",\"instance\":\"193.168.200.123:9099\",\"job\":\"cluster-version-operator\",\"namespace\":\"openshift-cluster-version\",\"pod\":\"cluster-version-operator-5b64488d4d-7r5xb\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"cluster-version-operator\",\"severity\":\"warning\"},\"value\":[1623315256.038,\"930\"]}]}}\n" Jun 10 04:54:16.049: INFO: Running '/usr/local/bin/kubectl --server=https://api.rdr-sdntest.redhat.com:6443 --kubeconfig=/root/openstack-upi/auth/kubeconfig --namespace=e2e-test-prometheus-wbx66 exec execpod -- /bin/sh -x -c curl --retry 15 --max-time 2 --retry-delay 1 -s -k -H 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6ImFDaTNLd1pZU01FdGpsTlo4Y1hSbGxJVTcwVmFxYTQ0RS1IdUE1T04xMGcifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbW9uaXRvcmluZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLWFkYXB0ZXItdG9rZW4ta2g4d2ciLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cy1hZGFwdGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiZDUzYjczN2MtMjQ2NC00YzQ2LTgxZDAtYzIxNjcwYjVkN2ViIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1tb25pdG9yaW5nOnByb21ldGhldXMtYWRhcHRlciJ9.kIx2TzjT9SeGa1uPTeO50Wk2GsJHVr01GMvCrY8edZf6RqsWveUCpK8fDJHsP6-udP04M6l28rh2MZA9BhO-dH7Hg59QmIFvbNbnt5Pj4mbGBrXMsm3UzyJHN6XvGZ7i0F-Dx8nBueSO81bXWFNINB_aBMl8zA-MbkJn9CHt2ea1ajaJ-pSvl5LEZbIAvwqkIQ3m3vj59OaTWHNsHU5ep2qlPrTs_O4C_sgbOTWY_pUIYx22GActyiOkpDtnAmIhp0uF7RMeTOaoid8VPPHrtSR7rZsr4AJVfXwGGDDXgY4YeXxCaTnU0cqIk_dIy1CY_5NsfYfXUnq5jhvqURaSkw' "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=ALERTS%7Balertname%21~%22Watchdog%7CAlertmanagerReceiversNotConfigured%22%2Calertstate%3D%22pending%22%2Cseverity%21%3D%22info%22%7D"' Jun 10 04:54:16.447: INFO: stderr: "+ curl --retry 15 --max-time 2 --retry-delay 1 -s -k -H 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6ImFDaTNLd1pZU01FdGpsTlo4Y1hSbGxJVTcwVmFxYTQ0RS1IdUE1T04xMGcifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbW9uaXRvcmluZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLWFkYXB0ZXItdG9rZW4ta2g4d2ciLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cy1hZGFwdGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiZDUzYjczN2MtMjQ2NC00YzQ2LTgxZDAtYzIxNjcwYjVkN2ViIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1tb25pdG9yaW5nOnByb21ldGhldXMtYWRhcHRlciJ9.kIx2TzjT9SeGa1uPTeO50Wk2GsJHVr01GMvCrY8edZf6RqsWveUCpK8fDJHsP6-udP04M6l28rh2MZA9BhO-dH7Hg59QmIFvbNbnt5Pj4mbGBrXMsm3UzyJHN6XvGZ7i0F-Dx8nBueSO81bXWFNINB_aBMl8zA-MbkJn9CHt2ea1ajaJ-pSvl5LEZbIAvwqkIQ3m3vj59OaTWHNsHU5ep2qlPrTs_O4C_sgbOTWY_pUIYx22GActyiOkpDtnAmIhp0uF7RMeTOaoid8VPPHrtSR7rZsr4AJVfXwGGDDXgY4YeXxCaTnU0cqIk_dIy1CY_5NsfYfXUnq5jhvqURaSkw' 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=ALERTS%7Balertname%21~%22Watchdog%7CAlertmanagerReceiversNotConfigured%22%2Calertstate%3D%22pending%22%2Cseverity%21%3D%22info%22%7D'\n" Jun 10 04:54:16.447: INFO: stdout: "{\"status\":\"success\",\"data\":{\"resultType\":\"vector\",\"result\":[]}}\n" Jun 10 04:54:16.447: FAIL: Unexpected alerts fired or pending after the test run: alert CannotRetrieveUpdates fired for 930 seconds with labels: {endpoint="metrics", instance="193.168.200.123:9099", job="cluster-version-operator", namespace="openshift-cluster-version", pod="cluster-version-operator-5b64488d4d-7r5xb", service="cluster-version-operator", severity="warning"} alert KubeAPIErrorBudgetBurn fired for 3600 seconds with labels: {long="1d", severity="warning", short="2h"} alert KubeAPIErrorBudgetBurn fired for 3600 seconds with labels: {long="3d", severity="warning", short="6h"} Full Stack Trace github.com/onsi/ginkgo/internal/leafnodes.(*runner).runSync(0xc0011eed80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...) github.com/onsi/ginkgo.0-origin.0+incompatible/internal/leafnodes/runner.go:113 +0xb8 github.com/onsi/ginkgo/internal/leafnodes.(*runner).run(0xc0011eed80, 0xc000f7ca20, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...) github.com/onsi/ginkgo.0-origin.0+incompatible/internal/leafnodes/runner.go:64 +0x180 github.com/onsi/ginkgo/internal/leafnodes.(*ItNode).Run(0xc00167ce40, 0x141852bd8, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...) github.com/onsi/ginkgo.0-origin.0+incompatible/internal/leafnodes/it_node.go:26 +0x98 github.com/onsi/ginkgo/internal/spec.(*Spec).runSample(0xc001f5dd10, 0x0, 0x141852bd8, 0xc00046a040) github.com/onsi/ginkgo.0-origin.0+incompatible/internal/spec/spec.go:215 +0x22c github.com/onsi/ginkgo/internal/spec.(*Spec).Run(0xc001f5dd10, 0x141852bd8, 0xc00046a040) github.com/onsi/ginkgo.0-origin.0+incompatible/internal/spec/spec.go:138 +0x110 github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpec(0xc001ef5b80, 0xc001f5dd10, 0x0) github.com/onsi/ginkgo.0-origin.0+incompatible/internal/specrunner/spec_runner.go:200 +0x100 github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpecs(0xc001ef5b80, 0x1) github.com/onsi/ginkgo.0-origin.0+incompatible/internal/specrunner/spec_runner.go:170 +0x148 github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).Run(0xc001ef5b80, 0xc002aad2e8) github.com/onsi/ginkgo.0-origin.0+incompatible/internal/specrunner/spec_runner.go:66 +0x118 github.com/onsi/ginkgo/internal/suite.(*Suite).Run(0xc0002fbef0, 0x141852e98, 0xc001cbd400, 0x0, 0x0, 0xc0003e83b0, 0x1, 0x1, 0x141929658, 0xc00046a040, ...) github.com/onsi/ginkgo.0-origin.0+incompatible/internal/suite/suite.go:62 +0x378 github.com/openshift/origin/pkg/test/ginkgo.(*TestOptions).Run(0xc001d76960, 0xc000ac4c00, 0x1, 0x1, 0x1440a7f00, 0x13d8efb70) github.com/openshift/origin/pkg/test/ginkgo/cmd_runtest.go:61 +0x390 main.newRunTestCommand.func1.1() github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:426 +0x60 github.com/openshift/origin/test/extended/util.WithCleanup(0xc00193fbb0) github.com/openshift/origin/test/extended/util/test.go:167 +0x80 main.newRunTestCommand.func1(0xc0016d5b80, 0xc000ac4c00, 0x1, 0x1, 0x0, 0x0) github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:426 +0x2d4 github.com/spf13/cobra.(*Command).execute(0xc0016d5b80, 0xc000ac4b30, 0x1, 0x1, 0xc0016d5b80, 0xc000ac4b30) github.com/spf13/cobra.1/command.go:850 +0x3d0 github.com/spf13/cobra.(*Command).ExecuteC(0xc0016d5080, 0x0, 0x14185a778, 0x1444c40e8) github.com/spf13/cobra.1/command.go:958 +0x2b4 github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra.1/command.go:895 main.main.func1(0xc0016d5080, 0x0, 0x0) github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:84 +0xa0 main.main() github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:85 +0x3b4 [AfterEach] [sig-instrumentation][Late] Alerts github.com/openshift/origin/test/extended/util/client.go:140 STEP: Collecting events from namespace "e2e-test-prometheus-wbx66". STEP: Found 6 events. Jun 10 04:54:16.463: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for execpod: { } Scheduled: Successfully assigned e2e-test-prometheus-wbx66/execpod to worker-1 Jun 10 04:54:16.463: INFO: At 2021-06-10 04:54:10 -0400 EDT - event for e2e-test-prometheus-wbx66: {namespace-security-allocation-controller } CreatedSCCRanges: created SCC ranges Jun 10 04:54:16.463: INFO: At 2021-06-10 04:54:13 -0400 EDT - event for execpod: {multus } AddedInterface: Add eth0 [10.128.3.240/23] Jun 10 04:54:16.463: INFO: At 2021-06-10 04:54:13 -0400 EDT - event for execpod: {kubelet worker-1} Pulled: Container image "image-registry.openshift-image-registry.svc:5000/openshift/tools:latest" already present on machine Jun 10 04:54:16.463: INFO: At 2021-06-10 04:54:14 -0400 EDT - event for execpod: {kubelet worker-1} Created: Created container agnhost-container Jun 10 04:54:16.463: INFO: At 2021-06-10 04:54:14 -0400 EDT - event for execpod: {kubelet worker-1} Started: Started container agnhost-container Jun 10 04:54:16.465: INFO: POD NODE PHASE GRACE CONDITIONS Jun 10 04:54:16.465: INFO: execpod worker-1 Running 1s [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2021-06-10 04:54:11 -0400 EDT } {Ready True 0001-01-01 00:00:00 +0000 UTC 2021-06-10 04:54:14 -0400 EDT } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 2021-06-10 04:54:14 -0400 EDT } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2021-06-10 04:54:11 -0400 EDT }] Jun 10 04:54:16.465: INFO: Jun 10 04:54:16.470: INFO: skipping dumping cluster info - cluster too large [AfterEach] [sig-instrumentation][Late] Alerts github.com/openshift/origin/test/extended/util/client.go:141 STEP: Destroying namespace "e2e-test-prometheus-wbx66" for this suite. fail [github.com/onsi/ginkgo.0-origin.0+incompatible/internal/leafnodes/runner.go:113]: Jun 10 04:54:16.448: Unexpected alerts fired or pending after the test run: alert CannotRetrieveUpdates fired for 930 seconds with labels: {endpoint="metrics", instance="193.168.200.123:9099", job="cluster-version-operator", namespace="openshift-cluster-version", pod="cluster-version-operator-5b64488d4d-7r5xb", service="cluster-version-operator", severity="warning"} alert KubeAPIErrorBudgetBurn fired for 3600 seconds with labels: {long="1d", severity="warning", short="2h"} alert KubeAPIErrorBudgetBurn fired for 3600 seconds with labels: {long="3d", severity="warning", short="6h"} ``` Expected results: Pass
This looks like it may be related to https://bugzilla.redhat.com/show_bug.cgi?id=1953798 - the PRs for that merged a few hours ago, but there have not yet been any new nightly builds that include the fixes. For now, I'm going to spin up a cluster with the latest nightly to confirm the issue and then will compare to the latest builds tomorrow once the fixes are included in a nightly.
@amccrae I tried the deploy a cluster on Power platform (Ppc64le arch) with build 4.8.0-0.nightly-ppc64le-2021-06-13-101555 and 4.8.0-rc.0 build , but the issue is getting reproduced. I'm still seeing the KubeAPIErrorBudgetBurn Alert getting fired. Can you please let me know which build will have the fix included. Thanks
Hi Tania, There isn't a fix at this point - the issue seems to be impacting only P CI deploys - the issue with this alert is that it indicates that either there are too many requests resulting in errors or there are too many slow requests (or both). In the test instances I have setup there are minimal errors, so it looks to be related to slow requests. Can you provide details on what the infrastructure you're running on looks like? Additionally, it would be useful to keep a cluster up and access the console to see the Dashboard which provides additional information on the long running requests, for example on the test instances I have setup the vast majority of long running requests are configmap operations - which suggests etcd slowness, since configmaps are stored in etcd - this could be due to slow disks or some other resource issue within the cluster. This doesn't necessarily indicate a bug though - since it could just be that your etcd performance is not suitable. Andy
amccrae As per https://bugzilla.redhat.com/show_bug.cgi?id=1953798#c18 in https://bugzilla.redhat.com/show_bug.cgi?id=1953798 by Ivan Sim , I did the steps to debug this further hope these following results are useful ----------------------------------------------------------------------------------------------------------------- Labels alertname= KubeAPIErrorBudgetBurn long= 3d severity=warning short=6h ### Restarting the Kubelet [root@rdr-tanikubapi-mon01-bastion-0 ~]# ssh core@worker-0 sudo systemctl restart kubelet.service [root@rdr-tanikubapi-mon01-bastion-0 ~]# ssh core@worker-0 sudo systemctl is-active kubelet.service active [root@rdr-tanikubapi-mon01-bastion-0 ~]# ssh core@worker-1 sudo systemctl restart kubelet.service [root@rdr-tanikubapi-mon01-bastion-0 ~]# ssh core@worker-0 sudo systemctl is-active kubelet.service active [root@rdr-tanikubapi-mon01-bastion-0 ~]# ssh core@master-0 sudo systemctl restart kubelet.service [root@rdr-tanikubapi-mon01-bastion-0 ~]# ssh core@master-0 sudo systemctl is-active kubelet.service active [root@rdr-tanikubapi-mon01-bastion-0 ~]# ssh core@master-1 sudo systemctl restart kubelet.service [root@rdr-tanikubapi-mon01-bastion-0 ~]# ssh core@master-1 sudo systemctl is-active kubelet.service active [root@rdr-tanikubapi-mon01-bastion-0 ~]# ssh core@master-2 sudo systemctl restart kubelet.service [root@rdr-tanikubapi-mon01-bastion-0 ~]# ssh core@master-2 sudo systemctl is-active kubelet.service active ### Cluster debugging tool [root@rdr-tanikubapi-mon01-bastion-0 audit_logs]# ls etcd etcd.audit_logs_listing kube-apiserver kube-apiserver.audit_logs_listing oauth-apiserver oauth-apiserver.audit_logs_listing openshift-apiserver openshift-apiserver.audit_logs_listing [root@rdr-tanikubapi-mon01-bastion-0 audit_logs]# kubectl-dev_tool audit -f oauth-apiserver --by resource --user=default --failed-only -otop [root@rdr-tanikubapi-mon01-bastion-0 audit_logs]# kubectl-dev_tool audit -f openshift-apiserver --by resource --user=default --failed-only -otop [root@rdr-tanikubapi-mon01-bastion-0 audit_logs]# kubectl-dev_tool audit -f etcd --by resource --user=default --failed-only -otop [root@rdr-tanikubapi-mon01-bastion-0 audit_logs]# kubectl-dev_tool audit -f kube-apiserver --by resource --user=default --failed-only -otop had 30398 line read failures [root@rdr-tanikubapi-mon01-bastion-0 audit_logs]# kubectl-dev_tool audit -f kube-apiserver -otop --by=user resource="apirequestcounts" had 30398 line read failures count: 1420621, first: 2021-06-16T03:26:42-04:00, last: 2021-06-16T10:46:21-04:00, duration: 7h19m39.495358s 171830x system:serviceaccount:openshift-kube-storage-version-migrator-operator:kube-storage-version-migrator-operator 139228x system:apiserver 108042x system:serviceaccount:openshift-kube-scheduler-operator:openshift-kube-scheduler-operator 103602x system:serviceaccount:openshift-kube-apiserver-operator:kube-apiserver-operator 98150x system:serviceaccount:openshift-kube-controller-manager-operator:kube-controller-manager-operator 92470x system:serviceaccount:openshift-cluster-version:default 65950x system:serviceaccount:openshift-monitoring:cluster-monitoring-operator 52747x system:kube-scheduler 49130x system:kube-controller-manager 43280x system:serviceaccount:openshift-apiserver:openshift-apiserver-sa [root@rdr-tanikubapi-mon01-bastion-0 kube-apiserver]# ls master-0-audit-2021-06-16T08-17-42.550.log master-0-audit.log.gz master-2-audit-2021-06-16T12-01-01.047.log.gz master-2-audit-2021-06-16T14-00-53.152.log.gz master-0-audit-2021-06-16T09-23-21.773.log.gz master-0-termination.log.gz master-2-audit-2021-06-16T12-21-25.117.log.gz master-2-audit-2021-06-16T14-21-16.959.log.gz master-0-audit-2021-06-16T10-29-49.402.log.gz master-1-audit-2021-06-16T10-39-20.886.log.gz master-2-audit-2021-06-16T12-41-54.859.log.gz master-2-audit-2021-06-16T14-41-48.861.log.gz master-0-audit-2021-06-16T11-36-30.432.log.gz master-1-audit.log.gz master-2-audit-2021-06-16T13-00-51.462.log.gz master-2-audit.log.gz master-0-audit-2021-06-16T12-42-34.180.log.gz master-1-termination.log.gz master-2-audit-2021-06-16T13-20-16.718.log.gz master-2-termination.log.gz master-0-audit-2021-06-16T13-42-30.810.log.gz master-2-audit-2021-06-16T11-40-47.376.log.gz master-2-audit-2021-06-16T13-40-38.835.log.gz
Piyush from dev team did further analysis, and here are the findings. There is a solution in-progress for a related issue which impacts all releases after OCP 4.6+ (https://access.redhat.com/solutions/5931541) Details - - kubeapi-server pods are generating unnecessary log lines which contain this line - controlbuf.go:508] transport: loopyWriter.run returning. connection error: desc = "transport is closing" - This is treated as an error and results in fast consumption of the total error budget causing it to generate a Warning state alert. - These errors are not harmful and can be ignored (as per RH in progress solution) - Now the warning is based on a timeline - long=3d and short=6h which means that OCP is warning us that the error budget may get exhausted in the next 30 days. amccrae Could you please check and confirm if this is something that can be safely ignored?.. thanks.
Just FYI, I also tried it on a baremetal cluster on rc.0, and didn't see a budgetburn (export KUBECONFIG=/root/ocp4-workdir/auth/kubeconfig; /bin/rm -rf /tmp/e2e.log /tmp/fixture-testdata-dir* /tmp/junit/ /tmp/tmp.*; /home/test/origin/openshift-tests run-test "[sig-instrumentation][Late] Alerts shouldn't report any alerts in firing or pending state apart from Watchdog and AlertmanagerReceiversNotConfigured and have no gaps in Watchdog firing [Suite:openshift/conformance/parallel]") [BeforeEach] [Top Level] github.com/openshift/origin/test/extended/util/framework.go:1450 [BeforeEach] [Top Level] github.com/openshift/origin/test/extended/util/framework.go:1450 [BeforeEach] [Top Level] github.com/openshift/origin/test/extended/util/test.go:59 [BeforeEach] [sig-instrumentation][Late] Alerts github.com/openshift/origin/test/extended/util/client.go:142 STEP: Creating a kubernetes client [BeforeEach] [sig-instrumentation][Late] Alerts github.com/openshift/origin/test/extended/prometheus/prometheus.go:50 [It] shouldn't report any alerts in firing or pending state apart from Watchdog and AlertmanagerReceiversNotConfigured and have no gaps in Watchdog firing [Suite:openshift/conformance/parallel] github.com/openshift/origin/test/extended/prometheus/prometheus.go:58 Jun 22 09:58:10.287: INFO: Creating namespace "e2e-test-prometheus-6vkg9" Jun 22 09:58:10.560: INFO: Waiting for ServiceAccount "default" to be provisioned... Jun 22 09:58:10.669: INFO: Creating new exec pod Jun 22 09:58:16.726: INFO: Running '/usr/local/bin/kubectl --server=https://api.ocp-ppc64le-test-080078.aus.stglabs.ibm.com:6443 --kubeconfig=/root/ocp4-workdir/auth/kubeconfig --namespace=e2e-test-prometheus-6vkg9 exec execpod -- /bin/sh -x -c curl --retry 15 --max-time 2 --retry-delay 1 -s -k -H 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6Ik5XakczTkd3dGdGbml4N1U2U0JhUGh5ekhZNDg1TTdRZU5YUEtGRU9SZDQifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbW9uaXRvcmluZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLWFkYXB0ZXItdG9rZW4tNWNicngiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cy1hZGFwdGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiNTcyYjhkYTktMTMwZC00NDY1LWI0ZWEtMzcyNTNhODA0ZmY2Iiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1tb25pdG9yaW5nOnByb21ldGhldXMtYWRhcHRlciJ9.Nhmyy5Ln9X03y1N7VwdmwfZWHc0M_i-OHdPOw--xC3LPs-8bc6z9R1sDlGZnr_7qp-3oIqmFshs_oRI1LpzNRfpC53C0DLh0GVFO1P7MTiSqi06UKMm0Fj62_4u6NJ39CsQD1uF7RiGLFzyWzcw5pv_U7GviYmlFpQPixe1WvmO9_55ckjk94LqpSDZb198o4MvaoL1Yh6qOIpxG0k6_EiMdksqfBemXW6sR7RBx1jJ97CmxPNvMRSLNpnbQUPcgYXKK9L5Lkt5MeWL7vsh-o_F0oYth2jH1xzuCpoOvd6374tTwMDUhrFgfKoKcnX8pCHSVCYB9Ig8p2wK_LmmP5A' "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=changes%28%28max%28%28ALERTS%7Balertstate%3D%22firing%22%2Calertname%3D%22Watchdog%22%2Cseverity%3D%22none%22%7D%29+or+%28absent%28ALERTS%7Balertstate%3D%22firing%22%2Calertname%3D%22Watchdog%22%2Cseverity%3D%22none%22%7D%29%2A0%29%29%29%5B1h0m0s%3A1s%5D%29+%3E+1"' Jun 22 09:58:17.379: INFO: stderr: "+ curl --retry 15 --max-time 2 --retry-delay 1 -s -k -H 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6Ik5XakczTkd3dGdGbml4N1U2U0JhUGh5ekhZNDg1TTdRZU5YUEtGRU9SZDQifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbW9uaXRvcmluZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLWFkYXB0ZXItdG9rZW4tNWNicngiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cy1hZGFwdGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiNTcyYjhkYTktMTMwZC00NDY1LWI0ZWEtMzcyNTNhODA0ZmY2Iiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1tb25pdG9yaW5nOnByb21ldGhldXMtYWRhcHRlciJ9.Nhmyy5Ln9X03y1N7VwdmwfZWHc0M_i-OHdPOw--xC3LPs-8bc6z9R1sDlGZnr_7qp-3oIqmFshs_oRI1LpzNRfpC53C0DLh0GVFO1P7MTiSqi06UKMm0Fj62_4u6NJ39CsQD1uF7RiGLFzyWzcw5pv_U7GviYmlFpQPixe1WvmO9_55ckjk94LqpSDZb198o4MvaoL1Yh6qOIpxG0k6_EiMdksqfBemXW6sR7RBx1jJ97CmxPNvMRSLNpnbQUPcgYXKK9L5Lkt5MeWL7vsh-o_F0oYth2jH1xzuCpoOvd6374tTwMDUhrFgfKoKcnX8pCHSVCYB9Ig8p2wK_LmmP5A' 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=changes%28%28max%28%28ALERTS%7Balertstate%3D%22firing%22%2Calertname%3D%22Watchdog%22%2Cseverity%3D%22none%22%7D%29+or+%28absent%28ALERTS%7Balertstate%3D%22firing%22%2Calertname%3D%22Watchdog%22%2Cseverity%3D%22none%22%7D%29%2A0%29%29%29%5B1h0m0s%3A1s%5D%29+%3E+1'\n" Jun 22 09:58:17.379: INFO: stdout: "{\"status\":\"success\",\"data\":{\"resultType\":\"vector\",\"result\":[]}}\n" Jun 22 09:58:17.379: INFO: Running '/usr/local/bin/kubectl --server=https://api.ocp-ppc64le-test-080078.aus.stglabs.ibm.com:6443 --kubeconfig=/root/ocp4-workdir/auth/kubeconfig --namespace=e2e-test-prometheus-6vkg9 exec execpod -- /bin/sh -x -c curl --retry 15 --max-time 2 --retry-delay 1 -s -k -H 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6Ik5XakczTkd3dGdGbml4N1U2U0JhUGh5ekhZNDg1TTdRZU5YUEtGRU9SZDQifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbW9uaXRvcmluZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLWFkYXB0ZXItdG9rZW4tNWNicngiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cy1hZGFwdGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiNTcyYjhkYTktMTMwZC00NDY1LWI0ZWEtMzcyNTNhODA0ZmY2Iiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1tb25pdG9yaW5nOnByb21ldGhldXMtYWRhcHRlciJ9.Nhmyy5Ln9X03y1N7VwdmwfZWHc0M_i-OHdPOw--xC3LPs-8bc6z9R1sDlGZnr_7qp-3oIqmFshs_oRI1LpzNRfpC53C0DLh0GVFO1P7MTiSqi06UKMm0Fj62_4u6NJ39CsQD1uF7RiGLFzyWzcw5pv_U7GviYmlFpQPixe1WvmO9_55ckjk94LqpSDZb198o4MvaoL1Yh6qOIpxG0k6_EiMdksqfBemXW6sR7RBx1jJ97CmxPNvMRSLNpnbQUPcgYXKK9L5Lkt5MeWL7vsh-o_F0oYth2jH1xzuCpoOvd6374tTwMDUhrFgfKoKcnX8pCHSVCYB9Ig8p2wK_LmmP5A' "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=%0Asort_desc%28%0Acount_over_time%28ALERTS%7Balertstate%3D%22firing%22%2Cseverity%21%3D%22info%22%2Calertname%21~%22Watchdog%7CAlertmanagerReceiversNotConfigured%22%7D%5B1h0m0s%3A1s%5D%29%0A%29+%3E+0%0A"' Jun 22 09:58:17.876: INFO: stderr: "+ curl --retry 15 --max-time 2 --retry-delay 1 -s -k -H 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6Ik5XakczTkd3dGdGbml4N1U2U0JhUGh5ekhZNDg1TTdRZU5YUEtGRU9SZDQifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbW9uaXRvcmluZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLWFkYXB0ZXItdG9rZW4tNWNicngiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cy1hZGFwdGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiNTcyYjhkYTktMTMwZC00NDY1LWI0ZWEtMzcyNTNhODA0ZmY2Iiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1tb25pdG9yaW5nOnByb21ldGhldXMtYWRhcHRlciJ9.Nhmyy5Ln9X03y1N7VwdmwfZWHc0M_i-OHdPOw--xC3LPs-8bc6z9R1sDlGZnr_7qp-3oIqmFshs_oRI1LpzNRfpC53C0DLh0GVFO1P7MTiSqi06UKMm0Fj62_4u6NJ39CsQD1uF7RiGLFzyWzcw5pv_U7GviYmlFpQPixe1WvmO9_55ckjk94LqpSDZb198o4MvaoL1Yh6qOIpxG0k6_EiMdksqfBemXW6sR7RBx1jJ97CmxPNvMRSLNpnbQUPcgYXKK9L5Lkt5MeWL7vsh-o_F0oYth2jH1xzuCpoOvd6374tTwMDUhrFgfKoKcnX8pCHSVCYB9Ig8p2wK_LmmP5A' 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=%0Asort_desc%28%0Acount_over_time%28ALERTS%7Balertstate%3D%22firing%22%2Cseverity%21%3D%22info%22%2Calertname%21~%22Watchdog%7CAlertmanagerReceiversNotConfigured%22%7D%5B1h0m0s%3A1s%5D%29%0A%29+%3E+0%0A'\n" Jun 22 09:58:17.876: INFO: stdout: "{\"status\":\"success\",\"data\":{\"resultType\":\"vector\",\"result\":[{\"metric\":{\"alertname\":\"SystemMemoryExceedsReservation\",\"alertstate\":\"firing\",\"node\":\"master-1.ocp-ppc64le-test-080078.aus.stglabs.ibm.com\",\"prometheus\":\"openshift-monitoring/k8s\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"3600\"]},{\"metric\":{\"alertname\":\"SystemMemoryExceedsReservation\",\"alertstate\":\"firing\",\"node\":\"master-0.ocp-ppc64le-test-080078.aus.stglabs.ibm.com\",\"prometheus\":\"openshift-monitoring/k8s\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"3600\"]},{\"metric\":{\"alertname\":\"SystemMemoryExceedsReservation\",\"alertstate\":\"firing\",\"node\":\"infnod-0.ocp-ppc64le-test-080078.aus.stglabs.ibm.com\",\"prometheus\":\"openshift-monitoring/k8s\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"3600\"]},{\"metric\":{\"alertname\":\"CannotRetrieveUpdates\",\"alertstate\":\"firing\",\"endpoint\":\"metrics\",\"instance\":\"192.168.79.24:9099\",\"job\":\"cluster-version-operator\",\"namespace\":\"openshift-cluster-version\",\"pod\":\"cluster-version-operator-7f48df7545-pzdt7\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"cluster-version-operator\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"3600\"]},{\"metric\":{\"alertname\":\"KubeDaemonSetMisScheduled\",\"alertstate\":\"firing\",\"container\":\"kube-rbac-proxy-main\",\"daemonset\":\"ingress-canary\",\"endpoint\":\"https-main\",\"job\":\"kube-state-metrics\",\"namespace\":\"openshift-ingress-canary\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"kube-state-metrics\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"3600\"]},{\"metric\":{\"alertname\":\"ClusterNotUpgradeable\",\"alertstate\":\"firing\",\"condition\":\"Upgradeable\",\"endpoint\":\"metrics\",\"name\":\"version\",\"prometheus\":\"openshift-monitoring/k8s\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"162\"]},{\"metric\":{\"alertname\":\"ClusterOperatorDown\",\"alertstate\":\"firing\",\"endpoint\":\"metrics\",\"instance\":\"192.168.79.24:9099\",\"job\":\"cluster-version-operator\",\"name\":\"machine-config\",\"namespace\":\"openshift-cluster-version\",\"pod\":\"cluster-version-operator-7f48df7545-pzdt7\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"cluster-version-operator\",\"severity\":\"critical\",\"version\":\"4.8.0-rc.0\"},\"value\":[1624370297.835,\"72\"]},{\"metric\":{\"alertname\":\"ClusterOperatorDegraded\",\"alertstate\":\"firing\",\"condition\":\"Degraded\",\"endpoint\":\"metrics\",\"instance\":\"192.168.79.24:9099\",\"job\":\"cluster-version-operator\",\"name\":\"machine-config\",\"namespace\":\"openshift-cluster-version\",\"pod\":\"cluster-version-operator-7f48df7545-pzdt7\",\"prometheus\":\"openshift-monitoring/k8s\",\"reason\":\"MachineConfigDaemonFailed\",\"service\":\"cluster-version-operator\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"72\"]},{\"metric\":{\"alertname\":\"ClusterOperatorDegraded\",\"alertstate\":\"firing\",\"condition\":\"Degraded\",\"endpoint\":\"metrics\",\"instance\":\"192.168.79.24:9099\",\"job\":\"cluster-version-operator\",\"name\":\"network\",\"namespace\":\"openshift-cluster-version\",\"pod\":\"cluster-version-operator-7f48df7545-pzdt7\",\"prometheus\":\"openshift-monitoring/k8s\",\"reason\":\"RolloutHung\",\"service\":\"cluster-version-operator\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"72\"]},{\"metric\":{\"alertname\":\"ClusterOperatorDegraded\",\"alertstate\":\"firing\",\"condition\":\"Degraded\",\"endpoint\":\"metrics\",\"instance\":\"192.168.79.24:9099\",\"job\":\"cluster-version-operator\",\"name\":\"openshift-apiserver\",\"namespace\":\"openshift-cluster-version\",\"pod\":\"cluster-version-operator-7f48df7545-pzdt7\",\"prometheus\":\"openshift-monitoring/k8s\",\"reason\":\"APIServerDeployment_UnavailablePod\",\"service\":\"cluster-version-operator\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"72\"]},{\"metric\":{\"alertname\":\"SDNPodNotReady\",\"alertstate\":\"firing\",\"condition\":\"true\",\"container\":\"kube-rbac-proxy-main\",\"endpoint\":\"https-main\",\"job\":\"kube-state-metrics\",\"namespace\":\"openshift-sdn\",\"pod\":\"sdn-ftb2l\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"kube-state-metrics\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"61\"]},{\"metric\":{\"alertname\":\"SDNPodNotReady\",\"alertstate\":\"firing\",\"condition\":\"true\",\"container\":\"kube-rbac-proxy-main\",\"endpoint\":\"https-main\",\"job\":\"kube-state-metrics\",\"namespace\":\"openshift-sdn\",\"pod\":\"sdn-controller-hcgzk\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"kube-state-metrics\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"61\"]},{\"metric\":{\"alertname\":\"KubeDaemonSetMisScheduled\",\"alertstate\":\"firing\",\"container\":\"kube-rbac-proxy-main\",\"daemonset\":\"sdn-controller\",\"endpoint\":\"https-main\",\"job\":\"kube-state-metrics\",\"namespace\":\"openshift-sdn\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"kube-state-metrics\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"59\"]},{\"metric\":{\"alertname\":\"KubePodNotReady\",\"alertstate\":\"firing\",\"namespace\":\"openshift-authentication\",\"pod\":\"oauth-openshift-7cf58f567d-4d262\",\"prometheus\":\"openshift-monitoring/k8s\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"59\"]},{\"metric\":{\"alertname\":\"KubePodNotReady\",\"alertstate\":\"firing\",\"namespace\":\"openshift-oauth-apiserver\",\"pod\":\"apiserver-68d6564447-mxn99\",\"prometheus\":\"openshift-monitoring/k8s\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"59\"]},{\"metric\":{\"alertname\":\"KubeDaemonSetMisScheduled\",\"alertstate\":\"firing\",\"container\":\"kube-rbac-proxy-main\",\"daemonset\":\"dns-default\",\"endpoint\":\"https-main\",\"job\":\"kube-state-metrics\",\"namespace\":\"openshift-dns\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"kube-state-metrics\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"59\"]},{\"metric\":{\"alertname\":\"PodDisruptionBudgetAtLimit\",\"alertstate\":\"firing\",\"namespace\":\"openshift-etcd\",\"poddisruptionbudget\":\"etcd-quorum-guard\",\"prometheus\":\"openshift-monitoring/k8s\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"59\"]},{\"metric\":{\"alertname\":\"KubePodNotReady\",\"alertstate\":\"firing\",\"namespace\":\"openshift-apiserver\",\"pod\":\"apiserver-6c879879c5-8hmgk\",\"prometheus\":\"openshift-monitoring/k8s\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"59\"]},{\"metric\":{\"alertname\":\"KubeDaemonSetMisScheduled\",\"alertstate\":\"firing\",\"container\":\"kube-rbac-proxy-main\",\"daemonset\":\"multus-admission-controller\",\"endpoint\":\"https-main\",\"job\":\"kube-state-metrics\",\"namespace\":\"openshift-multus\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"kube-state-metrics\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"59\"]},{\"metric\":{\"alertname\":\"KubeDaemonSetMisScheduled\",\"alertstate\":\"firing\",\"container\":\"kube-rbac-proxy-main\",\"daemonset\":\"machine-config-server\",\"endpoint\":\"https-main\",\"job\":\"kube-state-metrics\",\"namespace\":\"openshift-machine-config-operator\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"kube-state-metrics\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"59\"]},{\"metric\":{\"alertname\":\"etcdMembersDown\",\"alertstate\":\"firing\",\"job\":\"etcd\",\"namespace\":\"openshift-etcd\",\"pod\":\"etcd-master-2.ocp-ppc64le-test-080078.aus.stglabs.ibm.com\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"etcd\",\"severity\":\"critical\"},\"value\":[1624370297.835,\"57\"]},{\"metric\":{\"alertname\":\"KubeNodeNotReady\",\"alertstate\":\"firing\",\"condition\":\"Ready\",\"container\":\"kube-rbac-proxy-main\",\"endpoint\":\"https-main\",\"job\":\"kube-state-metrics\",\"namespace\":\"openshift-monitoring\",\"node\":\"master-2.ocp-ppc64le-test-080078.aus.stglabs.ibm.com\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"kube-state-metrics\",\"severity\":\"warning\",\"status\":\"true\"},\"value\":[1624370297.835,\"53\"]},{\"metric\":{\"alertname\":\"KubeNodeUnreachable\",\"alertstate\":\"firing\",\"container\":\"kube-rbac-proxy-main\",\"effect\":\"NoSchedule\",\"endpoint\":\"https-main\",\"job\":\"kube-state-metrics\",\"key\":\"node.kubernetes.io/unreachable\",\"namespace\":\"openshift-monitoring\",\"node\":\"master-2.ocp-ppc64le-test-080078.aus.stglabs.ibm.com\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"kube-state-metrics\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"53\"]},{\"metric\":{\"alertname\":\"TargetDown\",\"alertstate\":\"firing\",\"job\":\"scheduler\",\"namespace\":\"openshift-kube-scheduler\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"scheduler\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"48\"]},{\"metric\":{\"alertname\":\"TargetDown\",\"alertstate\":\"firing\",\"job\":\"machine-config-daemon\",\"namespace\":\"openshift-machine-config-operator\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"machine-config-daemon\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"48\"]},{\"metric\":{\"alertname\":\"TargetDown\",\"alertstate\":\"firing\",\"job\":\"kube-controller-manager\",\"namespace\":\"openshift-kube-controller-manager\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"kube-controller-manager\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"48\"]},{\"metric\":{\"alertname\":\"TargetDown\",\"alertstate\":\"firing\",\"job\":\"etcd\",\"namespace\":\"openshift-etcd\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"etcd\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"48\"]},{\"metric\":{\"alertname\":\"TargetDown\",\"alertstate\":\"firing\",\"job\":\"dns-default\",\"namespace\":\"openshift-dns\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"dns-default\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"48\"]},{\"metric\":{\"alertname\":\"TargetDown\",\"alertstate\":\"firing\",\"job\":\"controller-manager\",\"namespace\":\"openshift-controller-manager\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"controller-manager\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"48\"]},{\"metric\":{\"alertname\":\"TargetDown\",\"alertstate\":\"firing\",\"job\":\"multus-admission-controller\",\"namespace\":\"openshift-multus\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"multus-admission-controller\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"48\"]},{\"metric\":{\"alertname\":\"NTOPodsNotReady\",\"alertstate\":\"firing\",\"condition\":\"true\",\"container\":\"kube-rbac-proxy-main\",\"endpoint\":\"https-main\",\"job\":\"kube-state-metrics\",\"namespace\":\"openshift-cluster-node-tuning-operator\",\"pod\":\"tuned-qfb6t\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"kube-state-metrics\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"46\"]},{\"metric\":{\"alertname\":\"ClusterOperatorDown\",\"alertstate\":\"firing\",\"endpoint\":\"metrics\",\"instance\":\"192.168.79.24:9099\",\"job\":\"cluster-version-operator\",\"name\":\"monitoring\",\"namespace\":\"openshift-cluster-version\",\"pod\":\"cluster-version-operator-7f48df7545-pzdt7\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"cluster-version-operator\",\"severity\":\"critical\",\"version\":\"4.8.0-rc.0\"},\"value\":[1624370297.835,\"42\"]},{\"metric\":{\"alertname\":\"ClusterOperatorDown\",\"alertstate\":\"firing\",\"endpoint\":\"metrics\",\"instance\":\"192.168.79.24:9099\",\"job\":\"cluster-version-operator\",\"name\":\"authentication\",\"namespace\":\"openshift-cluster-version\",\"pod\":\"cluster-version-operator-7f48df7545-pzdt7\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"cluster-version-operator\",\"severity\":\"critical\",\"version\":\"4.8.0-rc.0\"},\"value\":[1624370297.835,\"42\"]},{\"metric\":{\"alertname\":\"ClusterOperatorDegraded\",\"alertstate\":\"firing\",\"condition\":\"Degraded\",\"endpoint\":\"metrics\",\"instance\":\"192.168.79.24:9099\",\"job\":\"cluster-version-operator\",\"name\":\"kube-apiserver\",\"namespace\":\"openshift-cluster-version\",\"pod\":\"cluster-version-operator-7f48df7545-pzdt7\",\"prometheus\":\"openshift-monitoring/k8s\",\"reason\":\"NodeController_MasterNodesReady\",\"service\":\"cluster-version-operator\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"42\"]},{\"metric\":{\"alertname\":\"ClusterOperatorDegraded\",\"alertstate\":\"firing\",\"condition\":\"Degraded\",\"endpoint\":\"metrics\",\"instance\":\"192.168.79.24:9099\",\"job\":\"cluster-version-operator\",\"name\":\"kube-scheduler\",\"namespace\":\"openshift-cluster-version\",\"pod\":\"cluster-version-operator-7f48df7545-pzdt7\",\"prometheus\":\"openshift-monitoring/k8s\",\"reason\":\"NodeController_MasterNodesReady\",\"service\":\"cluster-version-operator\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"42\"]},{\"metric\":{\"alertname\":\"ClusterOperatorDegraded\",\"alertstate\":\"firing\",\"condition\":\"Degraded\",\"endpoint\":\"metrics\",\"instance\":\"192.168.79.24:9099\",\"job\":\"cluster-version-operator\",\"name\":\"monitoring\",\"namespace\":\"openshift-cluster-version\",\"pod\":\"cluster-version-operator-7f48df7545-pzdt7\",\"prometheus\":\"openshift-monitoring/k8s\",\"reason\":\"UpdatingnodeExporterFailed\",\"service\":\"cluster-version-operator\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"42\"]},{\"metric\":{\"alertname\":\"ClusterOperatorDegraded\",\"alertstate\":\"firing\",\"condition\":\"Degraded\",\"endpoint\":\"metrics\",\"instance\":\"192.168.79.24:9099\",\"job\":\"cluster-version-operator\",\"name\":\"authentication\",\"namespace\":\"openshift-cluster-version\",\"pod\":\"cluster-version-operator-7f48df7545-pzdt7\",\"prometheus\":\"openshift-monitoring/k8s\",\"reason\":\"APIServerDeployment_UnavailablePod::OAuthServerDeployment_UnavailablePod::WellKnownReadyController_SyncError\",\"service\":\"cluster-version-operator\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"42\"]},{\"metric\":{\"alertname\":\"ClusterOperatorDegraded\",\"alertstate\":\"firing\",\"condition\":\"Degraded\",\"endpoint\":\"metrics\",\"instance\":\"192.168.79.24:9099\",\"job\":\"cluster-version-operator\",\"name\":\"etcd\",\"namespace\":\"openshift-cluster-version\",\"pod\":\"cluster-version-operator-7f48df7545-pzdt7\",\"prometheus\":\"openshift-monitoring/k8s\",\"reason\":\"EtcdMembers_UnhealthyMembers::NodeController_MasterNodesReady\",\"service\":\"cluster-version-operator\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"42\"]},{\"metric\":{\"alertname\":\"ClusterMonitoringOperatorReconciliationErrors\",\"alertstate\":\"firing\",\"prometheus\":\"openshift-monitoring/k8s\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"35\"]},{\"metric\":{\"alertname\":\"TargetDown\",\"alertstate\":\"firing\",\"job\":\"crio\",\"namespace\":\"kube-system\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"kubelet\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"18\"]},{\"metric\":{\"alertname\":\"TargetDown\",\"alertstate\":\"firing\",\"job\":\"network-metrics-service\",\"namespace\":\"openshift-multus\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"network-metrics-service\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"18\"]},{\"metric\":{\"alertname\":\"TargetDown\",\"alertstate\":\"firing\",\"job\":\"node-exporter\",\"namespace\":\"openshift-monitoring\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"node-exporter\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"18\"]},{\"metric\":{\"alertname\":\"TargetDown\",\"alertstate\":\"firing\",\"job\":\"sdn\",\"namespace\":\"openshift-sdn\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"sdn\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"18\"]},{\"metric\":{\"alertname\":\"TargetDown\",\"alertstate\":\"firing\",\"job\":\"kubelet\",\"namespace\":\"kube-system\",\"prometheus\":\"openshift-monitoring/k8s\",\"service\":\"kubelet\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"18\"]},{\"metric\":{\"alertname\":\"ClusterOperatorDegraded\",\"alertstate\":\"firing\",\"condition\":\"Degraded\",\"endpoint\":\"metrics\",\"instance\":\"192.168.79.24:9099\",\"job\":\"cluster-version-operator\",\"name\":\"kube-controller-manager\",\"namespace\":\"openshift-cluster-version\",\"pod\":\"cluster-version-operator-7f48df7545-pzdt7\",\"prometheus\":\"openshift-monitoring/k8s\",\"reason\":\"NodeController_MasterNodesReady\",\"service\":\"cluster-version-operator\",\"severity\":\"warning\"},\"value\":[1624370297.835,\"12\"]}]}}\n" Jun 22 09:58:17.878: INFO: Running '/usr/local/bin/kubectl --server=https://api.ocp-ppc64le-test-080078.aus.stglabs.ibm.com:6443 --kubeconfig=/root/ocp4-workdir/auth/kubeconfig --namespace=e2e-test-prometheus-6vkg9 exec execpod -- /bin/sh -x -c curl --retry 15 --max-time 2 --retry-delay 1 -s -k -H 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6Ik5XakczTkd3dGdGbml4N1U2U0JhUGh5ekhZNDg1TTdRZU5YUEtGRU9SZDQifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbW9uaXRvcmluZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLWFkYXB0ZXItdG9rZW4tNWNicngiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cy1hZGFwdGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiNTcyYjhkYTktMTMwZC00NDY1LWI0ZWEtMzcyNTNhODA0ZmY2Iiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1tb25pdG9yaW5nOnByb21ldGhldXMtYWRhcHRlciJ9.Nhmyy5Ln9X03y1N7VwdmwfZWHc0M_i-OHdPOw--xC3LPs-8bc6z9R1sDlGZnr_7qp-3oIqmFshs_oRI1LpzNRfpC53C0DLh0GVFO1P7MTiSqi06UKMm0Fj62_4u6NJ39CsQD1uF7RiGLFzyWzcw5pv_U7GviYmlFpQPixe1WvmO9_55ckjk94LqpSDZb198o4MvaoL1Yh6qOIpxG0k6_EiMdksqfBemXW6sR7RBx1jJ97CmxPNvMRSLNpnbQUPcgYXKK9L5Lkt5MeWL7vsh-o_F0oYth2jH1xzuCpoOvd6374tTwMDUhrFgfKoKcnX8pCHSVCYB9Ig8p2wK_LmmP5A' "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=ALERTS%7Balertname%21~%22Watchdog%7CAlertmanagerReceiversNotConfigured%22%2Calertstate%3D%22pending%22%2Cseverity%21%3D%22info%22%7D"' Jun 22 09:58:18.351: INFO: stderr: "+ curl --retry 15 --max-time 2 --retry-delay 1 -s -k -H 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6Ik5XakczTkd3dGdGbml4N1U2U0JhUGh5ekhZNDg1TTdRZU5YUEtGRU9SZDQifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbW9uaXRvcmluZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLWFkYXB0ZXItdG9rZW4tNWNicngiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cy1hZGFwdGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiNTcyYjhkYTktMTMwZC00NDY1LWI0ZWEtMzcyNTNhODA0ZmY2Iiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1tb25pdG9yaW5nOnByb21ldGhldXMtYWRhcHRlciJ9.Nhmyy5Ln9X03y1N7VwdmwfZWHc0M_i-OHdPOw--xC3LPs-8bc6z9R1sDlGZnr_7qp-3oIqmFshs_oRI1LpzNRfpC53C0DLh0GVFO1P7MTiSqi06UKMm0Fj62_4u6NJ39CsQD1uF7RiGLFzyWzcw5pv_U7GviYmlFpQPixe1WvmO9_55ckjk94LqpSDZb198o4MvaoL1Yh6qOIpxG0k6_EiMdksqfBemXW6sR7RBx1jJ97CmxPNvMRSLNpnbQUPcgYXKK9L5Lkt5MeWL7vsh-o_F0oYth2jH1xzuCpoOvd6374tTwMDUhrFgfKoKcnX8pCHSVCYB9Ig8p2wK_LmmP5A' 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=ALERTS%7Balertname%21~%22Watchdog%7CAlertmanagerReceiversNotConfigured%22%2Calertstate%3D%22pending%22%2Cseverity%21%3D%22info%22%7D'\n" Jun 22 09:58:18.351: INFO: stdout: "{\"status\":\"success\",\"data\":{\"resultType\":\"vector\",\"result\":[]}}\n" Jun 22 09:58:18.353: FAIL: Unexpected alerts fired or pending after the test run: alert CannotRetrieveUpdates fired for 3600 seconds with labels: {endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", service="cluster-version-operator", severity="warning"} alert ClusterMonitoringOperatorReconciliationErrors fired for 35 seconds with labels: {severity="warning"} alert ClusterNotUpgradeable fired for 162 seconds with labels: {condition="Upgradeable", endpoint="metrics", name="version", severity="warning"} alert ClusterOperatorDegraded fired for 12 seconds with labels: {condition="Degraded", endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="kube-controller-manager", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", reason="NodeController_MasterNodesReady", service="cluster-version-operator", severity="warning"} alert ClusterOperatorDegraded fired for 42 seconds with labels: {condition="Degraded", endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="etcd", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", reason="EtcdMembers_UnhealthyMembers::NodeController_MasterNodesReady", service="cluster-version-operator", severity="warning"} alert ClusterOperatorDegraded fired for 42 seconds with labels: {condition="Degraded", endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="kube-apiserver", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", reason="NodeController_MasterNodesReady", service="cluster-version-operator", severity="warning"} alert ClusterOperatorDegraded fired for 42 seconds with labels: {condition="Degraded", endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="kube-scheduler", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", reason="NodeController_MasterNodesReady", service="cluster-version-operator", severity="warning"} alert ClusterOperatorDegraded fired for 42 seconds with labels: {condition="Degraded", endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="monitoring", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", reason="UpdatingnodeExporterFailed", service="cluster-version-operator", severity="warning"} alert ClusterOperatorDegraded fired for 72 seconds with labels: {condition="Degraded", endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="machine-config", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", reason="MachineConfigDaemonFailed", service="cluster-version-operator", severity="warning"} alert ClusterOperatorDegraded fired for 72 seconds with labels: {condition="Degraded", endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="network", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", reason="RolloutHung", service="cluster-version-operator", severity="warning"} alert ClusterOperatorDegraded fired for 72 seconds with labels: {condition="Degraded", endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="openshift-apiserver", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", reason="APIServerDeployment_UnavailablePod", service="cluster-version-operator", severity="warning"} alert ClusterOperatorDown fired for 42 seconds with labels: {endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="monitoring", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", service="cluster-version-operator", severity="critical", version="4.8.0-rc.0"} alert ClusterOperatorDown fired for 72 seconds with labels: {endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="machine-config", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", service="cluster-version-operator", severity="critical", version="4.8.0-rc.0"} alert KubeDaemonSetMisScheduled fired for 3600 seconds with labels: {container="kube-rbac-proxy-main", daemonset="ingress-canary", endpoint="https-main", job="kube-state-metrics", namespace="openshift-ingress-canary", service="kube-state-metrics", severity="warning"} alert KubeDaemonSetMisScheduled fired for 59 seconds with labels: {container="kube-rbac-proxy-main", daemonset="dns-default", endpoint="https-main", job="kube-state-metrics", namespace="openshift-dns", service="kube-state-metrics", severity="warning"} alert KubeDaemonSetMisScheduled fired for 59 seconds with labels: {container="kube-rbac-proxy-main", daemonset="machine-config-server", endpoint="https-main", job="kube-state-metrics", namespace="openshift-machine-config-operator", service="kube-state-metrics", severity="warning"} alert KubeDaemonSetMisScheduled fired for 59 seconds with labels: {container="kube-rbac-proxy-main", daemonset="multus-admission-controller", endpoint="https-main", job="kube-state-metrics", namespace="openshift-multus", service="kube-state-metrics", severity="warning"} alert KubeDaemonSetMisScheduled fired for 59 seconds with labels: {container="kube-rbac-proxy-main", daemonset="sdn-controller", endpoint="https-main", job="kube-state-metrics", namespace="openshift-sdn", service="kube-state-metrics", severity="warning"} alert KubeNodeNotReady fired for 53 seconds with labels: {condition="Ready", container="kube-rbac-proxy-main", endpoint="https-main", job="kube-state-metrics", namespace="openshift-monitoring", node="master-2.ocp-ppc64le-test-080078.aus.stglabs.ibm.com", service="kube-state-metrics", severity="warning", status="true"} alert KubeNodeUnreachable fired for 53 seconds with labels: {container="kube-rbac-proxy-main", effect="NoSchedule", endpoint="https-main", job="kube-state-metrics", key="node.kubernetes.io/unreachable", namespace="openshift-monitoring", node="master-2.ocp-ppc64le-test-080078.aus.stglabs.ibm.com", service="kube-state-metrics", severity="warning"} alert KubePodNotReady fired for 59 seconds with labels: {namespace="openshift-apiserver", pod="apiserver-6c879879c5-8hmgk", severity="warning"} alert KubePodNotReady fired for 59 seconds with labels: {namespace="openshift-authentication", pod="oauth-openshift-7cf58f567d-4d262", severity="warning"} alert KubePodNotReady fired for 59 seconds with labels: {namespace="openshift-oauth-apiserver", pod="apiserver-68d6564447-mxn99", severity="warning"} alert NTOPodsNotReady fired for 46 seconds with labels: {condition="true", container="kube-rbac-proxy-main", endpoint="https-main", job="kube-state-metrics", namespace="openshift-cluster-node-tuning-operator", pod="tuned-qfb6t", service="kube-state-metrics", severity="warning"} alert PodDisruptionBudgetAtLimit fired for 59 seconds with labels: {namespace="openshift-etcd", poddisruptionbudget="etcd-quorum-guard", severity="warning"} alert SDNPodNotReady fired for 61 seconds with labels: {condition="true", container="kube-rbac-proxy-main", endpoint="https-main", job="kube-state-metrics", namespace="openshift-sdn", pod="sdn-controller-hcgzk", service="kube-state-metrics", severity="warning"} alert SDNPodNotReady fired for 61 seconds with labels: {condition="true", container="kube-rbac-proxy-main", endpoint="https-main", job="kube-state-metrics", namespace="openshift-sdn", pod="sdn-ftb2l", service="kube-state-metrics", severity="warning"} alert SystemMemoryExceedsReservation fired for 3600 seconds with labels: {node="infnod-0.ocp-ppc64le-test-080078.aus.stglabs.ibm.com", severity="warning"} alert SystemMemoryExceedsReservation fired for 3600 seconds with labels: {node="master-0.ocp-ppc64le-test-080078.aus.stglabs.ibm.com", severity="warning"} alert SystemMemoryExceedsReservation fired for 3600 seconds with labels: {node="master-1.ocp-ppc64le-test-080078.aus.stglabs.ibm.com", severity="warning"} alert TargetDown fired for 18 seconds with labels: {job="crio", namespace="kube-system", service="kubelet", severity="warning"} alert TargetDown fired for 18 seconds with labels: {job="kubelet", namespace="kube-system", service="kubelet", severity="warning"} alert TargetDown fired for 18 seconds with labels: {job="network-metrics-service", namespace="openshift-multus", service="network-metrics-service", severity="warning"} alert TargetDown fired for 18 seconds with labels: {job="node-exporter", namespace="openshift-monitoring", service="node-exporter", severity="warning"} alert TargetDown fired for 18 seconds with labels: {job="sdn", namespace="openshift-sdn", service="sdn", severity="warning"} alert TargetDown fired for 48 seconds with labels: {job="controller-manager", namespace="openshift-controller-manager", service="controller-manager", severity="warning"} alert TargetDown fired for 48 seconds with labels: {job="dns-default", namespace="openshift-dns", service="dns-default", severity="warning"} alert TargetDown fired for 48 seconds with labels: {job="etcd", namespace="openshift-etcd", service="etcd", severity="warning"} alert TargetDown fired for 48 seconds with labels: {job="kube-controller-manager", namespace="openshift-kube-controller-manager", service="kube-controller-manager", severity="warning"} alert TargetDown fired for 48 seconds with labels: {job="machine-config-daemon", namespace="openshift-machine-config-operator", service="machine-config-daemon", severity="warning"} alert TargetDown fired for 48 seconds with labels: {job="multus-admission-controller", namespace="openshift-multus", service="multus-admission-controller", severity="warning"} alert TargetDown fired for 48 seconds with labels: {job="scheduler", namespace="openshift-kube-scheduler", service="scheduler", severity="warning"} alert etcdMembersDown fired for 57 seconds with labels: {job="etcd", namespace="openshift-etcd", pod="etcd-master-2.ocp-ppc64le-test-080078.aus.stglabs.ibm.com", service="etcd", severity="critical"} Full Stack Trace github.com/onsi/ginkgo/internal/leafnodes.(*runner).runSync(0xc0020fefc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...) github.com/onsi/ginkgo.0-origin.0+incompatible/internal/leafnodes/runner.go:113 +0xb8 github.com/onsi/ginkgo/internal/leafnodes.(*runner).run(0xc0020fefc0, 0xc0027c06c0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...) github.com/onsi/ginkgo.0-origin.0+incompatible/internal/leafnodes/runner.go:64 +0x180 github.com/onsi/ginkgo/internal/leafnodes.(*ItNode).Run(0xc001f29820, 0x1213aee48, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...) github.com/onsi/ginkgo.0-origin.0+incompatible/internal/leafnodes/it_node.go:26 +0x98 github.com/onsi/ginkgo/internal/spec.(*Spec).runSample(0xc0033c12c0, 0x0, 0x1213aee48, 0xc0004c1f80) github.com/onsi/ginkgo.0-origin.0+incompatible/internal/spec/spec.go:215 +0x22c github.com/onsi/ginkgo/internal/spec.(*Spec).Run(0xc0033c12c0, 0x1213aee48, 0xc0004c1f80) github.com/onsi/ginkgo.0-origin.0+incompatible/internal/spec/spec.go:138 +0x110 github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpec(0xc0023f2140, 0xc0033c12c0, 0x0) github.com/onsi/ginkgo.0-origin.0+incompatible/internal/specrunner/spec_runner.go:200 +0x100 github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).runSpecs(0xc0023f2140, 0x1) github.com/onsi/ginkgo.0-origin.0+incompatible/internal/specrunner/spec_runner.go:170 +0x148 github.com/onsi/ginkgo/internal/specrunner.(*SpecRunner).Run(0xc0023f2140, 0xc001357708) github.com/onsi/ginkgo.0-origin.0+incompatible/internal/specrunner/spec_runner.go:66 +0x118 github.com/onsi/ginkgo/internal/suite.(*Suite).Run(0xc0004899a0, 0x1213af108, 0xc003021b30, 0x0, 0x0, 0xc0023dc3b0, 0x1, 0x1, 0x121480ad8, 0xc0004c1f80, ...) github.com/onsi/ginkgo.0-origin.0+incompatible/internal/suite/suite.go:62 +0x378 github.com/openshift/origin/pkg/test/ginkgo.(*TestOptions).Run(0xc0006b9b30, 0xc001261790, 0x1, 0x1, 0x123b17f00, 0x11d5b72b0) github.com/openshift/origin/pkg/test/ginkgo/cmd_runtest.go:61 +0x390 main.newRunTestCommand.func1.1() github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:426 +0x60 github.com/openshift/origin/test/extended/util.WithCleanup(0xc0023bfbb0) github.com/openshift/origin/test/extended/util/test.go:167 +0x80 main.newRunTestCommand.func1(0xc00291eb00, 0xc001261790, 0x1, 0x1, 0x0, 0x0) github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:426 +0x2d4 github.com/spf13/cobra.(*Command).execute(0xc00291eb00, 0xc0012616b0, 0x1, 0x1, 0xc00291eb00, 0xc0012616b0) github.com/spf13/cobra.1/command.go:850 +0x3d0 github.com/spf13/cobra.(*Command).ExecuteC(0xc00291e000, 0x0, 0x1213b66c8, 0x123f31978) github.com/spf13/cobra.1/command.go:958 +0x2b4 github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra.1/command.go:895 main.main.func1(0xc00291e000, 0x0, 0x0) github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:84 +0xa0 main.main() github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:85 +0x3b4 [AfterEach] [sig-instrumentation][Late] Alerts github.com/openshift/origin/test/extended/util/client.go:140 STEP: Collecting events from namespace "e2e-test-prometheus-6vkg9". STEP: Found 6 events. Jun 22 09:58:18.373: INFO: At 0001-01-01 00:00:00 +0000 UTC - event for execpod: { } Scheduled: Successfully assigned e2e-test-prometheus-6vkg9/execpod to infnod-1.ocp-ppc64le-test-080078.aus.stglabs.ibm.com Jun 22 09:58:18.373: INFO: At 2021-06-22 09:58:10 -0400 EDT - event for e2e-test-prometheus-6vkg9: {namespace-security-allocation-controller } CreatedSCCRanges: created SCC ranges Jun 22 09:58:18.373: INFO: At 2021-06-22 09:58:13 -0400 EDT - event for execpod: {multus } AddedInterface: Add eth0 [10.128.3.3/23] from openshift-sdn Jun 22 09:58:18.373: INFO: At 2021-06-22 09:58:13 -0400 EDT - event for execpod: {kubelet infnod-1.ocp-ppc64le-test-080078.aus.stglabs.ibm.com} Pulled: Container image "image-registry.openshift-image-registry.svc:5000/openshift/tools:latest" already present on machine Jun 22 09:58:18.373: INFO: At 2021-06-22 09:58:14 -0400 EDT - event for execpod: {kubelet infnod-1.ocp-ppc64le-test-080078.aus.stglabs.ibm.com} Created: Created container agnhost-container Jun 22 09:58:18.373: INFO: At 2021-06-22 09:58:14 -0400 EDT - event for execpod: {kubelet infnod-1.ocp-ppc64le-test-080078.aus.stglabs.ibm.com} Started: Started container agnhost-container Jun 22 09:58:18.377: INFO: POD NODE PHASE GRACE CONDITIONS Jun 22 09:58:18.377: INFO: execpod infnod-1.ocp-ppc64le-test-080078.aus.stglabs.ibm.com Running 1s [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2021-06-22 09:58:10 -0400 EDT } {Ready True 0001-01-01 00:00:00 +0000 UTC 2021-06-22 09:58:14 -0400 EDT } {ContainersReady True 0001-01-01 00:00:00 +0000 UTC 2021-06-22 09:58:14 -0400 EDT } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2021-06-22 09:58:10 -0400 EDT }] Jun 22 09:58:18.377: INFO: Jun 22 09:58:18.384: INFO: skipping dumping cluster info - cluster too large [AfterEach] [sig-instrumentation][Late] Alerts github.com/openshift/origin/test/extended/util/client.go:141 STEP: Destroying namespace "e2e-test-prometheus-6vkg9" for this suite. fail [github.com/onsi/ginkgo.0-origin.0+incompatible/internal/leafnodes/runner.go:113]: Jun 22 09:58:18.353: Unexpected alerts fired or pending after the test run: alert CannotRetrieveUpdates fired for 3600 seconds with labels: {endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", service="cluster-version-operator", severity="warning"} alert ClusterMonitoringOperatorReconciliationErrors fired for 35 seconds with labels: {severity="warning"} alert ClusterNotUpgradeable fired for 162 seconds with labels: {condition="Upgradeable", endpoint="metrics", name="version", severity="warning"} alert ClusterOperatorDegraded fired for 12 seconds with labels: {condition="Degraded", endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="kube-controller-manager", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", reason="NodeController_MasterNodesReady", service="cluster-version-operator", severity="warning"} alert ClusterOperatorDegraded fired for 42 seconds with labels: {condition="Degraded", endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="etcd", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", reason="EtcdMembers_UnhealthyMembers::NodeController_MasterNodesReady", service="cluster-version-operator", severity="warning"} alert ClusterOperatorDegraded fired for 42 seconds with labels: {condition="Degraded", endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="kube-apiserver", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", reason="NodeController_MasterNodesReady", service="cluster-version-operator", severity="warning"} alert ClusterOperatorDegraded fired for 42 seconds with labels: {condition="Degraded", endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="kube-scheduler", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", reason="NodeController_MasterNodesReady", service="cluster-version-operator", severity="warning"} alert ClusterOperatorDegraded fired for 42 seconds with labels: {condition="Degraded", endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="monitoring", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", reason="UpdatingnodeExporterFailed", service="cluster-version-operator", severity="warning"} alert ClusterOperatorDegraded fired for 72 seconds with labels: {condition="Degraded", endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="machine-config", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", reason="MachineConfigDaemonFailed", service="cluster-version-operator", severity="warning"} alert ClusterOperatorDegraded fired for 72 seconds with labels: {condition="Degraded", endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="network", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", reason="RolloutHung", service="cluster-version-operator", severity="warning"} alert ClusterOperatorDegraded fired for 72 seconds with labels: {condition="Degraded", endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="openshift-apiserver", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", reason="APIServerDeployment_UnavailablePod", service="cluster-version-operator", severity="warning"} alert ClusterOperatorDown fired for 42 seconds with labels: {endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="monitoring", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", service="cluster-version-operator", severity="critical", version="4.8.0-rc.0"} alert ClusterOperatorDown fired for 72 seconds with labels: {endpoint="metrics", instance="192.168.79.24:9099", job="cluster-version-operator", name="machine-config", namespace="openshift-cluster-version", pod="cluster-version-operator-7f48df7545-pzdt7", service="cluster-version-operator", severity="critical", version="4.8.0-rc.0"} alert KubeDaemonSetMisScheduled fired for 3600 seconds with labels: {container="kube-rbac-proxy-main", daemonset="ingress-canary", endpoint="https-main", job="kube-state-metrics", namespace="openshift-ingress-canary", service="kube-state-metrics", severity="warning"} alert KubeDaemonSetMisScheduled fired for 59 seconds with labels: {container="kube-rbac-proxy-main", daemonset="dns-default", endpoint="https-main", job="kube-state-metrics", namespace="openshift-dns", service="kube-state-metrics", severity="warning"} alert KubeDaemonSetMisScheduled fired for 59 seconds with labels: {container="kube-rbac-proxy-main", daemonset="machine-config-server", endpoint="https-main", job="kube-state-metrics", namespace="openshift-machine-config-operator", service="kube-state-metrics", severity="warning"} alert KubeDaemonSetMisScheduled fired for 59 seconds with labels: {container="kube-rbac-proxy-main", daemonset="multus-admission-controller", endpoint="https-main", job="kube-state-metrics", namespace="openshift-multus", service="kube-state-metrics", severity="warning"} alert KubeDaemonSetMisScheduled fired for 59 seconds with labels: {container="kube-rbac-proxy-main", daemonset="sdn-controller", endpoint="https-main", job="kube-state-metrics", namespace="openshift-sdn", service="kube-state-metrics", severity="warning"} alert KubeNodeNotReady fired for 53 seconds with labels: {condition="Ready", container="kube-rbac-proxy-main", endpoint="https-main", job="kube-state-metrics", namespace="openshift-monitoring", node="master-2.ocp-ppc64le-test-080078.aus.stglabs.ibm.com", service="kube-state-metrics", severity="warning", status="true"} alert KubeNodeUnreachable fired for 53 seconds with labels: {container="kube-rbac-proxy-main", effect="NoSchedule", endpoint="https-main", job="kube-state-metrics", key="node.kubernetes.io/unreachable", namespace="openshift-monitoring", node="master-2.ocp-ppc64le-test-080078.aus.stglabs.ibm.com", service="kube-state-metrics", severity="warning"} alert KubePodNotReady fired for 59 seconds with labels: {namespace="openshift-apiserver", pod="apiserver-6c879879c5-8hmgk", severity="warning"} alert KubePodNotReady fired for 59 seconds with labels: {namespace="openshift-authentication", pod="oauth-openshift-7cf58f567d-4d262", severity="warning"} alert KubePodNotReady fired for 59 seconds with labels: {namespace="openshift-oauth-apiserver", pod="apiserver-68d6564447-mxn99", severity="warning"} alert NTOPodsNotReady fired for 46 seconds with labels: {condition="true", container="kube-rbac-proxy-main", endpoint="https-main", job="kube-state-metrics", namespace="openshift-cluster-node-tuning-operator", pod="tuned-qfb6t", service="kube-state-metrics", severity="warning"} alert PodDisruptionBudgetAtLimit fired for 59 seconds with labels: {namespace="openshift-etcd", poddisruptionbudget="etcd-quorum-guard", severity="warning"} alert SDNPodNotReady fired for 61 seconds with labels: {condition="true", container="kube-rbac-proxy-main", endpoint="https-main", job="kube-state-metrics", namespace="openshift-sdn", pod="sdn-controller-hcgzk", service="kube-state-metrics", severity="warning"} alert SDNPodNotReady fired for 61 seconds with labels: {condition="true", container="kube-rbac-proxy-main", endpoint="https-main", job="kube-state-metrics", namespace="openshift-sdn", pod="sdn-ftb2l", service="kube-state-metrics", severity="warning"} alert SystemMemoryExceedsReservation fired for 3600 seconds with labels: {node="infnod-0.ocp-ppc64le-test-080078.aus.stglabs.ibm.com", severity="warning"} alert SystemMemoryExceedsReservation fired for 3600 seconds with labels: {node="master-0.ocp-ppc64le-test-080078.aus.stglabs.ibm.com", severity="warning"} alert SystemMemoryExceedsReservation fired for 3600 seconds with labels: {node="master-1.ocp-ppc64le-test-080078.aus.stglabs.ibm.com", severity="warning"} alert TargetDown fired for 18 seconds with labels: {job="crio", namespace="kube-system", service="kubelet", severity="warning"} alert TargetDown fired for 18 seconds with labels: {job="kubelet", namespace="kube-system", service="kubelet", severity="warning"} alert TargetDown fired for 18 seconds with labels: {job="network-metrics-service", namespace="openshift-multus", service="network-metrics-service", severity="warning"} alert TargetDown fired for 18 seconds with labels: {job="node-exporter", namespace="openshift-monitoring", service="node-exporter", severity="warning"} alert TargetDown fired for 18 seconds with labels: {job="sdn", namespace="openshift-sdn", service="sdn", severity="warning"} alert TargetDown fired for 48 seconds with labels: {job="controller-manager", namespace="openshift-controller-manager", service="controller-manager", severity="warning"} alert TargetDown fired for 48 seconds with labels: {job="dns-default", namespace="openshift-dns", service="dns-default", severity="warning"} alert TargetDown fired for 48 seconds with labels: {job="etcd", namespace="openshift-etcd", service="etcd", severity="warning"} alert TargetDown fired for 48 seconds with labels: {job="kube-controller-manager", namespace="openshift-kube-controller-manager", service="kube-controller-manager", severity="warning"} alert TargetDown fired for 48 seconds with labels: {job="machine-config-daemon", namespace="openshift-machine-config-operator", service="machine-config-daemon", severity="warning"} alert TargetDown fired for 48 seconds with labels: {job="multus-admission-controller", namespace="openshift-multus", service="multus-admission-controller", severity="warning"} alert TargetDown fired for 48 seconds with labels: {job="scheduler", namespace="openshift-kube-scheduler", service="scheduler", severity="warning"} alert etcdMembersDown fired for 57 seconds with labels: {job="etcd", namespace="openshift-etcd", pod="etcd-master-2.ocp-ppc64le-test-080078.aus.stglabs.ibm.com", service="etcd", severity="critical"}
Hi Tania, The issue you linked is still in flight (should land in kube 1.22), that said those are seen as "Information" and whilst they do generate noise in the logs, it shouldn't be the cause of this issue, as it won't count towards the Error budget. The KubeAPIErrorBudgetBurn is 2 things - errors, and/or slow queries. You can check in the console under "Monitoring --> Dashboards" and the API Performance dashboard will give you a rundown of request statuses (confirming there are not many errors - at least in the instances I have checked), and how many "long running" queries there are. Since it looks like there are no outright errors it means this is caused by slow queries - Mark confirmed that on the baremetal deploys they didn't see this alert, and we haven't seen this on other architectures (s390x/x86_64) - it looks like it is specific to the test/CI infra performance. I'd suspect disk speed as a start point (since the bulk of slow queries I saw were configMap, which should use etcd, and etcd is disk intensive) and it may be worth checking that vs etcd.
@tania: Can you leave the cluster in this state and give me access?
Here is the etcdctl output from @tkapoor's cluster. [root@rdr-tanumig-mon01-bastion-0 ~]# oc rsh etcd-rdr-tanumig-mon01-master-0 Defaulted container "etcdctl" out of: etcdctl, etcd, etcd-metrics, etcd-health-monitor, setup (init), etcd-ensure-env-vars (init), etcd-resources-copy (init) sh-4.4# etcdctl check perf --load="s" 60 / 60 Boooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00% 1m0s PASS: Throughput is 150 writes/s PASS: Slowest request took 0.042616s PASS: Stddev is 0.002902s PASS sh-4.4# sh-4.4# set -o vi sh-4.4# etcdctl check perf --load="m" 60 / 60 Boooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00% 1m0s PASS: Throughput is 915 writes/s PASS: Slowest request took 0.205755s PASS: Stddev is 0.015610s PASS sh-4.4# etcdctl check perf --load="l" 60 / 60 Boooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00% 1m0s FAIL: Throughput too low: 3115 writes/s Slowest request took too long: 0.534104s PASS: Stddev is 0.043344s FAIL And here is the output from the bare-metal cluster from @mhamzy sh-4.4# etcdctl check perf --load="s" 60 / 60 Booooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00% 1m0s PASS: Throughput is 151 writes/s PASS: Slowest request took 0.033216s PASS: Stddev is 0.001029s PASS sh-4.4# sh-4.4# set -o vi sh-4.4# etcdctl check perf --load="m" 60 / 60 Booooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00% 1m0s PASS: Throughput is 999 writes/s PASS: Slowest request took 0.027295s PASS: Stddev is 0.001508s PASS sh-4.4# etcdctl check perf --load="l" 60 / 60 Booooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00% 1m0s PASS: Throughput is 7595 writes/s PASS: Slowest request took 0.234594s PASS: Stddev is 0.006870s PASS Throughput and latency for the 'l' load seems to be beyond the acceptable value. @tkapoor , can you try changing etcd to use ramdisk?
I deployed a cluster in the ppc64le CI environment: [root@C155F2U33 ~]# (HOSTNAME=$(oc get routes/alertmanager-main -n openshift-monitoring -o json | jq -r '.spec.host'); TOKEN=$(oc -n openshift-monitoring sa get-token prometheus-k8s); curl --silent --insecure --header "Authorization: Bearer ${TOKEN}" https://${HOSTNAME}/api/v1/alerts | jq '.data[] | select(.labels.alertname=="KubeAPIErrorBudgetBurn")') ... { "labels": { "alertname": "KubeAPIErrorBudgetBurn", "long": "3d", "prometheus": "openshift-monitoring/k8s", "severity": "warning", "short": "6h" }, ... sh-4.4# etcdctl check perf --load="s" 60 / 60 Booooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00% 1m0s PASS: Throughput is 149 writes/s PASS: Slowest request took 0.154165s PASS: Stddev is 0.005295s PASS sh-4.4# etcdctl check perf --load="m" 60 / 60 Booooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00% 1m0s FAIL: Throughput too low: 864 writes/s PASS: Slowest request took 0.141168s PASS: Stddev is 0.008066s FAIL sh-4.4# etcdctl check perf --load="l" 60 / 60 Booooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00% 1m0s FAIL: Throughput too low: 2476 writes/s PASS: Slowest request took 0.335178s PASS: Stddev is 0.039842s FAIL
Piyush is looking into this alert still - assigning it over. My understanding is that we think this is related to IO throttling in the environment.
Dear reporter, As part of the migration of all OpenShift bugs to Red Hat Jira, we are evaluating all bugs which will result in some stale issues or those without high or urgent priority to be closed. If you believe this bug still requires engineering resolution, we kindly ask you to follow this link[1] and continue working with us in Jira by recreating the issue and providing the necessary information. Also, please provide the link to the original Bugzilla in the description. To create an issue, follow this link: [1] https://issues.redhat.com/secure/CreateIssueDetails!init.jspa?pid=12332330&issuetype=1&priority=10300&components=12367637
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days