1814334 – Chunk of 4.4 prometheus tests are failing on cluster upgraded from 4.1 to 4.2 to 4.3 to 4.4

Bug 1814334 - Chunk of 4.4 prometheus tests are failing on cluster upgraded from 4.1 to 4.2 to 4.3 to 4.4

Summary: Chunk of 4.4 prometheus tests are failing on cluster upgraded from 4.1 to 4.2...

Keywords:
Status:	CLOSED DUPLICATE of bug 1812261
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	4.4.0
Assignee:	Jacob Tanenbaum
QA Contact:	zhaozhanqi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-03-17 16:51 UTC by Clayton Coleman
Modified:	2021-04-05 17:46 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-03-24 20:59:04 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Clayton Coleman 2020-03-17 16:51:14 UTC

12 or so prometheus tests are failing in a cluster upgraded from 4.1 to 4.2 to 4.3 to 4.4.  This may be a serious platform issue or a test issue, but if it's a test issue it's blocking us from understanding if the metrics are right.

Needs immediate triage to determine why the test is failing, and the fix needs to land ASAP so we can identify whether other blockers exist.

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.1-to-4.2-to-4.3-to-4.4-nightly/22


[Feature:Prometheus][Conformance] Prometheus when installed on the cluster [Top Level] [Feature:Prometheus][Conformance] Prometheus when installed on the cluster should provide ingress metrics [Suite:openshift/conformance/parallel/minimal] expand_less	34s
fail [github.com/openshift/origin/test/extended/prometheus/prometheus.go:308]: Unexpected error:
    <*errors.errorString | 0xc003c6dd60>: {
        s: "host command failed: error running /usr/bin/kubectl --server=https://api.ci-op-p9tp56ty-599c3.origin-ci-int-aws.dev.rhcloud.com:6443 --kubeconfig=/tmp/admin.kubeconfig exec --namespace=e2e-test-prometheus-8r97x execpod7fghx -- /bin/sh -x -c curl -s -k -H 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbW9uaXRvcmluZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLWFkYXB0ZXItdG9rZW4tN2Y5djUiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cy1hZGFwdGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiMjM5MzFiMmMtNjgwMy0xMWVhLThiYWQtMGFkZjc4YmFkZmMzIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1tb25pdG9yaW5nOnByb21ldGhldXMtYWRhcHRlciJ9.qA3Qtn_B0_qDar0RPfLJq7PrDtYE1kP6zr2IkE27KDYkgld2Bx8Rj3V-dc5IgNkMss_V9k1tGj3x-O5huJ5jUDZ0VTLtMTYdVWjA2pZt5ZMjtAqpXXwBdYQ4TrZpnpVp3t8wIMXNP-Ka0g40q_tmetwRaxhzvHUz_G1B1TF7gqEc327W8-AqUx7LHVA2JdfYtJ6DUC9AByP6uZxUfQgWdrjRYM4pDKeGWrQjTV-y_PQZIC8dyQTMtEcR4OkLPUEs4aZiUCe9zCxWjSr2CIwjp6IKCax89-377IDOWVhipbHBomGYpIGxBMmjsiSdj2I6177Uf-3Wx3_Gn57s-SPU1Q' \"https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/targets\":\nCommand stdout:\n\nstderr:\n+ curl -s -k -H 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbW9uaXRvcmluZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLWFkYXB0ZXItdG9rZW4tN2Y5djUiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cy1hZGFwdGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiMjM5MzFiMmMtNjgwMy0xMWVhLThiYWQtMGFkZjc4YmFkZmMzIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1tb25pdG9yaW5nOnByb21ldGhldXMtYWRhcHRlciJ9.qA3Qtn_B0_qDar0RPfLJq7PrDtYE1kP6zr2IkE27KDYkgld2Bx8Rj3V-dc5IgNkMss_V9k1tGj3x-O5huJ5jUDZ0VTLtMTYdVWjA2pZt5ZMjtAqpXXwBdYQ4TrZpnpVp3t8wIMXNP-Ka0g40q_tmetwRaxhzvHUz_G1B1TF7gqEc327W8-AqUx7LHVA2JdfYtJ6DUC9AByP6uZxUfQgWdrjRYM4pDKeGWrQjTV-y_PQZIC8dyQTMtEcR4OkLPUEs4aZiUCe9zCxWjSr2CIwjp6IKCax89-377IDOWVhipbHBomGYpIGxBMmjsiSdj2I6177Uf-3Wx3_Gn57s-SPU1Q' https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/targets\ncommand terminated with exit code 35\n\nerror:\nexit status 35\n",
    }
    host command failed: error running /usr/bin/kubectl --server=https://api.ci-op-p9tp56ty-599c3.origin-ci-int-aws.dev.rhcloud.com:6443 --kubeconfig=/tmp/admin.kubeconfig exec --namespace=e2e-test-prometheus-8r97x execpod7fghx -- /bin/sh -x -c curl -s -k -H 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbW9uaXRvcmluZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLWFkYXB0ZXItdG9rZW4tN2Y5djUiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cy1hZGFwdGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiMjM5MzFiMmMtNjgwMy0xMWVhLThiYWQtMGFkZjc4YmFkZmMzIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1tb25pdG9yaW5nOnByb21ldGhldXMtYWRhcHRlciJ9.qA3Qtn_B0_qDar0RPfLJq7PrDtYE1kP6zr2IkE27KDYkgld2Bx8Rj3V-dc5IgNkMss_V9k1tGj3x-O5huJ5jUDZ0VTLtMTYdVWjA2pZt5ZMjtAqpXXwBdYQ4TrZpnpVp3t8wIMXNP-Ka0g40q_tmetwRaxhzvHUz_G1B1TF7gqEc327W8-AqUx7LHVA2JdfYtJ6DUC9AByP6uZxUfQgWdrjRYM4pDKeGWrQjTV-y_PQZIC8dyQTMtEcR4OkLPUEs4aZiUCe9zCxWjSr2CIwjp6IKCax89-377IDOWVhipbHBomGYpIGxBMmjsiSdj2I6177Uf-3Wx3_Gn57s-SPU1Q' "https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/targets":
    Command stdout:
    
    stderr:
    + curl -s -k -H 'Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJvcGVuc2hpZnQtbW9uaXRvcmluZyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLWFkYXB0ZXItdG9rZW4tN2Y5djUiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoicHJvbWV0aGV1cy1hZGFwdGVyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiMjM5MzFiMmMtNjgwMy0xMWVhLThiYWQtMGFkZjc4YmFkZmMzIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Om9wZW5zaGlmdC1tb25pdG9yaW5nOnByb21ldGhldXMtYWRhcHRlciJ9.qA3Qtn_B0_qDar0RPfLJq7PrDtYE1kP6zr2IkE27KDYkgld2Bx8Rj3V-dc5IgNkMss_V9k1tGj3x-O5huJ5jUDZ0VTLtMTYdVWjA2pZt5ZMjtAqpXXwBdYQ4TrZpnpVp3t8wIMXNP-Ka0g40q_tmetwRaxhzvHUz_G1B1TF7gqEc327W8-AqUx7LHVA2JdfYtJ6DUC9AByP6uZxUfQgWdrjRYM4pDKeGWrQjTV-y_PQZIC8dyQTMtEcR4OkLPUEs4aZiUCe9zCxWjSr2CIwjp6IKCax89-377IDOWVhipbHBomGYpIGxBMmjsiSdj2I6177Uf-3Wx3_Gn57s-SPU1Q' https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/targets
    command terminated with exit code 35
    
    error:
    exit status 35

Comment 1 Lili Cosic 2020-03-17 17:11:41 UTC

As mentioned on slack I suspect this is a test setup issue, as looking at dumps of prometheus and logs of cluster-monitoring-operator and prometheus itself there was no errors or failures. The alerts there suggest a SDN problem in run #22.

Comment 3 Pawel Krupa 2020-03-19 08:08:23 UTC

I don't see any issues with monitoring stack, reassigning to the networking team as only identified issues are with SDN.

Comment 4 Lili Cosic 2020-03-19 08:37:27 UTC

Follow up comment to clarify, we suspect why the Prometheus can't be queried due to networking problems in the cluster. As with the dump we can see there are metrics and alerts there for certain components.

Comment 6 W. Trevor King 2021-04-05 17:46:14 UTC

Removing UpgradeBlocker from this older bug, to remove it from the suspect queue described in [1].  If you feel like this bug still needs to be a suspect, please add keyword again.

[1]: https://github.com/openshift/enhancements/pull/475

Note You need to log in before you can comment on or make changes to this bug.