test: [Feature:Prometheus][Conformance] Prometheus when installed on the cluster [Top Level] [Feature:Prometheus][Conformance] Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present is failing frequently in CI, see search results: https://search.svc.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=%5C%5BFeature%3APrometheus%5C%5D%5C%5BConformance%5C%5D+Prometheus+when+installed+on+the+cluster+%5C%5BTop+Level%5C%5D+%5C%5BFeature%3APrometheus%5C%5D%5C%5BConformance%5C%5D+Prometheus+when+installed+on+the+cluster+should+report+telemetry+if+a+cloud%5C.openshift%5C.com+token+is+present Link to the job which is failing: https://deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-vsphere-upi-4.4/1281185853179170816 Snippet of failure logs: fail [github.com/openshift/origin/test/extended/builds/valuefrom.go:46]: Unexpected error: <*util.ExitError | 0xc001945350>: { Cmd: "oc --namespace=e2e-test-build-valuefrom-jllvb --kubeconfig=/tmp/configfile256536350 create -f /tmp/fixture-testdata-dir123358715/test/extended/testdata/builds/valuefrom/test-is.json --validate=false", StdErr: "error: error when creating \"/tmp/fixture-testdata-dir123358715/test/extended/testdata/builds/valuefrom/test-is.json\": Post https://api.ci-op-zt1lsbbl-e99c3.origin-ci-int-aws.dev.rhcloud.com:6443/apis/image.openshift.io/v1/namespaces/e2e-test-build-valuefrom-jllvb/imagestreams: EOF", ExitError: { ProcessState: { pid: 4991, status: 256, rusage: { Utime: {Sec: 0, Usec: 205566}, Stime: {Sec: 0, Usec: 105962}, Maxrss: 75936, Ixrss: 0, Idrss: 0, Isrss: 0, Minflt: 12720, Majflt: 0, Nswap: 0, Inblock: 0, Oublock: 0, Msgsnd: 0, Msgrcv: 0, Nsignals: 0, Nvcsw: 888, Nivcsw: 20, }, }, Stderr: nil, }, } exit status 1 Jul 09 11:41:30.684 E ns/openshift-kube-controller-manager pod/kube-controller-manager-control-plane-0 node/control-plane-0 container=cluster-policy-controller container exited with code 255 (Error): I0709 11:41:26.958277 1 cert_rotation.go:137] Starting client certificate rotation controller\nI0709 11:41:26.966046 1 policy_controller.go:41] Starting controllers on 0.0.0.0:10357 (31debebc)\nI0709 11:41:26.967891 1 standalone_apiserver.go:103] Started health checks at 0.0.0.0:10357\nF0709 11:41:26.969002 1 standalone_apiserver.go:119] listen tcp 0.0.0.0:10357: bind: address already in use\n Jul 09 11:43:55.182 E ns/openshift-kube-apiserver pod/kube-apiserver-control-plane-0 node/control-plane-0 container=setup init container exited with code 124 (Error): ................................................................................ Jul 09 11:44:06.597 E ns/openshift-console pod/console-7dbc679bf6-cbxd8 node/control-plane-1 container=console container exited with code 2 (Error): 2020-07-09T11:38:28Z cmd/main: cookies are secure!\n2020-07-09T11:38:28Z cmd/main: Binding to [::]:8443...\n2020-07-09T11:38:28Z cmd/main: using TLS\n One of tests related to the following is failing: - Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present. A bug was created for the same: https://bugzilla.redhat.com/show_bug.cgi?id=1853007 against a different release. However, as the relevant PR is closed and the following comment: https://bugzilla.redhat.com/show_bug.cgi?id=1853007#c21 expects the job to not fail, raising a bug for further investigation.
I suspect that the issue is bad timing between when the test is executed and when telemetry metrics are available from Prometheus. The telemeter-client logs [1] show that it didn't retrieve any metrics at 11:40:29.8762. This is consistent with the Prometheus logs [2][3] which show that they were starting around that time. The Prometheus dump shows also that the telemeter client has sent samples to the telemetry backend after the 11:45:18 mark while the test reported the failure at 11:45:03.430. Given that the telemeter client sends data every 4min30s and the test checking whether telemetry data has been sent does 5 retries at an interval of 10 seconds, this would explain it. The issue is probably rare and less visible in 4.6 since failing tests are retried to eliminate flakes. We should still look into making the test more predictible. [1] https://storage.googleapis.com/origin-ci-test/logs/release-openshift-ocp-installer-e2e-vsphere-upi-4.4/1281185853179170816/artifacts/e2e-vsphere-upi/pods/openshift-monitoring_telemeter-client-66dbfd95b7-zgv7k_telemeter-client.log [2] https://storage.googleapis.com/origin-ci-test/logs/release-openshift-ocp-installer-e2e-vsphere-upi-4.4/1281185853179170816/artifacts/e2e-vsphere-upi/pods/openshift-monitoring_prometheus-k8s-0_prometheus.log [3] https://storage.googleapis.com/origin-ci-test/logs/release-openshift-ocp-installer-e2e-vsphere-upi-4.4/1281185853179170816/artifacts/e2e-vsphere-upi/pods/openshift-monitoring_prometheus-k8s-1_prometheus.log
this was a tollbooth issue which has been resolved, there are only a couple flaky recent failures, most of the failures are over a day old(from before the issue was addressed) and will slowly fall off the test history.
checked with 4.7 CI results,no failed error for the case https://search.ci.openshift.org/?search=%5C%5BFeature%3APrometheus%5C%5D%5C%5BConformance%5C%5D+Prometheus+when+installed+on+the+cluster+%5C%5BTop+Level%5C%5D+%5C%5BFeature%3APrometheus%5C%5D%5C%5BConformance%5C%5D+Prometheus+when+installed+on+the+cluster+should+report+telemetry+if+a+cloud%5C.openshift%5C.com+token+is+present&maxAge=36h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job close it, feel free to reopen it if it happens
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633