Description of problem: In my PR's e2e tests we have some test cases which rely on metrics data, and we're occassionally seeing missing metrics. In our PR we're seeing that tests which rely on kubelet_volume_stats_capacity_bytes and/or kubelet_volume_stats_used_bytes metrics seem to be failing. Here's our PR's test run: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/operator-framework_operator-metering/969/pull-ci-operator-framework-operator-metering-master-metering-e2e-aws/1350 When I look at the artifacts, I see that prometheus.tar.gz is 0 bytes: https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/pr-logs/pull/operator-framework_operator-metering/969/pull-ci-operator-framework-operator-metering-master-metering-e2e-aws/1349/artifacts/metering-e2e-aws/metrics/ Additonally, the teardown.log indicates many issues with collecting pod information due to TLS errors: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/operator-framework_operator-metering/969/pull-ci-operator-framework-operator-metering-master-metering-e2e-aws/1349/artifacts/metering-e2e-aws/container-logs/teardown.log Version-Release number of selected component (if applicable): 4.3.0 How reproducible: Relatively. It's occurred at least twice for my PR https://prow.svc.ci.openshift.org/pr-history/?org=operator-framework&repo=operator-metering&pr=969 Steps to Reproduce: 1. ? Actual results: kubelet_volume_stats_capacity_bytes and kubelet_volume_stats_used_bytes metrics have no data. Expected results: kubelet_volume_stats_capacity_bytes and kubelet_volume_stats_used_bytes metrics have data Additional info:
Looks like the API server is erroring not being able to contact the etcd server: W1004 21:49:46.367840 1 asm_amd64.s:1337] Failed to dial etcd-1.ci-op-zh2yqt2r-cd80e.origin-ci-int-aws.dev.rhcloud.com:2379: grpc: the connection is closing; please retry. I1004 21:49:46.367922 1 asm_amd64.s:1337] balancerWrapper: got update addr from Notify: [{etcd-0.ci-op-zh2yqt2r-cd80e.origin-ci-int-aws.dev.rhcloud.com:2379 <nil>} {etcd-1.ci-op-zh2yqt2r-cd80e.origin-ci-int-aws.dev.rhcloud.com:2379 <nil>} {etcd-2.ci-op-zh2yqt2r-cd80e.origin-ci-int-aws.dev.rhcloud.com:2379 <nil>}]
Reassign to the etcd team... Perhaps they can help triage what is going on.
*** This bug has been marked as a duplicate of bug 1748073 ***