Bug 1759260 - 4.3 origin CI missing metrics
Summary: 4.3 origin CI missing metrics
Keywords:
Status: CLOSED DUPLICATE of bug 1748073
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Etcd
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.4.0
Assignee: Sam Batschelet
QA Contact: ge liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-07 17:49 UTC by Chance Zibolski
Modified: 2019-12-10 15:33 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-12-10 15:33:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Chance Zibolski 2019-10-07 17:49:21 UTC
Description of problem: In my PR's e2e tests we have some test cases which rely on metrics data, and we're occassionally seeing missing metrics. In our PR we're seeing that tests which rely on kubelet_volume_stats_capacity_bytes and/or kubelet_volume_stats_used_bytes metrics seem to be failing.

Here's our PR's test run: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/operator-framework_operator-metering/969/pull-ci-operator-framework-operator-metering-master-metering-e2e-aws/1350

When I look at the artifacts, I see that prometheus.tar.gz is 0 bytes: https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/pr-logs/pull/operator-framework_operator-metering/969/pull-ci-operator-framework-operator-metering-master-metering-e2e-aws/1349/artifacts/metering-e2e-aws/metrics/

Additonally, the teardown.log indicates many issues with collecting pod information due to TLS errors: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/operator-framework_operator-metering/969/pull-ci-operator-framework-operator-metering-master-metering-e2e-aws/1349/artifacts/metering-e2e-aws/container-logs/teardown.log


Version-Release number of selected component (if applicable): 4.3.0


How reproducible: Relatively. It's occurred at least twice for my PR https://prow.svc.ci.openshift.org/pr-history/?org=operator-framework&repo=operator-metering&pr=969


Steps to Reproduce:
1. ?


Actual results: kubelet_volume_stats_capacity_bytes and kubelet_volume_stats_used_bytes metrics have no data.


Expected results: kubelet_volume_stats_capacity_bytes and kubelet_volume_stats_used_bytes metrics have data


Additional info:

Comment 2 Ryan Phillips 2019-10-10 16:12:09 UTC
Looks like the API server is erroring not being able to contact the etcd server:

W1004 21:49:46.367840       1 asm_amd64.s:1337] Failed to dial etcd-1.ci-op-zh2yqt2r-cd80e.origin-ci-int-aws.dev.rhcloud.com:2379: grpc: the connection is closing; please retry.
I1004 21:49:46.367922       1 asm_amd64.s:1337] balancerWrapper: got update addr from Notify: [{etcd-0.ci-op-zh2yqt2r-cd80e.origin-ci-int-aws.dev.rhcloud.com:2379 <nil>} {etcd-1.ci-op-zh2yqt2r-cd80e.origin-ci-int-aws.dev.rhcloud.com:2379 <nil>} {etcd-2.ci-op-zh2yqt2r-cd80e.origin-ci-int-aws.dev.rhcloud.com:2379 <nil>}]

Comment 3 Ryan Phillips 2019-10-10 16:15:56 UTC
Reassign to the etcd team... Perhaps they can help triage what is going on.

Comment 4 Sam Batschelet 2019-12-10 15:33:47 UTC

*** This bug has been marked as a duplicate of bug 1748073 ***


Note You need to log in before you can comment on or make changes to this bug.