Bug 1701368 - kubelet_volume_stats metrics sometimes missing in 4.x aws-e2e
Summary: kubelet_volume_stats metrics sometimes missing in 4.x aws-e2e
Keywords:
Status: CLOSED DUPLICATE of bug 1700779
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: 4.1.0
Assignee: Hemant Kumar
QA Contact: Liang Xia
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-18 18:58 UTC by Chance Zibolski
Modified: 2019-04-26 10:26 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-04-26 10:26:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
prometheus-graph-ui (188.67 KB, image/png)
2019-04-18 18:58 UTC, Chance Zibolski
no flags Details

Description Chance Zibolski 2019-04-18 18:58:41 UTC
Created attachment 1556195 [details]
prometheus-graph-ui

Description of problem:

I found when running our teams aws-e2e tests against 4.x, that in some cases prometheus contains no kubelet_volume_stats_capacity_bytes or kubelet_volume_stats_usage_bytes metrics. When I looked at the prometheus.tar.gz containing the Prometheus database, and ran Prometheus locally with the metrics in the e2e run, I found these metrics were not available in Prometheus, but the kubelet is being successfully scraped.

Because prometheus is correctly scraping the kubelet, it would seem that kubelet is not exporting these volume metrics correctly.

Version-Release number of selected component (if applicable):

Whatever aws-e2e uses.


How reproducible:

Very rare. Currently occurred for us once in https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_release/3515/rehearse-3515-pull-ci-operator-framework-operator-metering-master-metering-e2e-aws/2 but it's likely this is occurring more regularly but is not being tested for.


Steps to view issue:

1. Download https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_release/3515/rehearse-3515-pull-ci-operator-framework-operator-metering-master-metering-e2e-aws/2/artifacts/metering-e2e-aws/metrics/prometheus.tar.gz to obtain the metrics for the above job.
2. Follow the instructions for using these metrics by searching the aos-devel mailing list for running Prometheus locally: Use the following search: "[aos-devel] Full prometheus dump is now captured during e2e runs".
3. Use the graph UI to look between 2019-04-18 16:00 and 2019-04-18 18:00 (see attached screenshot) for the kubelet_volume_stats_usage_bytes and up{job="kubelet"} metrics. You should see kubelet target is up and being scraped, but there is no metrics for kubelet_volume_stats_usage_bytes.

Actual results:

kubelet_volume_stats_usage_bytes and kubelet_volume_stats_capacity_bytes metrics do not exist.

Expected results:

kubelet_volume_stats_usage_bytes and kubelet_volume_stats_capacity_bytes metrics should exist.

Additional info:

Comment 1 Hemant Kumar 2019-04-23 21:03:26 UTC
FWIW - we do have e2es that test if kubelet is emitting volume stat metrics - https://github.com/kubernetes/kubernetes/blob/master/test/e2e/storage/volume_metrics.go#L191


Note You need to log in before you can comment on or make changes to this bug.