Bug 1701368

Summary: kubelet_volume_stats metrics sometimes missing in 4.x aws-e2e
Product: OpenShift Container Platform Reporter: Chance Zibolski <chancez>
Component: StorageAssignee: Hemant Kumar <hekumar>
Status: CLOSED DUPLICATE QA Contact: Liang Xia <lxia>
Severity: low Docs Contact:
Priority: unspecified    
Version: 4.1.0CC: aos-bugs, aos-storage-staff, bbennett, bchilds
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-04-26 10:26:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
prometheus-graph-ui none

Description Chance Zibolski 2019-04-18 18:58:41 UTC
Created attachment 1556195 [details]
prometheus-graph-ui

Description of problem:

I found when running our teams aws-e2e tests against 4.x, that in some cases prometheus contains no kubelet_volume_stats_capacity_bytes or kubelet_volume_stats_usage_bytes metrics. When I looked at the prometheus.tar.gz containing the Prometheus database, and ran Prometheus locally with the metrics in the e2e run, I found these metrics were not available in Prometheus, but the kubelet is being successfully scraped.

Because prometheus is correctly scraping the kubelet, it would seem that kubelet is not exporting these volume metrics correctly.

Version-Release number of selected component (if applicable):

Whatever aws-e2e uses.


How reproducible:

Very rare. Currently occurred for us once in https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_release/3515/rehearse-3515-pull-ci-operator-framework-operator-metering-master-metering-e2e-aws/2 but it's likely this is occurring more regularly but is not being tested for.


Steps to view issue:

1. Download https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_release/3515/rehearse-3515-pull-ci-operator-framework-operator-metering-master-metering-e2e-aws/2/artifacts/metering-e2e-aws/metrics/prometheus.tar.gz to obtain the metrics for the above job.
2. Follow the instructions for using these metrics by searching the aos-devel mailing list for running Prometheus locally: Use the following search: "[aos-devel] Full prometheus dump is now captured during e2e runs".
3. Use the graph UI to look between 2019-04-18 16:00 and 2019-04-18 18:00 (see attached screenshot) for the kubelet_volume_stats_usage_bytes and up{job="kubelet"} metrics. You should see kubelet target is up and being scraped, but there is no metrics for kubelet_volume_stats_usage_bytes.

Actual results:

kubelet_volume_stats_usage_bytes and kubelet_volume_stats_capacity_bytes metrics do not exist.

Expected results:

kubelet_volume_stats_usage_bytes and kubelet_volume_stats_capacity_bytes metrics should exist.

Additional info:

Comment 1 Hemant Kumar 2019-04-23 21:03:26 UTC
FWIW - we do have e2es that test if kubelet is emitting volume stat metrics - https://github.com/kubernetes/kubernetes/blob/master/test/e2e/storage/volume_metrics.go#L191