Bug 1540039 - Promethues metrics "storage_operation_errors_total" does not work
Summary: Promethues metrics "storage_operation_errors_total" does not work
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.9.0
Assignee: Hemant Kumar
QA Contact: Chao Yang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-01-30 06:53 UTC by Chao Yang
Modified: 2018-03-28 14:24 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-03-28 14:23:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0489 0 None None None 2018-03-28 14:24:35 UTC

Description Chao Yang 2018-01-30 06:53:00 UTC
Description of problem:
Promethues metrics "storage_operation_errors_total" does not work

Version-Release number of selected component (if applicable):
oc v3.9.0-0.31.0
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ip-172-18-10-56.ec2.internal:443
openshift v3.9.0-0.31.0
kubernetes v1.9.1+a0ce1bc657


How reproducible:
Always

Steps to Reproduce:
1.Create a dynamic pvc, record the ebs volume id
2.Attach the volume to other instance from aws web console
3.Create a pod using above pvc
4.There should be metric message 
storage_operation_errors_total { volume_plugin = "aws-ebs", operation_name = "volume_attach" }

Actual results:
No metric message displayed. This metric does not work

Expected results:
One metric message should display on the prometheus web console

Additional info:

Comment 1 Hemant Kumar 2018-02-01 15:12:28 UTC
@chaoyang - Did you see any storage errors when this happened? if there were no errors - this metric will not be emitted. Once emitted, it will keep getting emitted afterwards.

Comment 2 Chao Yang 2018-02-02 02:00:34 UTC
@hekumar - I saw the error message related "delete volume", but no error message like "attach volume"

Comment 3 Hemant Kumar 2018-02-02 19:25:21 UTC
I tried reproducing this in a AWS cluster and I got following metric after attach operation failed bunch of times:


storage_operation_errors_total{operation_name="volume_attach",volume_plugin="kubernetes.io/aws-ebs"} 2


And then I deleted a PVC which was being actively used by a pod and got:

storage_operation_errors_total{operation_name="volume_delete",volume_plugin="kubernetes.io/aws-ebs"} 2


So, I can't reproduce this problem. Can you post your logs that indicate - errors indeed happend and metrics were not recorded?

Comment 7 Hemant Kumar 2018-02-05 19:01:48 UTC
https://github.com/openshift/origin/pull/18442

Comment 9 Chao Yang 2018-02-22 06:15:35 UTC
It is passed 
oc v3.9.0-0.47.0
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ip-172-18-12-213.ec2.internal:8443
openshift v3.9.0-0.47.0
kubernetes v1.9.1+a0ce1bc657

# TYPE storage_operation_errors_total counter
storage_operation_errors_total{operation_name="volume_attach",volume_plugin="kubernetes.io/aws-ebs"} 8

Comment 12 errata-xmlrpc 2018-03-28 14:23:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489


Note You need to log in before you can comment on or make changes to this bug.