Description of problem: Promethues metrics "storage_operation_errors_total" does not work Version-Release number of selected component (if applicable): oc v3.9.0-0.31.0 kubernetes v1.9.1+a0ce1bc657 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://ip-172-18-10-56.ec2.internal:443 openshift v3.9.0-0.31.0 kubernetes v1.9.1+a0ce1bc657 How reproducible: Always Steps to Reproduce: 1.Create a dynamic pvc, record the ebs volume id 2.Attach the volume to other instance from aws web console 3.Create a pod using above pvc 4.There should be metric message storage_operation_errors_total { volume_plugin = "aws-ebs", operation_name = "volume_attach" } Actual results: No metric message displayed. This metric does not work Expected results: One metric message should display on the prometheus web console Additional info:
@chaoyang - Did you see any storage errors when this happened? if there were no errors - this metric will not be emitted. Once emitted, it will keep getting emitted afterwards.
@hekumar - I saw the error message related "delete volume", but no error message like "attach volume"
I tried reproducing this in a AWS cluster and I got following metric after attach operation failed bunch of times: storage_operation_errors_total{operation_name="volume_attach",volume_plugin="kubernetes.io/aws-ebs"} 2 And then I deleted a PVC which was being actively used by a pod and got: storage_operation_errors_total{operation_name="volume_delete",volume_plugin="kubernetes.io/aws-ebs"} 2 So, I can't reproduce this problem. Can you post your logs that indicate - errors indeed happend and metrics were not recorded?
https://github.com/openshift/origin/pull/18442
It is passed oc v3.9.0-0.47.0 kubernetes v1.9.1+a0ce1bc657 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://ip-172-18-12-213.ec2.internal:8443 openshift v3.9.0-0.47.0 kubernetes v1.9.1+a0ce1bc657 # TYPE storage_operation_errors_total counter storage_operation_errors_total{operation_name="volume_attach",volume_plugin="kubernetes.io/aws-ebs"} 8
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0489