Bug 1951952 - [AWS CSI Migration] Metrics for cloudprovider error requests are lost
Summary: [AWS CSI Migration] Metrics for cloudprovider error requests are lost
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.8
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.8.0
Assignee: melbeher
QA Contact: Qin Ping
URL:
Whiteboard:
: 1956791 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-21 08:29 UTC by Qin Ping
Modified: 2021-07-27 23:02 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 23:02:20 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift aws-ebs-csi-driver-operator pull 125 0 None open Bug 1951952: Metrics for cloudprovider error requests are lost 2021-05-12 09:11:41 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 23:02:40 UTC

Description Qin Ping 2021-04-21 08:29:51 UTC
Description of problem:
When AWS CSI migration is enabled, seems the metrics about the cloudprovider error requests are lost

Version-Release number of selected component (if applicable):
4.8.0-fc.0

How reproducible:
Always

Steps to Reproduce:
1. Setup a cluster on AWS
2. Enable csi migration
3. Create a PVC with gp2 storageclass
4. Expand this PVC 2 times

Actual results:
Risizing failed and got the following error:
  Warning  VolumeResizeFailed  1s                  external-resizer ebs.csi.aws.com  (combined from similar events): resize volume "pvc-f6286614-1d0f-411d-afb3-323d6a4c605b" by resizer "ebs.csi.aws.com" failed: rpc error: code = Internal desc = Could not resize volume "vol-064cf733d700d2365": could not modify AWS volume "vol-064cf733d700d2365": VolumeModificationRateExceeded: You've reached the maximum modification rate per volume limit. Wait at least 6 hours between modifications per EBS volume.
           status code: 400, request id: d58b1b51-00d2-4e26-a1cc-380a6b3b182e

Check the metrics:
No "cloudprovider_aws_api_request_errors" metrics or "csi_sidecar_operations_errors" metrics


Expected results:
Still can get the metrics about the error requests or operations.

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

Comment 1 Hemant Kumar 2021-04-21 14:19:55 UTC
yeah this is a known issue - https://github.com/kubernetes-sigs/aws-ebs-csi-driver/issues/806

Comment 2 Fabio Bertinatto 2021-04-27 08:32:51 UTC
Just for the record, we had previously talked about this in a team meeting and we decided that we do need cloud metrics before CSI migration goes GA. However, it's OK to not have it in Tech Preview (4.8).

Comment 4 Jan Safranek 2021-05-04 14:22:09 UTC
*** Bug 1956791 has been marked as a duplicate of this bug. ***

Comment 7 Qin Ping 2021-05-13 06:18:15 UTC
Verified with: 4.8.0-0.nightly-2021-05-12-184904

$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "en" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/query?query=cloudprovider_aws_api_request_errors'|jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   876    0   876    0     0  41714      0 --:--:-- --:--:-- --:--:-- 43800
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "cloudprovider_aws_api_request_errors",
          "container": "driver-kube-rbac-proxy",
          "endpoint": "driver-m",
          "instance": "10.0.162.1:9206",
          "job": "aws-ebs-csi-driver-controller-metrics",
          "namespace": "openshift-cluster-csi-drivers",
          "pod": "aws-ebs-csi-driver-controller-7d49867c85-bgbpv",
          "request": "DescribeVolumesModifications",
          "service": "aws-ebs-csi-driver-controller-metrics"
        },
        "value": [
          1620886264.24,
          "1"
        ]
      },
      {
        "metric": {
          "__name__": "cloudprovider_aws_api_request_errors",
          "container": "driver-kube-rbac-proxy",
          "endpoint": "driver-m",
          "instance": "10.0.162.1:9206",
          "job": "aws-ebs-csi-driver-controller-metrics",
          "namespace": "openshift-cluster-csi-drivers",
          "pod": "aws-ebs-csi-driver-controller-7d49867c85-bgbpv",
          "request": "ModifyVolume",
          "service": "aws-ebs-csi-driver-controller-metrics"
        },
        "value": [
          1620886264.24,
          "9"
        ]
      }
    ]
  }
}

Comment 10 errata-xmlrpc 2021-07-27 23:02:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.