Bug 1951952
| Summary: | [AWS CSI Migration] Metrics for cloudprovider error requests are lost | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Qin Ping <piqin> |
| Component: | Storage | Assignee: | melbeher |
| Storage sub component: | Operators | QA Contact: | Qin Ping <piqin> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | medium | ||
| Priority: | unspecified | CC: | aos-bugs, fbertina, hekumar, jsafrane |
| Version: | 4.8 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.8.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-07-27 23:02:20 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
yeah this is a known issue - https://github.com/kubernetes-sigs/aws-ebs-csi-driver/issues/806 Just for the record, we had previously talked about this in a team meeting and we decided that we do need cloud metrics before CSI migration goes GA. However, it's OK to not have it in Tech Preview (4.8). *** Bug 1956791 has been marked as a duplicate of this bug. *** Verified with: 4.8.0-0.nightly-2021-05-12-184904 $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "en" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/query?query=cloudprovider_aws_api_request_errors'|jq % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 876 0 876 0 0 41714 0 --:--:-- --:--:-- --:--:-- 43800 { "status": "success", "data": { "resultType": "vector", "result": [ { "metric": { "__name__": "cloudprovider_aws_api_request_errors", "container": "driver-kube-rbac-proxy", "endpoint": "driver-m", "instance": "10.0.162.1:9206", "job": "aws-ebs-csi-driver-controller-metrics", "namespace": "openshift-cluster-csi-drivers", "pod": "aws-ebs-csi-driver-controller-7d49867c85-bgbpv", "request": "DescribeVolumesModifications", "service": "aws-ebs-csi-driver-controller-metrics" }, "value": [ 1620886264.24, "1" ] }, { "metric": { "__name__": "cloudprovider_aws_api_request_errors", "container": "driver-kube-rbac-proxy", "endpoint": "driver-m", "instance": "10.0.162.1:9206", "job": "aws-ebs-csi-driver-controller-metrics", "namespace": "openshift-cluster-csi-drivers", "pod": "aws-ebs-csi-driver-controller-7d49867c85-bgbpv", "request": "ModifyVolume", "service": "aws-ebs-csi-driver-controller-metrics" }, "value": [ 1620886264.24, "9" ] } ] } } Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |
Description of problem: When AWS CSI migration is enabled, seems the metrics about the cloudprovider error requests are lost Version-Release number of selected component (if applicable): 4.8.0-fc.0 How reproducible: Always Steps to Reproduce: 1. Setup a cluster on AWS 2. Enable csi migration 3. Create a PVC with gp2 storageclass 4. Expand this PVC 2 times Actual results: Risizing failed and got the following error: Warning VolumeResizeFailed 1s external-resizer ebs.csi.aws.com (combined from similar events): resize volume "pvc-f6286614-1d0f-411d-afb3-323d6a4c605b" by resizer "ebs.csi.aws.com" failed: rpc error: code = Internal desc = Could not resize volume "vol-064cf733d700d2365": could not modify AWS volume "vol-064cf733d700d2365": VolumeModificationRateExceeded: You've reached the maximum modification rate per volume limit. Wait at least 6 hours between modifications per EBS volume. status code: 400, request id: d58b1b51-00d2-4e26-a1cc-380a6b3b182e Check the metrics: No "cloudprovider_aws_api_request_errors" metrics or "csi_sidecar_operations_errors" metrics Expected results: Still can get the metrics about the error requests or operations. Master Log: Node Log (of failed PODs): PV Dump: PVC Dump: StorageClass Dump (if StorageClass used by PV/PVC): Additional info: