Created attachment 1902877 [details] csi-snapshotter log Description of problem: Snapshot content is taking 4min+ time to get to ready status Version-Release number of selected component (if applicable): 4.12.0-0.nightly-2022-08-01-151317 How reproducible: Always Steps to Reproduce: 1. Create a Alicloud cluster with below mentioned details. template: aos-4_12/ipi-on-alicloud/versioned-installer-fips-ovn-ci 2. Create pvc, pod. Wait till the pod reaches to ready state. 3. Create volumesnapshot from default volumesnapshot class(alicloud-disk) 4. Check for volumesnapshot content to reach to Ready status. Actual results: It is in false state for more than 4min+ and displaying the below message within 4min of time before it reaches to true stage "Failed to check and update snapshot content: failed to take snapshot of the volume d-0xi7rswngantuq1ux2vm: "rpc error: code = Unknown desc = CreateSnapshot: snapshot create request limit snapshot-072ac6a4-5a78-475f-947c-9dd00033189c"" Expected results: It should not take this much time to reach to true stage as compared with other platforms Additional info: pvc_pod.yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: mypvc namespace: testropatil spec: accessModes: - ReadWriteOnce resources: requests: storage: 30Gi storageClassName: alicloud-disk volumeMode: Filesystem --- apiVersion: v1 kind: Pod metadata: name: mypod namespace: testropatil spec: containers: - image: quay.io/openshifttest/hello-openshift@sha256:b1aabe8c8272f750ce757b6c4263a2712796297511e0c6df79144ee188933623 name: mypod volumeMounts: - mountPath: "/mnt/storage" name: data volumes: - name: data persistentVolumeClaim: claimName: mypvc rohitpatil@ropatil-mac Downloads % oc get pvc,pod -n testropatil -o wide NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE VOLUMEMODE persistentvolumeclaim/mypvc Bound pvc-804892c5-7749-4b54-b325-b83426067cab 30Gi RWO alicloud-disk 7m22s Filesystem NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod/mypod 1/1 Running 0 7m22s 10.128.2.34 ropatil28-ali-vbhb5-worker-us-east-1b-sm4tc <none> <none> vss.yaml apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshot metadata: name: my-snapshot namespace: testropatil spec: source: persistentVolumeClaimName: mypvc volumeSnapshotClassName: alicloud-disk rohitpatil@ropatil-mac Downloads % oc get volumesnapshot -n testropatil NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME AGE my-snapshot false mypvc 30Gi alicloud-disk snapcontent-072ac6a4-5a78-475f-947c-9dd00033189c 14s 15s rohitpatil@ropatil-mac Downloads % oc describe volumesnapshot -n testropatil Name: my-snapshot Namespace: testropatil Labels: <none> Annotations: <none> API Version: snapshot.storage.k8s.io/v1 Kind: VolumeSnapshot Manager: Go-http-client Operation: Update Time: 2022-08-02T11:58:33Z API Version: snapshot.storage.k8s.io/v1 Fields Type: FieldsV1 f:time: f:readyToUse: f:restoreSize: Manager: Go-http-client Operation: Update Subresource: status Time: 2022-08-02T11:58:38Z Spec: Source: Persistent Volume Claim Name: mypvc Volume Snapshot Class Name: alicloud-disk Status: Bound Volume Snapshot Content Name: snapcontent-072ac6a4-5a78-475f-947c-9dd00033189c Creation Time: 2022-08-02T11:58:34Z Error: Message: Failed to check and update snapshot content: failed to take snapshot of the volume d-0xi7rswngantuq1ux2vm: "rpc error: code = Unknown desc = CreateSnapshot: snapshot create request limit snapshot-072ac6a4-5a78-475f-947c-9dd00033189c" Time: 2022-08-02T11:58:38Z Ready To Use: false Restore Size: 30Gi Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal CreatingSnapshot 33s snapshot-controller Waiting for a snapshot testropatil/my-snapshot to be created by the CSI driver. Normal SnapshotCreated 28s snapshot-controller Snapshot testropatil/my-snapshot was successfully created by the CSI driver. rohitpatil@ropatil-mac Downloads % oc get volumesnapshot -n testropatil NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME AGE my-snapshot true mypvc 30Gi alicloud-disk snapcontent-072ac6a4-5a78-475f-947c-9dd00033189c 7m22s 7m23s // It almost took 4min+ time to reach to snapshotready status. rohitpatil@ropatil-mac Downloads % oc describe volumesnapshot -n testropatil Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal CreatingSnapshot 7m38s snapshot-controller Waiting for a snapshot testropatil/my-snapshot to be created by the CSI driver. Normal SnapshotCreated 7m33s snapshot-controller Snapshot testropatil/my-snapshot was successfully created by the CSI driver. Normal SnapshotReady 3m38s snapshot-controller Snapshot testropatil/my-snapshot is ready to use. Attaching the csi-snapshotter log and must gather logs.
Note: Snapshot is still getting created inside default resource group instead of our newly created resource group.
Flexy template: aos-4_12/ipi-on-alicloud/versioned-installer-fips-ovn-ci Cluster auto created new rgid value: rg-aekzbska3jozjzy Job link: https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/126891/parameters/ Default resource group id: rg-acfnw6kdej3hyai created on Nov11 2021 1. Create cluster with above mentioned details, without mentioning resource_group_id in launcher variables. 2. Create pvc, dep, wait till dep goes to running state 3. Create vss from default volumesnapshotclass. 4. Check volumesnapshot content got generated in default resource group(rg-acfnw6kdej3hyai) and not in newly generated rg-id(rg-aekzbska3jozjzy) Expected result: It should have crated volumesnapshot inside newly generated rgid(rg-aekzbska3jozjzy)
I don't think we can fix speed of the CSI driver, but the snapshots should be created in the right resource group.
Alibaba CSI driver accepts parameter `resourceGroupId` in VolumeSnapshotClass, We need to fix the operator to set it.
Verified: PASS Payload: 4.12.0-0.nightly-2022-08-09-223806 Scenarios executed: 1) With rg mentioned inside sc.yaml/vssc.yaml rg-aekz744torld7py(cluster created resource group), check in backend got created in rg-aekz744torld7py 2) With rg mentioned inside sc.yaml/vssc.yaml rg-acfnw6kdej3hyai(default resource group created on Nov2021), check in backend got created in default rg-acfnw6kdej3hyai 3) Without rg mentioned in sc.yaml/volumesnapshotclass.yaml or using default sc/vssc, check in backend got created in default rg-acfnw6kdej3hyai With rg mentioned inside sc.yaml rg-aekz744torld7py(cluster created resource group), checked in backend got created in cluster rg-aekz744torld7py sc.yaml allowVolumeExpansion: true apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: mysc parameters: type: available volumeSizeAutoAvailable: "true" readOnly: "false" resourceGroupId: rg-aekz744torld7py provisioner: diskplugin.csi.alibabacloud.com reclaimPolicy: Delete volumeBindingMode: Immediate vssc.yaml apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshotClass metadata: name: alicloud-disk1 driver: diskplugin.csi.alibabacloud.com deletionPolicy: Delete parameters: resourceGroupId: rg-aekz744torld7py rohitpatil@ropatil-mac Downloads % oc get volumesnapshot -n testropatil NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME AGE mysnapshot true mypvc-csi 20Gi alicloud-disk1 snapcontent-ccd2c9f3-e971-40f5-8f2b-09f078268ec6 26s 26s With rg mentioned inside sc.yaml rg-acfnw6kdej3hyai(default resource group created on Nov2021),checked in backend got created in default rg-acfnw6kdej3hyai sc.yaml allowVolumeExpansion: true apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: mysc parameters: type: available volumeSizeAutoAvailable: "true" readOnly: "false" resourceGroupId: rg-acfnw6kdej3hyai provisioner: diskplugin.csi.alibabacloud.com reclaimPolicy: Delete volumeBindingMode: Immediate vssc.yaml apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshotClass metadata: name: alicloud-disk1 driver: diskplugin.csi.alibabacloud.com deletionPolicy: Delete parameters: resourceGroupId: rg-acfnw6kdej3hyai rohitpatil@ropatil-mac Downloads % oc get volumesnapshot -n testropatil NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME AGE mysnapshot true mypvc-csi 20Gi alicloud-disk1 snapcontent-af661dc6-8541-417f-ac08-ffefd0fdd9b3 39s 40s Without rg mentioned in sc.yaml/volumesnapshotclass.yaml or using default sc/vssc, snapshot got created in default rg-acfnw6kdej3hyai default sc.yaml - allowVolumeExpansion: true apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: annotations: storageclass.kubernetes.io/is-default-class: "true" name: alicloud-disk parameters: type: available volumeSizeAutoAvailable: "true" provisioner: diskplugin.csi.alibabacloud.com reclaimPolicy: Delete volumeBindingMode: WaitForFirstConsumer default vssc.yaml - apiVersion: snapshot.storage.k8s.io/v1 deletionPolicy: Delete driver: diskplugin.csi.alibabacloud.com kind: VolumeSnapshotClass metadata: annotations: snapshot.storage.kubernetes.io/is-default-class: "true" name: alicloud-disk parameters: resourceGroupId: "" rohitpatil@ropatil-mac Downloads % oc get volumesnapshot -n testropatil NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME AGE mysnapshot true mypvc-csi 20Gi alicloud-disk snapcontent-dca48a8b-eaea-4b62-8fe7-7dfa40448160 2m32s 2m32s
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7399