Bug 2114009 - [4.12 Alicloud Snapshot] taking more time(4min+) to make snapshot content with ready status and (volume/snapshot content) getting created in default Resource group id
Summary: [4.12 Alicloud Snapshot] taking more time(4min+) to make snapshot content wit...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.12
Hardware: All
OS: All
unspecified
high
Target Milestone: ---
: 4.12.0
Assignee: Jan Safranek
QA Contact: Rohit Patil
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-08-02 14:41 UTC by Rohit Patil
Modified: 2023-01-17 19:54 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-01-17 19:54:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift alibaba-disk-csi-driver-operator pull 35 0 None Merged Bug 2114009: Add VolumeSnapshotClassController 2022-08-10 11:23:16 UTC
Red Hat Product Errata RHSA-2022:7399 0 None None None 2023-01-17 19:54:22 UTC

Description Rohit Patil 2022-08-02 14:41:52 UTC
Created attachment 1902877 [details]
csi-snapshotter log

Description of problem:
Snapshot content is taking 4min+ time to get to ready status 

Version-Release number of selected component (if applicable): 
4.12.0-0.nightly-2022-08-01-151317

How reproducible: Always 

Steps to Reproduce:
1. Create a Alicloud cluster with below mentioned details. 
   template: aos-4_12/ipi-on-alicloud/versioned-installer-fips-ovn-ci
2. Create pvc, pod. Wait till the pod reaches to ready state. 
3. Create volumesnapshot from default volumesnapshot class(alicloud-disk)
4. Check for volumesnapshot content to reach to Ready status. 

Actual results:
It is in false state for more than 4min+ and displaying the below message within 4min of time before it reaches to true stage 
"Failed to check and update snapshot content: failed to take snapshot of the volume d-0xi7rswngantuq1ux2vm: "rpc error: code = Unknown desc = CreateSnapshot: snapshot create request limit snapshot-072ac6a4-5a78-475f-947c-9dd00033189c""

Expected results:
It should not take this much time to reach to true stage as compared with other platforms  

Additional info:
pvc_pod.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mypvc
  namespace: testropatil
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 30Gi
  storageClassName: alicloud-disk
  volumeMode: Filesystem
---
apiVersion: v1
kind: Pod
metadata:
  name: mypod
  namespace: testropatil
spec:
  containers:
  - image: quay.io/openshifttest/hello-openshift@sha256:b1aabe8c8272f750ce757b6c4263a2712796297511e0c6df79144ee188933623
    name: mypod
    volumeMounts:
    - mountPath: "/mnt/storage"
      name: data
  volumes:
  - name: data
    persistentVolumeClaim:
      claimName: mypvc

rohitpatil@ropatil-mac Downloads % oc get pvc,pod -n testropatil -o wide
NAME                          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS    AGE     VOLUMEMODE
persistentvolumeclaim/mypvc   Bound    pvc-804892c5-7749-4b54-b325-b83426067cab   30Gi       RWO            alicloud-disk   7m22s   Filesystem

NAME        READY   STATUS    RESTARTS   AGE     IP            NODE                                          NOMINATED NODE   READINESS GATES
pod/mypod   1/1     Running   0          7m22s   10.128.2.34   ropatil28-ali-vbhb5-worker-us-east-1b-sm4tc   <none>           <none>

vss.yaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: my-snapshot
  namespace: testropatil
spec:
  source:
    persistentVolumeClaimName: mypvc
  volumeSnapshotClassName: alicloud-disk

rohitpatil@ropatil-mac Downloads % oc get volumesnapshot -n testropatil
NAME          READYTOUSE   SOURCEPVC   SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS   SNAPSHOTCONTENT                                    CREATIONTIME   AGE
my-snapshot   false        mypvc                               30Gi          alicloud-disk   snapcontent-072ac6a4-5a78-475f-947c-9dd00033189c   14s            15s

rohitpatil@ropatil-mac Downloads % oc describe volumesnapshot -n testropatil
Name:         my-snapshot
Namespace:    testropatil
Labels:       <none>
Annotations:  <none>
API Version:  snapshot.storage.k8s.io/v1
Kind:         VolumeSnapshot
    Manager:      Go-http-client
    Operation:    Update
    Time:         2022-08-02T11:58:33Z
    API Version:  snapshot.storage.k8s.io/v1
    Fields Type:  FieldsV1
          f:time:
        f:readyToUse:
        f:restoreSize:
    Manager:         Go-http-client
    Operation:       Update
    Subresource:     status
    Time:            2022-08-02T11:58:38Z
Spec:
  Source:
    Persistent Volume Claim Name:  mypvc
  Volume Snapshot Class Name:      alicloud-disk
Status:
  Bound Volume Snapshot Content Name:  snapcontent-072ac6a4-5a78-475f-947c-9dd00033189c
  Creation Time:                       2022-08-02T11:58:34Z
  Error:
    Message:     Failed to check and update snapshot content: failed to take snapshot of the volume d-0xi7rswngantuq1ux2vm: "rpc error: code = Unknown desc = CreateSnapshot: snapshot create request limit snapshot-072ac6a4-5a78-475f-947c-9dd00033189c"
    Time:        2022-08-02T11:58:38Z
  Ready To Use:  false
  Restore Size:  30Gi
Events:
  Type    Reason            Age   From                 Message
  ----    ------            ----  ----                 -------
  Normal  CreatingSnapshot  33s   snapshot-controller  Waiting for a snapshot testropatil/my-snapshot to be created by the CSI driver.
  Normal  SnapshotCreated   28s   snapshot-controller  Snapshot testropatil/my-snapshot was successfully created by the CSI driver.

rohitpatil@ropatil-mac Downloads % oc get volumesnapshot -n testropatil
NAME          READYTOUSE   SOURCEPVC   SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS   SNAPSHOTCONTENT                                    CREATIONTIME   AGE
my-snapshot   true         mypvc                               30Gi          alicloud-disk   snapcontent-072ac6a4-5a78-475f-947c-9dd00033189c   7m22s          7m23s

// It almost took 4min+ time to reach to snapshotready status. 
rohitpatil@ropatil-mac Downloads % oc describe volumesnapshot -n testropatil
Events:
  Type    Reason            Age    From                 Message
  ----    ------            ----   ----                 -------
  Normal  CreatingSnapshot  7m38s  snapshot-controller  Waiting for a snapshot testropatil/my-snapshot to be created by the CSI driver.
  Normal  SnapshotCreated   7m33s  snapshot-controller  Snapshot testropatil/my-snapshot was successfully created by the CSI driver.
  Normal  SnapshotReady     3m38s  snapshot-controller  Snapshot testropatil/my-snapshot is ready to use.

Attaching the csi-snapshotter log and must gather logs.

Comment 1 Rohit Patil 2022-08-02 14:48:29 UTC
Note: Snapshot is still getting created inside default resource group instead of our newly created resource group.

Comment 2 Rohit Patil 2022-08-03 05:47:16 UTC
Flexy template: aos-4_12/ipi-on-alicloud/versioned-installer-fips-ovn-ci
Cluster auto created new rgid value: rg-aekzbska3jozjzy 
Job link: https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/126891/parameters/ 

Default resource group id: rg-acfnw6kdej3hyai created on Nov11 2021

1. Create cluster with above mentioned details, without mentioning resource_group_id in launcher variables. 
2. Create pvc, dep, wait till dep goes to running state 
3. Create vss from default volumesnapshotclass. 
4. Check volumesnapshot content got generated in default resource group(rg-acfnw6kdej3hyai) and not in newly generated rg-id(rg-aekzbska3jozjzy)

Expected result: 
It should have crated volumesnapshot inside newly generated rgid(rg-aekzbska3jozjzy)

Comment 3 Jan Safranek 2022-08-05 14:09:26 UTC
I don't think we can fix speed of the CSI driver, but the snapshots should be created in the right resource group.

Comment 4 Jan Safranek 2022-08-09 10:52:53 UTC
Alibaba CSI driver accepts parameter `resourceGroupId` in VolumeSnapshotClass, We need to fix the operator to set it.

Comment 6 Rohit Patil 2022-08-10 13:46:52 UTC
Verified: PASS 
Payload: 4.12.0-0.nightly-2022-08-09-223806  

Scenarios executed: 
1) With rg mentioned inside sc.yaml/vssc.yaml rg-aekz744torld7py(cluster created resource group), check in backend got created in rg-aekz744torld7py
2) With rg mentioned inside sc.yaml/vssc.yaml rg-acfnw6kdej3hyai(default resource group created on Nov2021), check in backend got created in default rg-acfnw6kdej3hyai
3) Without rg mentioned in sc.yaml/volumesnapshotclass.yaml or using default sc/vssc, check in backend got created in default rg-acfnw6kdej3hyai

With rg mentioned inside sc.yaml rg-aekz744torld7py(cluster created resource group), checked in backend got created in cluster rg-aekz744torld7py
sc.yaml 
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: mysc
parameters:
  type: available
  volumeSizeAutoAvailable: "true"
  readOnly: "false"
  resourceGroupId: rg-aekz744torld7py
provisioner: diskplugin.csi.alibabacloud.com
reclaimPolicy: Delete
volumeBindingMode: Immediate

vssc.yaml 
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: alicloud-disk1
driver: diskplugin.csi.alibabacloud.com
deletionPolicy: Delete
parameters:
  resourceGroupId: rg-aekz744torld7py

rohitpatil@ropatil-mac Downloads % oc get volumesnapshot -n testropatil
NAME         READYTOUSE   SOURCEPVC   SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS    SNAPSHOTCONTENT                                    CREATIONTIME   AGE
mysnapshot   true         mypvc-csi                           20Gi          alicloud-disk1   snapcontent-ccd2c9f3-e971-40f5-8f2b-09f078268ec6   26s            26s


With rg mentioned inside sc.yaml rg-acfnw6kdej3hyai(default resource group created on Nov2021),checked in backend got created in default rg-acfnw6kdej3hyai
sc.yaml 
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: mysc
parameters:
  type: available
  volumeSizeAutoAvailable: "true"
  readOnly: "false"
  resourceGroupId: rg-acfnw6kdej3hyai
provisioner: diskplugin.csi.alibabacloud.com
reclaimPolicy: Delete
volumeBindingMode: Immediate

vssc.yaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: alicloud-disk1
driver: diskplugin.csi.alibabacloud.com
deletionPolicy: Delete
parameters:
  resourceGroupId: rg-acfnw6kdej3hyai

rohitpatil@ropatil-mac Downloads % oc get volumesnapshot -n testropatil
NAME         READYTOUSE   SOURCEPVC   SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS    SNAPSHOTCONTENT                                    CREATIONTIME   AGE
mysnapshot   true         mypvc-csi                           20Gi          alicloud-disk1   snapcontent-af661dc6-8541-417f-ac08-ffefd0fdd9b3   39s            40s


Without rg mentioned in sc.yaml/volumesnapshotclass.yaml or using default sc/vssc, snapshot got created in default rg-acfnw6kdej3hyai
default sc.yaml 
- allowVolumeExpansion: true
  apiVersion: storage.k8s.io/v1
  kind: StorageClass
  metadata:
    annotations:
      storageclass.kubernetes.io/is-default-class: "true"
    name: alicloud-disk
  parameters:
    type: available
    volumeSizeAutoAvailable: "true"
  provisioner: diskplugin.csi.alibabacloud.com
  reclaimPolicy: Delete
  volumeBindingMode: WaitForFirstConsumer

default vssc.yaml 
- apiVersion: snapshot.storage.k8s.io/v1
  deletionPolicy: Delete
  driver: diskplugin.csi.alibabacloud.com
  kind: VolumeSnapshotClass
  metadata:
    annotations:
      snapshot.storage.kubernetes.io/is-default-class: "true"
    name: alicloud-disk
  parameters:
    resourceGroupId: ""

rohitpatil@ropatil-mac Downloads % oc get volumesnapshot -n testropatil
NAME         READYTOUSE   SOURCEPVC   SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS   SNAPSHOTCONTENT                                    CREATIONTIME   AGE
mysnapshot   true         mypvc-csi                           20Gi          alicloud-disk   snapcontent-dca48a8b-eaea-4b62-8fe7-7dfa40448160   2m32s          2m32s

Comment 9 errata-xmlrpc 2023-01-17 19:54:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399


Note You need to log in before you can comment on or make changes to this bug.