Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1948474

Summary: [azure disk csi operator] one of PV is stuck in “Released” status
Product: OpenShift Container Platform Reporter: Qin Ping <piqin>
Component: StorageAssignee: Fabio Bertinatto <fbertina>
Storage sub component: Operators QA Contact: Wei Duan <wduan>
Status: CLOSED WORKSFORME Docs Contact:
Severity: medium    
Priority: unspecified CC: aos-bugs, jsafrane
Version: 4.8   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1952931 (view as bug list) Environment:
Last Closed: 2021-06-02 07:10:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1952931    

Description Qin Ping 2021-04-12 09:24:27 UTC
Description of problem:
After running of csi verification test, one of PV is stuck in “Released” status

Version-Release number of selected component (if applicable):
4.8.0-0.nightly-2021-04-09-222447

How reproducible:
Hit one time, not try to reproduce it yet.

Steps to Reproduce:
1. Set Up an OCP4.8 cluster on Azure
2. Enable TechPreviewNoUpgrade featureset
3. Run csi verification test manually(with test image included in the payload image)
   # openshift-tests run openshift/csi
4. When the test is finished, check if there are some resources not cleaned

Actual results:
One of PV is stuck in “Release” status.
$ oc get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS     CLAIM                                         STORAGECLASS                      REASON   AGE
pvc-537e7872-6402-4923-b7d3-619873bf7d08   1Gi        RWO            Delete           Bound      e2e-multivolume-294/disk.csi.azure.comjcjj6   e2e-multivolume-294-e2e-scfw6h2            82m
pvc-5eb4ce13-34e2-493c-b348-91acd616dc17   1Gi        RWO            Delete           Released   e2e-volumemode-4624/disk.csi.azure.com7bkj5   e2e-volumemode-4624-e2e-sc88cnw            64m
pvc-8be0451d-e0d4-4823-a918-0e578aeb977d   1Gi        RWO            Delete           Bound      e2e-multivolume-294/disk.csi.azure.comtsmq8   e2e-multivolume-294-e2e-scf544k            82m

The PVC and the namespace includes the PVC were cleaned.
Events:
  Type     Reason              Age                  From                                                                               Message
  ----     ------              ----                 ----                                                                               -------
  Warning  VolumeFailedDelete  67m (x6 over 67m)    disk.csi.azure.com_piqin-0412-z8dh9-master-0_0909beaf-168b-4bc2-ad17-e13dd0213ed7  persistentvolume pvc-5eb4ce13-34e2-493c-b348-91acd616dc17 is still attached to node piqin-0412-z8dh9-worker-northcentralus-8tj9w
  Warning  VolumeFailedDelete  4m8s (x15 over 66m)  disk.csi.azure.com_piqin-0412-z8dh9-master-0_0909beaf-168b-4bc2-ad17-e13dd0213ed7  rpc error: code = Unknown desc = disk(/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/piqin-0412-z8dh9-rg/providers/Microsoft.Compute/disks/pvc-5eb4ce13-34e2-493c-b348-91acd616dc17) already attached to node(/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/piqin-0412-z8dh9-rg/providers/Microsoft.Compute/virtualMachines/piqin-0412-z8dh9-worker-northcentralus-8tj9w), could not be deleted



Expected results:
PV can be deleted after the test.

Additional info:
csi verification test result:
Storage Capabilities (guaranteed only on full CSI test suite with 0 fails)
==========================================================================
Driver short name:                         azuredisk
Driver name:                               disk.csi.azure.com
Storage class:                             
Supported OpenShift / CSI features:
  Persistent volumes:                      true
  Raw block mode:                          true
  FSGroup:                                 true
  Executable files on a volume:            true
  Volume snapshots:                        true
  Volume cloning:                          false
  Use volume from multiple pods on a node: true
  ReadWriteMany access mode:               false
  Volume expansion for controller:         true
  Volume expansion for node:               true
  Volume limits:                           false
  Volume can run on single node:           false
  Topology:                                true
Supported OpenShift Virtualization features:
  Raw block VM disks:                      true
  Live migration:                          false
  VM snapshots:                            true
  Storage-assisted cloning:                true

error: 20 fail, 39 pass, 128 skip (36m39s)

Comment 2 Fabio Bertinatto 2021-06-02 07:10:02 UTC
Followed the steps from the description, but couldn't reproduce:

==========================================================================
Driver short name:                         azuredisk
Driver name:                               disk.csi.azure.com
Storage class:
Supported OpenShift / CSI features:
  Persistent volumes:                      true
  Raw block mode:                          true
  FSGroup:                                 true
  Executable files on a volume:            true
  Volume snapshots:                        true
  Volume cloning:                          false
  Use volume from multiple pods on a node: true
  ReadWriteMany access mode:               false
  Volume expansion for controller:         false
  Volume expansion for node:               false
  Volume limits:                           false
  Volume can run on single node:           false
  Topology:                                true
Supported OpenShift Virtualization features:
  Raw block VM disks:                      true
  Live migration:                          false
  VM snapshots:                            true
  Storage-assisted cloning:                true

error: 25 fail, 35 pass, 129 skip (22m46s)

$ oc get pv
No resources found

-----

However, it's known that e2e tests leave PVs behind, so this isn't an issue specific to Azure. Closing in favor of bug 1959445.

*** This bug has been marked as a duplicate of bug 1959445 ***

Comment 3 Tomas Smetana 2021-06-11 10:53:37 UTC
I don't think this is a duplicate of the bug #1959445: that one is actually compose of two other failures: e2e local volume test and e2e CSI mock snapshot tests that didn't clean up the resources properly. This might a similar case but in some other test. Please feel free to reopen this one if such a something happens again, but it has to be really this: Azure CSI driver created disk stuck in "Released". Other types of leftover PVs would most probably had a different root cause.

I'm changing the reason for closing the bug since we were not able to reproduce it again yet, and to denote that it is most likely really a different issue than the bug #1959445.