Bug 1570606

Summary: PVCProtection: Pod and PVC are in stuck status when there is a deployment config
Product: OpenShift Container Platform Reporter: Wenqi He <wehe>
Component: StorageAssignee: Hemant Kumar <hekumar>
Status: CLOSED NOTABUG QA Contact: Wenqi He <wehe>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.9.0CC: aos-bugs, aos-storage-staff, wmeng
Target Milestone: ---   
Target Release: 3.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-23 16:49:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Wenqi He 2018-04-23 10:28:25 UTC
Description of problem:
After enable PVCProtection feature-gate, create a new app like mongodb-persistent

Version-Release number of selected component (if applicable):
openshift v3.9.24
kubernetes v1.9.1+a0ce1bc657

How reproducible:
Always

Steps to Reproduce:
1. oc new-app mongodb-persistent
2. Wait until the first deployment done and pod is running
3. Delete the pvc 
oc delete pvc mongodb
4. Delete the pod
oc delete pods 

Actual results:
Since the PVCProtection is enabled, the pvc is in terminating status after the first deployed pod is deleted, the second pod is created by the second deployment, and both are in stuck status 

$ oc get pvc
NAME      STATUS        VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
mongodb   Terminating   pvc-6c97bf73-46be-11e8-aefb-000d3a11d3cd   1Gi        RWO            azddef         36m

$ oc delete pods mongodb-1-59cfn
pod "mongodb-1-59cfn" deleted

$ oc get pods
NAME              READY     STATUS              RESTARTS   AGE
mongodb-1-s7pdg   0/1       ContainerCreating   0          11m

$ oc describe pod mongodb-1-s7pdg
...
Events:
  Type     Reason                 Age               From                                Message
  ----     ------                 ----              ----                                -------
  Normal   Scheduled              11m               default-scheduler                   Successfully assigned mongodb-1-s7pdg to storage-master-etcd-nfs-1
  Normal   SuccessfulMountVolume  11m               kubelet, storage-master-etcd-nfs-1  MountVolume.SetUp succeeded for volume "default-token-mfvmd"
  Warning  FailedMount            38s (x5 over 9m)  kubelet, storage-master-etcd-nfs-1  Unable to mount volumes for pod "mongodb-1-s7pdg_wehe(54460aaf-46c4-11e8-aefb-000d3a11d3cd)": timeout expired waiting for volumes to attach/mount for pod "wehe"/"mongodb-1-s7pdg". list of unattached/unmounted volumes=[mongodb-data]


Expected results:
PVC could be deleted because the pod which is used this PVC has been deleted already.


Node Log (of failed PODs):
E0423 07:10:43.997974   14719 desired_state_of_world_populator.go:276] Error processing volume "mongodb-data" for pod "mongodb-1-s7pdg_wehe(54460aaf-46c4-11e8-aefb-000d3a11d3cd)": error processing PVC "wehe"/"mongodb": can't start pod because PVC wehe/mongodb is being deleted

Additional info:

Comment 1 Hemant Kumar 2018-04-23 16:49:00 UTC
Not sure if this is a bug. IMO - this is working as intended because we are intentionally creating race condition in code that protects the PVC.

Does scaling down the deployment rather than deleting the pod results in PVC deletion? I am pretty sure that will work without problem. I am closing this bug, but if scaling down deployment does not work please reopen this BZ.

There is no way to guarantee PVC deletion before new pod creation happens and hence the BZ should be considered an user error.