Bug 1310587

Summary: PV recycle racing
Product: OpenShift Container Platform Reporter: Liang Xia <lxia>
Component: StorageAssignee: Sami Wagiaalla <swagiaal>
Status: CLOSED ERRATA QA Contact: Jianwei Hou <jhou>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.2.0CC: agoldste, aos-bugs, bchilds, jsafrane, mmcgrath, pep, swagiaal, tdawson, tschloss
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-12 16:29:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1267746, 1305417, 1339502    

Description Liang Xia 2016-02-22 10:12:11 UTC
Description of problem:
While a recycle PV's status are changing from Released to Available,
creating a new pvc will cause the PV stuck in Released status.

Version-Release number of selected component (if applicable):
openshift v3.1.1.904
kubernetes v1.2.0-alpha.7-703-gbc4550d
etcd 2.2.5

How reproducible:
Very often

Steps to Reproduce:
1.Create a PV with recycling policy.
2.Create a PVC.
3.Check the PV/PVC are bound.
4.Delete the PVC.
5.Check the PV is released status.
6.Create another PVC.
7.Check the PV and PVC status.

Actual results:
The PV keeps in released status, while
the PVC keeps in pending status.
$ oc get pv nfs --config=admin.kubeconfig ; echo ; oc get pvc nfsc
NAME      CAPACITY   ACCESSMODES   STATUS     CLAIM        REASON    AGE
nfs       5Gi        RWO           Released   lxiap/nfsc             2h

NAME      STATUS    VOLUME    CAPACITY   ACCESSMODES   AGE
nfsc      Pending                                      13m


Expected results:
The PV should become available, the bind to the PVC.

Additional info:
PV file can be found at https://github.com/openshift-qe/v3-testfiles/blob/master/persistent-volumes/nfs/nfs-recycle-rwo.json
PVC file can be found at https://github.com/openshift-qe/v3-testfiles/blob/master/persistent-volumes/nfs/claim-rwo.json

$ oc get pv nfs -o yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  creationTimestamp: 2016-02-22T05:31:18Z
  name: nfs
  resourceVersion: "20738"
  selfLink: /api/v1/persistentvolumes/nfs
  uid: 7fa675ac-d925-11e5-9cf1-fa163e544e12
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 5Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: nfsc
    namespace: lxiap
    resourceVersion: "15730"
    uid: 34df1ad4-d93b-11e5-9cf1-fa163e544e12
  nfs:
    path: /jhou
    server: 10.66.79.133
  persistentVolumeReclaimPolicy: Recycle
status:
  phase: Released

$ oc get pvc -o yaml
apiVersion: v1
items:
- apiVersion: v1
  kind: PersistentVolumeClaim
  metadata:
    creationTimestamp: 2016-02-22T08:07:19Z
    name: nfsc
    namespace: lxiap
    resourceVersion: "15767"
    selfLink: /api/v1/namespaces/lxiap/persistentvolumeclaims/nfsc
    uid: 4b0d5a5b-d93b-11e5-9cf1-fa163e544e12
  spec:
    accessModes:
    - ReadWriteOnce
    resources:
      requests:
        storage: 5Gi
  status:
    phase: Pending

Logs in /var/log/messages
Feb 22 13:31:47 openshift-138 atomic-openshift-master-controllers: I0222 00:31:47.801870       1 persistentvolume_claim_binder_controller.go:338] Synchronizing PersistentVolumeClaim[nfsc]
Feb 22 13:31:47 openshift-138 atomic-openshift-master-controllers: I0222 00:31:47.835563       1 persistentvolume_recycler_controller.go:158] Recycler: checking PersistentVolume[nfs]
Feb 22 13:31:47 openshift-138 atomic-openshift-master-controllers: I0222 00:31:47.843327       1 persistentvolume_recycler_controller.go:189] PersistentVolume[nfs] phase Available - skipping: The volume is not in 'Released' phase
Feb 22 13:31:47 openshift-138 atomic-openshift-master-controllers: I0222 00:31:47.862330       1 persistentvolume_claim_binder_controller.go:412] PersistentVolumeClaim[nfsc] is bound
Feb 22 13:31:47 openshift-138 atomic-openshift-master-controllers: I0222 00:31:47.862376       1 persistentvolume_claim_binder_controller.go:338] Synchronizing PersistentVolumeClaim[nfsc]
Feb 22 13:31:47 openshift-138 atomic-openshift-master-controllers: E0222 00:31:47.867998       1 persistentvolume_claim_binder_controller.go:162] PVClaimBinder could not update claim nfsc: Unexpected error saving claim status: persistentvolumeclaims "nfsc" cannot be updated: the object has been modified; please apply your changes to the latest version and try again

Comment 1 Mark Turansky 2016-02-25 13:52:42 UTC
Recreated but no resolution found yet.

Comment 2 Mark Turansky 2016-02-26 16:34:56 UTC
After refreshing to Origin HEAD and starting a fresh environment, I did not have any problems with the recycler using the Wordpress example in github.

I will test the specific version in this BZ.

Comment 3 Mark Turansky 2016-02-26 18:06:32 UTC
I verified recycling in Origin HEAD works.  

There was another issue around newly created claims binding to existing volumes, when they should be released.  I believe this BZ is a symptom of that former problem.

Origin HEAD has the fix:

https://github.com/openshift/origin/blob/master/Godeps/_workspace/src/k8s.io/kubernetes/pkg/controller/persistentvolume/types.go#L120

v3.1.1.x does not:

https://github.com/openshift/origin/blob/cffae0523cfa80ddf917aba69f08508b91f603d5/Godeps/_workspace/src/k8s.io/kubernetes/pkg/controller/persistentvolume/types.go#L120


The latest rebase pulled in the fix from upstream.

Comment 4 Bradley Childs 2016-03-02 23:18:57 UTC
From Mark this will be fixed from latest rebase

Comment 5 Jianwei Hou 2016-03-09 07:15:58 UTC
Tested on latest build with same PV and PVC templates:
openshift v3.1.1.911
kubernetes v1.2.0-alpha.7-703-gbc4550d
etcd 2.2.5

The issue is still reproducible. 

# oc get pv 
NAME      LABELS    CAPACITY   ACCESSMODES   STATUS     CLAIM       REASON    AGE
nfs       <none>    5Gi        RWO           Released   jhou/nfsc             50m

# oc get pvc
NAME      LABELS    STATUS    VOLUME    CAPACITY   ACCESSMODES   AGE
nfsc      <none>    Pending                                      50m

Comment 6 Sami Wagiaalla 2016-03-16 18:53:04 UTC
Reproduced.. working on a fix.

Comment 7 Sami Wagiaalla 2016-03-17 15:14:30 UTC
Fix posted upstream here: https://github.com/kubernetes/kubernetes/pull/23078
will open an origin PR

Comment 8 Sami Wagiaalla 2016-03-17 17:34:49 UTC
Origin PR: https://github.com/openshift/origin/pull/8100

Comment 9 Andy Goldstein 2016-03-29 14:57:57 UTC
2nd upstream PR: https://github.com/kubernetes/kubernetes/pull/23548. Will create new origin PR once this merges.

Comment 10 Avesh Agarwal 2016-04-04 17:10:32 UTC
*** Bug 1323613 has been marked as a duplicate of this bug. ***

Comment 11 Liang Xia 2016-04-11 06:42:29 UTC
We are blocked by bug https://bugzilla.redhat.com/show_bug.cgi?id=1324418 since PV failed to recycle.

Once that bug is fixed/verified, we will re-check this one.

Comment 12 Liang Xia 2016-04-15 07:54:52 UTC
Following the same steps as in #comment 0, the PV and PVC finally got bound. Move bug to verified.

Comment 13 Liang Xia 2016-04-15 07:55:43 UTC
Verified on version,
openshift v3.2.0.15
kubernetes v1.2.0-36-g4a3f9c5
etcd 2.2.5

Comment 15 errata-xmlrpc 2016-05-12 16:29:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:1064

Comment 16 Sami Wagiaalla 2016-05-16 19:36:41 UTC
*** Bug 1323596 has been marked as a duplicate of this bug. ***