Description of problem: When pv-recycler pod is deleted (pod eviction from restarted node), the PV is stuck in 'Released' state and is not recycled until next master restart. Version-Release number of selected component (if applicable): atomic-openshift-3.1.1.6-4.git.21.cd70c35.el7aos.x86_64 atomic-openshift-3.2.0.4-1.git.0.4717e23.el7.x86_64 How reproducible: Every time on clean environment. Steps to Reproduce: 1. Remove recycler image from all OSE nodes 2. Release PV 3. Delete the recycler pod (or restart node to cause pod eviction) Actual results: Recycler pod is deleted, PV in 'Released' state, recycling not attempted again Expected results: Recycler pod is recreated OR PV in 'Available' state OR recycling is attempted again (periodically) Additional info: Deleting the recycler image on all nodes is required, otherwise, the PV can be scrubbed before you get the chance to delete the pod.
I dont think I am able to reproduce it on latest kube, here are my steps, it seems to be working as expected. I think when you say recycler image, you mean the busybox image used by upstream kube for recycling. Here are details in my setup: 1. PV: kind: PersistentVolume apiVersion: v1 metadata: name: pv0001 labels: type: local spec: capacity: storage: 1.5Gi accessModes: - ReadWriteOnce persistentVolumeReclaimPolicy: Recycle hostPath: path: "/tmp/data01 2. PVC: kind: PersistentVolumeClaim apiVersion: v1 metadata: name: myclaim-1 spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi 3. Pod: kind: Pod apiVersion: v1 metadata: name: mypod labels: name: frontendhttp spec: containers: - name: myfrontend image: nginx ports: - containerPort: 80 name: "http-server" volumeMounts: - mountPath: "/usr/share/nginx/html" name: mypd volumes: - name: mypd persistentVolumeClaim: claimName: myclaim-1 After I created above PV, PVC and Pod, I made sure that the busybox image does not exist on the node and then deleted PVC. I notice that Pod is recreated and PV is in Available state.
I haven't checked if https://github.com/kubernetes/kubernetes/pull/23548 fixed this too. Or I am doing something wrong in my steps above.
I tried it with latest origin (a few times), here are the steps I did: 1. /tmp/data01/ has data: 354MB of orign repo 2. created pv 3. created pvc 4. created pod (Note: All files used for creating pv, pvc, and pod are the same as in the comment 1. and all are running fine now.) 5. made sure that origin-recyle image does not exist locally 6. deleted pvc and pod as follows: oc delete -f ~/data-json-yaml-files/claim-01.yaml; oc delete -f ~/data-json-yaml-files/pod-pvc.yaml (the fastest I could run these 2 commands). After the step 6, I notice that PV is in available state, and there is no pod though, and recylcer has cleared data (origin repo) in /tmp/data01. It seems to be working as expected and so I am leaving it here for time being unless I get any more feedback on this.
@tschloss do you have an environment we can use to reproduce this?
I have created a reproducer that is easy to use: Open a few terminals, in one run the following command (deletes the recycler pods as soon as they appear): while true; do oc get po -n openshift-infra | grep recycler | awk '{print $1}'| xargs --no-run-if-empty oc -n openshift-infra delete po '{}' --grace-period=0; done In another, watch the PVs: watch oc get pv In third, create an application with PV: oc new-app --template=postgresql-persistent (wait a moment to bind the PVC) oc delete svc,pvc,dc --all --grace-period=0 Now if I check the terminal with killer script, I see: pod "pv-recycler-nfs-4zbh2" deleted And in the one that watches PVs: vol02 1Gi RWO,RWX Released tschloss/postgresql 8d The PVC is deleted and unless I restart the master (which deploys the recycler again), the volume stays in 'Released' state. # oc version oc v3.2.0.4 kubernetes v1.2.0-origin-41-g91d3e75# rpm -qa | grep atomic-openshift atomic-openshift-master-3.2.0.4-1.git.0.4717e23.el7.x86_64 tuned-profiles-atomic-openshift-node-3.2.0.4-1.git.0.4717e23.el7.x86_64 atomic-openshift-sdn-ovs-3.2.0.4-1.git.0.4717e23.el7.x86_64 atomic-openshift-utils-3.0.59-1.git.0.917a1bf.el7.noarch atomic-openshift-3.2.0.4-1.git.0.4717e23.el7.x86_64 atomic-openshift-node-3.2.0.4-1.git.0.4717e23.el7.x86_64 atomic-openshift-clients-3.2.0.4-1.git.0.4717e23.el7.x86_64 This is the latest version I was able to get. The environment is running in our private environment, Send me a mail if you can't reproduce it and I'll give you access to our OSE instance.
*** This bug has been marked as a duplicate of bug 1310587 ***