Description of problem: Version-Release number of selected component (if applicable): openshift v3.3.1.3 kubernetes v1.3.0+52492b4 etcd 2.3.0+git How reproducible: 1. Create PV in OS namespace. 2. Create new project 3. Create new MYSQL persistent instances 4. Delete project. Steps to Reproduce: 1. Create some NFS persistent volumes on NFS server and in OS (using Recycle). (at least 8) 2. Create new project, add new MYSQL persistent instances (8 in my case) 3. The pods with MYSQL will be deployed and get assigned to a persistent volume using a pvc. 4. Once all pods are up and ready, delete the project, either using web interface or oc delete. 5. Watch PV status => oc get pv -w 6. Some PV get status "AVAILABLE" before the recycler complete the data scrubbing + sometimes the recycler pod didnt start and flag the volume as AVAILABLE while data are still present on the NFS volume. Actual results: Some volumes get correctly scrubbed. Some volumes are not at all scrubbed and their status is changed from Bound to Recycle to Available. The oc get po -w in openshift-infra namespace list strange pod creation when failing to recycle : recycler-for-volume1 0/1 Completed recycler-for-volume1 0/1 Terminating recycler-for-volume1 0/1 Terminating in oc get pv -w: volume1 10Gi RWO,RWX Released projecta/dbmysql volume1 10Gi RWO,RWX Released volume1 10Gi RWO,RWX Available => NFS share still has folder and data. Expected results: A volume cannot be in status Available before the recycler successfully delete all data on the NFS volume. Additional info: There is no permission issue related here as some volume the data scrubbing is correctly done, based on my observation the issues are related to the recycler pod creation and the PV status update. I feel like the recycler pod creation is not always started correctly like "Pending" -> "Container Creating" -> "Running" -> "Completed" -> "Terminating" For some reason sometimes the creation is no done and we have only Completed + terminating in the logs or simply no recycler creation.
I'm suspecting also the fact that the persistent volumes are mounted while the pods are currently killed. I have some files with .nfsXXXXXX on the share which means : "The .nfs files are created when an open file is removed but is still being accessed" So i guess when deleting a project, first all the pods should be stopped => PV correctly unmounted, then PVC deleted, then the recycler can start to recyle the NFS volumes. EDIT: Still doing theses steps manually : 1. oc delete dc --all 2. oc delete pvc --all 3. oc delete project problem persist
Created attachment 1218130 [details] master logs deleting project with 4 mysql persistent the volume tv203-volume is NOT cleaned correctly the volume tv202-volume IS cleaned correctly the only difference i see is that the recycler pod for volume202 is "CREATED" Pod recycler-for-tv202-volume created: kind:"" as the volume203 the recycler pod is : Pod recycler-for-tv203-volume updated I dont see the creation of the recycler pod...
Can you please attach: - node logs from longer period of time (say, several minutes) *and* not shortened (e.g. lines at 11:13:11.033138 have elipsis "..." and I don't see what really happens to the pod) - output of oc get event - output of oc describe pod (in the namespace where the recycler runs) - please also wait at least 5 minutes for the recycler - it may fail for the first time, however it should retry in couple of minutes
Created attachment 1218470 [details] volume failing to recycle long logs
Created attachment 1218471 [details] oc get events result of oc get events
Hello, I created new projet in OS named "ya-test-to" and deployed 5 mysql (rhscl/mysql-56-rhel7) using persistent volume. Once they all deployed, delete the project. The volume tv202 didnt get cleaned tv200-201-203-204 were correctly cleaned. Every 1.0s: du -d 1 -ah ./tv200 ./tv201 ./tv202 ./tv203 ./tv204 0 ./tv200 0 ./tv201 1.7M ./tv202/mysql 12M ./tv202/ibdata1 76M ./tv202/ib_logfile1 76M ./tv202/ib_logfile0 636K ./tv202/performance_schema 4.0K ./tv202/auto.cnf 4.0K ./tv202/sampledb 167M ./tv202 0 ./tv203 0 ./tv204 The only thing i noticed is that in the openshift-infra namespace where the recycler pods are running that before the delete project action, that there were still a pod not ready with status completed : recycler-for-tv202-volume 0/1 Completed 0 4m - node logs from longer period of time See new attachement - output of oc get event See attachement => events.txt - output of oc describe pod (in the namespace where the recycler runs) oc get po -w NAME READY STATUS RESTARTS AGE hawkular-cassandra-1-gp133 1/1 Running 0 6d hawkular-metrics-onkqd 1/1 Running 4 28d heapster-yddsn 1/1 Running 0 6d recycler-for-tv202-volume 0/1 Completed 0 4m NAME READY STATUS RESTARTS AGE recycler-for-tv200-volume 0/1 Pending 0 0s recycler-for-tv203-volume 0/1 Pending 0 0s recycler-for-tv200-volume 0/1 Pending 0 0s recycler-for-tv203-volume 0/1 Pending 0 0s recycler-for-tv200-volume 0/1 ContainerCreating 0 0s recycler-for-tv202-volume 0/1 Terminating 0 7m recycler-for-tv204-volume 0/1 Pending 0 1s recycler-for-tv202-volume 0/1 Terminating 0 7m recycler-for-tv201-volume 0/1 Pending 0 0s recycler-for-tv204-volume 0/1 Pending 0 1s recycler-for-tv203-volume 0/1 ContainerCreating 0 1s recycler-for-tv201-volume 0/1 Pending 0 0s recycler-for-tv204-volume 0/1 ContainerCreating 0 2s recycler-for-tv201-volume 0/1 ContainerCreating 0 1s recycler-for-tv204-volume 1/1 Running 0 23s recycler-for-tv200-volume 0/1 Completed 0 26s recycler-for-tv201-volume 0/1 Completed 0 25s recycler-for-tv201-volume 0/1 Terminating 0 25s recycler-for-tv201-volume 0/1 Terminating 0 26s recycler-for-tv204-volume 0/1 Completed 0 28s recycler-for-tv204-volume 0/1 Terminating 0 28s recycler-for-tv204-volume 0/1 Terminating 0 28s recycler-for-tv203-volume 1/1 Running 0 28s recycler-for-tv203-volume 0/1 Completed 0 32s recycler-for-tv203-volume 0/1 Terminating 0 32s recycler-for-tv203-volume 0/1 Terminating 0 32s
Thanks for the logs! With a quick check, everything looks like it _should_ work, the only difference in volume tv202 is that its recycler got scheduled to node qa-paas-master.qa-ch7.dev instead of 10.2.64.151, where the test of recyclers were successfully running. It seems to me that "qa-paas-master.qa-ch7.dev" actually *is* 10.2.64.151, isn't it? Why it's used as "qa-paas-master" once??? In the logs, I can see the recycler for tv202 pod got "Succeeded" without any error, however it seems it did not run at all. Your node logs start at Mon 2016-11-07 20:50:40, when the weird pod for tv202 is already created, do you have earlier logs? It looks like couple of minutes should be fine... I want to see "recycleVolumeOperation [tv202-volume] started" and subsequent "About to try and schedule pod recycler-for-tv202-volume".
Well what i noticed is that the volume that aren't cleaned are the one having a popd in the namespace openshift infra with status Completed. The volume was cleaned but for unknow reason the cleaner po is still there... : Ex tv203: oc get po -w NAME READY STATUS RESTARTS AGE hawkular-cassandra-1-gp133 1/1 Running 0 6d hawkular-metrics-onkqd 1/1 Running 4 29d heapster-yddsn 1/1 Running 0 6d recycler-for-tv203-volume 0/1 Completed 0 6m If i delete these po, deploy new project with volume claim, and delete the project, i get all the volumes cleaned. Now the issue is if i deploy new project and delete it again, look at the recycler pod 203 oc get po -w NAME READY STATUS RESTARTS AGE recycler-for-tv201-volume 0/1 Pending 0 0s recycler-for-tv204-volume 0/1 Pending 0 0s recycler-for-tv201-volume 0/1 Pending 0 0s recycler-for-tv204-volume 0/1 Pending 0 0s recycler-for-tv200-volume 0/1 Pending 0 0s recycler-for-tv201-volume 0/1 ContainerCreating 0 0s recycler-for-tv200-volume 0/1 Pending 0 0s recycler-for-tv204-volume 0/1 ContainerCreating 0 0s recycler-for-tv202-volume 0/1 Pending 0 0s recycler-for-tv200-volume 0/1 ContainerCreating 0 0s recycler-for-tv202-volume 0/1 Pending 0 0s recycler-for-tv202-volume 0/1 ContainerCreating 0 1s recycler-for-tv201-volume 1/1 Running 0 11s recycler-for-tv200-volume 1/1 Running 0 12s recycler-for-tv200-volume 0/1 Completed 0 17s recycler-for-tv201-volume 0/1 Completed 0 17s recycler-for-tv200-volume 0/1 Terminating 0 17s recycler-for-tv200-volume 0/1 Terminating 0 17s recycler-for-tv202-volume 1/1 Running 0 21s recycler-for-tv204-volume 1/1 Running 0 23s recycler-for-tv202-volume 0/1 Completed 0 25s recycler-for-tv202-volume 0/1 Terminating 0 25s recycler-for-tv202-volume 0/1 Terminating 0 25s recycler-for-tv204-volume 0/1 Completed 0 27s recycler-for-tv204-volume 0/1 Terminating 0 27s recycler-for-tv204-volume 0/1 Terminating 0 27s See nothing... doing again oc get po, the recycler still Completed and still not ready. oc get po NAME READY STATUS RESTARTS AGE hawkular-cassandra-1-gp133 1/1 Running 0 6d hawkular-metrics-onkqd 1/1 Running 4 29d heapster-yddsn 1/1 Running 0 6d recycler-for-tv201-volume 0/1 Completed 0 48s recycler-for-tv203-volume 0/1 Completed 0 7m Looking at the PV... oc get pv -w NAME CAPACITY ACCESSMODES STATUS CLAIM REASON AGE tv200-volume 10Gi RWO,RWX Bound ya-test2/mysql4 6d tv201-volume 10Gi RWO,RWX Bound ya-test2/mysql1 6d tv202-volume 10Gi RWO,RWX Bound ya-test2/mysql5 5d tv203-volume 10Gi RWO,RWX Bound ya-test2/mysql2 6d tv204-volume 10Gi RWO,RWX Bound ya-test2/mysql3 6d Now delete the project NAME CAPACITY ACCESSMODES STATUS CLAIM REASON AGE tv201-volume 10Gi RWO,RWX Released ya-test2/mysql1 6d tv203-volume 10Gi RWO,RWX Released ya-test2/mysql2 6d tv204-volume 10Gi RWO,RWX Released ya-test2/mysql3 6d tv200-volume 10Gi RWO,RWX Released ya-test2/mysql4 6d tv202-volume 10Gi RWO,RWX Released ya-test2/mysql5 5d tv200-volume 10Gi RWO,RWX Released 6d tv200-volume 10Gi RWO,RWX Available 6d tv202-volume 10Gi RWO,RWX Released 5d tv202-volume 10Gi RWO,RWX Available 5d tv204-volume 10Gi RWO,RWX Released 6d tv201-volume 10Gi RWO,RWX Released 6d tv204-volume 10Gi RWO,RWX Available 6d tv201-volume 10Gi RWO,RWX Available 6d tv203-volume 10Gi RWO,RWX Released 6d tv203-volume 10Gi RWO,RWX Available 6d the volume 203 get status AVAILABLE but the data are still present on the NFS share. If i do a oc logs on the 203 recycler pod oc logs recycler-for-tv203-volume Scrub ok The oc describe po recycler-for-tv203-volume oc describe po recycler-for-tv203-volume Name: recycler-for-tv203-volume Namespace: openshift-infra Node: 10.2.64.151/10.2.64.151 Start Time: Tue, 08 Nov 2016 20:44:00 +0100 Labels: <none> Status: Succeeded IP: 10.201.0.2 Controllers: <none> Containers: pv-recycler: Container ID: docker://97d05dc4e6517b4981ab2436315f531bd0761409d456d072fc903c1ba0ca7c69 Image: openshift3/ose-recycler:v3.3.0.32 Image ID: docker://sha256:08aa1eb54fafb89c3ec673e931f9f172fea2880fba5133576faa554649dff3fb Port: Command: /usr/bin/openshift-recycle Args: /scrub QoS Tier: memory: BestEffort cpu: BestEffort State: Terminated Reason: Completed Exit Code: 0 Started: Tue, 08 Nov 2016 20:44:15 +0100 Finished: Tue, 08 Nov 2016 20:44:25 +0100 Ready: False Restart Count: 0 Environment Variables: Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: vol: Type: NFS (an NFS mount that lasts the lifetime of a pod) Server: paas-nfs.qa-ch7.dev Path: /exports/tv203 ReadOnly: false pv-recycler-controller-token-gp1i3: Type: Secret (a volume populated by a Secret) SecretName: pv-recycler-controller-token-gp1i3 Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 16m 16m 1 {default-scheduler } Normal Scheduled Successfully assigned recycler-for-tv203-volume to 10.2.64.151 16m 16m 1 {kubelet 10.2.64.151} spec.containers{pv-recycler} Normal Pulled Container image "openshift3/ose-recycler:v3.3.0.32" already present on machine 16m 16m 1 {kubelet 10.2.64.151} spec.containers{pv-recycler} Normal Created Created container with docker id 97d05dc4e651 16m 16m 1 {kubelet 10.2.64.151} spec.containers{pv-recycler} Normal Started Started container with docker id 97d05dc4e651 Seems ok... I ll keep testing but so far the issue comes if there is already a previous recycler pod present in the openshift-infra namespace. Now the question is why there is randomly one remaining there... which has logs "Scrub ok" and the next time it get invoked to clean the volume it answer : Scrub done (which is not the case) and then the Volume get back in status Available.
Hello, so i got more information... Doing a stupid cron job running every minute on the master... looking for pod called "recycler-for" Not Ready 0/1 and Status Completed and simply deleting these pods... for i in $( oc get po -n openshift-infra | grep -o '\(recycler\S*\)\(.*0\/1.*Completed\)' | awk '{ print $1}' ); do echo "Deleting recycler pod : $i" echo "" oc describe po $i -n openshift-infra echo "" oc delete po $i -n openshift-infra done Every time a volume claim get released OS creates a NEW recycler pod instead updating the one which is in not Ready/Completed status. Since i did this trick... seems no more issue with data scrubbing...
So, there are several things here: 1) When you delete a namespace with a PVC and a mysql pod, the PVC is deleted earlier that the pod and for some time the mysql pod and the recycler run in parallel and the mysql pod may prevent some files from being deleted. 2) The recycler pod in OpenShift Enterprise does not check that there are no files on the volume when it finishes. This way, it should detect that there are still some mysql files somewhere and retry recycling again after a while. 3) From some reason, some recycle pods are not deleted by Kubernetes when they complete. It's totally unpredictable, I've reproduced it few times: 1. create 10 NFS PVs, each pointing to a separate directory 2. create 10 claims 3. create 10 pods, each using one claim, creating bunch of files and directories in an endless loop 4. delete the pods and claims at the same time. Pods are slowly being while they're still creating some files and directories and while recycler pods are running. 5. some recycle pods hang in "0/1 Completed" state and are not deleted -> bug! I'm investigating 3), I'll fill separate bugs for 1) and 2).
Patch for 3): https://github.com/kubernetes/kubernetes/pull/36827
Origin pull request: https://github.com/openshift/origin/pull/12041
The code went to Origin 1.5 with the last Kubernetes rebase.
This has been merged into ocp and is in OCP v3.5.0.7 or newer. If you are going to backport this, please clone this bugzilla and attach the pull request for the backport to it.
Cloned for 3.4.x as #1415624
Tried on below version with 10 PVs for several times, pv becomes available and data are cleaned. openshift v3.5.0.7+390ef18 kubernetes v1.5.2+43a9be4 etcd 3.1.0-rc.0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0884