Bug 1392338
| Summary: | Persistent Volumes on NFS arent all getting recycled | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | yannick <yannick.bongard> | ||||||||
| Component: | Storage | Assignee: | Jan Safranek <jsafrane> | ||||||||
| Status: | CLOSED ERRATA | QA Contact: | Liang Xia <lxia> | ||||||||
| Severity: | urgent | Docs Contact: | |||||||||
| Priority: | unspecified | ||||||||||
| Version: | 3.3.1 | CC: | aos-bugs, bchilds, eparis, erich, jsafrane, tdawson, vwalek, yannick.bongard | ||||||||
| Target Milestone: | --- | Keywords: | Performance | ||||||||
| Target Release: | --- | ||||||||||
| Hardware: | Unspecified | ||||||||||
| OS: | Unspecified | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||
| Doc Text: |
Cause:
OpenShift contained a race condition in NFS recycler hanling.
Consequence:
When recycler pods for multiple NFS shares started at the same time, some of these pods were not started and corresponding NFS share is not recycled.
Fix:
Fix the race condition.
Result:
All scheduled NFS recycler pods are started and NFS shares are recycled.
|
Story Points: | --- | ||||||||
| Clone Of: | |||||||||||
| : | 1415624 (view as bug list) | Environment: | |||||||||
| Last Closed: | 2017-04-12 19:16:05 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Bug Depends On: | 1395271, 1395276 | ||||||||||
| Bug Blocks: | 1415624 | ||||||||||
| Attachments: |
|
||||||||||
|
Description
yannick
2016-11-07 09:11:15 UTC
I'm suspecting also the fact that the persistent volumes are mounted while the pods are currently killed. I have some files with .nfsXXXXXX on the share which means : "The .nfs files are created when an open file is removed but is still being accessed" So i guess when deleting a project, first all the pods should be stopped => PV correctly unmounted, then PVC deleted, then the recycler can start to recyle the NFS volumes. EDIT: Still doing theses steps manually : 1. oc delete dc --all 2. oc delete pvc --all 3. oc delete project problem persist Created attachment 1218130 [details]
master logs deleting project with 4 mysql persistent
the volume tv203-volume is NOT cleaned correctly
the volume tv202-volume IS cleaned correctly
the only difference i see is that the recycler pod for volume202 is "CREATED" Pod recycler-for-tv202-volume created: kind:""
as the volume203 the recycler pod is :
Pod recycler-for-tv203-volume updated
I dont see the creation of the recycler pod...
Can you please attach: - node logs from longer period of time (say, several minutes) *and* not shortened (e.g. lines at 11:13:11.033138 have elipsis "..." and I don't see what really happens to the pod) - output of oc get event - output of oc describe pod (in the namespace where the recycler runs) - please also wait at least 5 minutes for the recycler - it may fail for the first time, however it should retry in couple of minutes Created attachment 1218470 [details]
volume failing to recycle long logs
Created attachment 1218471 [details]
oc get events
result of oc get events
Hello, I created new projet in OS named "ya-test-to" and deployed 5 mysql (rhscl/mysql-56-rhel7) using persistent volume. Once they all deployed, delete the project. The volume tv202 didnt get cleaned tv200-201-203-204 were correctly cleaned. Every 1.0s: du -d 1 -ah ./tv200 ./tv201 ./tv202 ./tv203 ./tv204 0 ./tv200 0 ./tv201 1.7M ./tv202/mysql 12M ./tv202/ibdata1 76M ./tv202/ib_logfile1 76M ./tv202/ib_logfile0 636K ./tv202/performance_schema 4.0K ./tv202/auto.cnf 4.0K ./tv202/sampledb 167M ./tv202 0 ./tv203 0 ./tv204 The only thing i noticed is that in the openshift-infra namespace where the recycler pods are running that before the delete project action, that there were still a pod not ready with status completed : recycler-for-tv202-volume 0/1 Completed 0 4m - node logs from longer period of time See new attachement - output of oc get event See attachement => events.txt - output of oc describe pod (in the namespace where the recycler runs) oc get po -w NAME READY STATUS RESTARTS AGE hawkular-cassandra-1-gp133 1/1 Running 0 6d hawkular-metrics-onkqd 1/1 Running 4 28d heapster-yddsn 1/1 Running 0 6d recycler-for-tv202-volume 0/1 Completed 0 4m NAME READY STATUS RESTARTS AGE recycler-for-tv200-volume 0/1 Pending 0 0s recycler-for-tv203-volume 0/1 Pending 0 0s recycler-for-tv200-volume 0/1 Pending 0 0s recycler-for-tv203-volume 0/1 Pending 0 0s recycler-for-tv200-volume 0/1 ContainerCreating 0 0s recycler-for-tv202-volume 0/1 Terminating 0 7m recycler-for-tv204-volume 0/1 Pending 0 1s recycler-for-tv202-volume 0/1 Terminating 0 7m recycler-for-tv201-volume 0/1 Pending 0 0s recycler-for-tv204-volume 0/1 Pending 0 1s recycler-for-tv203-volume 0/1 ContainerCreating 0 1s recycler-for-tv201-volume 0/1 Pending 0 0s recycler-for-tv204-volume 0/1 ContainerCreating 0 2s recycler-for-tv201-volume 0/1 ContainerCreating 0 1s recycler-for-tv204-volume 1/1 Running 0 23s recycler-for-tv200-volume 0/1 Completed 0 26s recycler-for-tv201-volume 0/1 Completed 0 25s recycler-for-tv201-volume 0/1 Terminating 0 25s recycler-for-tv201-volume 0/1 Terminating 0 26s recycler-for-tv204-volume 0/1 Completed 0 28s recycler-for-tv204-volume 0/1 Terminating 0 28s recycler-for-tv204-volume 0/1 Terminating 0 28s recycler-for-tv203-volume 1/1 Running 0 28s recycler-for-tv203-volume 0/1 Completed 0 32s recycler-for-tv203-volume 0/1 Terminating 0 32s recycler-for-tv203-volume 0/1 Terminating 0 32s Thanks for the logs! With a quick check, everything looks like it _should_ work, the only difference in volume tv202 is that its recycler got scheduled to node qa-paas-master.qa-ch7.dev instead of 10.2.64.151, where the test of recyclers were successfully running. It seems to me that "qa-paas-master.qa-ch7.dev" actually *is* 10.2.64.151, isn't it? Why it's used as "qa-paas-master" once??? In the logs, I can see the recycler for tv202 pod got "Succeeded" without any error, however it seems it did not run at all. Your node logs start at Mon 2016-11-07 20:50:40, when the weird pod for tv202 is already created, do you have earlier logs? It looks like couple of minutes should be fine... I want to see "recycleVolumeOperation [tv202-volume] started" and subsequent "About to try and schedule pod recycler-for-tv202-volume". Well what i noticed is that the volume that aren't cleaned are the one having a popd in the namespace openshift infra with status Completed. The volume was cleaned but for unknow reason the cleaner po is still there... :
Ex tv203:
oc get po -w
NAME READY STATUS RESTARTS AGE
hawkular-cassandra-1-gp133 1/1 Running 0 6d
hawkular-metrics-onkqd 1/1 Running 4 29d
heapster-yddsn 1/1 Running 0 6d
recycler-for-tv203-volume 0/1 Completed 0 6m
If i delete these po, deploy new project with volume claim, and delete the project, i get all the volumes cleaned.
Now the issue is if i deploy new project and delete it again, look at the recycler pod 203
oc get po -w
NAME READY STATUS RESTARTS AGE
recycler-for-tv201-volume 0/1 Pending 0 0s
recycler-for-tv204-volume 0/1 Pending 0 0s
recycler-for-tv201-volume 0/1 Pending 0 0s
recycler-for-tv204-volume 0/1 Pending 0 0s
recycler-for-tv200-volume 0/1 Pending 0 0s
recycler-for-tv201-volume 0/1 ContainerCreating 0 0s
recycler-for-tv200-volume 0/1 Pending 0 0s
recycler-for-tv204-volume 0/1 ContainerCreating 0 0s
recycler-for-tv202-volume 0/1 Pending 0 0s
recycler-for-tv200-volume 0/1 ContainerCreating 0 0s
recycler-for-tv202-volume 0/1 Pending 0 0s
recycler-for-tv202-volume 0/1 ContainerCreating 0 1s
recycler-for-tv201-volume 1/1 Running 0 11s
recycler-for-tv200-volume 1/1 Running 0 12s
recycler-for-tv200-volume 0/1 Completed 0 17s
recycler-for-tv201-volume 0/1 Completed 0 17s
recycler-for-tv200-volume 0/1 Terminating 0 17s
recycler-for-tv200-volume 0/1 Terminating 0 17s
recycler-for-tv202-volume 1/1 Running 0 21s
recycler-for-tv204-volume 1/1 Running 0 23s
recycler-for-tv202-volume 0/1 Completed 0 25s
recycler-for-tv202-volume 0/1 Terminating 0 25s
recycler-for-tv202-volume 0/1 Terminating 0 25s
recycler-for-tv204-volume 0/1 Completed 0 27s
recycler-for-tv204-volume 0/1 Terminating 0 27s
recycler-for-tv204-volume 0/1 Terminating 0 27s
See nothing...
doing again oc get po, the recycler still Completed and still not ready.
oc get po
NAME READY STATUS RESTARTS AGE
hawkular-cassandra-1-gp133 1/1 Running 0 6d
hawkular-metrics-onkqd 1/1 Running 4 29d
heapster-yddsn 1/1 Running 0 6d
recycler-for-tv201-volume 0/1 Completed 0 48s
recycler-for-tv203-volume 0/1 Completed 0 7m
Looking at the PV...
oc get pv -w
NAME CAPACITY ACCESSMODES STATUS CLAIM REASON AGE
tv200-volume 10Gi RWO,RWX Bound ya-test2/mysql4 6d
tv201-volume 10Gi RWO,RWX Bound ya-test2/mysql1 6d
tv202-volume 10Gi RWO,RWX Bound ya-test2/mysql5 5d
tv203-volume 10Gi RWO,RWX Bound ya-test2/mysql2 6d
tv204-volume 10Gi RWO,RWX Bound ya-test2/mysql3 6d
Now delete the project
NAME CAPACITY ACCESSMODES STATUS CLAIM REASON AGE
tv201-volume 10Gi RWO,RWX Released ya-test2/mysql1 6d
tv203-volume 10Gi RWO,RWX Released ya-test2/mysql2 6d
tv204-volume 10Gi RWO,RWX Released ya-test2/mysql3 6d
tv200-volume 10Gi RWO,RWX Released ya-test2/mysql4 6d
tv202-volume 10Gi RWO,RWX Released ya-test2/mysql5 5d
tv200-volume 10Gi RWO,RWX Released 6d
tv200-volume 10Gi RWO,RWX Available 6d
tv202-volume 10Gi RWO,RWX Released 5d
tv202-volume 10Gi RWO,RWX Available 5d
tv204-volume 10Gi RWO,RWX Released 6d
tv201-volume 10Gi RWO,RWX Released 6d
tv204-volume 10Gi RWO,RWX Available 6d
tv201-volume 10Gi RWO,RWX Available 6d
tv203-volume 10Gi RWO,RWX Released 6d
tv203-volume 10Gi RWO,RWX Available 6d
the volume 203 get status AVAILABLE but the data are still present on the NFS share.
If i do a oc logs on the 203 recycler pod
oc logs recycler-for-tv203-volume
Scrub ok
The oc describe po recycler-for-tv203-volume
oc describe po recycler-for-tv203-volume
Name: recycler-for-tv203-volume
Namespace: openshift-infra
Node: 10.2.64.151/10.2.64.151
Start Time: Tue, 08 Nov 2016 20:44:00 +0100
Labels: <none>
Status: Succeeded
IP: 10.201.0.2
Controllers: <none>
Containers:
pv-recycler:
Container ID: docker://97d05dc4e6517b4981ab2436315f531bd0761409d456d072fc903c1ba0ca7c69
Image: openshift3/ose-recycler:v3.3.0.32
Image ID: docker://sha256:08aa1eb54fafb89c3ec673e931f9f172fea2880fba5133576faa554649dff3fb
Port:
Command:
/usr/bin/openshift-recycle
Args:
/scrub
QoS Tier:
memory: BestEffort
cpu: BestEffort
State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 08 Nov 2016 20:44:15 +0100
Finished: Tue, 08 Nov 2016 20:44:25 +0100
Ready: False
Restart Count: 0
Environment Variables:
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
vol:
Type: NFS (an NFS mount that lasts the lifetime of a pod)
Server: paas-nfs.qa-ch7.dev
Path: /exports/tv203
ReadOnly: false
pv-recycler-controller-token-gp1i3:
Type: Secret (a volume populated by a Secret)
SecretName: pv-recycler-controller-token-gp1i3
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
16m 16m 1 {default-scheduler } Normal Scheduled Successfully assigned recycler-for-tv203-volume to 10.2.64.151
16m 16m 1 {kubelet 10.2.64.151} spec.containers{pv-recycler} Normal Pulled Container image "openshift3/ose-recycler:v3.3.0.32" already present on machine
16m 16m 1 {kubelet 10.2.64.151} spec.containers{pv-recycler} Normal Created Created container with docker id 97d05dc4e651
16m 16m 1 {kubelet 10.2.64.151} spec.containers{pv-recycler} Normal Started Started container with docker id 97d05dc4e651
Seems ok...
I ll keep testing but so far the issue comes if there is already a previous recycler pod present in the openshift-infra namespace. Now the question is why there is randomly one remaining there... which has logs "Scrub ok" and the next time it get invoked to clean the volume it answer : Scrub done (which is not the case) and then the Volume get back in status Available.
Hello,
so i got more information...
Doing a stupid cron job running every minute on the master... looking for pod called "recycler-for" Not Ready 0/1 and Status Completed and simply deleting these pods...
for i in $( oc get po -n openshift-infra | grep -o '\(recycler\S*\)\(.*0\/1.*Completed\)' | awk '{ print $1}' ); do
echo "Deleting recycler pod : $i"
echo ""
oc describe po $i -n openshift-infra
echo ""
oc delete po $i -n openshift-infra
done
Every time a volume claim get released OS creates a NEW recycler pod instead updating the one which is in not Ready/Completed status.
Since i did this trick... seems no more issue with data scrubbing...
So, there are several things here: 1) When you delete a namespace with a PVC and a mysql pod, the PVC is deleted earlier that the pod and for some time the mysql pod and the recycler run in parallel and the mysql pod may prevent some files from being deleted. 2) The recycler pod in OpenShift Enterprise does not check that there are no files on the volume when it finishes. This way, it should detect that there are still some mysql files somewhere and retry recycling again after a while. 3) From some reason, some recycle pods are not deleted by Kubernetes when they complete. It's totally unpredictable, I've reproduced it few times: 1. create 10 NFS PVs, each pointing to a separate directory 2. create 10 claims 3. create 10 pods, each using one claim, creating bunch of files and directories in an endless loop 4. delete the pods and claims at the same time. Pods are slowly being while they're still creating some files and directories and while recycler pods are running. 5. some recycle pods hang in "0/1 Completed" state and are not deleted -> bug! I'm investigating 3), I'll fill separate bugs for 1) and 2). Patch for 3): https://github.com/kubernetes/kubernetes/pull/36827 Origin pull request: https://github.com/openshift/origin/pull/12041 The code went to Origin 1.5 with the last Kubernetes rebase. This has been merged into ocp and is in OCP v3.5.0.7 or newer. If you are going to backport this, please clone this bugzilla and attach the pull request for the backport to it. Cloned for 3.4.x as #1415624 Tried on below version with 10 PVs for several times, pv becomes available and data are cleaned. openshift v3.5.0.7+390ef18 kubernetes v1.5.2+43a9be4 etcd 3.1.0-rc.0 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0884 |