Bug 1395271 - NFS recycler pod does not check final state of scrub
Summary: NFS recycler pod does not check final state of scrub
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 3.3.1
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Jan Safranek
QA Contact: Liang Xia
URL:
Whiteboard:
Depends On:
Blocks: 1392338 1415624
TreeView+ depends on / blocked
 
Reported: 2016-11-15 14:44 UTC by Jan Safranek
Modified: 2017-07-24 14:11 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2017-04-12 19:16:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Origin (Github) 11934 0 None None None 2017-02-01 15:25:07 UTC
Red Hat Product Errata RHBA-2017:0884 0 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.5 RPM Release Advisory 2017-04-12 22:50:07 UTC

Description Jan Safranek 2016-11-15 14:44:15 UTC
OpenShift ships its own recycler pod which deletes all files on a persistent volume, typically a NFS share. Since it is a NFS share, it can be mounted to another pods that can create files while the recycler removes them. Typically, user deletes a namespace with both pods and PVCs and dying MySQL pod there may prevent some files from being recycled (see bug #1392338).

As result, the recycler may finish with the NFS share not empty. The recycler should check that the NFS share is empty and return an error if not. This way, Kubernetes will retry recycling after a while, hoping that the dying pod is already terminated.

Version-Release number of selected component (if applicable):
openshift v3.3.1.3
kubernetes v1.3.0+52492b4
etcd 2.3.0+git

How reproducible:
~50%

Steps to Reproduce:
1. create 10 separate NFS shares, 10 PVs and 10 claims
2. create 10 pods, each using one claim. The pods randomly create files on the its PVs.
3. delete the pods and claims.

Actual results:
Recycler is started before the pods are killed. Therefore recycler will try to remove files from a NFS share, while the dying pod is creating new ones.
-> recycler finished with non-empty NFS shares and reports success

Expected results:
recycler should report error and it should be restarted by Kubernetes in a while to remove remaining files.

Additional info:
Of course, this is not 100% safe, a dying pod may create a file just *after* the recycler pod checks that the NFS share is empty. I'll fill a separate bug to prevent recycler to run if there is a pod that uses the same PV. However, a check in recycler should be super simple and easy to implement and it should help in 99.9% of cases.

Comment 1 Jan Safranek 2016-11-16 12:45:29 UTC
Patch: https://github.com/openshift/origin/pull/11934/

Comment 2 Jan Safranek 2017-02-02 08:41:24 UTC
Origin PR got merged yesterday.

Comment 3 Liang Xia 2017-02-10 12:23:43 UTC
Verified the files created by the pod are finally deleted with below version,
openshift v3.5.0.18+9a5d1aa
kubernetes v1.5.2+43a9be4
etcd 3.1.0

Comment 5 errata-xmlrpc 2017-04-12 19:16:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0884


Note You need to log in before you can comment on or make changes to this bug.