Bug 1496256 - Deleted in use PVCs can break the scheduler
Summary: Deleted in use PVCs can break the scheduler
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 3.6.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.8.0
Assignee: Jan Safranek
QA Contact: Wenqi He
URL:
Whiteboard:
: 1496225 (view as bug list)
Depends On:
Blocks: 1499172 1499174 1499175 1499176 1499177 1499178
TreeView+ depends on / blocked
 
Reported: 2017-09-26 19:46 UTC by Eric Paris
Modified: 2018-03-28 14:07 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
: 1499172 1499174 1499175 1499176 1499177 1499178 (view as bug list)
Environment:
Last Closed: 2018-03-28 14:07:14 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0489 0 None None None 2018-03-28 14:07:53 UTC

Description Eric Paris 2017-09-26 19:46:03 UTC
oc v3.6.173.0.5

Steps to reproduce:
In namespace1
 - Create a PVC (make the PVC bind and work)
 - Create a pod using the PVC
 - Delete the PVC
 - Create a PVC with the same name, but this time make it such that it stay's pending forever (eg: ReadWriteMany on EBS)

In namespace2
 - create a valid PVC
 - create a valid Pod.

The namespace2 pod will be unable to schedule because:

https://github.com/kubernetes/kubernetes/blob/master/plugin/pkg/scheduler/algorithm/predicates/predicates.go#L284

will error out. filterVolumes() will iterate over all of the pods/PVCs on a node, will find namespace1 PVC by NAME, and will error when it find that it is pending. 

We then set the status on the 'namespace2' PVC to say something like:
  "message": "SchedulerPredicates failed due to PersistentVolumeClaim is not bound: "namespace1PVC", which is unexpected."

This is bad since we leaked the PVC name from namespace1 into the status of a pod in namespace2.

This is also bad since we didn't schedule the namespace2 pod even though it could have been scheduled.


Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

Comment 1 Clayton Coleman 2017-09-26 20:11:01 UTC
*** Bug 1496225 has been marked as a duplicate of this bug. ***

Comment 2 Jan Safranek 2017-09-27 13:37:59 UTC
Upstream PR: https://github.com/kubernetes/kubernetes/pull/53135

Comment 5 Jan Safranek 2017-10-06 09:56:09 UTC
Marking this bug as for 3.8 to remember to test it after rebase. Cloning to older releases.

Comment 7 Jan Safranek 2017-11-23 15:33:21 UTC
I am glad I left it open, UPSTREAM patch got lost during the rebase.

New PR: https://github.com/openshift/origin/pull/17442

Comment 9 Wenqi He 2018-01-04 01:12:33 UTC
Tested on below version:
openshift v3.9.0-0.9.0
kubernetes v1.8.1+0d5291c

Scheduler still work after delete in use pvcs.

Comment 12 errata-xmlrpc 2018-03-28 14:07:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489


Note You need to log in before you can comment on or make changes to this bug.