Bug 1694760 - Storage classes in multi-AZ clusters don't work reliably
Summary: Storage classes in multi-AZ clusters don't work reliably
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Bradley Childs
QA Contact: Liang Xia
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-01 15:13 UTC by bmorriso
Modified: 2019-09-27 14:22 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-04-02 13:05:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description bmorriso 2019-04-01 15:13:54 UTC
Description of problem:

On a multi-AZ v3.11 cluster with a gp2 storage class that isn't restricted to a single AZ, it is possible to have a situation where two PVCs (belonging to the same pod) are provisioned in two separate AZs. In this scenario, the pod can't be scheduled anywhere because there isn't a node in the cluster that can mount both of the PVs. 

One alternative to this would be to create a storage class for each AZ and have the user specify which storage class to use each time a PVC is created. This works, but its inconvenient and it requires each user to know about this extra step. If users don't specify which SC to use, then all new PVs will be created in the AZ of the default SC, which could lead to a disproportionate amount of the cluster's workload running on nodes in a single AZ. 

Version-Release number of selected component (if applicable):

OpenShift v3.11

How reproducible:

Steps to Reproduce:
1. From a template, create a deployment that provisions two PVCs that are mounted to the same pod.
2.
3.

Actual results:
The PVCs will (sometimes) provision to two different AZs which makes the pod unschedulable. 

Expected results:
The PVCs will be created in the same AZ so that the pod can mount both of them. 

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

Comment 1 Hemant Kumar 2019-04-01 16:02:31 UTC
We can't fix this in 3.11. The fix depends on a relatively big feature which is alpha in 3.11. Even if you were to enable that alpha feature (called topology aware scheduling/provisioning), support for EBS volumes is missing in 3.11 .

Comment 2 Eric Rich 2019-04-09 15:57:19 UTC
This was/is addressed in 4.1: https://bugzilla.redhat.com/show_bug.cgi?id=1698083


Note You need to log in before you can comment on or make changes to this bug.