Bug 1694760

Summary: Storage classes in multi-AZ clusters don't work reliably
Product: OpenShift Container Platform Reporter: bmorriso
Component: StorageAssignee: Bradley Childs <bchilds>
Status: CLOSED NOTABUG QA Contact: Liang Xia <lxia>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.11.0CC: aos-bugs, aos-storage-staff, erich, hekumar
Target Milestone: ---Keywords: OpsBlocker
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-04-02 13:05:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description bmorriso 2019-04-01 15:13:54 UTC
Description of problem:

On a multi-AZ v3.11 cluster with a gp2 storage class that isn't restricted to a single AZ, it is possible to have a situation where two PVCs (belonging to the same pod) are provisioned in two separate AZs. In this scenario, the pod can't be scheduled anywhere because there isn't a node in the cluster that can mount both of the PVs. 

One alternative to this would be to create a storage class for each AZ and have the user specify which storage class to use each time a PVC is created. This works, but its inconvenient and it requires each user to know about this extra step. If users don't specify which SC to use, then all new PVs will be created in the AZ of the default SC, which could lead to a disproportionate amount of the cluster's workload running on nodes in a single AZ. 

Version-Release number of selected component (if applicable):

OpenShift v3.11

How reproducible:

Steps to Reproduce:
1. From a template, create a deployment that provisions two PVCs that are mounted to the same pod.
2.
3.

Actual results:
The PVCs will (sometimes) provision to two different AZs which makes the pod unschedulable. 

Expected results:
The PVCs will be created in the same AZ so that the pod can mount both of them. 

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

Comment 1 Hemant Kumar 2019-04-01 16:02:31 UTC
We can't fix this in 3.11. The fix depends on a relatively big feature which is alpha in 3.11. Even if you were to enable that alpha feature (called topology aware scheduling/provisioning), support for EBS volumes is missing in 3.11 .

Comment 2 Eric Rich 2019-04-09 15:57:19 UTC
This was/is addressed in 4.1: https://bugzilla.redhat.com/show_bug.cgi?id=1698083