Description of problem: On a multi-AZ v3.11 cluster with a gp2 storage class that isn't restricted to a single AZ, it is possible to have a situation where two PVCs (belonging to the same pod) are provisioned in two separate AZs. In this scenario, the pod can't be scheduled anywhere because there isn't a node in the cluster that can mount both of the PVs. One alternative to this would be to create a storage class for each AZ and have the user specify which storage class to use each time a PVC is created. This works, but its inconvenient and it requires each user to know about this extra step. If users don't specify which SC to use, then all new PVs will be created in the AZ of the default SC, which could lead to a disproportionate amount of the cluster's workload running on nodes in a single AZ. Version-Release number of selected component (if applicable): OpenShift v3.11 How reproducible: Steps to Reproduce: 1. From a template, create a deployment that provisions two PVCs that are mounted to the same pod. 2. 3. Actual results: The PVCs will (sometimes) provision to two different AZs which makes the pod unschedulable. Expected results: The PVCs will be created in the same AZ so that the pod can mount both of them. Master Log: Node Log (of failed PODs): PV Dump: PVC Dump: StorageClass Dump (if StorageClass used by PV/PVC): Additional info:
We can't fix this in 3.11. The fix depends on a relatively big feature which is alpha in 3.11. Even if you were to enable that alpha feature (called topology aware scheduling/provisioning), support for EBS volumes is missing in 3.11 .
This was/is addressed in 4.1: https://bugzilla.redhat.com/show_bug.cgi?id=1698083