Bug 1698083 - Storage classes in multi-AZ clusters don't work reliably
Summary: Storage classes in multi-AZ clusters don't work reliably
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 1.0.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Bradley Childs
QA Contact: Liang Xia
URL:
Whiteboard:
Depends On:
Blocks: 1664187
TreeView+ depends on / blocked
 
Reported: 2019-04-09 15:10 UTC by Eric Rich
Modified: 2019-09-24 09:41 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-04-09 15:56:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Eric Rich 2019-04-09 15:10:56 UTC
This bug was initially created as a copy of Bug #1694760

I am copying this bug because: 

Description of problem:

On a multi-AZ v3.11 cluster with a gp2 storage class that isn't restricted to a single AZ, it is possible to have a situation where two PVCs (belonging to the same pod) are provisioned in two separate AZs. In this scenario, the pod can't be scheduled anywhere because there isn't a node in the cluster that can mount both of the PVs. 

One alternative to this would be to create a storage class for each AZ and have the user specify which storage class to use each time a PVC is created. This works, but its inconvenient and it requires each user to know about this extra step. If users don't specify which SC to use, then all new PVs will be created in the AZ of the default SC, which could lead to a disproportionate amount of the cluster's workload running on nodes in a single AZ. 

Version-Release number of selected component (if applicable):

OpenShift v3.11

How reproducible:

Steps to Reproduce:
1. From a template, create a deployment that provisions two PVCs that are mounted to the same pod.
2.
3.

Actual results:
The PVCs will (sometimes) provision to two different AZs which makes the pod unschedulable. 

Expected results:
The PVCs will be created in the same AZ so that the pod can mount both of them. 

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

Comment 1 Eric Rich 2019-04-09 15:56:49 UTC
This seems to be mitigated by https://kubernetes.io/blog/2018/10/11/topology-aware-volume-provisioning-in-kubernetes/ 

As the default storage class; has a `volumeBindingsMode` set:

$ cat must-gather/cluster-scoped-resources/storage.k8s.io/storageclasses/gp2.yaml 
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
  creationTimestamp: 2019-04-09T15:30:53Z
  name: gp2
  ownerReferences:
  - apiVersion: v1
    kind: clusteroperator
    name: storage
    uid: 55c725bb-5adb-11e9-8aa8-02d4f8d6a68e
  resourceVersion: "7951"
  selfLink: /apis/storage.k8s.io/v1/storageclasses/gp2
  uid: 7608d8ae-5adc-11e9-b839-02b822a6f0f6
parameters:
  type: gp2
provisioner: kubernetes.io/aws-ebs
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer


Note You need to log in before you can comment on or make changes to this bug.