Bug 1683750 - OCP4 from Installer 0.13 creates StorageClass that creates PVs in availability zones that have no machines
Summary: OCP4 from Installer 0.13 creates StorageClass that creates PVs in availabilit...
Keywords:
Status: CLOSED DUPLICATE of bug 1664145
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 4.1.0
Assignee: Matthew Wong
QA Contact: Liang Xia
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-02-27 17:38 UTC by Wolfgang Kulhanek
Modified: 2019-03-14 17:25 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-03-14 17:25:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Wolfgang Kulhanek 2019-02-27 17:38:45 UTC
Description of problem:
So I stood up a 0.13 cluster this morning. It appears that PVC provisioning is not working quite right.

I configured the cluster with 2 workers - so my machinesets us-east-1a and us-east-1b are scaled to 1 instance.

I then deploy a template with database (e.g dancer-mysql-persistent) and a PVC gets created in the project. After a while the PV is created and bound.

BUT what I am seeing is that it is creating the PV in us-east-1c - which has no active nodes... so the app stays in pending with  "2 node(s) had volume node affinity conflict"

Version-Release number of selected component (if applicable):
oc adm release info
Name:      4.0.0-0.5
Digest:    sha256:2de0de1c56c2b8de6b57c733db9397d206ff3b3328bd50d1bf1613cd5ba709c6
Created:   2019-02-27 00:08:21 +0000 UTC
OS/Arch:   linux/amd64
Manifests: 244

Release Metadata:
  Version:  4.0.0-0.5
  Upgrades: 4.0.0-0.4

Component Versions:
  Kubernetes 1.12.4

How reproducible:

Steps to Reproduce:
1. See above

Actual results:
oc describe pv
Name:              pvc-aaef2978-3a99-11e9-8307-0e3610fc2722
Labels:            failure-domain.beta.kubernetes.io/region=us-east-1
                   failure-domain.beta.kubernetes.io/zone=us-east-1c
Annotations:       kubernetes.io/createdby: aws-ebs-dynamic-provisioner
                   pv.kubernetes.io/bound-by-controller: yes
                   pv.kubernetes.io/provisioned-by: kubernetes.io/aws-ebs
Finalizers:        [kubernetes.io/pv-protection]
StorageClass:      gp2
Status:            Bound
Claim:             wk-test/mysql
Reclaim Policy:    Delete
Access Modes:      RWO
Capacity:          1Gi
Node Affinity:
  Required Terms:
    Term 0:        failure-domain.beta.kubernetes.io/zone in [us-east-1c]
                   failure-domain.beta.kubernetes.io/region in [us-east-1]
Message:
Source:
    Type:       AWSElasticBlockStore (a Persistent Disk resource in AWS)
    VolumeID:   aws://us-east-1c/vol-028e209aad0cf2577
    FSType:     ext4
    Partition:  0
    ReadOnly:   false
Events:         <none>

Expected results:
Either 1a or 1b. 

StorageClass Dump (if StorageClass used by PV/PVC):
apiVersion: v1
items:
- apiVersion: storage.k8s.io/v1
  kind: StorageClass
  metadata:
    annotations:
      storageclass.kubernetes.io/is-default-class: "true"
    creationTimestamp: 2019-02-27T17:34:07Z
    labels:
      cluster.storage.openshift.io/owner-name: cluster-config-v1
      cluster.storage.openshift.io/owner-namespace: kube-system
    name: gp2
    resourceVersion: "156690"
    selfLink: /apis/storage.k8s.io/v1/storageclasses/gp2
    uid: e280d120-3ab5-11e9-b930-0e3610fc2722
  parameters:
    type: gp2
  provisioner: kubernetes.io/aws-ebs
  reclaimPolicy: Delete
  volumeBindingMode: Immediate
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Additional info:

Comment 1 Matthew Wong 2019-02-27 18:41:10 UTC
https://github.com/openshift/cluster-storage-operator/pull/12 to make the StorageClass volumeBindingMode WaitForFirstConsumer merged only yesterday, I guess we missed the boat on .13. We can wait for the next version. In the meantime, create your own StorageClass with volumeBindingMode WaitForFirstConsumer and mark it default instead by removing the annotation from the current one and adding it to yours

Comment 2 Wolfgang Kulhanek 2019-02-27 19:02:29 UTC
@matthew thanks I did that. Seems to work fine.

Comment 4 Matthew Wong 2019-03-14 17:25:48 UTC
Fixed by https://github.com/openshift/cluster-storage-operator/commit/b850242280b7ef2cf7631952229c0a438ec39e64 and installer 0.14. Marking as dupe of https://bugzilla.redhat.com/show_bug.cgi?id=1664145, tracking it there

*** This bug has been marked as a duplicate of bug 1664145 ***


Note You need to log in before you can comment on or make changes to this bug.