Bug 1372059

Summary: Dynamic provisioned volumes fail in AWS due to incorrect zone
Product: OpenShift Container Platform Reporter: Matt Wringe <mwringe>
Component: InstallerAssignee: Kenny Woodson <kwoodson>
Status: CLOSED ERRATA QA Contact: Johnny Liu <jialiu>
Severity: medium Docs Contact:
Priority: high    
Version: 3.3.0CC: aos-bugs, eparis, jokerman, mifiedle, mmccomas
Target Milestone: ---   
Target Release: 3.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Kubernetes requires that all resources under management be labeled with KubernetesCluster (deprecated) or kubernetes.io/cluster/xxxx so that when attaching/detaching persistent volumes they connect to the correct instance in the correct zone.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-28 21:51:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Matt Wringe 2016-08-31 19:49:52 UTC
Description of problem:
When running on AWS, dynamic volumes currently fail because the dynamic volume is being created within another availability zone, which is not permitted.

Error message from the origin logs:

"E0831 19:31:17.926294    1430 factory.go:514] Error scheduling openshift-infra hawkular-cassandra-1-xxor6: pod (hawkular-cassandra-1-xxor6) failed to fit in any node
fit failure on node (ip-172-18-10-196.ec2.internal): NoVolumeZoneConflict"

The AWS instance is running in zone 'us-east-1d', but the PV and volume are running in 'us-east-1c'.


# oc get pv -o yaml
apiVersion: v1
items:
- apiVersion: v1
  kind: PersistentVolume
  metadata:
    annotations:
      kubernetes.io/createdby: aws-ebs-dynamic-provisioner
      pv.kubernetes.io/bound-by-controller: "yes"
      pv.kubernetes.io/provisioned-by: kubernetes.io/aws-ebs
    creationTimestamp: 2016-08-31T19:26:10Z
    labels:
      failure-domain.beta.kubernetes.io/region: us-east-1
      failure-domain.beta.kubernetes.io/zone: us-east-1c
    name: pvc-c4cd4906-6fb0-11e6-8991-0e3b730bb317
    resourceVersion: "625"
    selfLink: /api/v1/persistentvolumes/pvc-c4cd4906-6fb0-11e6-8991-0e3b730bb317
    uid: c5b67afb-6fb0-11e6-8991-0e3b730bb317
  spec:
    accessModes:
    - ReadWriteOnce
    awsElasticBlockStore:
      fsType: ext4
      volumeID: aws://us-east-1c/vol-b0b2161c
    capacity:
      storage: 10Gi
    claimRef:
      apiVersion: v1
      kind: PersistentVolumeClaim
      name: metrics-cassandra-1
      namespace: openshift-infra
      resourceVersion: "584"
      uid: c4cd4906-6fb0-11e6-8991-0e3b730bb317
    persistentVolumeReclaimPolicy: Delete
  status:
    phase: Bound
kind: List
metadata: {}

Version-Release number of selected component (if applicable):
Origin master, but is also reproducible in v1.3.0-alpha.3

How reproducible:
Always

Comment 2 Jianwei Hou 2016-09-01 02:31:52 UTC
@Matt, pls see if https://bugzilla.redhat.com/show_bug.cgi?id=1365398#c6 works for you.

Comment 3 Eric Paris 2016-09-01 14:18:17 UTC
Brad, can you make sure the docs mention the AWS labeling requirement?

Comment 4 Eric Paris 2016-09-01 14:19:32 UTC
Brad: aka https://bugzilla.redhat.com/show_bug.cgi?id=1367617

Comment 5 Eric Paris 2016-09-02 15:44:02 UTC
Moving this to installer to potentially be addressed in 3.4 or later. For 3.3 we have a docs PR https://github.com/openshift/openshift-docs/pull/2783 to explain how this can be accomplished manually.

Comment 6 Jason DeTiberus 2016-09-22 20:36:37 UTC
I'm not sure this is a bug currently, because we don't currently support provisioning in AWS. That said, we do plan on supporting AWS provisioning for 3.4, so will use this to track ensuring we set this properly.

Comment 7 Scott Dodson 2017-02-10 02:07:18 UTC
AWS provisioning is not included in 3.5

Comment 8 Scott Dodson 2017-08-24 18:50:07 UTC
This happens because instances aren't labeled properly. The AWS provisioning work that Kenny is doing will ensure that instances are labeled though it appears to use the format that's preferred prior to 3.6. We need to update it to use this format of 

Key: kubernetes.io/cluster/xxxx
Value: SomeUniqueClusterId

where xxxx is a unique string

See https://trello.com/c/PWwHHUc0/154-retrofit-existing-clusters-with-the-tags-needed-for-the-the-provisioner#comment-595cee234251514e70b52a32 for reference

Comment 9 Scott Dodson 2017-10-13 13:15:32 UTC
New AWS provisioning playbooks in openshift-ansible master branch should properly tag all resources.

Comment 10 Johnny Liu 2017-10-16 09:33:05 UTC
This is validated for several round of testing, adding "KubernetesCluster" tag make cluster be working well.


But one more thing need to be highlighted:
1. when all instances in the cluster are running in the same zone, adding "KubernetesCluster" tag works well.
2. when all instances in the cluster are running in multi zones, adding "KubernetesCluster" tag does not resolve all issues, will encounter BZ#1491761.
3. For installer enhancement, according to https://bugzilla.redhat.com/show_bug.cgi?id=1491399#c9, seem like openshift-ansible tools would do instance tag check only for upgrade part, why not do the same check for fresh install.


@Scott, based on #3, I move this bug to ASSIGNED status.

Comment 11 Scott Dodson 2017-10-19 14:15:19 UTC
We intend to check during 3.7 install and 3.6 to 3.7 upgrade. Lets track those other bugs separately, the scope of this bug is ensuring that the new provisioning work properly sets a cluster id.

Comment 12 Johnny Liu 2017-10-20 02:31:55 UTC
Based on comment 10 and comment 11, move this bug to VERIFIED.

Comment 13 Mike Fiedler 2017-10-20 11:34:48 UTC
+1 to comment 12 as it prevents a fresh install from succeeding.  In 3.6, the OpenShift documentation barely mentions cluster ID.   The only reference to it that I could find is in a side note[1] in the table on persistent volume types.  Nothing in Installation.

[1] - https://docs.openshift.com/container-platform/3.6/install_config/persistent_storage/dynamically_provisioning_pvs.html#available-dynamically-provisioned-plug-ins

Comment 17 errata-xmlrpc 2017-11-28 21:51:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188