Description of problem: Set up a cluster with 2 nodes in us-east-1d, use PVC to dynamically provision volumes, found volume sometimes could be provisioned in us-east-1c. This randomly happens but if you keep provisioning volumes, you will definitely see a volume provisioned in wrong AZ. Version-Release number of selected component (if applicable): openshift v3.3.0.17 kubernetes v1.3.0+507d3a7 etcd 2.3.0+git How reproducible: 80% Steps to Reproduce: 1.Install master and node instances on AWS, Both in the us-east-1d 2.Create dynamic pvc using this file https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/persistent-volumes/misc/pvc.json with name pvc1 and size is 1Gi 3. Create dynamic pvc using this file https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/persistent-volumes/misc/pvc.json with name pvc2 and size is 2Gi 4.Check pv pvc-76c63740-5dff-11e6-84a2-0ef1f4b2f333 1Gi RWO Bound jhou/pvc1 6m pvc-bbe9982f-5dff-11e6-84a2-0ef1f4b2f333 2Gi RWX Bound jhou/pvc2 5m 5.oc describe pv [root@ip-172-18-11-134 ~]# oc describe pv pvc-76c63740-5dff-11e6-84a2-0ef1f4b2f333 Name: pvc-76c63740-5dff-11e6-84a2-0ef1f4b2f333 Labels: failure-domain.beta.kubernetes.io/region=us-east-1 failure-domain.beta.kubernetes.io/zone=us-east-1d Status: Bound Claim: jhou/pvc1 Reclaim Policy: Delete Access Modes: RWO Capacity: 1Gi Message: Source: Type: AWSElasticBlockStore (a Persistent Disk resource in AWS) VolumeID: aws://us-east-1d/vol-4c8d98e8 FSType: ext4 Partition: 0 ReadOnly: false No events. [root@ip-172-18-11-134 ~]# oc describe pv pvc-bbe9982f-5dff-11e6-84a2-0ef1f4b2f333 Name: pvc-bbe9982f-5dff-11e6-84a2-0ef1f4b2f333 Labels: failure-domain.beta.kubernetes.io/region=us-east-1 failure-domain.beta.kubernetes.io/zone=us-east-1c Status: Bound Claim: jhou/pvc2 Reclaim Policy: Delete Access Modes: RWX Capacity: 2Gi Message: Source: Type: AWSElasticBlockStore (a Persistent Disk resource in AWS) VolumeID: aws://us-east-1c/vol-65e8abc8 FSType: ext4 Partition: 0 ReadOnly: false No events. Actual results: One of pv is created in the us-east-1c, not the same AZ with instance Expected results: Two pv should be in the same AZ with instance Additional info: Aug 9 03:06:03 ip-172-18-11-134 docker: I0809 03:06:03.623402 1 aws.go:972] Found instances in zones map[us-east-1c:{} us-east-1d:{}] Aug 9 03:06:03 ip-172-18-11-134 docker: I0809 03:06:03.623442 1 util.go:248] Creating volume for PVC "pvc2"; chose zone="us-east-1c" from zones=["us-east-1c" "us-east-1d"] Aug 9 03:06:03 ip-172-18-11-134 atomic-openshift-node: I0809 03:06:03.889799 25041 generic.go:181] GenericPLEG: Relisting
I saw it once or twice. Looking at the code, Kubernetes lists all running AWS instances and randomly selects a zone that is used by one of them. It happens only on your shared AWS account. It should work if Kubernetes is installed on a dedicated AWS project where all AWS instances are Kubernetes nodes. Filled https://github.com/kubernetes/kubernetes/issues/30265 about it.
Current work around: Add tag "Name=KubernetesCluster,Value=<clusterid>" to all instances of a same cluster. Removed 'testblocker' keyword since the work around works for us.
I think the upstream issue is saying that the tagging is not a 'work around' but is the 'design'. I think this is 'working as expected'. I do not believe there is anything left to fix in this BZ.
Closing as working as designed per upstreams comment (use the tagging to influence PV zone)
We need to document this in case customers runs into same problem. Tracked in https://bugzilla.redhat.com/show_bug.cgi?id=1367617
openshift-ansible doesn't currently perform any instance manipulation but we start that work as part of 3.7. Is there another suggested short term fix that you'd like?
*** Bug 1468756 has been marked as a duplicate of this bug. ***
https://github.com/openshift/openshift-ansible/pull/4726 makes it mandatory to specify a cluster id when you're using the AWS provider or explicitly state that you're only running one cluster per account.
Based on the following from Hemant Kumar I'm moving this to be a Docs bug. While the ansible installer could update aws.conf this seems like a bad idea because it's yet another item that needs to be kept in sync. "Also, there is no need to update aws.conf file, because if KubernetesClusterTag is not present in aws.conf then the tag value is picked from master instance tag." Infact, the docs already mention this, however in a section specific to AWS dynamic volumes. It should probably be moved to a more prominent location and it needs to be updated to reflect the new label. Here's a PR that does the latter. https://github.com/openshift/openshift-docs/pull/4783