Bug 1509028
Summary: | GCE dynamic provisioning going to wrong zone | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Eric Jones <erjones> |
Component: | Storage | Assignee: | Tomas Smetana <tsmetana> |
Status: | CLOSED ERRATA | QA Contact: | Liang Xia <lxia> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 3.6.0 | CC: | aos-bugs, aos-storage-staff, bchilds, erich, gpei, jkaur, pschiffe, tsmetana, wmeng |
Target Milestone: | --- | Keywords: | NeedsTestCase |
Target Release: | 3.9.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause:
When using OpenShift cluster in GCE multiple zone setup the dynamic persistent volumes might get provisioned in zones with no running nodes.
Consequence:
Pods using the volume provisioned in zone with no nodes could not be run.
Fix:
The GCE cloud provider has been fixed to provision persistent volumes only in the zones with running nodes.
Result:
The pods in multi-zone clusters should not fail to start because of not fitting to any node.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2018-03-28 14:09:47 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1531444 | ||
Bug Blocks: |
Description
Eric Jones
2017-11-02 17:53:48 UTC
*** Bug 1490477 has been marked as a duplicate of this bug. *** There is a workaround described in the bug #1490477. Hi, Just to be clear, the workaround described in the other bug is: As a workaround, you should always have at least one node in every zone that has a master. And to clarify, this means that should you have 3 masters and each is in a different zone, the workaround for this issue is to simply have at least one node in each of those zones so that the pod can be scheduled to whatever zone has access to the gce storage? Or am I missing an understanding of how kubernetes interacts with the gce storage? Having spoken with Bradley Childs in irc, I am removing the needinfo flag as he explained the workaround. Rather than necessarily having one node per zone with the masters, you can understand the workaround to be "have at least one master per zone with nodes". This is because the provisioner for GCE runs on the master and therefore only knows about the zone that that master is configured for. This means if you have nodes in zones without a master, there will be nothing in that zone to tell GCE that it needs storage in that zone. Tested on OCP 3.9 with version v3.9.0-0.20.0, and the issue has been fixed. # openshift version openshift v3.9.0-0.20.0 kubernetes v1.9.1+a0ce1bc657 etcd 3.2.8 Verification steps: ENV of OCP 3.9: total 1 master, 1 nodes. 1 master in zone us-central1-a. 1 node in zone us-central1-a. 1. Create a storage class without setting parameters.zones. 2. Create 100 pvc using above storage class, and check dynamic provisioned volumes. Result: ALL 100 volumes were provisioned in zone us-central1-a. 1. Create a storage class with parameters.zones set to "us-central1-a,us-central1-b". 2. Create 100 pvc using above storage class, and check dynamic provisioned volumes. Result: 51 volumes were provisioned in zone us-central1-a. 51 pvc were in bound status, 49 in pending status. Check the pending pvc, # oc describe pvc pvcname095 Name: pvcname095 Namespace: default StorageClass: zones Status: Pending Volume: Labels: <none> Annotations: volume.beta.kubernetes.io/storage-provisioner=kubernetes.io/gce-pd Finalizers: [] Capacity: Access Modes: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning ProvisioningFailed 3s (x7 over 1m) persistentvolume-controller Failed to provision volume with StorageClass "zones": kubernetes does not have a node in zone "us-central1-b" The error message is clear to show why it failed to provision the volume. So in total, the result is acceptable from QE's perspective. Feel free to move back if you do not agree. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0489 |