Description of problem: When using GCE for hosting the OCP cluster across multiple zones, and using the dynamically provisioned storage, the storage periodically Version-Release number of selected component (if applicable): 3.6 Additional info: Found a known kubernetes issue [0] about this issue that still seems to be open. [0] https://github.com/kubernetes/kubernetes/issues/50115
*** Bug 1490477 has been marked as a duplicate of this bug. ***
There is a workaround described in the bug #1490477.
Hi, Just to be clear, the workaround described in the other bug is: As a workaround, you should always have at least one node in every zone that has a master. And to clarify, this means that should you have 3 masters and each is in a different zone, the workaround for this issue is to simply have at least one node in each of those zones so that the pod can be scheduled to whatever zone has access to the gce storage? Or am I missing an understanding of how kubernetes interacts with the gce storage?
Having spoken with Bradley Childs in irc, I am removing the needinfo flag as he explained the workaround. Rather than necessarily having one node per zone with the masters, you can understand the workaround to be "have at least one master per zone with nodes". This is because the provisioner for GCE runs on the master and therefore only knows about the zone that that master is configured for. This means if you have nodes in zones without a master, there will be nothing in that zone to tell GCE that it needs storage in that zone.
https://github.com/kubernetes/kubernetes/pull/55039
Tested on OCP 3.9 with version v3.9.0-0.20.0, and the issue has been fixed. # openshift version openshift v3.9.0-0.20.0 kubernetes v1.9.1+a0ce1bc657 etcd 3.2.8 Verification steps: ENV of OCP 3.9: total 1 master, 1 nodes. 1 master in zone us-central1-a. 1 node in zone us-central1-a. 1. Create a storage class without setting parameters.zones. 2. Create 100 pvc using above storage class, and check dynamic provisioned volumes. Result: ALL 100 volumes were provisioned in zone us-central1-a. 1. Create a storage class with parameters.zones set to "us-central1-a,us-central1-b". 2. Create 100 pvc using above storage class, and check dynamic provisioned volumes. Result: 51 volumes were provisioned in zone us-central1-a. 51 pvc were in bound status, 49 in pending status. Check the pending pvc, # oc describe pvc pvcname095 Name: pvcname095 Namespace: default StorageClass: zones Status: Pending Volume: Labels: <none> Annotations: volume.beta.kubernetes.io/storage-provisioner=kubernetes.io/gce-pd Finalizers: [] Capacity: Access Modes: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning ProvisioningFailed 3s (x7 over 1m) persistentvolume-controller Failed to provision volume with StorageClass "zones": kubernetes does not have a node in zone "us-central1-b" The error message is clear to show why it failed to provision the volume. So in total, the result is acceptable from QE's perspective. Feel free to move back if you do not agree.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0489