Bug 1509028 - GCE dynamic provisioning going to wrong zone
Summary: GCE dynamic provisioning going to wrong zone
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 3.6.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.9.0
Assignee: Tomas Smetana
QA Contact: Liang Xia
URL:
Whiteboard:
: 1490477 (view as bug list)
Depends On: 1531444
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-02 17:53 UTC by Eric Jones
Modified: 2018-03-28 14:10 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: When using OpenShift cluster in GCE multiple zone setup the dynamic persistent volumes might get provisioned in zones with no running nodes. Consequence: Pods using the volume provisioned in zone with no nodes could not be run. Fix: The GCE cloud provider has been fixed to provision persistent volumes only in the zones with running nodes. Result: The pods in multi-zone clusters should not fail to start because of not fitting to any node.
Clone Of:
Environment:
Last Closed: 2018-03-28 14:09:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0489 0 None None None 2018-03-28 14:10:51 UTC

Description Eric Jones 2017-11-02 17:53:48 UTC
Description of problem:
When using GCE for hosting the OCP cluster across multiple zones, and using the dynamically provisioned storage, the storage periodically 

Version-Release number of selected component (if applicable):
3.6

Additional info:
Found a known kubernetes issue [0] about this issue that still seems to be open.

[0] https://github.com/kubernetes/kubernetes/issues/50115

Comment 2 Pavel Pospisil 2017-11-03 19:53:18 UTC
*** Bug 1490477 has been marked as a duplicate of this bug. ***

Comment 3 Tomas Smetana 2017-11-06 08:48:06 UTC
There is a workaround described in the bug #1490477.

Comment 4 Eric Jones 2017-11-07 17:50:21 UTC
Hi,

Just to be clear, the workaround described in the other bug is:


As a workaround, you should always have at least one node in every zone that has a master.


And to clarify, this means that should you have 3 masters and each is in a different zone, the workaround for this issue is to simply have at least one node in each of those zones so that the pod can be scheduled to whatever zone has access to the gce storage?

Or am I missing an understanding of how kubernetes interacts with the gce storage?

Comment 5 Eric Jones 2017-11-07 18:06:56 UTC
Having spoken with Bradley Childs in irc, I am removing the needinfo flag as he explained the workaround.

Rather than necessarily having one node per zone with the masters, you can understand the workaround to be "have at least one master per zone with nodes". This is because the provisioner for GCE runs on the master and therefore only knows about the zone that that master is configured for. This means if you have nodes in zones without a master, there will be nothing in that zone to tell GCE that it needs storage in that zone.

Comment 14 Tomas Smetana 2018-01-12 09:04:59 UTC
https://github.com/kubernetes/kubernetes/pull/55039

Comment 16 Liang Xia 2018-01-18 08:56:56 UTC
Tested on OCP 3.9 with version v3.9.0-0.20.0, and the issue has been fixed.

# openshift version
openshift v3.9.0-0.20.0
kubernetes v1.9.1+a0ce1bc657
etcd 3.2.8


Verification steps:
ENV of OCP 3.9:
   total 1 master, 1 nodes. 
   1 master in zone us-central1-a.
   1 node   in zone us-central1-a.

1. Create a storage class without setting parameters.zones.
2. Create 100 pvc using above storage class, and check dynamic provisioned volumes.
Result: ALL 100 volumes were provisioned in zone us-central1-a.

1. Create a storage class with parameters.zones set to "us-central1-a,us-central1-b".
2. Create 100 pvc using above storage class, and check dynamic provisioned volumes.
Result: 51 volumes were provisioned in zone us-central1-a.
51 pvc were in bound status, 49 in pending status.

Check the pending pvc,
# oc describe pvc pvcname095
Name:          pvcname095
Namespace:     default
StorageClass:  zones
Status:        Pending
Volume:        
Labels:        <none>
Annotations:   volume.beta.kubernetes.io/storage-provisioner=kubernetes.io/gce-pd
Finalizers:    []
Capacity:      
Access Modes:  
Events:
  Type     Reason              Age              From                         Message
  ----     ------              ----             ----                         -------
  Warning  ProvisioningFailed  3s (x7 over 1m)  persistentvolume-controller  Failed to provision volume with StorageClass "zones": kubernetes does not have a node in zone "us-central1-b"

The error message is clear to show why it failed to provision the volume.


So in total, the result is acceptable from QE's perspective.

Feel free to move back if you do not agree.

Comment 19 errata-xmlrpc 2018-03-28 14:09:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489


Note You need to log in before you can comment on or make changes to this bug.