Bug 1398104 - [GCE] Volume provisioned in wrong zone given multizone is enabled
Summary: [GCE] Volume provisioned in wrong zone given multizone is enabled
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 3.4.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Hemant Kumar
QA Contact: Liang Xia
URL:
Whiteboard:
: 1400248 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-24 07:08 UTC by Jianwei Hou
Modified: 2016-12-16 18:16 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-12-16 18:16:59 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1399418 0 medium CLOSED [Docs] Dynamic provisioning link is broken in persistent_storage_gce page 2021-02-22 00:41:40 UTC

Description Jianwei Hou 2016-11-24 07:08:15 UTC
Description of problem:
Setup GCE cloud provider, having "multizone = true" in its cloud config, volumes can be provisioned in a zone where there are no nodes. The volumes are provisioned using "metadata.annotations.volume.alpha.kubernetes.io".

According to https://github.com/openshift/openshift-docs/blob/master/install_config/persistent_storage/dynamically_provisioning_pvs.adoc. This can be avoided by setting labels to the PV, thus using StorageClass could avoid it. Considering both alpha and beta versions(for annotations) are in use, I think this needs to be highlighted and documented.

Version-Release number of selected component (if applicable):
openshift v3.4.0.29+ca980ba
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0

How reproducible:
Always

Steps to Reproduce:
1. Setup OCP cluster on GCE with cloud provider enabled
2. Make sure the cloud config has multizone set to true
3. Use the 3.3 version PVC to dynamically provision a PV(i.e. not using StorageClass)


Actual results:
Volume provisioned in another zone where there are no nodes in it, thus pods can not mount.

Expected results:
Volume should always provision in zones where there are available nodes.


Additional info:

Comment 1 Hemant Kumar 2016-11-28 23:17:36 UTC
I setup a cluster using Flexy deployment on GCE and tried to reproduce this. I am still somewhat away from reproducing it.

It should be noted that - dynamic provisioning without storageclass document throws 404, if I try to browse via official website -  https://docs.openshift.org/latest/install_config/persistent_storage/persistent_storage_gce.html

Try clicking "provisioned dynamically" link in the page - it takes us to - https://docs.openshift.org/latest/install_config/persistent_storage/dynamically_provisioning_pvs.html#install-config-persistent-storage-dynamically-provisioning-pvs which throws 404. 

I don't know - if we are going to support dynamic provisioning without StorageClasses going forward.  @bchilds what you think?

@Jianwei can you please post your cloud provider config, pvc config?

Comment 2 Jianwei Hou 2016-11-29 05:59:18 UTC
I used the alpha version dynamic provisioner, i.e. PVC with "volume.alpha.kubernetes.io/storage-class" in its annotation.

```
{
  "kind": "PersistentVolumeClaim",
  "apiVersion": "v1",
  "metadata": {
    "name": "gcec",
    "labels": {
         "name": "gce-dynamic"
     },
    "annotations": {
        "volume.alpha.kubernetes.io/storage-class": "foo"
    }
  },
  "spec": {
    "accessModes": [
      "ReadWriteOnce"
    ],
    "resources": {
      "requests": {
        "storage": "3Gi"
      }
    }
  }
}
```

My cloud config is
```
[Global]
multizone = true
```

The pod failed schedule due to "NoVolumeZoneConflict".

Events:
  FirstSeen	LastSeen	Count	From			SubobjectPath	Type		Reason			Message
  ---------	--------	-----	----			-------------	--------	------			-------
  7m		24s		28	{default-scheduler }			Warning		FailedScheduling	pod (gce) failed to fit in any node
fit failure on node (qe-jhou-node-registry-router-1): NoVolumeZoneConflict

Comment 3 Bradley Childs 2016-12-01 17:47:47 UTC
*** Bug 1400248 has been marked as a duplicate of this bug. ***

Comment 4 Hemant Kumar 2016-12-01 18:04:35 UTC
I can reproduce this bug with 100% certainty. While I think documentation indeed can be fixed, I am wondering if it can be fixed in code as well. 

I am looking at fixing it in code now.

Comment 5 Eric Paris 2016-12-01 21:25:36 UTC
Are all of the nodes in the GCE account in the same "projectID"?  The GCE provisioner should only allocate from zones that contain nodes in this cluster. One common failure is to have nodes in the same GCE "projectID" but which are not a part of a SINGLE kube cluster.

I know nothing about the GCE console how to set nodes as part of a single projectID.  Is that the problem here?

Comment 10 Hemant Kumar 2016-12-05 18:16:24 UTC
The current documentation - https://docs.openshift.org/latest/install_config/persistent_storage/dynamically_provisioning_pvs.html says and I think it can't be right:

"
In multi-zone configurations, PVs must be created in the same region/zone as the master node. Do this by setting the failure-domain.beta.kubernetes.io/region and failure-domain.beta.kubernetes.io/zone PV labels to match the master node."

This can't be right. For dynamically provisioned PVs, the PVs are created automatically from pvc and setting these lables on generated pv is meaningless because PV has been already created in some X. Now, setting these labels to master node's zone is kinda pointless.

Comment 13 Hemant Kumar 2016-12-07 19:13:04 UTC
Doc fix has been merged - https://github.com/openshift/openshift-docs/pull/3327/files

Comment 14 Liang Xia 2016-12-09 09:26:41 UTC
The doc looks good from QE's side.


Note You need to log in before you can comment on or make changes to this bug.