Hide Forgot
Description of problem: Setup GCE cloud provider, having "multizone = true" in its cloud config, volumes can be provisioned in a zone where there are no nodes. The volumes are provisioned using "metadata.annotations.volume.alpha.kubernetes.io". According to https://github.com/openshift/openshift-docs/blob/master/install_config/persistent_storage/dynamically_provisioning_pvs.adoc. This can be avoided by setting labels to the PV, thus using StorageClass could avoid it. Considering both alpha and beta versions(for annotations) are in use, I think this needs to be highlighted and documented. Version-Release number of selected component (if applicable): openshift v3.4.0.29+ca980ba kubernetes v1.4.0+776c994 etcd 3.1.0-rc.0 How reproducible: Always Steps to Reproduce: 1. Setup OCP cluster on GCE with cloud provider enabled 2. Make sure the cloud config has multizone set to true 3. Use the 3.3 version PVC to dynamically provision a PV(i.e. not using StorageClass) Actual results: Volume provisioned in another zone where there are no nodes in it, thus pods can not mount. Expected results: Volume should always provision in zones where there are available nodes. Additional info:
I setup a cluster using Flexy deployment on GCE and tried to reproduce this. I am still somewhat away from reproducing it. It should be noted that - dynamic provisioning without storageclass document throws 404, if I try to browse via official website - https://docs.openshift.org/latest/install_config/persistent_storage/persistent_storage_gce.html Try clicking "provisioned dynamically" link in the page - it takes us to - https://docs.openshift.org/latest/install_config/persistent_storage/dynamically_provisioning_pvs.html#install-config-persistent-storage-dynamically-provisioning-pvs which throws 404. I don't know - if we are going to support dynamic provisioning without StorageClasses going forward. @bchilds what you think? @Jianwei can you please post your cloud provider config, pvc config?
I used the alpha version dynamic provisioner, i.e. PVC with "volume.alpha.kubernetes.io/storage-class" in its annotation. ``` { "kind": "PersistentVolumeClaim", "apiVersion": "v1", "metadata": { "name": "gcec", "labels": { "name": "gce-dynamic" }, "annotations": { "volume.alpha.kubernetes.io/storage-class": "foo" } }, "spec": { "accessModes": [ "ReadWriteOnce" ], "resources": { "requests": { "storage": "3Gi" } } } } ``` My cloud config is ``` [Global] multizone = true ``` The pod failed schedule due to "NoVolumeZoneConflict". Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 7m 24s 28 {default-scheduler } Warning FailedScheduling pod (gce) failed to fit in any node fit failure on node (qe-jhou-node-registry-router-1): NoVolumeZoneConflict
*** Bug 1400248 has been marked as a duplicate of this bug. ***
I can reproduce this bug with 100% certainty. While I think documentation indeed can be fixed, I am wondering if it can be fixed in code as well. I am looking at fixing it in code now.
Are all of the nodes in the GCE account in the same "projectID"? The GCE provisioner should only allocate from zones that contain nodes in this cluster. One common failure is to have nodes in the same GCE "projectID" but which are not a part of a SINGLE kube cluster. I know nothing about the GCE console how to set nodes as part of a single projectID. Is that the problem here?
The current documentation - https://docs.openshift.org/latest/install_config/persistent_storage/dynamically_provisioning_pvs.html says and I think it can't be right: " In multi-zone configurations, PVs must be created in the same region/zone as the master node. Do this by setting the failure-domain.beta.kubernetes.io/region and failure-domain.beta.kubernetes.io/zone PV labels to match the master node." This can't be right. For dynamically provisioned PVs, the PVs are created automatically from pvc and setting these lables on generated pv is meaningless because PV has been already created in some X. Now, setting these labels to master node's zone is kinda pointless.
Doc fix has been merged - https://github.com/openshift/openshift-docs/pull/3327/files
The doc looks good from QE's side.