Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1371872

Summary:	[doc]The multizone setting in gce.conf is ignored by PersistentVolumeLabel
Product:	OpenShift Container Platform	Reporter:	Weihua Meng <wmeng>
Component:	Storage	Assignee:	Hemant Kumar <hekumar>
Status:	CLOSED ERRATA	QA Contact:	Liang Xia <lxia>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	3.3.0	CC:	agoldste, aos-bugs, bchilds, decarr, dma, eparis, ghuang, hekumar, jhou, jokerman, jsafrane, lxia, mmccomas, tdawson
Target Milestone:	---
Target Release:	3.4.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	No Doc Update
Doc Text:	Bug doesn't require a documentation update.	Story Points:	---
Clone Of:
Clones:	1372702 (view as bug list)		Environment:
Last Closed:	2017-05-18 09:26:51 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1372702

Description Weihua Meng 2016-08-31 10:18:00 UTC

Description of problem:
There is cluster on GCE which master and node is in different zone, When create a pv has same zone with node, always failed.

Version-Release number of selected component (if applicable):
openshift v3.3.0.27
kubernetes v1.3.0+507d3a7
etcd 2.3.0+git

How reproducible:
Always

Steps to Reproduce:
1.Create a cluster which node and master in different zones on GCE
# oc get nodes --show-labels
NAME                                    STATUS    AGE       LABELS
qe-wmeng-zones-master-nfs-1             Ready     2h        beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard-1,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,kubernetes.io/hostname=qe-wmeng-zones-master-nfs-1,role=node
qe-wmeng-zones-node-registry-router-1   Ready     2h        beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard-1,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-b,kubernetes.io/hostname=qe-wmeng-zones-node-registry-router-1,registry=enabled,role=node,router=enabled
qe-wmeng-zones-node-registry-router-2   Ready     2h        beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard-1,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-b,kubernetes.io/hostname=qe-wmeng-zones-node-registry-router-2,registry=enabled,role=node,router=enabled
2.Create a pv which has same with zone node(zone=us-central1-b)
$ oc create -f pv.yaml
Error from server: error when creating "gce-pv.yaml": persistentvolumes "pv1b" is forbidden: error querying GCE PD volume wmeng-us-central1b: GCE persistent disk not found: "wmeng-us-central1b"
# cat pv.yaml
apiVersion: "v1"
kind: "PersistentVolume"
metadata:
  name: "pv1b" 
spec:
  capacity:
    storage: "12Gi" 
  accessModes:
    - "ReadWriteOnce"
  gcePersistentDisk: 
    fsType: "ext4" 
    pdName: "wmeng-us-central1b"

Actual results:
2. Create pv failed
$ oc create -f pv.yaml
Error from server: error when creating "gce-pv.yaml": persistentvolumes "pv1b" is forbidden: error querying GCE PD volume wmeng-us-central1b: GCE persistent disk not found: "wmeng-us-central1b"

Expected results:
2.Create pv success

Additional info:
PV created if same zone with master
# oc get pv --show-labels
NAME                                       CAPACITY   ACCESSMODES   STATUS      CLAIM                 REASON    AGE       LABELS
pv1a                                       11Gi       RWO           Available                                   2h        failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a

Comment 1 Andy Goldstein 2016-08-31 13:35:22 UTC

Can you try adding "multizone = true" to your gce.conf file?

Comment 2 Andy Goldstein 2016-08-31 13:35:41 UTC

And restarting your master

Comment 4 Andy Goldstein 2016-08-31 15:52:53 UTC

This is failing because of https://github.com/kubernetes/kubernetes/issues/27656. There is a workaround that I'm trying now.

Comment 6 Weihua Meng 2016-09-01 07:53:32 UTC

Thanks, Andy.
The workaround works.
On GCE, if create pv in a zone different from OpenShift master's zone, zone info should be explicitly specified. (for AWS, no need to do that)

Comment 8 Eric Paris 2016-09-01 13:36:52 UTC

Brad, can you make sure the docs include this workaround information? We'll try to dig deeper in the upcoming release.

Comment 9 Andy Goldstein 2016-09-01 13:45:56 UTC

This is really a Kube admission wiring bug. Probably better to reassign to CI. Earliest we could get this is 3.5.

Analysis:

In https://github.com/kubernetes/kubernetes/blob/ef0c9f0c5b8efbba948a0be2c98d9d2e32e0b68c/plugin/pkg/admission/persistentvolume/label/admission.go#L180, it creates the GCE cloud provider without specifying a config file. Because of this, the admission plugin doesn't know that the admin wants multizone=true.

We currently have a couple of places in the master config where it's possible to specify the cloud provider and cloud config file. Unfortunately, due to the way that admission plugins are currently initialized, they can't reuse these same settings, so we'll have to use a different mechanism (and potentially create yet another place where you have to specify the cloud provider and cloud config file).

Comment 12 Paul Morie 2016-10-26 14:15:29 UTC

Looks like this is still a bug.

Comment 13 Paul Morie 2016-10-26 14:17:59 UTC

Looking a little further - this affects AWS and GCE, both of which the admission controller initializes without passing a config.

Comment 14 Derek Carr 2017-01-18 15:59:55 UTC

moving to storage component.

Comment 15 Derek Carr 2017-01-18 16:13:03 UTC

The PersistentVolumeLabel admission controller could fix this problem in one of two ways.

Option 1: use per admission control configuration file
https://github.com/kubernetes/kubernetes/pull/39109

Option 2: extend admission control with a "WantsCloudConfig" interface
this is probably preferred, and is a well-established pattern.
a similar PR that is doing the same for "WantsToRun" is available here for reference:
https://github.com/kubernetes/kubernetes/pull/37148

I would prefer Option 2 as I suspect the cloud configuration is generally desired across multiple plug-ins and users would not want to configure the same information in multiple places.

Comment 16 Hemant Kumar 2017-01-26 15:47:56 UTC

Yeah I like option#2 as well. I am half way there, the only mini minor problem I see is -  as plugins require more such "special things" constructor in NewPluginInitializer (https://github.com/kubernetes/kubernetes/blob/master/cmd/kube-apiserver/app/server.go#L277) will get 'crowded'. Part of me thinks, the PluginInitializer being an interface is a problem - it doesn't do much as an interface can easily be replaced with concrete implementation itself and thereby removing this construction pain.

Comment 17 Hemant Kumar 2017-01-26 18:50:09 UTC

Opened a PR to fix this upstream - https://github.com/kubernetes/kubernetes/pull/40537 It is ready for review, but I am testing it on GCE and stuff.

Comment 23 Liang Xia 2017-02-14 03:12:49 UTC

Verified the workaround works. Move bug to verified as we can track the code fix in trello.

# openshift version
openshift v3.5.0.20+87266c6
kubernetes v1.5.2+43a9be4
etcd 3.1.0

# oc create -f pv1.json 
Error from server (Forbidden): error when creating "pv1.json": persistentvolumes "pv1" is forbidden: error querying GCE PD volume gcepd: GCE persistent disk "gcepd" not found in managed zones (us-central1-a)

# oc create -f pv2.json
persistentvolume "pv2" created
# oc get pv
NAME           CAPACITY   ACCESSMODES   RECLAIMPOLICY   STATUS      CLAIM                 REASON    AGE
pv2         5Gi        RWO           Retain          Available                                   3s

# diff pv1.json pv2.json 
1,17c1,15
< {
<   "apiVersion": "v1",
<   "kind": "PersistentVolume",
<   "metadata": {
<     "name": "pv1"
<   },
<   "spec": {
<     "capacity": {
<         "storage": "5Gi"
<     },
<     "accessModes": [ "ReadWriteOnce" ],
<     "gcePersistentDisk": {
<         "pdName": "gcepd",
<         "fsType": "ext4"
<     }
<   }
< }
---
> apiVersion: "v1"
> kind: "PersistentVolume"
> metadata:
>   name: "pv2"
>   labels:
>     failure-domain.beta.kubernetes.io/region: "us-central1"
>     failure-domain.beta.kubernetes.io/zone: "us-central1-b"
> spec:
>   capacity:
>     storage: "5Gi"
>   accessModes:
>     - "ReadWriteOnce"
>   gcePersistentDisk:
>     fsType: "ext4"
>     pdName: "gcepd"


$ gcloud compute disks list gcepd
NAME   ZONE           SIZE_GB  TYPE         STATUS
gcepd  us-central1-b  5        pd-standard  READY

Comment 24 Troy Dawson 2017-04-20 22:11:26 UTC

Why is this bug marked with a Target Release of 3.4.1, when the Verification was for 3.5.0.
Can we move this to a target of 3.5.1?

Comment 25 Liang Xia 2017-05-12 02:43:25 UTC

Hi Hemant,

Could you confirm the concern in #comment 24 ?

Thanks

Comment 27 Hemant Kumar 2017-05-17 14:04:36 UTC

I am not sure why this was marked for target of 3.4.1. This bug only fixes documentation to have proper workaround for fixing the problem in both 3.4 and 3.5. So documentation fix is applicable to both versions (3.4 and 3.5)

Also this bug has been fixed for good in 3.6 - https://trello.com/c/MskIyxux/422-enable-cloudprovider-configuration-in-admission-controller and documented workaround is not needed with 3.6.

Comment 28 errata-xmlrpc 2017-05-18 09:26:51 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1235