Bug 1371872 - [doc]The multizone setting in gce.conf is ignored by PersistentVolumeLabel
Summary: [doc]The multizone setting in gce.conf is ignored by PersistentVolumeLabel
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 3.3.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.4.z
Assignee: Hemant Kumar
QA Contact: Liang Xia
URL:
Whiteboard:
Depends On:
Blocks: 1372702
TreeView+ depends on / blocked
 
Reported: 2016-08-31 10:18 UTC by Weihua Meng
Modified: 2017-05-18 09:26 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Bug doesn't require a documentation update.
Clone Of:
: 1372702 (view as bug list)
Environment:
Last Closed: 2017-05-18 09:26:51 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:1235 0 normal SHIPPED_LIVE OpenShift Container Platform 3.5, 3.4, 3.3, and 3.1 bug fix update 2017-05-18 13:15:52 UTC

Description Weihua Meng 2016-08-31 10:18:00 UTC
Description of problem:
There is cluster on GCE which master and node is in different zone, When create a pv has same zone with node, always failed.

Version-Release number of selected component (if applicable):
openshift v3.3.0.27
kubernetes v1.3.0+507d3a7
etcd 2.3.0+git

How reproducible:
Always

Steps to Reproduce:
1.Create a cluster which node and master in different zones on GCE
# oc get nodes --show-labels
NAME                                    STATUS    AGE       LABELS
qe-wmeng-zones-master-nfs-1             Ready     2h        beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard-1,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a,kubernetes.io/hostname=qe-wmeng-zones-master-nfs-1,role=node
qe-wmeng-zones-node-registry-router-1   Ready     2h        beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard-1,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-b,kubernetes.io/hostname=qe-wmeng-zones-node-registry-router-1,registry=enabled,role=node,router=enabled
qe-wmeng-zones-node-registry-router-2   Ready     2h        beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=n1-standard-1,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-b,kubernetes.io/hostname=qe-wmeng-zones-node-registry-router-2,registry=enabled,role=node,router=enabled
2.Create a pv which has same with zone node(zone=us-central1-b)
$ oc create -f pv.yaml
Error from server: error when creating "gce-pv.yaml": persistentvolumes "pv1b" is forbidden: error querying GCE PD volume wmeng-us-central1b: GCE persistent disk not found: "wmeng-us-central1b"
# cat pv.yaml
apiVersion: "v1"
kind: "PersistentVolume"
metadata:
  name: "pv1b" 
spec:
  capacity:
    storage: "12Gi" 
  accessModes:
    - "ReadWriteOnce"
  gcePersistentDisk: 
    fsType: "ext4" 
    pdName: "wmeng-us-central1b"

Actual results:
2. Create pv failed
$ oc create -f pv.yaml
Error from server: error when creating "gce-pv.yaml": persistentvolumes "pv1b" is forbidden: error querying GCE PD volume wmeng-us-central1b: GCE persistent disk not found: "wmeng-us-central1b"

Expected results:
2.Create pv success

Additional info:
PV created if same zone with master
# oc get pv --show-labels
NAME                                       CAPACITY   ACCESSMODES   STATUS      CLAIM                 REASON    AGE       LABELS
pv1a                                       11Gi       RWO           Available                                   2h        failure-domain.beta.kubernetes.io/region=us-central1,failure-domain.beta.kubernetes.io/zone=us-central1-a

Comment 1 Andy Goldstein 2016-08-31 13:35:22 UTC
Can you try adding "multizone = true" to your gce.conf file?

Comment 2 Andy Goldstein 2016-08-31 13:35:41 UTC
And restarting your master

Comment 4 Andy Goldstein 2016-08-31 15:52:53 UTC
This is failing because of https://github.com/kubernetes/kubernetes/issues/27656. There is a workaround that I'm trying now.

Comment 6 Weihua Meng 2016-09-01 07:53:32 UTC
Thanks, Andy.
The workaround works.
On GCE, if create pv in a zone different from OpenShift master's zone, zone info should be explicitly specified. (for AWS, no need to do that)

Comment 8 Eric Paris 2016-09-01 13:36:52 UTC
Brad, can you make sure the docs include this workaround information? We'll try to dig deeper in the upcoming release.

Comment 9 Andy Goldstein 2016-09-01 13:45:56 UTC
This is really a Kube admission wiring bug. Probably better to reassign to CI. Earliest we could get this is 3.5.

Analysis:

In https://github.com/kubernetes/kubernetes/blob/ef0c9f0c5b8efbba948a0be2c98d9d2e32e0b68c/plugin/pkg/admission/persistentvolume/label/admission.go#L180, it creates the GCE cloud provider without specifying a config file. Because of this, the admission plugin doesn't know that the admin wants multizone=true.

We currently have a couple of places in the master config where it's possible to specify the cloud provider and cloud config file. Unfortunately, due to the way that admission plugins are currently initialized, they can't reuse these same settings, so we'll have to use a different mechanism (and potentially create yet another place where you have to specify the cloud provider and cloud config file).

Comment 12 Paul Morie 2016-10-26 14:15:29 UTC
Looks like this is still a bug.

Comment 13 Paul Morie 2016-10-26 14:17:59 UTC
Looking a little further - this affects AWS and GCE, both of which the admission controller initializes without passing a config.

Comment 14 Derek Carr 2017-01-18 15:59:55 UTC
moving to storage component.

Comment 15 Derek Carr 2017-01-18 16:13:03 UTC
The PersistentVolumeLabel admission controller could fix this problem in one of two ways.

Option 1: use per admission control configuration file
https://github.com/kubernetes/kubernetes/pull/39109

Option 2: extend admission control with a "WantsCloudConfig" interface
this is probably preferred, and is a well-established pattern.
a similar PR that is doing the same for "WantsToRun" is available here for reference:
https://github.com/kubernetes/kubernetes/pull/37148

I would prefer Option 2 as I suspect the cloud configuration is generally desired across multiple plug-ins and users would not want to configure the same information in multiple places.

Comment 16 Hemant Kumar 2017-01-26 15:47:56 UTC
Yeah I like option#2 as well. I am half way there, the only mini minor problem I see is -  as plugins require more such "special things" constructor in NewPluginInitializer (https://github.com/kubernetes/kubernetes/blob/master/cmd/kube-apiserver/app/server.go#L277) will get 'crowded'. Part of me thinks, the PluginInitializer being an interface is a problem - it doesn't do much as an interface can easily be replaced with concrete implementation itself and thereby removing this construction pain.

Comment 17 Hemant Kumar 2017-01-26 18:50:09 UTC
Opened a PR to fix this upstream - https://github.com/kubernetes/kubernetes/pull/40537 It is ready for review, but I am testing it on GCE and stuff.

Comment 23 Liang Xia 2017-02-14 03:12:49 UTC
Verified the workaround works. Move bug to verified as we can track the code fix in trello.

# openshift version
openshift v3.5.0.20+87266c6
kubernetes v1.5.2+43a9be4
etcd 3.1.0

# oc create -f pv1.json 
Error from server (Forbidden): error when creating "pv1.json": persistentvolumes "pv1" is forbidden: error querying GCE PD volume gcepd: GCE persistent disk "gcepd" not found in managed zones (us-central1-a)

# oc create -f pv2.json
persistentvolume "pv2" created
# oc get pv
NAME           CAPACITY   ACCESSMODES   RECLAIMPOLICY   STATUS      CLAIM                 REASON    AGE
pv2         5Gi        RWO           Retain          Available                                   3s

# diff pv1.json pv2.json 
1,17c1,15
< {
<   "apiVersion": "v1",
<   "kind": "PersistentVolume",
<   "metadata": {
<     "name": "pv1"
<   },
<   "spec": {
<     "capacity": {
<         "storage": "5Gi"
<     },
<     "accessModes": [ "ReadWriteOnce" ],
<     "gcePersistentDisk": {
<         "pdName": "gcepd",
<         "fsType": "ext4"
<     }
<   }
< }
---
> apiVersion: "v1"
> kind: "PersistentVolume"
> metadata:
>   name: "pv2"
>   labels:
>     failure-domain.beta.kubernetes.io/region: "us-central1"
>     failure-domain.beta.kubernetes.io/zone: "us-central1-b"
> spec:
>   capacity:
>     storage: "5Gi"
>   accessModes:
>     - "ReadWriteOnce"
>   gcePersistentDisk:
>     fsType: "ext4"
>     pdName: "gcepd"


$ gcloud compute disks list gcepd
NAME   ZONE           SIZE_GB  TYPE         STATUS
gcepd  us-central1-b  5        pd-standard  READY

Comment 24 Troy Dawson 2017-04-20 22:11:26 UTC
Why is this bug marked with a Target Release of 3.4.1, when the Verification was for 3.5.0.
Can we move this to a target of 3.5.1?

Comment 25 Liang Xia 2017-05-12 02:43:25 UTC
Hi Hemant,

Could you confirm the concern in #comment 24 ?

Thanks

Comment 27 Hemant Kumar 2017-05-17 14:04:36 UTC
I am not sure why this was marked for target of 3.4.1. This bug only fixes documentation to have proper workaround for fixing the problem in both 3.4 and 3.5. So documentation fix is applicable to both versions (3.4 and 3.5)

Also this bug has been fixed for good in 3.6 - https://trello.com/c/MskIyxux/422-enable-cloudprovider-configuration-in-admission-controller and documented workaround is not needed with 3.6.

Comment 28 errata-xmlrpc 2017-05-18 09:26:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1235


Note You need to log in before you can comment on or make changes to this bug.