Bug 1454601 - Provision PV in zone other than master failed with error "disk is not found" while disk exists
Summary: Provision PV in zone other than master failed with error "disk is not found" ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 3.6.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.7.0
Assignee: David Eads
QA Contact: Jianwei Hou
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-05-23 07:08 UTC by Liang Xia
Modified: 2017-11-28 21:56 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Feature: Fix static PV provisioning in multizone environments. Reason: Lack of Cloudprovider configuration in admission plugin that checks zone for PV provisioning can cause PV creation to fail when PV is being created in a zone other than master. Result: After the fix, users can statically provision PVs in zones other than master in multizone configurations.
Clone Of:
Environment:
Last Closed: 2017-11-28 21:56:17 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:3188 normal SHIPPED_LIVE Moderate: Red Hat OpenShift Container Platform 3.7 security, bug, and enhancement update 2017-11-29 02:34:54 UTC

Description Liang Xia 2017-05-23 07:08:02 UTC
Description of problem:
On a multi-zone cluster (with multizone enabled), creating PV in zone other than master failed with below error:
"Error from server (Forbidden): error when creating "pv.yaml": persistentvolumes "pv2" is forbidden: error querying GCE PD volume disk1-us-central1-b: disk is not found", but actually, the volume/disk exists.

Version-Release number of selected component (if applicable):
openshift v3.6.74
kubernetes v1.6.1+5115d708d7
etcd 3.1.0

How reproducible:
Always

Steps to Reproduce:
1.Set up multi-zone cluster(1 master in us-central1-a, 2 nodes in us-central1-b) on GCE.
2.Create the volumes in us-central1-a, us-central1-b and us-central1-c.
3.Create the pv with above volumes.
$ oc create -f pv.yaml
apiVersion: "v1"
kind: "PersistentVolume"
metadata:
   name: "pv2"
spec:
   capacity:
     storage: "1Gi"
   accessModes:
     - "ReadWriteOnce"
   gcePersistentDisk:
     fsType: "ext4"
     pdName: "disk1-us-central1-b"

Actual results:
PV with volume in us-central1-a(the same as master) can be created.
PV with volume in us-central1-b/us-central1-c can _NOT_ be created. Failed as below,
# oc create -f pv.yaml
Error from server (Forbidden): error when creating "pv.yaml": persistentvolumes "pv2" is forbidden: error querying GCE PD volume disk1-us-central1-b: disk is not found

Expected results:
PV should be created successfully.

Additional info:
The disk really exist.
$ gcloud compute disks list | grep disk
disk1-us-central1-a                                          us-central1-a  1        pd-standard  READY
disk1-us-central1-c                                          us-central1-c  1        pd-standard  READY
disk1-us-central1-b                                          us-central1-b  1        pd-standard  READY

Comment 1 Hemant Kumar 2017-05-23 14:56:48 UTC
Can you verify this with plain Kubernetes installation and see if it works? Wondering if something broke existing code.

Comment 2 Liang Xia 2017-05-24 01:58:14 UTC
With the same pv.yaml as in #comment 0, pv can be created on k8s with below version,

# kubectl version
Client Version: version.Info{Major:"1", Minor:"7+", GitVersion:"v1.7.0-alpha.4.322+95a6f108bda5c4", GitCommit:"95a6f108bda5c4af79e78e4ebf613252c8c0fd5f", GitTreeState:"clean", BuildDate:"2017-05-24T01:31:11Z", GoVersion:"go1.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"7+", GitVersion:"v1.7.0-alpha.4.322+95a6f108bda5c4", GitCommit:"95a6f108bda5c4af79e78e4ebf613252c8c0fd5f", GitTreeState:"clean", BuildDate:"2017-05-24T01:31:11Z", GoVersion:"go1.8", Compiler:"gc", Platform:"linux/amd64"}

Comment 5 Hemant Kumar 2017-06-01 14:30:06 UTC
Openshift has broken admission plugin initializer - https://github.com/openshift/origin/blob/master/pkg/cmd/server/start/start_master.go#L399

We are passing nil cloudprovider config to plugin initializer. Also, because plugin is initialized before cloudprovider config, cloudprovider config isn't available for plugin initialization.

Comment 7 David Eads 2017-06-01 19:18:35 UTC
Opened https://github.com/openshift/origin/pull/14444 as a possibility, but its a problematic thing that should really fall out as the result of a more in-depth refactor.

Comment 8 Eric Paris 2017-06-02 16:18:22 UTC
Marking this UpcomingRelease. David was not confident in his fix so lets merge it monday morning. That will leave use 3 weeks to make sure we don't make things worse.

Comment 9 Hemant Kumar 2017-06-02 16:20:54 UTC
I have added a Trello card for myself to write an e2e test for this - https://trello.com/c/Ul2Ldjjn/508-write-an-e2e-test-for-multizone-fix

Having said that, I did pull david's change and verified on real GCE mutlizone cluster (repeating same comment I left on PR).

Comment 10 Liang Xia 2017-06-26 07:41:20 UTC
Verified this has been fixed on OCP 3.6.121

Once the bug is ON_QA, we can move it to verified.

Comment 11 Liang Xia 2017-06-27 01:45:43 UTC
Move bug to verified.

Comment 15 errata-xmlrpc 2017-11-28 21:56:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188


Note You need to log in before you can comment on or make changes to this bug.