Bug 1390160
Summary: | Couldn't deploy multizone OCP-3.4 env on GCE | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Gaoyun Pei <gpei> |
Component: | Installer | Assignee: | Scott Dodson <sdodson> |
Status: | CLOSED ERRATA | QA Contact: | Gaoyun Pei <gpei> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 3.4.0 | CC: | agoldste, aos-bugs, decarr, dma, haowang, jialiu, jokerman, lxia, mmccomas, wmeng |
Target Milestone: | --- | Keywords: | TestBlocker |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | openshift-ansible-3.4.20-1 | Doc Type: | Bug Fix |
Doc Text: |
Previously openshift-ansible did not configure environments using GCE as multizone clusters. This prevented nodes from different zones registering against masters. Now GCE based clusters are multizone enabled allowing nodes from other zones to register themselves.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2017-01-18 12:48:03 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Comment 2
Derek Carr
2016-10-31 14:39:04 UTC
Additional logs have been attached, looks like it's removing nodes that aren't in the same region. Re-assigning to Kubernetes. Ok I have a lead. By default, the GCE cloudprovider in kubernetes uses a single-zone mode. https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/gce/gce.go#L2906 https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/gce/gce.go#L307 using the getInstanceByName/single-zone endpoint with the zone of the requesting node. In order for the request to be make against all zones (multizone), one must: /etc/origin/master/master-config.yaml apiServerArguments: cloud-provider: - "gce" cloud-config: - "/etc/gce.conf" controllerArguments: cloud-provider: - "gce" cloud-config: - "/etc/gce.conf" and in /etc/gce.conf mutlizone = true You can do this on the nodes as well but they don't make requests for any instance information other than their own AFAICT I'm verifying that this works now. Confirmed. Typo in my last comment. s/mutlizone/multizone/ in the gce.conf. Looks like this /etc/gce.conf [Global] multizone = true It doesn't look like this is supported by the ansible installer. Two ways I'm thinking for the installer to support this: 1) a new openshift_cloudprovider_gce_multizone=true variable 2) always set up the gce.conf with multizone=true if the openshift_cloudprovider_kind=gce Here is a rough PR for option 2 https://github.com/openshift/openshift-ansible/pull/2728 Test with openshift-ansible-3.4.18-1.git.0.ed7dac0.el7.noarch.rpm. Once enable gce cloudprovider by setting openshift_cloudprovider_kind=gce in ansible inventory, it will fail when "Set cloud provider facts" TASK [openshift_cloud_provider : Set cloud provider facts] ********************* Wednesday 09 November 2016 06:16:11 +0000 (0:00:02.131) 0:04:01.616 **** fatal: [146.148.52.78]: FAILED! => {"changed": false, "failed": true, "module_stderr": "Shared connection to 146.148.52.78 closed.\r\n", "module_stdout": "Traceback (most recent call last):\r\n File \"/tmp/ansible_xGtwL_/ansible_module_openshift_facts.py\", line 2293, in <module>\r\n main()\r\n File \"/tmp/ansible_xGtwL_/ansible_module_openshift_facts.py\", line 2274, in main\r\n protected_facts_to_overwrite)\r\n File \"/tmp/ansible_xGtwL_/ansible_module_openshift_facts.py\", line 1730, in __init__\r\n protected_facts_to_overwrite)\r\n File \"/tmp/ansible_xGtwL_/ansible_module_openshift_facts.py\", line 1788, in generate_facts\r\n facts = build_controller_args(facts)\r\n File \"/tmp/ansible_xGtwL_/ansible_module_openshift_facts.py\", line 1113, in build_controller_args\r\n kubelet_args['cloud-config'] = [cloud_cfg_path + '/gce.conf']\r\nNameError: global name 'kubelet_args' is not defined\r\n", "msg": "MODULE FAILURE"} Seem like comment 11 is a bug relevant to openshift-ansible installer, need open a new bug to track that issue. Move this bug back to ON_QA status, will re-test it later using a workaround for comment 11. Go trough comment 9, seem like the fix is landed in opoenshift-ansible installer, so ignore comment 12. Commit pushed to master at https://github.com/openshift/openshift-ansible https://github.com/openshift/openshift-ansible/commit/5a752437e6e852c514d0698c9241be759c03c6c7 Fix typos in openshift_facts gce cloud provider Fixes Bug 1390160 Fixes BZ1390160 Setup an ocp-3.4 cluster with 2 masters+ 3 etcd + 6 node with openshift-ansible-3.4.20-1.git.0.2031d1e.el7.noarch.rpm, master/etcd/node groups are all located across zones. Set openshift_cloudprovider_kind=gce in ansible inventory, after installation, all nodes are available on the two masters, and also works well when atomic-openshift-master-controllers switch over between masters. Cloud provider configuration are correct in master-config.yaml, node-config.yaml and /etc/origin/cloudprovider/gce.conf /etc/origin/master/master-config.yaml ... apiServerArguments: cloud-config: - /etc/origin/cloudprovider/gce.conf cloud-provider: - gce controllerArguments: cloud-config: - /etc/origin/cloudprovider/gce.conf cloud-provider: - gce /etc/origin/node/node-config.yaml ... kubeletArguments: cloud-config: - /etc/origin/cloudprovider/gce.conf cloud-provider: - gce /etc/origin/cloudprovider/gce.conf [Global] multizone = true Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0066 |