The log on the attached node appears different than what is shown in the bug report. Oct 31 06:29:44 gpei-34-gce-private-777-node-zone1-primary-1 atomic-openshift-node[54232]: I1031 06:29:44.480564 54232 kubelet_node_status.go:67] Successfully registered node gpei-34-gce-private-777-node-zone1-primary-1 It appears that the node actually registered by that log message. The logs also do not appear to go beyond 06:29. Is the proper log attached? I am moving this to the install component to see if they can reproduce in the interim until updated logs are supplied.
Additional logs have been attached, looks like it's removing nodes that aren't in the same region. Re-assigning to Kubernetes.
Ok I have a lead. By default, the GCE cloudprovider in kubernetes uses a single-zone mode. https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/gce/gce.go#L2906 https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/gce/gce.go#L307 using the getInstanceByName/single-zone endpoint with the zone of the requesting node. In order for the request to be make against all zones (multizone), one must: /etc/origin/master/master-config.yaml apiServerArguments: cloud-provider: - "gce" cloud-config: - "/etc/gce.conf" controllerArguments: cloud-provider: - "gce" cloud-config: - "/etc/gce.conf" and in /etc/gce.conf mutlizone = true You can do this on the nodes as well but they don't make requests for any instance information other than their own AFAICT I'm verifying that this works now.
Confirmed. Typo in my last comment. s/mutlizone/multizone/ in the gce.conf. Looks like this /etc/gce.conf [Global] multizone = true It doesn't look like this is supported by the ansible installer. Two ways I'm thinking for the installer to support this: 1) a new openshift_cloudprovider_gce_multizone=true variable 2) always set up the gce.conf with multizone=true if the openshift_cloudprovider_kind=gce Here is a rough PR for option 2 https://github.com/openshift/openshift-ansible/pull/2728
Test with openshift-ansible-3.4.18-1.git.0.ed7dac0.el7.noarch.rpm. Once enable gce cloudprovider by setting openshift_cloudprovider_kind=gce in ansible inventory, it will fail when "Set cloud provider facts" TASK [openshift_cloud_provider : Set cloud provider facts] ********************* Wednesday 09 November 2016 06:16:11 +0000 (0:00:02.131) 0:04:01.616 **** fatal: [146.148.52.78]: FAILED! => {"changed": false, "failed": true, "module_stderr": "Shared connection to 146.148.52.78 closed.\r\n", "module_stdout": "Traceback (most recent call last):\r\n File \"/tmp/ansible_xGtwL_/ansible_module_openshift_facts.py\", line 2293, in <module>\r\n main()\r\n File \"/tmp/ansible_xGtwL_/ansible_module_openshift_facts.py\", line 2274, in main\r\n protected_facts_to_overwrite)\r\n File \"/tmp/ansible_xGtwL_/ansible_module_openshift_facts.py\", line 1730, in __init__\r\n protected_facts_to_overwrite)\r\n File \"/tmp/ansible_xGtwL_/ansible_module_openshift_facts.py\", line 1788, in generate_facts\r\n facts = build_controller_args(facts)\r\n File \"/tmp/ansible_xGtwL_/ansible_module_openshift_facts.py\", line 1113, in build_controller_args\r\n kubelet_args['cloud-config'] = [cloud_cfg_path + '/gce.conf']\r\nNameError: global name 'kubelet_args' is not defined\r\n", "msg": "MODULE FAILURE"}
Seem like comment 11 is a bug relevant to openshift-ansible installer, need open a new bug to track that issue. Move this bug back to ON_QA status, will re-test it later using a workaround for comment 11.
Go trough comment 9, seem like the fix is landed in opoenshift-ansible installer, so ignore comment 12.
Commit pushed to master at https://github.com/openshift/openshift-ansible https://github.com/openshift/openshift-ansible/commit/5a752437e6e852c514d0698c9241be759c03c6c7 Fix typos in openshift_facts gce cloud provider Fixes Bug 1390160 Fixes BZ1390160
Setup an ocp-3.4 cluster with 2 masters+ 3 etcd + 6 node with openshift-ansible-3.4.20-1.git.0.2031d1e.el7.noarch.rpm, master/etcd/node groups are all located across zones. Set openshift_cloudprovider_kind=gce in ansible inventory, after installation, all nodes are available on the two masters, and also works well when atomic-openshift-master-controllers switch over between masters. Cloud provider configuration are correct in master-config.yaml, node-config.yaml and /etc/origin/cloudprovider/gce.conf /etc/origin/master/master-config.yaml ... apiServerArguments: cloud-config: - /etc/origin/cloudprovider/gce.conf cloud-provider: - gce controllerArguments: cloud-config: - /etc/origin/cloudprovider/gce.conf cloud-provider: - gce /etc/origin/node/node-config.yaml ... kubeletArguments: cloud-config: - /etc/origin/cloudprovider/gce.conf cloud-provider: - gce /etc/origin/cloudprovider/gce.conf [Global] multizone = true
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0066