Description of problem: Installer fails to label nodes correctly. This results in two things... 1) The installer crashes with an error if node labels are specified using openshift_node_labels in inventory file 2) The installer deploys router pods but they fail to deploy due to no nodes being available Version-Release number of the following components: $ cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.4 (Maipo) /openshift-ansible]$ ansible --version ansible 2.4.2.0 config file = /home/ccallega/git/openshift-ansible/ansible.cfg configured module search path = [u'/home/ccallega/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules'] ansible python module location = /usr/lib/python2.7/site-packages/ansible executable location = /usr/bin/ansible python version = 2.7.5 (default, May 3 2017, 07:55:04) [GCC 4.8.5 20150623 (Red Hat 4.8.5-14)] $ git describe openshift-ansible-3.9.0-0.35.0-33-g3e2c7c22a How reproducible: Always Steps to Reproduce: 1. ansible-playbook playbooks/aws/openshift-cluster/provision_install.yml -e /provisioning_vars.yml 2. 3. Actual results: When openshift_node_labels in inventory file... TASK [openshift_node : file] *************************************************************************************************************************** Friday 26 January 2018 14:19:56 -0500 (0:00:00.370) 0:15:27.446 ******** fatal: [ec2-54-91-116-198.compute-1.amazonaws.com]: FAILED! => {} MSG: The conditional check '('config' in l2_openshift_node_kubelet_args) | bool' failed. The error was: {{ l_node_kubelet_args_default | combine(l_openshift_node_kubelet_args, recursive=True) }}: {{ openshift_node_kubelet_args_dict[openshift_cloudprovider_kind | default('undefined')] }}: {u'azure': {u'cloud-config': [u"{{ openshift_config_base ~ '/cloudprovider/azure.conf' }}"], u'node-labels': u'{{ l_node_kubelet_node_labels }}', u'cloud-provider': [u'azure']}, u'openstack': {u'cloud-config': [u"{{ openshift_config_base ~ '/cloudprovider/openstack.conf' }}"], u'node-labels': u'{{ l_node_kubelet_node_labels }}', u'cloud-provider': [u'openstack']}, u'gce': {u'cloud-config': [u"{{ openshift_config_base ~ '/cloudprovider/gce.conf' }}"], u'node-labels': u'{{ l_node_kubelet_node_labels }}', u'cloud-provider': [u'gce']}, u'aws': {u'cloud-config': [u"{{ openshift_config_base ~ '/cloudprovider/aws.conf' }}"], u'node-labels': u'{{ l_node_kubelet_node_labels }}', u'cloud-provider': [u'aws']}, u'undefined': {u'node-labels': u'{{ l_node_kubelet_node_labels }}'}}: {{ openshift_node_labels | default({}) | lib_utils_oo_dict_to_keqv_list }}: 'unicode' object has no attribute 'items' The error appears to have been in '/home/ccallega/git/openshift-ansible/roles/openshift_node/tasks/config.yml': line 26, column 3, but may be elsewhere in the file depending on the exact syntax problem. The offending line appears to be: - file: ^ here fatal: [ec2-52-87-190-174.compute-1.amazonaws.com]: FAILED! => {} MSG: The conditional check '('config' in l2_openshift_node_kubelet_args) | bool' failed. The error was: {{ l_node_kubelet_args_default | combine(l_openshift_node_kubelet_args, recursive=True) }}: {{ openshift_node_kubelet_args_dict[openshift_cloudprovider_kind | default('undefined')] }}: {u'azure': {u'cloud-config': [u"{{ openshift_config_base ~ '/cloudprovider/azure.conf' }}"], u'node-labels': u'{{ l_node_kubelet_node_labels }}', u'cloud-provider': [u'azure']}, u'openstack': {u'cloud-config': [u"{{ openshift_config_base ~ '/cloudprovider/openstack.conf' }}"], u'node-labels': u'{{ l_node_kubelet_node_labels }}', u'cloud-provider': [u'openstack']}, u'gce': {u'cloud-config': [u"{{ openshift_config_base ~ '/cloudprovider/gce.conf' }}"], u'node-labels': u'{{ l_node_kubelet_node_labels }}', u'cloud-provider': [u'gce']}, u'aws': {u'cloud-config': [u"{{ openshift_config_base ~ '/cloudprovider/aws.conf' }}"], u'node-labels': u'{{ l_node_kubelet_node_labels }}', u'cloud-provider': [u'aws']}, u'undefined': {u'node-labels': u'{{ l_node_kubelet_node_labels }}'}}: {{ openshift_node_labels | default({}) | lib_utils_oo_dict_to_keqv_list }}: 'unicode' object has no attribute 'items' The error appears to have been in '/home/ccallega/git/openshift-ansible/roles/openshift_node/tasks/config.yml': line 26, column 3, but may be elsewhere in the file depending on the exact syntax problem. The offending line appears to be: - file: ^ here fatal: [ec2-34-227-15-109.compute-1.amazonaws.com]: FAILED! => {} MSG: The conditional check '('config' in l2_openshift_node_kubelet_args) | bool' failed. The error was: {{ l_node_kubelet_args_default | combine(l_openshift_node_kubelet_args, recursive=True) }}: {{ openshift_node_kubelet_args_dict[openshift_cloudprovider_kind | default('undefined')] }}: {u'azure': {u'cloud-config': [u"{{ openshift_config_base ~ '/cloudprovider/azure.conf' }}"], u'node-labels': u'{{ l_node_kubelet_node_labels }}', u'cloud-provider': [u'azure']}, u'openstack': {u'cloud-config': [u"{{ openshift_config_base ~ '/cloudprovider/openstack.conf' }}"], u'node-labels': u'{{ l_node_kubelet_node_labels }}', u'cloud-provider': [u'openstack']}, u'gce': {u'cloud-config': [u"{{ openshift_config_base ~ '/cloudprovider/gce.conf' }}"], u'node-labels': u'{{ l_node_kubelet_node_labels }}', u'cloud-provider': [u'gce']}, u'aws': {u'cloud-config': [u"{{ openshift_config_base ~ '/cloudprovider/aws.conf' }}"], u'node-labels': u'{{ l_node_kubelet_node_labels }}', u'cloud-provider': [u'aws']}, u'undefined': {u'node-labels': u'{{ l_node_kubelet_node_labels }}'}}: {{ openshift_node_labels | default({}) | lib_utils_oo_dict_to_keqv_list }}: 'unicode' object has no attribute 'items' The error appears to have been in '/home/ccallega/git/openshift-ansible/roles/openshift_node/tasks/config.yml': line 26, column 3, but may be elsewhere in the file depending on the exact syntax problem. The offending line appears to be: - file: ^ here Expected results: I expect nodes to be labelled correctly I expect router pods to deploy to assigned infra nodes I expect the installer to complete without fail Additional info: -vvvv yields no further information
I've been hammering on this issue via GitHub issue ... https://github.com/openshift/openshift-ansible/issues/6897
Created attachment 1391563 [details] Inventory file
Created attachment 1391564 [details] provisioning_vars
From my investigation ... #1) This is the source of the installer file crash diff --git a/roles/openshift_node/defaults/main.yml b/roles/openshift_node/defaults/main.yml index 9f887891b..304dfbe08 100644 --- a/roles/openshift_node/defaults/main.yml +++ b/roles/openshift_node/defaults/main.yml @@ -27,7 +27,7 @@ openshift_dns_ip: "{{ ansible_default_ipv4['address'] }}" openshift_node_env_vars: {} # Create list of 'k=v' pairs. -l_node_kubelet_node_labels: "{{ openshift_node_labels | default({}) | lib_utils_oo_dict_to_keqv_list }}" +l_node_kubelet_node_labels: "{{ openshift_node_labels | default({}) }}" This fixes the installer failure and nodes get labelled. Curiously we end up with labels like this ... # oc get nodes --show-labels=True NAME STATUS AGE VERSION LABELS ip-172-31-48-53.sysdeseng.com Ready 6m v1.7.6+a08f5eeb62 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,default=,host-type=master,infra=,kubernetes.io/hostname=ip-172-31-48-53.sysdeseng.com,master=,node-role.kubernetes.io/master=true,region=infra,sub-host-type=default ip-172-31-52-218.sysdeseng.com Ready 6m v1.7.6+a08f5eeb62 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,default=,host-type=master,infra=,kubernetes.io/hostname=ip-172-31-52-218.sysdeseng.com,master=,node-role.kubernetes.io/master=true,region=infra,sub-host-type=default ip-172-31-60-255.sysdeseng.com Ready 6m v1.7.6+a08f5eeb62 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,default=,host-type=master,infra=,kubernetes.io/hostname=ip-172-31-60-255.sysdeseng.com,master=,node-role.kubernetes.io/master=true,region=infra,sub-host-type=default Note the labels that have the values of openshift_node_labels. This is where I am stumped. I have searched high and low and cannot find where these labels are coming from. I have a least narrowed it down to node service restart in roles/openshift_node/tasks/config.yml#L89. Things I've checked for... 1) Bad templating and config /etc/origin/node/system:node:<fqdn>.kubeconfig /etc/origin/node/node-config.yaml /etc/sysconfig/atomic-openshift-node 2) manual oc label command via tasks and python
Created attachment 1391707 [details] ansible_vvvv_log ansible-playbook playbooks/aws/openshift-cluster/provision_install.yml -e @playbooks/aws/provisioning_vars.yml -vvvv
I give up on trying to find where the leaked labels are coming from. We have to label the nodes anyway. It's very easy to do a var to label comparison and remove the bad labels via oc_label. Here's my diff for that solution... diff --git a/roles/lib_openshift/library/oc_label.py b/roles/lib_openshift/library/oc_label.py index ac3279ef8..d0e71e205 100644 --- a/roles/lib_openshift/library/oc_label.py +++ b/roles/lib_openshift/library/oc_label.py @@ -1532,6 +1532,26 @@ class OCLabel(OpenShiftCLI): return False + def sanitize_labels(self): + ''' sanitize labels: \ + Awful work around because somehow openshift_node_labels values are + leaking into node labels\ + ''' + cmd = self.cmd_template() + + get_extra_current_labels = self.get_extra_current_labels() + for label in self.labels: + if any(label['value'] in a for a in get_extra_current_labels): + exec_cmd = True + cmd.append("{}-".format(label['value'])) + + try: + if exec_cmd: cmd.append('--overwrite') + except Exception as e: + exec_cmd = False + + if exec_cmd: return self.openshift_cmd(cmd) + def replace(self): ''' replace currently stored labels with user provided labels ''' cmd = self.cmd_template() @@ -1644,6 +1664,7 @@ class OCLabel(OpenShiftCLI): # Add ####### if state == 'add': + oc_label.sanitize_labels() if not (name or selector): return {'failed': True, 'msg': "Param 'name' or 'selector' is required if state == 'add'"} diff --git a/roles/openshift_node/defaults/main.yml b/roles/openshift_node/defaults/main.yml index 9f887891b..304dfbe08 100644 --- a/roles/openshift_node/defaults/main.yml +++ b/roles/openshift_node/defaults/main.yml @@ -27,7 +27,7 @@ openshift_dns_ip: "{{ ansible_default_ipv4['address'] }}" openshift_node_env_vars: {} # Create list of 'k=v' pairs. -l_node_kubelet_node_labels: "{{ openshift_node_labels | default({}) | lib_utils_oo_dict_to_keqv_list }}" +l_node_kubelet_node_labels: "{{ openshift_node_labels | default({}) }}"
Here are labels at task "Start node service..." [root@ip-172-31-60-99 ~]# oc get node NAME STATUS AGE VERSION LABELS ip-172-31-60-99.sysdeseng.com Ready 47s v1.7.6+a08f5eeb62 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,default=,host-type=,infra=,kubernetes.io/hostname=ip-172-31-60-99.sysdeseng.com,master=,region=,sub-host-type= ip-172-31-63-10.sysdeseng.com Ready 17s v1.7.6+a08f5eeb62 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,default=,host-type=,infra=,kubernetes.io/hostname=ip-172-31-63-10.sysdeseng.com,master=,region=,sub-host-type= ip-172-31-63-185.sysdeseng.com Ready 18s v1.7.6+a08f5eeb62 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,default=,host-type=,infra=,kubernetes.io/hostname=ip-172-31-63-185.sysdeseng.com,master=,region=,sub-host-type= 5 mins later ... here labels at task "Label nodes" [root@ip-172-31-60-99 ~]# oc get node NAME STATUS AGE VERSION LABELS ip-172-31-60-99.sysdeseng.com Ready 4m v1.7.6+a08f5eeb62 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,host-type=master,kubernetes.io/hostname=ip-172-31-60-99.sysdeseng.com,node-role.kubernetes.io/master=true,region=infra,sub-host-type=default ip-172-31-63-10.sysdeseng.com Ready 3m v1.7.6+a08f5eeb62 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,host-type=master,kubernetes.io/hostname=ip-172-31-63-10.sysdeseng.com,node-role.kubernetes.io/master=true,region=infra,sub-host-type=default ip-172-31-63-185.sysdeseng.com Ready 3m v1.7.6+a08f5eeb62 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,host-type=master,kubernetes.io/hostname=ip-172-31-63-185.sysdeseng.com,node-role.kubernetes.io/master=true,region=infra,sub-host-type=default See what I mean? Task "Label nodes" has to fill in the values. It's very easy just to just wack the bad labels at this point. My diff is also safe for custom labels that are NOT in hosts file.
I am unable to replicate the issue of the filter not working on master with properly formatted openshift_node_labels variable. I suspect there is some other inventory file or other means not shown here that is creating a malformed openshift_node_labels variable.
[masters] hostA hostB hostC [masters:vars] openshift_node_labels="{'host-type': 'master', 'sub-host-type': 'default', 'region': 'infra'}" I've been using this exact syntax since openshift-enterprise 3.2. We discussed the oneliner under the [masters] category but that fails for rhel7 / python version = 2.7.5 / ansible 2.4.2.0 / openshift-ansible-3.9.0-0.37.0-16-g90f5778 Please provide an alternative node label method.
FAILS at openshift_node : file - Fedora 27 / python version = 3.6.4 / ansible-playbook 2.4.3.0 - Fedora 27 / python version = 2.7.14 / ansible-playbook 2.4.2.0 - RHEL 7 / python version = 2.7.5 / ansible 2.4.3.0 The following configuration fails very early in the playbooks/aws/provisioning_vars.yml. It cannot be used at this time. - RHEL 7 / SCL - python version = 3.5.1 / ansible 2.4.3.0
I was able to get rhel7 / scl python 3.5.1 / ansible 2.4 working... I had a time sync issue with that vm that was killing the plays. **** $ ansible --version ansible 2.4.3.0 config file = /home/ccallega/git/openshift-ansible/ansible.cfg configured module search path = ['/home/ccallega/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules'] ansible python module location = /opt/rh/rh-python35/root/usr/lib/python3.5/site-packages/ansible executable location = /home/ccallega/virtenv/ansible/bin/ansible python version = 3.5.1 (default, Sep 15 2016, 08:30:32) [GCC 4.8.3 20140911 (Red Hat 4.8.3-9)] ***** TASK [openshift_node : file] ********************************************************************************************************************************************* Friday 09 February 2018 12:21:20 -0500 (0:00:00.236) 0:20:57.933 ******* fatal: [ec2-54-158-118-239.compute-1.amazonaws.com]: FAILED! => {"msg": "The conditional check '('config' in l2_openshift_node_kubelet_args) | bool' failed. The error was: 'AttributeError' object has no attribute 'message'\n\nThe error appears to have been in '/home/ccallega/git/openshift-ansible/roles/openshift_node/tasks/config.yml': line 26, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- file:\n ^ here\n"} fatal: [ec2-52-91-0-154.compute-1.amazonaws.com]: FAILED! => {"msg": "The conditional check '('config' in l2_openshift_node_kubelet_args) | bool' failed. The error was: 'AttributeError' object has no attribute 'message'\n\nThe error appears to have been in '/home/ccallega/git/openshift-ansible/roles/openshift_node/tasks/config.yml': line 26, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- file:\n ^ here\n"} fatal: [ec2-34-203-42-157.compute-1.amazonaws.com]: FAILED! => {"msg": "The conditional check '('config' in l2_openshift_node_kubelet_args) | bool' failed. The error was: 'AttributeError' object has no attribute 'message'\n\nThe error appears to have been in '/home/ccallega/git/openshift-ansible/roles/openshift_node/tasks/config.yml': line 26, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- file:\n ^ here\n"} ***** I think I've done more than enough here to prove there is a problem with the code and it isn't a silly excuse like a bad virtual machine or invalid character in the inv file.
I am unable to reproduce with Fedora (deploy host) to CentOS 7 hosts using standard install via deploy_cluster.yml or via CentOS 7 (deploy host) to CentOS 7 cluster. NAME STATUS AGE VERSION LABELS host1 Ready 11m v1.7.6+a08f5eeb62 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=host1,region=primary,zone=east host2 Ready 11m v1.7.6+a08f5eeb62 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=host2,node-role.kubernetes.io/master=true,region=infra,zone=default CentOS7 deploy host setup details: Cloned openshift-ansible to /home/centos/git/openshift-ansible Tip of master branch. Python install via RPM, ansible installed via pip into virtualenv w/ openshift-ansible's requirements.txt $ python --version Python 2.7.5 $ ansible --version ansible 2.4.1.0 config file = None configured module search path = [u'/home/centos/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules'] ansible python module location = /home/centos/git/openshift-ansible/venv/lib/python2.7/site-packages/ansible executable location = /home/centos/git/openshift-ansible/venv/bin/ansible python version = 2.7.5 (default, Aug 4 2017, 00:39:18) [GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] Inventory: https://github.com/michaelgugino/openshift-stuff/blob/master/centos/inv-centos.txt extra_vars.yml: https://github.com/michaelgugino/openshift-stuff/blob/master/centos/extra_vars.yml Only changes necessary to inventory are hostnames. Exact commands run (as centos user): $ ansible-playbook -i inv-centos.txt -e @extra_vars.yml ~/git/openshift-ansible/playbooks/prerequisites.yml -vvv $ ansible-playbook -i inv-centos.txt -e @extra_vars.yml ~/git/openshift-ansible/playbooks/deploy_cluster.yml -vvv Will need exact steps to reproduce, I am unable to reproduce this problem in testing.
Steps to reproduce: 1) git clone https://github.com/openshift/openshift-ansible.git 2) cd openshift-ansible 3) ENSURE AWS CREDENTIALS ARE CORRECT 4) export AWS_ACCESS_KEY_ID=XXXXXX 5) export AWS_SECRET_ACCESS_KEY=XXXXXX 6) export FACT_PATH=${HOME} 7) export ANSIBLE_INVENTORY=inventory/hosts 8) ENSURE MY INVENTORY IS IN PLACE IN inventory/hosts 9) ENSURE MY EXTRA_VARS IS IN PLACE IN playbooks/aws/provisioning_vars.yml 10) ansible-playbook playbooks/aws/openshift-cluster/prerequisites.yml -e @playbooks/aws/provisioning_vars.yml 11) ansible-playbook playbooks/aws/openshift-cluster/build_ami.yml -e @playbooks/aws/provisioning_vars.yml 12) ansible-playbook playbooks/aws/openshift-cluster/provision_install.yml -e @playbooks/aws/provisioning_vars.yml ^--- The error will come at openshift_node : file Full error message.............. Failure summary: 1. Hosts: ec2-52-90-93-232.compute-1.amazonaws.com, ec2-52-91-226-132.compute-1.amazonaws.com, ec2-54-172-137-55.compute-1.amazonaws.com Play: Configure nodes Task: openshift_node : file Message: The conditional check '('config' in l2_openshift_node_kubelet_args) | bool' failed. The error was: {{ l_node_kubelet_args_default | combine(l_openshift_node_kubelet_args, recursive=True) }}: {{ openshift_node_kubelet_args_dict[openshift_cloudprovider_kind | default('undefined')] }}: {u'azure': {u'cloud-config': [u"{{ openshift_config_base ~ '/cloudprovider/azure.conf' }}"], u'node-labels': u'{{ l_node_kubelet_node_labels }}', u'cloud-provider': [u'azure']}, u'openstack': {u'cloud-config': [u"{{ openshift_config_base ~ '/cloudprovider/openstack.conf' }}"], u'node-labels': u'{{ l_node_kubelet_node_labels }}', u'cloud-provider': [u'openstack']}, u'gce': {u'cloud-config': [u"{{ openshift_config_base ~ '/cloudprovider/gce.conf' }}"], u'node-labels': u'{{ l_node_kubelet_node_labels }}', u'cloud-provider': [u'gce']}, u'aws': {u'cloud-config': [u"{{ openshift_config_base ~ '/cloudprovider/aws.conf' }}"], u'node-labels': u'{{ l_node_kubelet_node_labels }}', u'cloud-provider': [u'aws']}, u'undefined': {u'node-labels': u'{{ l_node_kubelet_node_labels }}'}}: {{ openshift_node_labels | default({}) | lib_utils_oo_dict_to_keqv_list }}: 'unicode' object has no attribute 'items' The error appears to have been in '/home/ccallega/git/openshift-ansible/roles/openshift_node/tasks/config.yml': line 26, column 3, but may be elsewhere in the file depending on the exact syntax problem. The offending line appears to be: - file: ^ here
Attempting to reproduce in a test environment.
I was able to reproduce the bug. The cause is due to the different ways Ansible does automatic type conversion on variables from inventory. Details of the reproducer are here https://github.com/openshift/openshift-ansible/issues/6897#issuecomment-367451274 Mike, would you please take a look at this to resolve the changes from the facts refactor?
PR Created: https://github.com/openshift/openshift-ansible/pull/7243
Setting the following group vars in inventory file: [nodes:vars] openshift_node_labels="{'registry' : 'enabled', 'role': 'node', 'router': 'enabled', 'region1': 'infra'}" Running testing with ansible-2.4.4-0.1.beta1.el7ae.noarch. Reproduce this bug with openshift-ansible-3.9.0-0.51.0.git.0.e26400f.el7.noarch, and verified it with openshift-ansible-3.9.1-1.git.0.9862628.el7.noarch, and PASS.
Thanks for pinning this down. I also confirm that node labels are applied correctly using aws provisioning playbooks.
This bugzilla needs to be reopened and reinvestigate. Here are the host groups in my inventory... Labels are messed up again > 1112 # host group for masters > 1113 [masters] > 1114 > 1115 [masters:vars] > 1116 openshift_node_labels="{'host-type': 'master', 'sub-host-type': 'default'}" > 1117 openshift_schedulable=True > 1118 > 1119 [etcd:children] > 1120 masters > 1121 > 1122 # NOTE: Containerized load balancer hosts are not yet supported, if using a global > 1123 # containerized=true host variable we must set to false. > 1124 [routers] > 1125 > 1126 [routers:vars] > 1127 openshift_node_labels="{'sub-host-type': 'infra'}" > 1128 openshift_schedulable=True > 1129 > 1130 #[lb] > 1131 #ose3-lb-ansible.test.example.com containerized=false > 1132 > 1133 [nodes] > 1134 > 1135 [nodes:children] > 1136 masters > 1137 routers > 1138 > 1139 [nodes:vars] > 1140 openshift_node_labels="{'sub-host-type': 'compute'}" > 1141 openshift_schedulable=True > 1142 > 1143 #[nfs] > 1144 #ose3-nfs-ansible.test.example.com Here are the labels... > # oc get nodes --show-labels=true > NAME STATUS ROLES AGE VERSION LABELS > ip-172-31-50-186.ec2.internal Ready <none> 2m v1.9.1+a0ce1bc657 type=infra > ip-172-31-50-249.ec2.internal Ready master 5m v1.9.1+a0ce1bc657 sub-host-type=default > ip-172-31-52-218.ec2.internal Ready <none> 2m v1.9.1+a0ce1bc657 type=compute > ip-172-31-53-189.ec2.internal Ready <none> 2m v1.9.1+a0ce1bc657 type=compute > ip-172-31-53-228.ec2.internal Ready master 5m v1.9.1+a0ce1bc657 sub-host-type=default > ip-172-31-54-86.ec2.internal Ready <none> 2m v1.9.1+a0ce1bc657 type=infra > ip-172-31-55-211.ec2.internal Ready master 5m v1.9.1+a0ce1bc657 sub-host-type=default > ip-172-31-58-175.ec2.internal Ready <none> 2m v1.9.1+a0ce1bc657 type=infra > ip-172-31-63-70.ec2.internal Ready <none> 2m v1.9.1+a0ce1bc657 type=compute Somehow the -'s in the label keys are causing corruption
Looks like the -'s are not the problem here... I updated the hosts section of inventory to the following... [all:vars] ansible_become=False [x1] localhost ansible_become=False ansible_connection=local ansible_ssh_user=ec2-user [x1:vars] become=True ansible_ssh_user=ec2-user [nodes] [nodes:children] masters routers [nodes:vars] openshift_node_labels="{'subhosttype': 'compute'}" openshift_schedulable=True [masters] [masters:vars] openshift_node_labels="{'host-type': 'master', 'subhosttype': 'default'}" openshift_schedulable=True [etcd:children] masters [routers] [routers:vars] openshift_node_labels="{'subhosttype': 'infra'}" openshift_schedulable=True # Create an OSEv3 group that contains the masters and nodes groups [OSEv3:children] masters nodes etcd #lb #nfs # Set variables common for all OSEv3 hosts [OSEv3:vars] Stuff... Run ansible-playbook playbooks/aws/openshift-cluster/provision_install.yml -e @playbooks/aws/provisioning_vars.yml # oc get nodes --show-labels=true NAME STATUS ROLES AGE VERSION LABELS ip-172-31-51-17.ec2.internal Ready <none> 6m v1.9.1+a0ce1bc657 type=infra ip-172-31-52-249.ec2.internal Ready master 20m v1.9.1+a0ce1bc657 subhosttype=default ip-172-31-52-63.ec2.internal Ready <none> 3m v1.9.1+a0ce1bc657 type=compute ip-172-31-53-46.ec2.internal Ready <none> 2m v1.9.1+a0ce1bc657 type=compute ip-172-31-55-147.ec2.internal Ready <none> 6m v1.9.1+a0ce1bc657 type=infra ip-172-31-55-232.ec2.internal Ready master 20m v1.9.1+a0ce1bc657 subhosttype=default ip-172-31-56-93.ec2.internal Ready <none> 3m v1.9.1+a0ce1bc657 type=compute ip-172-31-58-208.ec2.internal Ready <none> 6m v1.9.1+a0ce1bc657 type=infra ip-172-31-62-210.ec2.internal Ready master 20m v1.9.1+a0ce1bc657 subhosttype=default
(In reply to Chris C from comment #24) > Looks like the -'s are not the problem here... > > I updated the hosts section of inventory to the following... > [all:vars] > ansible_become=False > > [x1] > localhost ansible_become=False ansible_connection=local > ansible_ssh_user=ec2-user > > [x1:vars] > become=True > ansible_ssh_user=ec2-user > > [nodes] > > [nodes:children] > masters > routers > > [nodes:vars] > openshift_node_labels="{'subhosttype': 'compute'}" > openshift_schedulable=True > > [masters] > > [masters:vars] > openshift_node_labels="{'host-type': 'master', 'subhosttype': 'default'}" > openshift_schedulable=True > > [etcd:children] > masters > > [routers] > > [routers:vars] > openshift_node_labels="{'subhosttype': 'infra'}" > openshift_schedulable=True > > # Create an OSEv3 group that contains the masters and nodes groups > [OSEv3:children] > masters > nodes > etcd > #lb > #nfs > > # Set variables common for all OSEv3 hosts > [OSEv3:vars] > Stuff... > > > Run ansible-playbook playbooks/aws/openshift-cluster/provision_install.yml > -e @playbooks/aws/provisioning_vars.yml > > > # oc get nodes --show-labels=true > NAME STATUS ROLES AGE VERSION > LABELS > ip-172-31-51-17.ec2.internal Ready <none> 6m > v1.9.1+a0ce1bc657 type=infra > ip-172-31-52-249.ec2.internal Ready master 20m > v1.9.1+a0ce1bc657 subhosttype=default > ip-172-31-52-63.ec2.internal Ready <none> 3m > v1.9.1+a0ce1bc657 type=compute > ip-172-31-53-46.ec2.internal Ready <none> 2m > v1.9.1+a0ce1bc657 type=compute > ip-172-31-55-147.ec2.internal Ready <none> 6m > v1.9.1+a0ce1bc657 type=infra > ip-172-31-55-232.ec2.internal Ready master 20m > v1.9.1+a0ce1bc657 subhosttype=default > ip-172-31-56-93.ec2.internal Ready <none> 3m > v1.9.1+a0ce1bc657 type=compute > ip-172-31-58-208.ec2.internal Ready <none> 6m > v1.9.1+a0ce1bc657 type=infra > ip-172-31-62-210.ec2.internal Ready master 20m > v1.9.1+a0ce1bc657 subhosttype=default The inventory is not valid. You cannot define openshift_node_labels in nodes:vars and it's children's group vars. For example: [nodes:children] masters [masters] host1 Would be the same as writing: [nodes] host1 [nodes:vars] openshift_node_labels = value 1 [masters] host1 [masters:vars] openshift_node_labels = value 2 Thus, host1 doesn't know what variables it should use. Should it use the variables from the master group, or the node group?
The Ref Arch team has abandoned prospect of integrating the plays under playbooks/aws/openshift-cluster/ into the ref arch documents. There is too many issues with this code and we've spent far too long on hammering it into a viable shape. It's also out of line with how infrastructure is getting deployed with other providers. I'm closing this bugzilla now.