Bug 1542099 - Node labelling fails
Summary: Node labelling fails
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.9.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: 3.9.0
Assignee: Michael Gugino
QA Contact: Johnny Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-02-05 15:02 UTC by Chris Callegari
Modified: 2018-03-27 09:46 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-03-07 15:49:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Inventory file (53.30 KB, text/plain)
2018-02-05 15:05 UTC, Chris Callegari
no flags Details
provisioning_vars (4.96 KB, text/plain)
2018-02-05 15:07 UTC, Chris Callegari
no flags Details
ansible_vvvv_log (10.67 MB, text/plain)
2018-02-05 21:11 UTC, Chris Callegari
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0489 0 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.9 RPM Release Advisory 2018-03-28 18:06:38 UTC

Description Chris Callegari 2018-02-05 15:02:40 UTC
Description of problem:
Installer fails to label nodes correctly.  

This results in two things...
1) The installer crashes with an error if node labels are specified using openshift_node_labels in inventory file
2) The installer deploys router pods but they fail to deploy due to no nodes being available

Version-Release number of the following components:
$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.4 (Maipo)
/openshift-ansible]$ ansible --version
ansible 2.4.2.0
  config file = /home/ccallega/git/openshift-ansible/ansible.cfg
  configured module search path = [u'/home/ccallega/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, May  3 2017, 07:55:04) [GCC 4.8.5 20150623 (Red Hat 4.8.5-14)]
$ git describe
openshift-ansible-3.9.0-0.35.0-33-g3e2c7c22a

How reproducible:
Always

Steps to Reproduce:
1. ansible-playbook playbooks/aws/openshift-cluster/provision_install.yml -e /provisioning_vars.yml
2.
3.

Actual results:
When openshift_node_labels in inventory file...
TASK [openshift_node : file] ***************************************************************************************************************************
Friday 26 January 2018  14:19:56 -0500 (0:00:00.370)       0:15:27.446 ********
fatal: [ec2-54-91-116-198.compute-1.amazonaws.com]: FAILED! => {}

MSG:

The conditional check '('config' in l2_openshift_node_kubelet_args) | bool' failed. The error was: {{ l_node_kubelet_args_default | combine(l_openshift_node_kubelet_args, recursive=True) }}: {{ openshift_node_kubelet_args_dict[openshift_cloudprovider_kind | default('undefined')] }}: {u'azure': {u'cloud-config': [u"{{ openshift_config_base ~ '/cloudprovider/azure.conf' }}"], u'node-labels': u'{{ l_node_kubelet_node_labels }}', u'cloud-provider': [u'azure']}, u'openstack': {u'cloud-config': [u"{{ openshift_config_base ~ '/cloudprovider/openstack.conf' }}"], u'node-labels': u'{{ l_node_kubelet_node_labels }}', u'cloud-provider': [u'openstack']}, u'gce': {u'cloud-config': [u"{{ openshift_config_base ~ '/cloudprovider/gce.conf' }}"], u'node-labels': u'{{ l_node_kubelet_node_labels }}', u'cloud-provider': [u'gce']}, u'aws': {u'cloud-config': [u"{{ openshift_config_base ~ '/cloudprovider/aws.conf' }}"], u'node-labels': u'{{ l_node_kubelet_node_labels }}', u'cloud-provider': [u'aws']}, u'undefined': {u'node-labels': u'{{ l_node_kubelet_node_labels }}'}}: {{ openshift_node_labels | default({}) | lib_utils_oo_dict_to_keqv_list }}: 'unicode' object has no attribute 'items'

The error appears to have been in '/home/ccallega/git/openshift-ansible/roles/openshift_node/tasks/config.yml': line 26, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:


- file:
  ^ here


fatal: [ec2-52-87-190-174.compute-1.amazonaws.com]: FAILED! => {}

MSG:

The conditional check '('config' in l2_openshift_node_kubelet_args) | bool' failed. The error was: {{ l_node_kubelet_args_default | combine(l_openshift_node_kubelet_args, recursive=True) }}: {{ openshift_node_kubelet_args_dict[openshift_cloudprovider_kind | default('undefined')] }}: {u'azure': {u'cloud-config': [u"{{ openshift_config_base ~ '/cloudprovider/azure.conf' }}"], u'node-labels': u'{{ l_node_kubelet_node_labels }}', u'cloud-provider': [u'azure']}, u'openstack': {u'cloud-config': [u"{{ openshift_config_base ~ '/cloudprovider/openstack.conf' }}"], u'node-labels': u'{{ l_node_kubelet_node_labels }}', u'cloud-provider': [u'openstack']}, u'gce': {u'cloud-config': [u"{{ openshift_config_base ~ '/cloudprovider/gce.conf' }}"], u'node-labels': u'{{ l_node_kubelet_node_labels }}', u'cloud-provider': [u'gce']}, u'aws': {u'cloud-config': [u"{{ openshift_config_base ~ '/cloudprovider/aws.conf' }}"], u'node-labels': u'{{ l_node_kubelet_node_labels }}', u'cloud-provider': [u'aws']}, u'undefined': {u'node-labels': u'{{ l_node_kubelet_node_labels }}'}}: {{ openshift_node_labels | default({}) | lib_utils_oo_dict_to_keqv_list }}: 'unicode' object has no attribute 'items'

The error appears to have been in '/home/ccallega/git/openshift-ansible/roles/openshift_node/tasks/config.yml': line 26, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:


- file:
  ^ here


fatal: [ec2-34-227-15-109.compute-1.amazonaws.com]: FAILED! => {}

MSG:

The conditional check '('config' in l2_openshift_node_kubelet_args) | bool' failed. The error was: {{ l_node_kubelet_args_default | combine(l_openshift_node_kubelet_args, recursive=True) }}: {{ openshift_node_kubelet_args_dict[openshift_cloudprovider_kind | default('undefined')] }}: {u'azure': {u'cloud-config': [u"{{ openshift_config_base ~ '/cloudprovider/azure.conf' }}"], u'node-labels': u'{{ l_node_kubelet_node_labels }}', u'cloud-provider': [u'azure']}, u'openstack': {u'cloud-config': [u"{{ openshift_config_base ~ '/cloudprovider/openstack.conf' }}"], u'node-labels': u'{{ l_node_kubelet_node_labels }}', u'cloud-provider': [u'openstack']}, u'gce': {u'cloud-config': [u"{{ openshift_config_base ~ '/cloudprovider/gce.conf' }}"], u'node-labels': u'{{ l_node_kubelet_node_labels }}', u'cloud-provider': [u'gce']}, u'aws': {u'cloud-config': [u"{{ openshift_config_base ~ '/cloudprovider/aws.conf' }}"], u'node-labels': u'{{ l_node_kubelet_node_labels }}', u'cloud-provider': [u'aws']}, u'undefined': {u'node-labels': u'{{ l_node_kubelet_node_labels }}'}}: {{ openshift_node_labels | default({}) | lib_utils_oo_dict_to_keqv_list }}: 'unicode' object has no attribute 'items'

The error appears to have been in '/home/ccallega/git/openshift-ansible/roles/openshift_node/tasks/config.yml': line 26, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:


- file:
  ^ here



Expected results:
I expect nodes to be labelled correctly
I expect router pods to deploy to assigned infra nodes
I expect the installer to complete without fail

Additional info:
-vvvv yields no further information

Comment 1 Chris Callegari 2018-02-05 15:03:23 UTC
I've been hammering on this issue via GitHub issue ... https://github.com/openshift/openshift-ansible/issues/6897

Comment 2 Chris Callegari 2018-02-05 15:05:06 UTC
Created attachment 1391563 [details]
Inventory file

Comment 3 Chris Callegari 2018-02-05 15:07:38 UTC
Created attachment 1391564 [details]
provisioning_vars

Comment 4 Chris Callegari 2018-02-05 15:59:36 UTC
From my investigation ...
#1) This is the source of the installer file crash
diff --git a/roles/openshift_node/defaults/main.yml b/roles/openshift_node/defaults/main.yml
index 9f887891b..304dfbe08 100644
--- a/roles/openshift_node/defaults/main.yml
+++ b/roles/openshift_node/defaults/main.yml
@@ -27,7 +27,7 @@ openshift_dns_ip: "{{ ansible_default_ipv4['address'] }}"
 openshift_node_env_vars: {}

 # Create list of 'k=v' pairs.
-l_node_kubelet_node_labels: "{{ openshift_node_labels | default({}) | lib_utils_oo_dict_to_keqv_list }}"
+l_node_kubelet_node_labels: "{{ openshift_node_labels | default({}) }}"

This fixes the installer failure and nodes get labelled.


Curiously we end up with labels like this ...
# oc get nodes --show-labels=True
NAME                             STATUS    AGE       VERSION             LABELS
ip-172-31-48-53.sysdeseng.com    Ready     6m        v1.7.6+a08f5eeb62   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,default=,host-type=master,infra=,kubernetes.io/hostname=ip-172-31-48-53.sysdeseng.com,master=,node-role.kubernetes.io/master=true,region=infra,sub-host-type=default
ip-172-31-52-218.sysdeseng.com   Ready     6m        v1.7.6+a08f5eeb62   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,default=,host-type=master,infra=,kubernetes.io/hostname=ip-172-31-52-218.sysdeseng.com,master=,node-role.kubernetes.io/master=true,region=infra,sub-host-type=default
ip-172-31-60-255.sysdeseng.com   Ready     6m        v1.7.6+a08f5eeb62   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,default=,host-type=master,infra=,kubernetes.io/hostname=ip-172-31-60-255.sysdeseng.com,master=,node-role.kubernetes.io/master=true,region=infra,sub-host-type=default


Note the labels that have the values of openshift_node_labels.  

This is where I am stumped.  I have searched high and low and cannot find where these labels are coming from.  I have a least narrowed it down to node service restart in roles/openshift_node/tasks/config.yml#L89.

Things I've checked for...
1) Bad templating and config
   /etc/origin/node/system:node:<fqdn>.kubeconfig
   /etc/origin/node/node-config.yaml
   /etc/sysconfig/atomic-openshift-node
2) manual oc label command via tasks and python

Comment 5 Chris Callegari 2018-02-05 21:11:57 UTC
Created attachment 1391707 [details]
ansible_vvvv_log

ansible-playbook playbooks/aws/openshift-cluster/provision_install.yml -e @playbooks/aws/provisioning_vars.yml -vvvv

Comment 6 Chris Callegari 2018-02-06 18:38:28 UTC
I give up on trying to find where the leaked labels are coming from.

We have to label the nodes anyway.  It's very easy to do a var to label comparison and remove the bad labels via oc_label.

Here's my diff for that solution...

diff --git a/roles/lib_openshift/library/oc_label.py b/roles/lib_openshift/library/oc_label.py
index ac3279ef8..d0e71e205 100644
--- a/roles/lib_openshift/library/oc_label.py
+++ b/roles/lib_openshift/library/oc_label.py
@@ -1532,6 +1532,26 @@ class OCLabel(OpenShiftCLI):

         return False

+    def sanitize_labels(self):
+        ''' sanitize labels: \
+        Awful work around because somehow openshift_node_labels values are
+        leaking into node labels\
+        '''
+        cmd = self.cmd_template()
+
+        get_extra_current_labels = self.get_extra_current_labels()
+        for label in self.labels:
+            if any(label['value'] in a for a in get_extra_current_labels):
+                exec_cmd = True
+                cmd.append("{}-".format(label['value']))
+
+        try:
+            if exec_cmd: cmd.append('--overwrite')
+        except Exception as e:
+            exec_cmd = False
+
+        if exec_cmd: return self.openshift_cmd(cmd)
+
     def replace(self):
         ''' replace currently stored labels with user provided labels '''
         cmd = self.cmd_template()
@@ -1644,6 +1664,7 @@ class OCLabel(OpenShiftCLI):
         # Add
         #######
         if state == 'add':
+            oc_label.sanitize_labels()
             if not (name or selector):
                 return {'failed': True,
                         'msg': "Param 'name' or 'selector' is required if state == 'add'"}
diff --git a/roles/openshift_node/defaults/main.yml b/roles/openshift_node/defaults/main.yml
index 9f887891b..304dfbe08 100644
--- a/roles/openshift_node/defaults/main.yml
+++ b/roles/openshift_node/defaults/main.yml
@@ -27,7 +27,7 @@ openshift_dns_ip: "{{ ansible_default_ipv4['address'] }}"
 openshift_node_env_vars: {}

 # Create list of 'k=v' pairs.
-l_node_kubelet_node_labels: "{{ openshift_node_labels | default({}) | lib_utils_oo_dict_to_keqv_list }}"
+l_node_kubelet_node_labels: "{{ openshift_node_labels | default({}) }}"

Comment 7 Chris Callegari 2018-02-06 18:41:18 UTC
Here are labels at task "Start node service..."
[root@ip-172-31-60-99 ~]# oc get node
NAME                             STATUS    AGE       VERSION             LABELS
ip-172-31-60-99.sysdeseng.com    Ready     47s       v1.7.6+a08f5eeb62   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,default=,host-type=,infra=,kubernetes.io/hostname=ip-172-31-60-99.sysdeseng.com,master=,region=,sub-host-type=
ip-172-31-63-10.sysdeseng.com    Ready     17s       v1.7.6+a08f5eeb62   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,default=,host-type=,infra=,kubernetes.io/hostname=ip-172-31-63-10.sysdeseng.com,master=,region=,sub-host-type=
ip-172-31-63-185.sysdeseng.com   Ready     18s       v1.7.6+a08f5eeb62   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,default=,host-type=,infra=,kubernetes.io/hostname=ip-172-31-63-185.sysdeseng.com,master=,region=,sub-host-type=


5 mins later ... here labels at task "Label nodes"
[root@ip-172-31-60-99 ~]# oc get node
NAME                             STATUS    AGE       VERSION             LABELS
ip-172-31-60-99.sysdeseng.com    Ready     4m        v1.7.6+a08f5eeb62   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,host-type=master,kubernetes.io/hostname=ip-172-31-60-99.sysdeseng.com,node-role.kubernetes.io/master=true,region=infra,sub-host-type=default
ip-172-31-63-10.sysdeseng.com    Ready     3m        v1.7.6+a08f5eeb62   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,host-type=master,kubernetes.io/hostname=ip-172-31-63-10.sysdeseng.com,node-role.kubernetes.io/master=true,region=infra,sub-host-type=default
ip-172-31-63-185.sysdeseng.com   Ready     3m        v1.7.6+a08f5eeb62   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,host-type=master,kubernetes.io/hostname=ip-172-31-63-185.sysdeseng.com,node-role.kubernetes.io/master=true,region=infra,sub-host-type=default


See what I mean?  Task "Label nodes" has to fill in the values.  It's very easy just to just wack the bad labels at this point.

My diff is also safe for custom labels that are NOT in hosts file.

Comment 8 Michael Gugino 2018-02-07 18:00:12 UTC
I am unable to replicate the issue of the filter not working on master with properly formatted openshift_node_labels variable.

I suspect there is some other inventory file or other means not shown here that is creating a malformed openshift_node_labels variable.

Comment 9 Chris Callegari 2018-02-07 21:32:07 UTC
[masters]
hostA
hostB
hostC

[masters:vars]
openshift_node_labels="{'host-type': 'master', 'sub-host-type': 'default', 'region': 'infra'}"

I've been using this exact syntax since openshift-enterprise 3.2.  

We discussed the oneliner under the [masters] category but that fails for rhel7 / python version = 2.7.5 / ansible 2.4.2.0 / openshift-ansible-3.9.0-0.37.0-16-g90f5778

Please provide an alternative node label method.

Comment 10 Chris Callegari 2018-02-09 14:18:42 UTC
FAILS at openshift_node : file
  - Fedora 27 / python version = 3.6.4 / ansible-playbook 2.4.3.0
  - Fedora 27 / python version = 2.7.14 / ansible-playbook 2.4.2.0
  - RHEL 7 / python version = 2.7.5 / ansible 2.4.3.0

The following configuration fails very early in the playbooks/aws/provisioning_vars.yml.  It cannot be used at this time.
  - RHEL 7 / SCL - python version = 3.5.1 / ansible 2.4.3.0

Comment 11 Chris Callegari 2018-02-09 17:27:12 UTC
I was able to get rhel7 / scl python 3.5.1 / ansible 2.4 working... I had a time sync issue with that vm that was killing the plays.

****

$ ansible --version
ansible 2.4.3.0
  config file = /home/ccallega/git/openshift-ansible/ansible.cfg
  configured module search path = ['/home/ccallega/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /opt/rh/rh-python35/root/usr/lib/python3.5/site-packages/ansible
  executable location = /home/ccallega/virtenv/ansible/bin/ansible
  python version = 3.5.1 (default, Sep 15 2016, 08:30:32) [GCC 4.8.3 20140911 (Red Hat 4.8.3-9)]

*****

TASK [openshift_node : file] *********************************************************************************************************************************************
Friday 09 February 2018  12:21:20 -0500 (0:00:00.236)       0:20:57.933 *******
fatal: [ec2-54-158-118-239.compute-1.amazonaws.com]: FAILED! => {"msg": "The conditional check '('config' in l2_openshift_node_kubelet_args) | bool' failed. The error was: 'AttributeError' object has no attribute 'message'\n\nThe error appears to have been in '/home/ccallega/git/openshift-ansible/roles/openshift_node/tasks/config.yml': line 26, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- file:\n  ^ here\n"}
fatal: [ec2-52-91-0-154.compute-1.amazonaws.com]: FAILED! => {"msg": "The conditional check '('config' in l2_openshift_node_kubelet_args) | bool' failed. The error was: 'AttributeError' object has no attribute 'message'\n\nThe error appears to have been in '/home/ccallega/git/openshift-ansible/roles/openshift_node/tasks/config.yml': line 26, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- file:\n  ^ here\n"}
fatal: [ec2-34-203-42-157.compute-1.amazonaws.com]: FAILED! => {"msg": "The conditional check '('config' in l2_openshift_node_kubelet_args) | bool' failed. The error was: 'AttributeError' object has no attribute 'message'\n\nThe error appears to have been in '/home/ccallega/git/openshift-ansible/roles/openshift_node/tasks/config.yml': line 26, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- file:\n  ^ here\n"}

*****

I think I've done more than enough here to prove there is a problem with the code and it isn't a silly excuse like a bad virtual machine or invalid character in the inv file.

Comment 12 Michael Gugino 2018-02-09 18:33:51 UTC
I am unable to reproduce with Fedora (deploy host) to CentOS 7 hosts using standard install via deploy_cluster.yml or via CentOS 7 (deploy host) to CentOS 7 cluster.

NAME                            STATUS    AGE       VERSION             LABELS
host1   Ready     11m       v1.7.6+a08f5eeb62   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=host1,region=primary,zone=east
host2    Ready     11m       v1.7.6+a08f5eeb62   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=host2,node-role.kubernetes.io/master=true,region=infra,zone=default


CentOS7 deploy host setup details:

Cloned openshift-ansible to /home/centos/git/openshift-ansible

Tip of master branch.

Python install via RPM, ansible installed via pip into virtualenv w/ openshift-ansible's requirements.txt

$ python --version
Python 2.7.5

$ ansible --version
ansible 2.4.1.0
  config file = None
  configured module search path = [u'/home/centos/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /home/centos/git/openshift-ansible/venv/lib/python2.7/site-packages/ansible
  executable location = /home/centos/git/openshift-ansible/venv/bin/ansible
  python version = 2.7.5 (default, Aug  4 2017, 00:39:18) [GCC 4.8.5 20150623 (Red Hat 4.8.5-16)]

Inventory: https://github.com/michaelgugino/openshift-stuff/blob/master/centos/inv-centos.txt

extra_vars.yml: https://github.com/michaelgugino/openshift-stuff/blob/master/centos/extra_vars.yml

Only changes necessary to inventory are hostnames.

Exact commands run (as centos user):
$ ansible-playbook -i inv-centos.txt -e @extra_vars.yml ~/git/openshift-ansible/playbooks/prerequisites.yml -vvv
$ ansible-playbook -i inv-centos.txt -e @extra_vars.yml ~/git/openshift-ansible/playbooks/deploy_cluster.yml -vvv


Will need exact steps to reproduce, I am unable to reproduce this problem in testing.

Comment 13 Chris Callegari 2018-02-14 19:54:09 UTC
Steps to reproduce:
1) git clone https://github.com/openshift/openshift-ansible.git
2) cd openshift-ansible
3) ENSURE AWS CREDENTIALS ARE CORRECT
4) export AWS_ACCESS_KEY_ID=XXXXXX
5) export AWS_SECRET_ACCESS_KEY=XXXXXX
6) export FACT_PATH=${HOME}
7) export ANSIBLE_INVENTORY=inventory/hosts
8) ENSURE MY INVENTORY IS IN PLACE IN inventory/hosts
9) ENSURE MY EXTRA_VARS IS IN PLACE IN playbooks/aws/provisioning_vars.yml
10) ansible-playbook playbooks/aws/openshift-cluster/prerequisites.yml -e @playbooks/aws/provisioning_vars.yml
11) ansible-playbook playbooks/aws/openshift-cluster/build_ami.yml -e @playbooks/aws/provisioning_vars.yml
12) ansible-playbook playbooks/aws/openshift-cluster/provision_install.yml -e @playbooks/aws/provisioning_vars.yml
    ^--- The error will come at openshift_node : file


Full error message..............
Failure summary:


  1. Hosts:    ec2-52-90-93-232.compute-1.amazonaws.com, ec2-52-91-226-132.compute-1.amazonaws.com, ec2-54-172-137-55.compute-1.amazonaws.com
     Play:     Configure nodes
     Task:     openshift_node : file
     Message:  The conditional check '('config' in l2_openshift_node_kubelet_args) | bool' failed. The error was: {{ l_node_kubelet_args_default | combine(l_openshift_node_kubelet_args, recursive=True) }}: {{ openshift_node_kubelet_args_dict[openshift_cloudprovider_kind | default('undefined')] }}: {u'azure': {u'cloud-config': [u"{{ openshift_config_base ~ '/cloudprovider/azure.conf' }}"], u'node-labels': u'{{ l_node_kubelet_node_labels }}', u'cloud-provider': [u'azure']}, u'openstack': {u'cloud-config': [u"{{ openshift_config_base ~ '/cloudprovider/openstack.conf' }}"], u'node-labels': u'{{ l_node_kubelet_node_labels }}', u'cloud-provider': [u'openstack']}, u'gce': {u'cloud-config': [u"{{ openshift_config_base ~ '/cloudprovider/gce.conf' }}"], u'node-labels': u'{{ l_node_kubelet_node_labels }}', u'cloud-provider': [u'gce']}, u'aws': {u'cloud-config': [u"{{ openshift_config_base ~ '/cloudprovider/aws.conf' }}"], u'node-labels': u'{{ l_node_kubelet_node_labels }}', u'cloud-provider': [u'aws']}, u'undefined': {u'node-labels': u'{{ l_node_kubelet_node_labels }}'}}: {{ openshift_node_labels | default({}) | lib_utils_oo_dict_to_keqv_list }}: 'unicode' object has no attribute 'items'

               The error appears to have been in '/home/ccallega/git/openshift-ansible/roles/openshift_node/tasks/config.yml': line 26, column 3, but may
               be elsewhere in the file depending on the exact syntax problem.

               The offending line appears to be:


               - file:
                 ^ here

Comment 14 Russell Teague 2018-02-21 15:05:32 UTC
Attempting to reproduce in a test environment.

Comment 15 Russell Teague 2018-02-21 20:01:00 UTC
I was able to reproduce the bug.  The cause is due to the different ways Ansible does automatic type conversion on variables from inventory.  Details of the reproducer are here https://github.com/openshift/openshift-ansible/issues/6897#issuecomment-367451274

Mike, would you please take a look at this to resolve the changes from the facts refactor?

Comment 16 Michael Gugino 2018-02-21 22:55:19 UTC
PR Created: https://github.com/openshift/openshift-ansible/pull/7243

Comment 21 Johnny Liu 2018-03-01 02:13:10 UTC
Setting the following group vars in inventory file:
[nodes:vars]
openshift_node_labels="{'registry' : 'enabled', 'role': 'node', 'router': 'enabled', 'region1': 'infra'}"

Running testing with ansible-2.4.4-0.1.beta1.el7ae.noarch.

Reproduce this bug with openshift-ansible-3.9.0-0.51.0.git.0.e26400f.el7.noarch, and verified it with openshift-ansible-3.9.1-1.git.0.9862628.el7.noarch, and PASS.

Comment 22 Chris Callegari 2018-03-02 14:06:08 UTC
Thanks for pinning this down.

I also confirm that node labels are applied correctly using aws provisioning playbooks.

Comment 23 Chris Callegari 2018-03-02 20:21:06 UTC
This bugzilla needs to be reopened and reinvestigate.  

Here are the host groups in my inventory...
Labels are messed up again

> 1112 # host group for masters
> 1113 [masters]
> 1114
> 1115 [masters:vars]
> 1116 openshift_node_labels="{'host-type': 'master', 'sub-host-type': 'default'}"
> 1117 openshift_schedulable=True
> 1118
> 1119 [etcd:children]
> 1120 masters
> 1121
> 1122 # NOTE: Containerized load balancer hosts are not yet supported, if using a global
> 1123 # containerized=true host variable we must set to false.
> 1124 [routers]
> 1125
> 1126 [routers:vars]
> 1127 openshift_node_labels="{'sub-host-type': 'infra'}"
> 1128 openshift_schedulable=True
> 1129
> 1130 #[lb]
> 1131 #ose3-lb-ansible.test.example.com containerized=false
> 1132
> 1133 [nodes]
> 1134
> 1135 [nodes:children]
> 1136 masters
> 1137 routers
> 1138
> 1139 [nodes:vars]
> 1140 openshift_node_labels="{'sub-host-type': 'compute'}"
> 1141 openshift_schedulable=True
> 1142
> 1143 #[nfs]
> 1144 #ose3-nfs-ansible.test.example.com

Here are the labels...
> # oc get nodes --show-labels=true
> NAME                            STATUS    ROLES     AGE       VERSION             LABELS
> ip-172-31-50-186.ec2.internal   Ready     <none>    2m        v1.9.1+a0ce1bc657  type=infra
> ip-172-31-50-249.ec2.internal   Ready     master    5m        v1.9.1+a0ce1bc657  sub-host-type=default
> ip-172-31-52-218.ec2.internal   Ready     <none>    2m        v1.9.1+a0ce1bc657  type=compute
> ip-172-31-53-189.ec2.internal   Ready     <none>    2m        v1.9.1+a0ce1bc657  type=compute
> ip-172-31-53-228.ec2.internal   Ready     master    5m        v1.9.1+a0ce1bc657  sub-host-type=default
> ip-172-31-54-86.ec2.internal    Ready     <none>    2m        v1.9.1+a0ce1bc657  type=infra
> ip-172-31-55-211.ec2.internal   Ready     master    5m        v1.9.1+a0ce1bc657  sub-host-type=default
> ip-172-31-58-175.ec2.internal   Ready     <none>    2m        v1.9.1+a0ce1bc657  type=infra
> ip-172-31-63-70.ec2.internal    Ready     <none>    2m        v1.9.1+a0ce1bc657  type=compute

Somehow the -'s in the label keys are causing corruption

Comment 24 Chris Callegari 2018-03-05 17:27:37 UTC
Looks like the -'s are not the problem here...

I updated the hosts section of inventory to the following...
[all:vars]
ansible_become=False

[x1]
localhost ansible_become=False ansible_connection=local ansible_ssh_user=ec2-user

[x1:vars]
become=True
ansible_ssh_user=ec2-user

[nodes]

[nodes:children]
masters
routers

[nodes:vars]
openshift_node_labels="{'subhosttype': 'compute'}"
openshift_schedulable=True

[masters]

[masters:vars]
openshift_node_labels="{'host-type': 'master', 'subhosttype': 'default'}"
openshift_schedulable=True

[etcd:children]
masters

[routers]

[routers:vars]
openshift_node_labels="{'subhosttype': 'infra'}"
openshift_schedulable=True

# Create an OSEv3 group that contains the masters and nodes groups
[OSEv3:children]
masters
nodes
etcd
#lb
#nfs

# Set variables common for all OSEv3 hosts
[OSEv3:vars]
Stuff...


Run ansible-playbook playbooks/aws/openshift-cluster/provision_install.yml -e @playbooks/aws/provisioning_vars.yml


# oc get nodes --show-labels=true
NAME                            STATUS    ROLES     AGE       VERSION             LABELS
ip-172-31-51-17.ec2.internal    Ready     <none>    6m        v1.9.1+a0ce1bc657   type=infra
ip-172-31-52-249.ec2.internal   Ready     master    20m       v1.9.1+a0ce1bc657   subhosttype=default
ip-172-31-52-63.ec2.internal    Ready     <none>    3m        v1.9.1+a0ce1bc657   type=compute
ip-172-31-53-46.ec2.internal    Ready     <none>    2m        v1.9.1+a0ce1bc657   type=compute
ip-172-31-55-147.ec2.internal   Ready     <none>    6m        v1.9.1+a0ce1bc657   type=infra
ip-172-31-55-232.ec2.internal   Ready     master    20m       v1.9.1+a0ce1bc657   subhosttype=default
ip-172-31-56-93.ec2.internal    Ready     <none>    3m        v1.9.1+a0ce1bc657   type=compute
ip-172-31-58-208.ec2.internal   Ready     <none>    6m        v1.9.1+a0ce1bc657   type=infra
ip-172-31-62-210.ec2.internal   Ready     master    20m       v1.9.1+a0ce1bc657   subhosttype=default

Comment 25 Michael Gugino 2018-03-05 17:47:32 UTC
(In reply to Chris C from comment #24)
> Looks like the -'s are not the problem here...
> 
> I updated the hosts section of inventory to the following...
> [all:vars]
> ansible_become=False
> 
> [x1]
> localhost ansible_become=False ansible_connection=local
> ansible_ssh_user=ec2-user
> 
> [x1:vars]
> become=True
> ansible_ssh_user=ec2-user
> 
> [nodes]
> 
> [nodes:children]
> masters
> routers
> 
> [nodes:vars]
> openshift_node_labels="{'subhosttype': 'compute'}"
> openshift_schedulable=True
> 
> [masters]
> 
> [masters:vars]
> openshift_node_labels="{'host-type': 'master', 'subhosttype': 'default'}"
> openshift_schedulable=True
> 
> [etcd:children]
> masters
> 
> [routers]
> 
> [routers:vars]
> openshift_node_labels="{'subhosttype': 'infra'}"
> openshift_schedulable=True
> 
> # Create an OSEv3 group that contains the masters and nodes groups
> [OSEv3:children]
> masters
> nodes
> etcd
> #lb
> #nfs
> 
> # Set variables common for all OSEv3 hosts
> [OSEv3:vars]
> Stuff...
> 
> 
> Run ansible-playbook playbooks/aws/openshift-cluster/provision_install.yml
> -e @playbooks/aws/provisioning_vars.yml
> 
> 
> # oc get nodes --show-labels=true
> NAME                            STATUS    ROLES     AGE       VERSION       
> LABELS
> ip-172-31-51-17.ec2.internal    Ready     <none>    6m       
> v1.9.1+a0ce1bc657   type=infra
> ip-172-31-52-249.ec2.internal   Ready     master    20m      
> v1.9.1+a0ce1bc657   subhosttype=default
> ip-172-31-52-63.ec2.internal    Ready     <none>    3m       
> v1.9.1+a0ce1bc657   type=compute
> ip-172-31-53-46.ec2.internal    Ready     <none>    2m       
> v1.9.1+a0ce1bc657   type=compute
> ip-172-31-55-147.ec2.internal   Ready     <none>    6m       
> v1.9.1+a0ce1bc657   type=infra
> ip-172-31-55-232.ec2.internal   Ready     master    20m      
> v1.9.1+a0ce1bc657   subhosttype=default
> ip-172-31-56-93.ec2.internal    Ready     <none>    3m       
> v1.9.1+a0ce1bc657   type=compute
> ip-172-31-58-208.ec2.internal   Ready     <none>    6m       
> v1.9.1+a0ce1bc657   type=infra
> ip-172-31-62-210.ec2.internal   Ready     master    20m      
> v1.9.1+a0ce1bc657   subhosttype=default

The inventory is not valid.

You cannot define openshift_node_labels in nodes:vars and it's children's group vars.

For example:

[nodes:children]
masters

[masters]
host1



Would be the same as writing:
[nodes]
host1

[nodes:vars]
openshift_node_labels = value 1

[masters]
host1

[masters:vars]
openshift_node_labels = value 2

Thus, host1 doesn't know what variables it should use.  Should it use the variables from the master group, or the node group?

Comment 26 Chris Callegari 2018-03-07 15:49:50 UTC
The Ref Arch team has abandoned prospect of integrating the plays under playbooks/aws/openshift-cluster/ into the ref arch documents.  There is too many issues with this code and we've spent far too long on hammering it into a viable shape.  It's also out of line with how infrastructure is getting deployed with other providers.

I'm closing this bugzilla now.


Note You need to log in before you can comment on or make changes to this bug.