Bug 1629394 - deploy_cluster playbook fails on openshift_node/tasks/config.yml execution
Summary: deploy_cluster playbook fails on openshift_node/tasks/config.yml execution
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OKD
Classification: Red Hat
Component: Installer
Version: 3.x
Hardware: x86_64
OS: Linux
low
medium
Target Milestone: ---
: 3.x
Assignee: Scott Dodson
QA Contact: Johnny Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-09-15 16:36 UTC by Jurijs Kolomijecs
Modified: 2018-09-18 12:41 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-09-18 12:41:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
inventory file (deleted)
2018-09-15 16:36 UTC, Jurijs Kolomijecs
no flags Details
native on Fedora log (426.26 KB, text/plain)
2018-09-15 16:54 UTC, Jurijs Kolomijecs
no flags Details
docker log (392.46 KB, text/plain)
2018-09-15 16:55 UTC, Jurijs Kolomijecs
no flags Details
inventory file fixed (1.52 KB, text/plain)
2018-09-15 18:42 UTC, Jurijs Kolomijecs
no flags Details

Description Jurijs Kolomijecs 2018-09-15 16:36:50 UTC
Description of problem:
I'm unable to install openshift origin as explained in here: https://docs.okd.io/latest/install/running_install.html#running-the-advanced-installation-rpm (step 2: Run the deploy_cluster.yml playbook to initiate the cluster installation:)

Version-Release number of selected component (if applicable):
openshift v3.10

How reproducible:
Always

Steps to Reproduce (native on Fedora):
1. Install ansible (ansible-2.6.4-1.fc28.noarch) 
2. Clone openshift-ansible repo (https://github.com/openshift/openshift-ansible)
3. Prepare inventory file (see my inventory in attachment)
4. Execute: ansible-playbook -vvv -i inventory/dev_hosts playbooks/deploy_cluster.yml

Steps to Reproduce (docker: openshift/origin-ansible:v3.10):
1. Install docker-ce (docker-ce-18.06.1.ce-3.fc28.x86_64)
2. Prepare inventory file (see my inventory in attachment)
3. Execute: docker run -u `id -u` -v $HOME/.ssh/jurikolo_rsa:/opt/app-root/src/.ssh/id_rsa:Z,ro -v $HOME/git/openshift-ansible/inventory/dev_hosts:/tmp/inventory:ro -e INVENTORY_FILE=/tmp/inventory -e OPTS="-vvv" -e PLAYBOOK_FILE=playbooks/deploy_cluster.yml openshift/origin-ansible:v3.10

Actual results:
Playbook fails with error:
Message:  The conditional check '('config' in l2_openshift_node_kubelet_args) | bool' failed. The error was: An unhandled exception occurred while templating '{{ openshift_node_kubelet_args_dict[openshift_cloudprovider_kind | default('undefined')] }}'. Error was a <class 'ansible.errors.AnsibleError'>, original message: An unhandled exception occurred while templating '{u'vsphere': {u'cloud-config': [u"{{ openshift_config_base ~ '/cloudprovider/vsphere.conf' }}"], u'node-labels': u'{{ l_node_kubelet_node_labels }}', u'cloud-provider': [u'vsphere']}, u'gce': {u'cloud-config': [u"{{ openshift_config_base ~ '/cloudprovider/gce.conf' }}"], u'node-labels': u'{{ l_node_kubelet_node_labels }}', u'cloud-provider': [u'gce']}, u'undefined': {u'node-labels': u'{{ l_node_kubelet_node_labels }}'}, u'aws': {u'cloud-config': [u"{{ openshift_config_base ~ '/cloudprovider/aws.conf' }}"], u'node-labels': u'{{ l_node_kubelet_node_labels }}', u'cloud-provider': [u'aws']}, u'azure': {u'cloud-config': [u"{{ openshift_config_base ~ '/cloudprovider/azure.conf' }}"], u'node-labels': u'{{ l_node_kubelet_node_labels }}', u'cloud-provider': [u'azure']}, u'openstack': {u'cloud-config': [u"{{ openshift_config_base ~ '/cloudprovider/openstack.conf' }}"], u'node-labels': u'{{ l_node_kubelet_node_labels }}', u'cloud-provider': [u'openstack']}}'. Error was a <class 'ansible.errors.AnsibleError'>, original message: An unhandled exception occurred while templating '{{ openshift_node_labels | default({}) | lib_utils_oo_dict_to_keqv_list }}'. Error was a <type 'exceptions.SyntaxError'>, original message: invalid syntax (<unknown>, line 1)

The file and a line of code that causes issues: https://github.com/openshift/openshift-ansible/blob/release-3.10/roles/openshift_node/tasks/config.yml#L21

Expected results:
Installation continues

Additional info:
The servers for openshift are CentOS VMs in Azure
cat /etc/redhat-release 
CentOS Linux release 7.5.1804 (Core)

Comment 1 Jurijs Kolomijecs 2018-09-15 16:54:46 UTC
Created attachment 1483565 [details]
native on Fedora log

Comment 2 Jurijs Kolomijecs 2018-09-15 16:55:32 UTC
Created attachment 1483566 [details]
docker log

Comment 3 Jurijs Kolomijecs 2018-09-15 18:41:49 UTC
Apparently the issue gone with modified inventory file, where instead of:
[nodes]
40.85.82.159 openshift_node_group_name="node-config-master" openshift_node_labels="{'region': 'infra', 'zone': 'default'}" openshift_public_ip=40.85.82.159 openshift_ip=10.0.0.5 openshift_public_hostname=40.85.82.159 openshift_hostname=openshift-1 openshift_scheduleable=false
40.87.134.175 openshift_node_group_name="node-config-compute" openshift_node_labels="{'region': 'primary', 'zone': 'east'}"openshift_public_ip=40.87.134.175 openshift_ip=10.0.0.6 openshift_public_hostname=40.87.134.175 openshift_hostname=openshift-2

I removed most of extra data and left just this:
[nodes]
40.85.82.159 openshift_node_group_name='node-config-master'
40.87.134.175 openshift_node_group_name='node-config-compute'

I believe there should be a validation for the inventory file or user-friendly error message. Full fixed inventory file in attachment.

Comment 4 Jurijs Kolomijecs 2018-09-15 18:42:18 UTC
Created attachment 1483570 [details]
inventory file fixed

Comment 5 Scott Dodson 2018-09-17 12:41:10 UTC
Are you sure it's not just the lack of a space between openshift_node_labels and openshift_public_ip definition in your second host?

Comment 6 Jurijs Kolomijecs 2018-09-18 08:14:23 UTC
(In reply to Scott Dodson from comment #5)
> Are you sure it's not just the lack of a space between openshift_node_labels
> and openshift_public_ip definition in your second host?

For some reason original inventory file is removed and I didn't save it locally. But if the reason is mistype, then feel free to close the issue.


Note You need to log in before you can comment on or make changes to this bug.