1506951 – Automatically add container provider failed

Bug 1506951 - Automatically add container provider failed

Summary: Automatically add container provider failed

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	3.7.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	3.7.z
Assignee:	Tim Bielawa
QA Contact:	Gaoyun Pei
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-10-27 09:18 UTC by Gaoyun Pei
Modified:	2018-11-15 15:00 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: While adding providers we were assuming users had defined an optional variable 'openshift_master_cluster_public_hostname' Consequence: If the variable was not defined by the users then ansible would raise an undefined variable error and crash Fix: Use 'openshift_master_cluster_public_hostname' if it is defined, otherwise fall-back to using the first master hostname Result: OCP can be added as a container provider with or without openshift_master_cluster_public_hostname being set
Clone Of:
Environment:
Last Closed:	2018-11-15 15:00:34 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift openshift-ansible pull 5989	0	None	None	None	2017-11-02 15:21:30 UTC

Description Gaoyun Pei 2017-10-27 09:18:21 UTC

Description of problem:
When trying to add ocp-3.7 cluster as a container provider in cfme with the add_container_provider playbook, it fails as below:
TASK [openshift_management : Ensure the management service route is saved] **************************************************************************************************
ok: [ec2-34-228-240-132.compute-1.amazonaws.com] => {"ansible_facts": {"management_route": "httpd-openshift-management.apps.1027-ji0.qe.rhcloud.com"}, "changed": false}

TASK [openshift_management : Ensure this cluster is a container provider] ***************************************************************************************************
fatal: [ec2-34-228-240-132.compute-1.amazonaws.com]: FAILED! => {"failed": true, "msg": "the field 'args' has an invalid value, which appears to include a variable that is undefined. The error was: 'dict object' has no attribute 'cluster_public_hostname'\n\nThe error appears to have been in '/usr/share/ansible/openshift-ansible/roles/openshift_management/tasks/add_container_provider.yml': line 48, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Ensure this cluster is a container provider\n  ^ here\n"}
	to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-management/add_container_provider.retry
 


Version-Release number of the following components:
openshift-ansible-3.7.0-0.182.0.git.0.23a42dc.el7.noarch

How reproducible:
Always

Steps to Reproduce:
1.After deploying cfme successfully on ocp-3.7 cluster, add the current cluster to cfme as a container provider
ansible-playbook -v -i host/host /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-management/add_container_provider.yml


Actual results:

Expected results:

Additional info:

Comment 1 Tim Bielawa 2017-10-31 18:39:29 UTC

I see what's happening here. During the failing task I am referencing 'openshift.master.cluster_public_hostname'.

I assumed this would get picked up automatically when I called 'openshift_facts' earlier in that task file, but I am seeing now that without 'openshift_master_cluster_public_hostname' set in your inventory, then the value will be empty.

I'll make sure I can reproduce this. After a little code peeping, I think it might be safer to reference the 'openshift.master.api_public_hostname' value instead. I'll see if that works as a potential fix.

Comment 2 Tim Bielawa 2017-11-01 19:10:56 UTC

NEEDINFO:

- Does your inventory have `openshift_master_cluster_public_hostname` set?

----

Given the steps I took to reproduce the bug, I assume it was not set. Your cluster needs a canonical way to reference it from other clients (MIQ, Web Browsers, CURL, etc).

Without `openshift_master_cluster_public_hostname` set, then there is technically no officially designated way to access the frontend of your cluster. While we *could assume* that your first detected master host is your desired API endpoint, that might be foolish and cause more bugs.

The other choices I am looking at are adding validation checks into the code to notify users that `openshift_master_cluster_public_hostname` must be set, or else I may just try to parse the closest default fact I can find to make a best-guess at a working API endpoint.

Which essentially means using the 'openshift.master.cluster_hostname' fact. In this use-case (adding OCP as a container provider) that will default to using the hostname of the first master in your cluster.

I'll try writing up a patch and seeing how it works with `openshift_master_cluster_public_hostname` UNDEFINED in my inventory.

Comment 3 Gaoyun Pei 2017-11-02 07:40:16 UTC

Hi Tim, 

Your assumption is correct, we usually don't set openshift_master_cluster_public_hostname in ansible inventory file unless we were running a native ha-master cluster installation.

Comment 4 Tim Bielawa 2017-11-02 12:56:13 UTC

(In reply to Gaoyun Pei from comment #3)
> Hi Tim, 
> 
> Your assumption is correct, we usually don't set
> openshift_master_cluster_public_hostname in ansible inventory file unless we
> were running a native ha-master cluster installation.

Thank you for clarifying, Gaoyun, I should have a patch on github today.

Comment 5 Scott Dodson 2017-11-02 12:56:41 UTC

(In reply to Tim Bielawa from comment #1)
> I'll make sure I can reproduce this. After a little code peeping, I think it
> might be safer to reference the 'openshift.master.api_public_hostname' value
> instead. I'll see if that works as a potential fix.

I think this is the path to victory.

Comment 6 Tim Bielawa 2017-11-02 15:21:30 UTC

Pull request submitted with bug fix

https://github.com/openshift/openshift-ansible/pull/5989

> The CFME 'automatically add provider' playbook would fail if
> openshift_master_cluster_public_hostname was not defined in the
> inventory. Now we use that value if it is available, and fallback to
> using the masters 'cluster_hostname' otherwise.

Comment 8 Gaoyun Pei 2017-11-07 07:47:04 UTC

Hi Tim, met with another error when trying with openshift-ansible-3.7.0-0.196.0.git.0.27cd7ec.el7.noarch

[root@gpei-test-ansible ~]# ansible-playbook -i host /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-management/add_container_provider.yml -v

...

TASK [openshift_management : Ensure we use openshift_master_cluster_public_hostname if it is available] *********************************************************************
skipping: [openshift-128.lab.sjc.redhat.com] => {"changed": false, "skip_reason": "Conditional result was False", "skipped": true}

TASK [openshift_management : Ensure we default to the first master if openshift_master_cluster_public_hostname is unavailable] **********************************************
fatal: [openshift-128.lab.sjc.redhat.com]: FAILED! => {"failed": true, "msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'cluster_hostname'\n\nThe error appears to have been in '/usr/share/ansible/openshift-ansible/roles/openshift_management/tasks/add_container_provider.yml': line 19, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Ensure we default to the first master if openshift_master_cluster_public_hostname is unavailable\n  ^ here\n\nexception type: <class 'ansible.errors.AnsibleUndefinedVariable'>\nexception: 'dict object' has no attribute 'cluster_hostname'"}
	to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-management/add_container_provider.retry

Comment 9 Scott Dodson 2017-11-07 19:49:36 UTC

I've moved this to 3.7.z as CFME 4.6 is beta until it's release next year. we'll fix this up post 3.7 GA.

Comment 11 Russell Teague 2018-11-15 15:00:34 UTC

There are no active cases related to this bug. As such we're closing this bug in order to focus on bugs that are still tied to active customer cases. Please re-open this bug if this bug becomes relevant to an open customer case.

Note You need to log in before you can comment on or make changes to this bug.