Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1506951 - Automatically add container provider failed
Automatically add container provider failed
Status: ASSIGNED
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer (Show other bugs)
3.7.0
Unspecified Unspecified
medium Severity medium
: ---
: 3.7.z
Assigned To: Tim Bielawa
Gaoyun Pei
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-10-27 05:18 EDT by Gaoyun Pei
Modified: 2018-05-29 05:05 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: While adding providers we were assuming users had defined an optional variable 'openshift_master_cluster_public_hostname' Consequence: If the variable was not defined by the users then ansible would raise an undefined variable error and crash Fix: Use 'openshift_master_cluster_public_hostname' if it is defined, otherwise fall-back to using the first master hostname Result: OCP can be added as a container provider with or without openshift_master_cluster_public_hostname being set
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Github openshift/openshift-ansible/pull/5989 None None None 2017-11-02 11:21 EDT

  None (edit)
Description Gaoyun Pei 2017-10-27 05:18:21 EDT
Description of problem:
When trying to add ocp-3.7 cluster as a container provider in cfme with the add_container_provider playbook, it fails as below:
TASK [openshift_management : Ensure the management service route is saved] **************************************************************************************************
ok: [ec2-34-228-240-132.compute-1.amazonaws.com] => {"ansible_facts": {"management_route": "httpd-openshift-management.apps.1027-ji0.qe.rhcloud.com"}, "changed": false}

TASK [openshift_management : Ensure this cluster is a container provider] ***************************************************************************************************
fatal: [ec2-34-228-240-132.compute-1.amazonaws.com]: FAILED! => {"failed": true, "msg": "the field 'args' has an invalid value, which appears to include a variable that is undefined. The error was: 'dict object' has no attribute 'cluster_public_hostname'\n\nThe error appears to have been in '/usr/share/ansible/openshift-ansible/roles/openshift_management/tasks/add_container_provider.yml': line 48, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Ensure this cluster is a container provider\n  ^ here\n"}
	to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-management/add_container_provider.retry
 


Version-Release number of the following components:
openshift-ansible-3.7.0-0.182.0.git.0.23a42dc.el7.noarch

How reproducible:
Always

Steps to Reproduce:
1.After deploying cfme successfully on ocp-3.7 cluster, add the current cluster to cfme as a container provider
ansible-playbook -v -i host/host /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-management/add_container_provider.yml


Actual results:

Expected results:

Additional info:
Comment 1 Tim Bielawa 2017-10-31 14:39:29 EDT
I see what's happening here. During the failing task I am referencing 'openshift.master.cluster_public_hostname'.

I assumed this would get picked up automatically when I called 'openshift_facts' earlier in that task file, but I am seeing now that without 'openshift_master_cluster_public_hostname' set in your inventory, then the value will be empty.

I'll make sure I can reproduce this. After a little code peeping, I think it might be safer to reference the 'openshift.master.api_public_hostname' value instead. I'll see if that works as a potential fix.
Comment 2 Tim Bielawa 2017-11-01 15:10:56 EDT
NEEDINFO:

- Does your inventory have `openshift_master_cluster_public_hostname` set?

----

Given the steps I took to reproduce the bug, I assume it was not set. Your cluster needs a canonical way to reference it from other clients (MIQ, Web Browsers, CURL, etc).

Without `openshift_master_cluster_public_hostname` set, then there is technically no officially designated way to access the frontend of your cluster. While we *could assume* that your first detected master host is your desired API endpoint, that might be foolish and cause more bugs.

The other choices I am looking at are adding validation checks into the code to notify users that `openshift_master_cluster_public_hostname` must be set, or else I may just try to parse the closest default fact I can find to make a best-guess at a working API endpoint.

Which essentially means using the 'openshift.master.cluster_hostname' fact. In this use-case (adding OCP as a container provider) that will default to using the hostname of the first master in your cluster.

I'll try writing up a patch and seeing how it works with `openshift_master_cluster_public_hostname` UNDEFINED in my inventory.
Comment 3 Gaoyun Pei 2017-11-02 03:40:16 EDT
Hi Tim, 

Your assumption is correct, we usually don't set openshift_master_cluster_public_hostname in ansible inventory file unless we were running a native ha-master cluster installation.
Comment 4 Tim Bielawa 2017-11-02 08:56:13 EDT
(In reply to Gaoyun Pei from comment #3)
> Hi Tim, 
> 
> Your assumption is correct, we usually don't set
> openshift_master_cluster_public_hostname in ansible inventory file unless we
> were running a native ha-master cluster installation.

Thank you for clarifying, Gaoyun, I should have a patch on github today.
Comment 5 Scott Dodson 2017-11-02 08:56:41 EDT
(In reply to Tim Bielawa from comment #1)
> I'll make sure I can reproduce this. After a little code peeping, I think it
> might be safer to reference the 'openshift.master.api_public_hostname' value
> instead. I'll see if that works as a potential fix.

I think this is the path to victory.
Comment 6 Tim Bielawa 2017-11-02 11:21:30 EDT
Pull request submitted with bug fix

https://github.com/openshift/openshift-ansible/pull/5989

> The CFME 'automatically add provider' playbook would fail if
> openshift_master_cluster_public_hostname was not defined in the
> inventory. Now we use that value if it is available, and fallback to
> using the masters 'cluster_hostname' otherwise.
Comment 8 Gaoyun Pei 2017-11-07 02:47:04 EST
Hi Tim, met with another error when trying with openshift-ansible-3.7.0-0.196.0.git.0.27cd7ec.el7.noarch

[root@gpei-test-ansible ~]# ansible-playbook -i host /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-management/add_container_provider.yml -v

...

TASK [openshift_management : Ensure we use openshift_master_cluster_public_hostname if it is available] *********************************************************************
skipping: [openshift-128.lab.sjc.redhat.com] => {"changed": false, "skip_reason": "Conditional result was False", "skipped": true}

TASK [openshift_management : Ensure we default to the first master if openshift_master_cluster_public_hostname is unavailable] **********************************************
fatal: [openshift-128.lab.sjc.redhat.com]: FAILED! => {"failed": true, "msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'cluster_hostname'\n\nThe error appears to have been in '/usr/share/ansible/openshift-ansible/roles/openshift_management/tasks/add_container_provider.yml': line 19, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Ensure we default to the first master if openshift_master_cluster_public_hostname is unavailable\n  ^ here\n\nexception type: <class 'ansible.errors.AnsibleUndefinedVariable'>\nexception: 'dict object' has no attribute 'cluster_hostname'"}
	to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-management/add_container_provider.retry
Comment 9 Scott Dodson 2017-11-07 14:49:36 EST
I've moved this to 3.7.z as CFME 4.6 is beta until it's release next year. we'll fix this up post 3.7 GA.

Note You need to log in before you can comment on or make changes to this bug.