Description of problem: When trying to add ocp-3.7 cluster as a container provider in cfme with the add_container_provider playbook, it fails as below: TASK [openshift_management : Ensure the management service route is saved] ************************************************************************************************** ok: [ec2-34-228-240-132.compute-1.amazonaws.com] => {"ansible_facts": {"management_route": "httpd-openshift-management.apps.1027-ji0.qe.rhcloud.com"}, "changed": false} TASK [openshift_management : Ensure this cluster is a container provider] *************************************************************************************************** fatal: [ec2-34-228-240-132.compute-1.amazonaws.com]: FAILED! => {"failed": true, "msg": "the field 'args' has an invalid value, which appears to include a variable that is undefined. The error was: 'dict object' has no attribute 'cluster_public_hostname'\n\nThe error appears to have been in '/usr/share/ansible/openshift-ansible/roles/openshift_management/tasks/add_container_provider.yml': line 48, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Ensure this cluster is a container provider\n ^ here\n"} to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-management/add_container_provider.retry Version-Release number of the following components: openshift-ansible-3.7.0-0.182.0.git.0.23a42dc.el7.noarch How reproducible: Always Steps to Reproduce: 1.After deploying cfme successfully on ocp-3.7 cluster, add the current cluster to cfme as a container provider ansible-playbook -v -i host/host /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-management/add_container_provider.yml Actual results: Expected results: Additional info:
I see what's happening here. During the failing task I am referencing 'openshift.master.cluster_public_hostname'. I assumed this would get picked up automatically when I called 'openshift_facts' earlier in that task file, but I am seeing now that without 'openshift_master_cluster_public_hostname' set in your inventory, then the value will be empty. I'll make sure I can reproduce this. After a little code peeping, I think it might be safer to reference the 'openshift.master.api_public_hostname' value instead. I'll see if that works as a potential fix.
NEEDINFO: - Does your inventory have `openshift_master_cluster_public_hostname` set? ---- Given the steps I took to reproduce the bug, I assume it was not set. Your cluster needs a canonical way to reference it from other clients (MIQ, Web Browsers, CURL, etc). Without `openshift_master_cluster_public_hostname` set, then there is technically no officially designated way to access the frontend of your cluster. While we *could assume* that your first detected master host is your desired API endpoint, that might be foolish and cause more bugs. The other choices I am looking at are adding validation checks into the code to notify users that `openshift_master_cluster_public_hostname` must be set, or else I may just try to parse the closest default fact I can find to make a best-guess at a working API endpoint. Which essentially means using the 'openshift.master.cluster_hostname' fact. In this use-case (adding OCP as a container provider) that will default to using the hostname of the first master in your cluster. I'll try writing up a patch and seeing how it works with `openshift_master_cluster_public_hostname` UNDEFINED in my inventory.
Hi Tim, Your assumption is correct, we usually don't set openshift_master_cluster_public_hostname in ansible inventory file unless we were running a native ha-master cluster installation.
(In reply to Gaoyun Pei from comment #3) > Hi Tim, > > Your assumption is correct, we usually don't set > openshift_master_cluster_public_hostname in ansible inventory file unless we > were running a native ha-master cluster installation. Thank you for clarifying, Gaoyun, I should have a patch on github today.
(In reply to Tim Bielawa from comment #1) > I'll make sure I can reproduce this. After a little code peeping, I think it > might be safer to reference the 'openshift.master.api_public_hostname' value > instead. I'll see if that works as a potential fix. I think this is the path to victory.
Pull request submitted with bug fix https://github.com/openshift/openshift-ansible/pull/5989 > The CFME 'automatically add provider' playbook would fail if > openshift_master_cluster_public_hostname was not defined in the > inventory. Now we use that value if it is available, and fallback to > using the masters 'cluster_hostname' otherwise.
Hi Tim, met with another error when trying with openshift-ansible-3.7.0-0.196.0.git.0.27cd7ec.el7.noarch [root@gpei-test-ansible ~]# ansible-playbook -i host /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-management/add_container_provider.yml -v ... TASK [openshift_management : Ensure we use openshift_master_cluster_public_hostname if it is available] ********************************************************************* skipping: [openshift-128.lab.sjc.redhat.com] => {"changed": false, "skip_reason": "Conditional result was False", "skipped": true} TASK [openshift_management : Ensure we default to the first master if openshift_master_cluster_public_hostname is unavailable] ********************************************** fatal: [openshift-128.lab.sjc.redhat.com]: FAILED! => {"failed": true, "msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'cluster_hostname'\n\nThe error appears to have been in '/usr/share/ansible/openshift-ansible/roles/openshift_management/tasks/add_container_provider.yml': line 19, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Ensure we default to the first master if openshift_master_cluster_public_hostname is unavailable\n ^ here\n\nexception type: <class 'ansible.errors.AnsibleUndefinedVariable'>\nexception: 'dict object' has no attribute 'cluster_hostname'"} to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-management/add_container_provider.retry
I've moved this to 3.7.z as CFME 4.6 is beta until it's release next year. we'll fix this up post 3.7 GA.
There are no active cases related to this bug. As such we're closing this bug in order to focus on bugs that are still tied to active customer cases. Please re-open this bug if this bug becomes relevant to an open customer case.