Red Hat Bugzilla – Bug 1506951
Automatically add container provider failed
Last modified: 2018-05-29 05:05:45 EDT
Description of problem: When trying to add ocp-3.7 cluster as a container provider in cfme with the add_container_provider playbook, it fails as below: TASK [openshift_management : Ensure the management service route is saved] ************************************************************************************************** ok: [ec2-34-228-240-132.compute-1.amazonaws.com] => {"ansible_facts": {"management_route": "httpd-openshift-management.apps.1027-ji0.qe.rhcloud.com"}, "changed": false} TASK [openshift_management : Ensure this cluster is a container provider] *************************************************************************************************** fatal: [ec2-34-228-240-132.compute-1.amazonaws.com]: FAILED! => {"failed": true, "msg": "the field 'args' has an invalid value, which appears to include a variable that is undefined. The error was: 'dict object' has no attribute 'cluster_public_hostname'\n\nThe error appears to have been in '/usr/share/ansible/openshift-ansible/roles/openshift_management/tasks/add_container_provider.yml': line 48, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Ensure this cluster is a container provider\n ^ here\n"} to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-management/add_container_provider.retry Version-Release number of the following components: openshift-ansible-3.7.0-0.182.0.git.0.23a42dc.el7.noarch How reproducible: Always Steps to Reproduce: 1.After deploying cfme successfully on ocp-3.7 cluster, add the current cluster to cfme as a container provider ansible-playbook -v -i host/host /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-management/add_container_provider.yml Actual results: Expected results: Additional info:
I see what's happening here. During the failing task I am referencing 'openshift.master.cluster_public_hostname'. I assumed this would get picked up automatically when I called 'openshift_facts' earlier in that task file, but I am seeing now that without 'openshift_master_cluster_public_hostname' set in your inventory, then the value will be empty. I'll make sure I can reproduce this. After a little code peeping, I think it might be safer to reference the 'openshift.master.api_public_hostname' value instead. I'll see if that works as a potential fix.
NEEDINFO: - Does your inventory have `openshift_master_cluster_public_hostname` set? ---- Given the steps I took to reproduce the bug, I assume it was not set. Your cluster needs a canonical way to reference it from other clients (MIQ, Web Browsers, CURL, etc). Without `openshift_master_cluster_public_hostname` set, then there is technically no officially designated way to access the frontend of your cluster. While we *could assume* that your first detected master host is your desired API endpoint, that might be foolish and cause more bugs. The other choices I am looking at are adding validation checks into the code to notify users that `openshift_master_cluster_public_hostname` must be set, or else I may just try to parse the closest default fact I can find to make a best-guess at a working API endpoint. Which essentially means using the 'openshift.master.cluster_hostname' fact. In this use-case (adding OCP as a container provider) that will default to using the hostname of the first master in your cluster. I'll try writing up a patch and seeing how it works with `openshift_master_cluster_public_hostname` UNDEFINED in my inventory.
Hi Tim, Your assumption is correct, we usually don't set openshift_master_cluster_public_hostname in ansible inventory file unless we were running a native ha-master cluster installation.
(In reply to Gaoyun Pei from comment #3) > Hi Tim, > > Your assumption is correct, we usually don't set > openshift_master_cluster_public_hostname in ansible inventory file unless we > were running a native ha-master cluster installation. Thank you for clarifying, Gaoyun, I should have a patch on github today.
(In reply to Tim Bielawa from comment #1) > I'll make sure I can reproduce this. After a little code peeping, I think it > might be safer to reference the 'openshift.master.api_public_hostname' value > instead. I'll see if that works as a potential fix. I think this is the path to victory.
Pull request submitted with bug fix https://github.com/openshift/openshift-ansible/pull/5989 > The CFME 'automatically add provider' playbook would fail if > openshift_master_cluster_public_hostname was not defined in the > inventory. Now we use that value if it is available, and fallback to > using the masters 'cluster_hostname' otherwise.
Hi Tim, met with another error when trying with openshift-ansible-3.7.0-0.196.0.git.0.27cd7ec.el7.noarch [root@gpei-test-ansible ~]# ansible-playbook -i host /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-management/add_container_provider.yml -v ... TASK [openshift_management : Ensure we use openshift_master_cluster_public_hostname if it is available] ********************************************************************* skipping: [openshift-128.lab.sjc.redhat.com] => {"changed": false, "skip_reason": "Conditional result was False", "skipped": true} TASK [openshift_management : Ensure we default to the first master if openshift_master_cluster_public_hostname is unavailable] ********************************************** fatal: [openshift-128.lab.sjc.redhat.com]: FAILED! => {"failed": true, "msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'cluster_hostname'\n\nThe error appears to have been in '/usr/share/ansible/openshift-ansible/roles/openshift_management/tasks/add_container_provider.yml': line 19, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Ensure we default to the first master if openshift_master_cluster_public_hostname is unavailable\n ^ here\n\nexception type: <class 'ansible.errors.AnsibleUndefinedVariable'>\nexception: 'dict object' has no attribute 'cluster_hostname'"} to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-management/add_container_provider.retry
I've moved this to 3.7.z as CFME 4.6 is beta until it's release next year. we'll fix this up post 3.7 GA.