Created attachment 1381434 [details] Full ansible log including the additional debug message Description of problem: The openshift-metrics role does not detect error from the "oc create " when applying templates for ReplicationControllers. With an incorrect inventory, this results in a completed install, showing no errors, but the RC is not existing afterwards. This causes great confusion when deploying metrics through the installer. Version-Release number of the following components: rpm -q openshift-ansible openshift-ansible-3.7.14-1.git.0.4b35b2d.el7.noarch rpm -q ansible # rpm -q ansible ansible-2.4.1.0-1.el7.noarch ansible --version [root@bastion ~]# ansible --version ansible 2.4.1.0 config file = /etc/ansible/ansible.cfg configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules'] ansible python module location = /usr/lib/python2.7/site-packages/ansible executable location = /bin/ansible python version = 2.7.5 (default, May 3 2017, 07:55:04) [GCC 4.8.5 20150623 (Red Hat 4.8.5-14)] How reproducible: Steps to Reproduce: 1. Set wrong settings in the inventory file. ### Metrics ### openshift_metrics_install_metrics=true --- #- Heapster openshift_metrics_heapster_nodeselector={'region': 'infra'} openshift_metrics_heapster_requests_memory=4Gi <-- request is larger than limit openshift_metrics_heapster_limits_memory=1Gi openshift_metrics_heapster_requests_cpu=1000m openshift_metrics_heapster_limits_cpu=1000m 2. Run the openshift-metrics playbook # ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-metrics.yml Actual results: The installer finishes without indicating any errors, although the command in Apply task in roles/openshift_metrics/tasks/oc_apply.yaml fails. Expected results: The installer should fail or warn that applying the tempalte failed and display the errors message from the "oc apply". Additional info: I've added a debug task that displays the output from the apply command in roles/openshift_metrics/tasks/oc_apply.yaml. TASK [openshift_metrics : Applying /tmp/openshift-metrics-ansible-8ypfqX/templates/metrics-heapster-rc.yaml] ************************************************************************************************** ok: [master1.openshift.mitzicom.int.m0sslab.org] TASK [openshift_metrics : Prinout apply] ********************************************************************************************************************************************************************** ok: [master1.openshift.mitzicom.int.m0sslab.org] => { "msg": { "changed": false, "cmd": [ "oc", "--config=/tmp/openshift-metrics-ansible-8ypfqX/admin.kubeconfig", "apply", "-f", "/tmp/openshift-metrics-ansible-8ypfqX/templates/metrics-heapster-rc.yaml", "-n", "openshift-infra" ], "delta": "0:00:02.524603", "end": "2018-01-12 17:36:20.996628", "failed": false, "failed_when_result": false, "msg": "non-zero return code", "rc": 1, "start": "2018-01-12 17:36:18.472025", "stderr": "The ReplicationController \"heapster\" is invalid: spec.template.spec.containers[0].resources.limits: Invalid value: \"1Gi\": must be greater than or equal to memory request", "stderr_lines": [ "The ReplicationController \"heapster\" is invalid: spec.template.spec.containers[0].resources.limits: Invalid value: \"1Gi\": must be greater than or equal to memory request" ], "stdout": "", "stdout_lines": [] } } TASK [openshift_metrics : Determine change status of ReplicationController heapster] ************************************************************************************************************************** ok: [master1.openshift.mitzicom.int.m0sslab.org] Please attach logs from ansible-playbook with the -vvv flag
A suggestion would be to expand the failed when: to include "Invalid" as well, which would catch this specific error. https://github.com/openshift/openshift-ansible/blob/8de2b99d915fadc6579320b7b74354ab33213067/roles/openshift_metrics/tasks/oc_apply.yaml#L19
Created a PR for master: https://github.com/openshift/openshift-ansible/pull/6751 It seems the safest way to check rc code
Created PR for 3.7: https://github.com/openshift/openshift-ansible/pull/6769
Since there are two pull requests i've marked this BZ as 3.9.0 target release and moved it to modified ON_QA so they can begin testing it in 3.9. Once verified we can backport the fix and clone the bug.
Set the paramters in inventory openshift_metrics_heapster_nodeselector={'region': 'infra'} openshift_metrics_heapster_requests_memory=4Gi <-- request is larger than limit openshift_metrics_heapster_limits_memory=1Gi openshift_metrics_heapster_requests_cpu=1000m openshift_metrics_heapster_limits_cpu=1000m although there is warn info in log, the installation process is passed, there is not fail or warn that applying the tempalte failed and display the errors message from the "oc apply". *********************************************************** "msg": "non-zero return code", "rc": 1, "start": "2018-01-22 08:47:32.080078", "stderr": "The ReplicationController \"heapster\" is invalid: spec.template.spec.containers[0].resources.requests: Invalid value: \"4Gi\": must be less than or equal to memory limit", "stderr_lines": [ "The ReplicationController \"heapster\" is invalid: spec.template.spec.containers[0].resources.requests: Invalid value: \"4Gi\": must be less than or equal to memory limit" ], "stdout": "", "stdout_lines": [] } ********************************************************** PLAY RECAP ******************************************************************************* MASTER : ok=22 changed=0 unreachable=0 failed=0 NODE : ok=198 changed=44 unreachable=0 failed=0 localhost : ok=12 changed=0 unreachable=0 failed=0 INSTALLER STATUS ******************************************************************************** Initialization : Complete (0:00:21) Metrics Install : Complete (0:01:20) more info please the ansible log # rpm -qa | grep openshift-ansible openshift-ansible-docs-3.9.0-0.22.0.git.0.0e9d896.el7.noarch openshift-ansible-playbooks-3.9.0-0.22.0.git.0.0e9d896.el7.noarch openshift-ansible-3.9.0-0.22.0.git.0.0e9d896.el7.noarch openshift-ansible-roles-3.9.0-0.22.0.git.0.0e9d896.el7.noarch
Created attachment 1384448 [details] ansible log
(In reply to Junqi Zhao from comment #5) > Set the paramters in inventory > openshift_metrics_heapster_nodeselector={'region': 'infra'} > openshift_metrics_heapster_requests_memory=4Gi <-- request is larger than > limit > openshift_metrics_heapster_limits_memory=1Gi > openshift_metrics_heapster_requests_cpu=1000m > openshift_metrics_heapster_limits_cpu=1000m > > although there is warn info in log, the installation process is passed, > there is not fail or warn that applying the tempalte failed and display the > errors message from the "oc apply". Good catch, Ansible considered these clauses as 'ANDs' while I expected those to be ORs. Create PR https://github.com/openshift/openshift-ansible/pull/6815 to fix this
Commit pushed to master at https://github.com/openshift/openshift-ansible https://github.com/openshift/openshift-ansible/commit/0344a8f71af49cdf3827ccf6be339097369c12f7 Merge pull request #6815 from vrutkovs/failed_when-rc-to-int Automatic merge from submit-queue. failed_when lists are implicitely ANDs, not ORs For some reason I believed `failed_when` lists are considered to be ORs, but it turns out these are ANDs. Fixes bug 1534538 Relates to #6751
The fix is available in openshift-ansible-3.9.0-0.24.0.git.0.735690f.el7
Tested again with openshift-ansible-3.9.0-0.24.0.git.0.735690f.el7.noarch, used the same settings in Comment 7, and tested other similar scenarios, such as openshift_metrics_hawkular_requests_memory > openshift_metrics_hawkular_limits_memory. The installer fail and warn that applying the tempalte failed and display the errors message from the "oc apply". "stderr": "The ReplicationController \"heapster\" is invalid: spec.template.spec.containers[0].resources.requests: Invalid value: \"4Gi\": must be less than or equal to memory limit", "stderr_lines": [ "The ReplicationController \"heapster\" is invalid: spec.template.spec.containers[0].resources.requests: Invalid value: \"4Gi\": must be less than or equal to memory limit" ], "stdout": "", "stdout_lines": [] } to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/openshift-metrics/config.retry PLAY RECAP ******************************************************************************* MASTER : ok=148 changed=26 unreachable=0 failed=1 NODE : ok=24 changed=0 unreachable=0 failed=0 Attached the ansible logs
Created attachment 1386402 [details] ansible running log, it throws out the failed error
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3748