Bug 1534538

Summary: Metrics Apply continues when it should fail
Product: OpenShift Container Platform Reporter: Sebastian Mossberg <smossber>
Component: InstallerAssignee: Vadim Rutkovsky <vrutkovs>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.7.0CC: aos-bugs, jokerman, mmccomas, tlarsson, vrutkovs
Target Milestone: ---   
Target Release: 3.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Playbook for openshift-metrics component was not verifying the return code of some commands Consequence: Some incorrect parameters could be used to setup metrics. The playbook succedeed but metrics was incorrectly setup Fix: The playbook verifies exit code for `oc apply` commands Result: Installation fails when incorrect commands are used
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-12-13 19:26:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Full ansible log including the additional debug message
none
ansible log
none
ansible running log, it throws out the failed error none

Description Sebastian Mossberg 2018-01-15 13:08:19 UTC
Created attachment 1381434 [details]
Full ansible log including the additional debug message

Description of problem:

The openshift-metrics role does not detect error from the "oc create " when applying templates for ReplicationControllers. With an incorrect inventory, this results in a completed install, showing no errors, but the RC is not existing afterwards.  

This causes great confusion when deploying metrics through the installer. 

Version-Release number of the following components:
rpm -q openshift-ansible
openshift-ansible-3.7.14-1.git.0.4b35b2d.el7.noarch

rpm -q ansible
# rpm -q ansible
ansible-2.4.1.0-1.el7.noarch

ansible --version
[root@bastion ~]# ansible --version
ansible 2.4.1.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /bin/ansible
  python version = 2.7.5 (default, May  3 2017, 07:55:04) [GCC 4.8.5 20150623 (Red Hat 4.8.5-14)]

How reproducible:

Steps to Reproduce:
1. Set wrong settings in the inventory file.
### Metrics ###
openshift_metrics_install_metrics=true

---
#- Heapster
openshift_metrics_heapster_nodeselector={'region': 'infra'}

openshift_metrics_heapster_requests_memory=4Gi  <-- request is larger than limit
openshift_metrics_heapster_limits_memory=1Gi
openshift_metrics_heapster_requests_cpu=1000m
openshift_metrics_heapster_limits_cpu=1000m

2. Run the openshift-metrics playbook
# ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-metrics.yml


Actual results:
The installer finishes without indicating any errors, although the command in Apply task in roles/openshift_metrics/tasks/oc_apply.yaml fails.

Expected results:
The installer should fail or warn that applying the tempalte failed and display the errors message from the "oc apply". 



Additional info:
I've added a debug task that displays the output from the apply command in roles/openshift_metrics/tasks/oc_apply.yaml.

TASK [openshift_metrics : Applying /tmp/openshift-metrics-ansible-8ypfqX/templates/metrics-heapster-rc.yaml] **************************************************************************************************
ok: [master1.openshift.mitzicom.int.m0sslab.org]

TASK [openshift_metrics : Prinout apply] **********************************************************************************************************************************************************************
ok: [master1.openshift.mitzicom.int.m0sslab.org] => {
"msg": {
"changed": false,
"cmd": [
"oc",
"--config=/tmp/openshift-metrics-ansible-8ypfqX/admin.kubeconfig",
"apply",
"-f",
"/tmp/openshift-metrics-ansible-8ypfqX/templates/metrics-heapster-rc.yaml",
"-n",
"openshift-infra"
],
"delta": "0:00:02.524603",
"end": "2018-01-12 17:36:20.996628",
"failed": false,
"failed_when_result": false,
"msg": "non-zero return code",
"rc": 1,
"start": "2018-01-12 17:36:18.472025",
"stderr": "The ReplicationController \"heapster\" is invalid: spec.template.spec.containers[0].resources.limits: Invalid value: \"1Gi\": must be greater than or equal to memory request", 
"stderr_lines": [
"The ReplicationController \"heapster\" is invalid: spec.template.spec.containers[0].resources.limits: Invalid value: \"1Gi\": must be greater than or equal to memory request" 
],
"stdout": "",
"stdout_lines": []
}
}

TASK [openshift_metrics : Determine change status of ReplicationController heapster] **************************************************************************************************************************
ok: [master1.openshift.mitzicom.int.m0sslab.org]


Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Sebastian Mossberg 2018-01-15 13:11:33 UTC
A suggestion would be to expand the failed when: to include "Invalid" as well, which would catch this specific error.

 https://github.com/openshift/openshift-ansible/blob/8de2b99d915fadc6579320b7b74354ab33213067/roles/openshift_metrics/tasks/oc_apply.yaml#L19

Comment 2 Vadim Rutkovsky 2018-01-17 11:18:44 UTC
Created a PR for master: https://github.com/openshift/openshift-ansible/pull/6751

It seems the safest way to check rc code

Comment 3 Vadim Rutkovsky 2018-01-18 10:08:13 UTC
Created PR for 3.7: https://github.com/openshift/openshift-ansible/pull/6769

Comment 4 Scott Dodson 2018-01-19 20:51:44 UTC
Since there are two pull requests i've marked this BZ as 3.9.0 target release and moved it to modified ON_QA so they can begin testing it in 3.9. Once verified we can backport the fix and clone the bug.

Comment 5 Junqi Zhao 2018-01-22 14:02:11 UTC
Set the paramters in inventory
openshift_metrics_heapster_nodeselector={'region': 'infra'}
openshift_metrics_heapster_requests_memory=4Gi  <-- request is larger than limit
openshift_metrics_heapster_limits_memory=1Gi
openshift_metrics_heapster_requests_cpu=1000m
openshift_metrics_heapster_limits_cpu=1000m

although there is warn info in log, the installation process is passed, there is not fail or warn that applying the tempalte failed and display the errors message from the "oc apply". 
***********************************************************
    "msg": "non-zero return code", 
    "rc": 1, 
    "start": "2018-01-22 08:47:32.080078", 
    "stderr": "The ReplicationController \"heapster\" is invalid: spec.template.spec.containers[0].resources.requests: Invalid value: \"4Gi\": must be less than or equal to memory limit", 
    "stderr_lines": [
        "The ReplicationController \"heapster\" is invalid: spec.template.spec.containers[0].resources.requests: Invalid value: \"4Gi\": must be less than or equal to memory limit"
    ], 
    "stdout": "", 
    "stdout_lines": []
}
**********************************************************
PLAY RECAP *******************************************************************************
MASTER : ok=22   changed=0    unreachable=0    failed=0   
NODE : ok=198  changed=44   unreachable=0    failed=0   
localhost                  : ok=12   changed=0    unreachable=0    failed=0   


INSTALLER STATUS ********************************************************************************
Initialization             : Complete (0:00:21)
Metrics Install            : Complete (0:01:20)

more info please the ansible log
# rpm -qa | grep openshift-ansible
openshift-ansible-docs-3.9.0-0.22.0.git.0.0e9d896.el7.noarch
openshift-ansible-playbooks-3.9.0-0.22.0.git.0.0e9d896.el7.noarch
openshift-ansible-3.9.0-0.22.0.git.0.0e9d896.el7.noarch
openshift-ansible-roles-3.9.0-0.22.0.git.0.0e9d896.el7.noarch

Comment 6 Junqi Zhao 2018-01-22 14:02:46 UTC
Created attachment 1384448 [details]
ansible log

Comment 7 Vadim Rutkovsky 2018-01-22 17:19:29 UTC
(In reply to Junqi Zhao from comment #5)
> Set the paramters in inventory
> openshift_metrics_heapster_nodeselector={'region': 'infra'}
> openshift_metrics_heapster_requests_memory=4Gi  <-- request is larger than
> limit
> openshift_metrics_heapster_limits_memory=1Gi
> openshift_metrics_heapster_requests_cpu=1000m
> openshift_metrics_heapster_limits_cpu=1000m
> 
> although there is warn info in log, the installation process is passed,
> there is not fail or warn that applying the tempalte failed and display the
> errors message from the "oc apply". 


Good catch, Ansible considered these clauses as 'ANDs' while I expected those to be ORs.

Create PR https://github.com/openshift/openshift-ansible/pull/6815 to fix this

Comment 8 openshift-github-bot 2018-01-24 15:50:50 UTC
Commit pushed to master at https://github.com/openshift/openshift-ansible

https://github.com/openshift/openshift-ansible/commit/0344a8f71af49cdf3827ccf6be339097369c12f7
Merge pull request #6815 from vrutkovs/failed_when-rc-to-int

Automatic merge from submit-queue.

failed_when lists are implicitely ANDs, not ORs

For some reason I believed `failed_when` lists are considered to be ORs, but it turns out these are ANDs.

Fixes bug 1534538
Relates to #6751

Comment 9 Vadim Rutkovsky 2018-01-25 07:35:08 UTC
The fix is available in openshift-ansible-3.9.0-0.24.0.git.0.735690f.el7

Comment 10 Junqi Zhao 2018-01-26 01:12:47 UTC
Tested again with openshift-ansible-3.9.0-0.24.0.git.0.735690f.el7.noarch, used the same settings in Comment 7, and tested other similar scenarios, such as openshift_metrics_hawkular_requests_memory > openshift_metrics_hawkular_limits_memory.

The installer fail and warn that applying the tempalte failed and display the errors message from the "oc apply". 

    "stderr": "The ReplicationController \"heapster\" is invalid: spec.template.spec.containers[0].resources.requests: Invalid value: \"4Gi\": must be less than or equal to memory limit", 
    "stderr_lines": [
        "The ReplicationController \"heapster\" is invalid: spec.template.spec.containers[0].resources.requests: Invalid value: \"4Gi\": must be less than or equal to memory limit"
    ], 
    "stdout": "", 
    "stdout_lines": []
}
	to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/openshift-metrics/config.retry

PLAY RECAP *******************************************************************************
MASTER : ok=148  changed=26   unreachable=0    failed=1   
NODE : ok=24   changed=0    unreachable=0    failed=0   


Attached the ansible logs

Comment 11 Junqi Zhao 2018-01-26 01:13:44 UTC
Created attachment 1386402 [details]
ansible running log, it throws out the failed error

Comment 14 errata-xmlrpc 2018-12-13 19:26:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3748