Bug 1803616

Summary: Metrics installer fails when executing metrics playbook from a Bastion host (non-master) due to missing tmp directory
Product: OpenShift Container Platform Reporter: Mitchell Rollinson <mirollin>
Component: HawkularAssignee: Ruben Vargas Palma <rvargasp>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.11.0CC: alegrand, anpicker, aos-bugs, erooth, jmartisk, kakkoyun, lcosic, mloibl, pkrupa, surbania
Target Milestone: ---Keywords: Bugfix
Target Release: 3.11.z   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-28 05:44:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mitchell Rollinson 2020-02-17 03:45:46 UTC
Description of problem:

When I ran Metrics install with Hawkular FIPs enabled from the Bastion, the Ansible fails with "copy" error. But running the same from Master with the same Inventory config, the "copy" succeeds.

Version-Release number of the following components:
rpm -q openshift-ansible 
~~~~
openshift-ansible-3.11.161-2.git.5.029d67f.el7.noarch       Fri Jan 17 07:51:30 2020
openshift-ansible-docs-3.11.161-2.git.5.029d67f.el7.noarch  Fri Jan 17 07:51:27 2020
openshift-ansible-playbooks-3.11.161-2.git.5.029d67f.el7.noarch Fri Jan 17 07:51:28 2020
openshift-ansible-roles-3.11.161-2.git.5.029d67f.el7.noarch Fri Jan 17 07:51:30 2020
~~~~

rpm -q ansible

ansible-2.6.20-1.el7ae.noarch                               Fri Jan 17 07:51:27 2020

ansible --version
ansible 2.6.20

How reproducible:

Steps to Reproduce:
1. Install cluster - excluding metrics component, from a bastion Server (must not be a master)
2. Attempt to install metrics from a bastion server (must not be a master)
3. Note - Running the installer from the master results in a successful installation of metrics.

Actual results:

metrics installation fails

Please include the entire output from the last TASK line through the end of output if an error is generated

Master is - IBM-ocp-jenkins-3d1eeafc82-master-0.ibm-openshift.cloud
Bastion is -IBM-OCP-Jenkins-3d1eeafc82-bastion.IBM-OpenShift.cloud

2020-02-11 15:12:01 TASK [openshift_metrics : copy] ***************************************************************************************************************************
2020-02-11 15:12:01 task path: /usr/share/ansible/openshift-ansible/roles/openshift_metrics/tasks/generate_hawkular_certificates.yaml:35

......

2020-02-11 15:12:02 The full traceback is:
2020-02-11 15:12:02   File "/tmp/ansible_dm7TJf/ansible_module_copy.py", line 415, in main
2020-02-11 15:12:02     os.stat(os.path.dirname(b_dest))
2020-02-11 15:12:02 fatal: [ibm-ocp-jenkins-3d1eeafc82-master-0.ibm-openshift.cloud]: FAILED! => {
2020-02-11 15:12:02     "changed": false,
2020-02-11 15:12:02     "checksum": "1030eb114395bb692028f330168efb1749c8e5dc",
2020-02-11 15:12:02     "diff": [],
2020-02-11 15:12:02     "invocation": {
2020-02-11 15:12:02         "module_args": {
2020-02-11 15:12:02             "_original_basename": "tmpVL9wMP",
2020-02-11 15:12:02             "attributes": null,
2020-02-11 15:12:02             "backup": false,
2020-02-11 15:12:02             "checksum": "1030eb114395bb692028f330168efb1749c8e5dc",
2020-02-11 15:12:02             "content": null,
2020-02-11 15:12:02             "delimiter": null,
2020-02-11 15:12:02             "dest": "/tmp/tmp.618FEhaOBo/hawkular-metrics.htpasswd",
2020-02-11 15:12:02             "directory_mode": null,
2020-02-11 15:12:02             "follow": false,
2020-02-11 15:12:02             "force": true,
2020-02-11 15:12:02             "group": null,
2020-02-11 15:12:02             "local_follow": null,
2020-02-11 15:12:02             "mode": null,
2020-02-11 15:12:02             "owner": null,
2020-02-11 15:12:02             "regexp": null,
2020-02-11 15:12:02             "remote_src": null,
2020-02-11 15:12:02             "selevel": null,
2020-02-11 15:12:02             "serole": null,
2020-02-11 15:12:02             "setype": null,
2020-02-11 15:12:02             "seuser": null,
2020-02-11 15:12:02             "src": "/root/.ansible/tmp/ansible-tmp-1581433922.1-19554600380450/source",
2020-02-11 15:12:02             "unsafe_writes": null,
2020-02-11 15:12:02             "validate": null
2020-02-11 15:12:02         }
2020-02-11 15:12:02     },
2020-02-11 15:12:02     "msg": "Destination directory /tmp/tmp.618FEhaOBo does not exist"
2020-02-11 15:12:02 }


Expected results:

Installation of the metrics playbook from the bastion host is successful

Additional info:
Please attach logs from ansible-playbook with the -vvv flag
provided

Comment 1 Mitchell Rollinson 2020-02-17 03:49:16 UTC
Created attachment 1663431 [details]
Bastion Metrics installation playbook log - vvv

/usr/share/ansible/openshift-ansible/playbooks/openshift-metrics/config.yml

Comment 3 Mitchell Rollinson 2020-02-21 05:37:45 UTC
Good day folks,

Can you please advise whether any additional information is required, and when i am likely to receive an update.

Kind Regards

Comment 5 Russell Teague 2020-04-06 18:49:57 UTC
Moving to logging team for investigation.

Comment 8 Ruben Vargas Palma 2020-04-29 02:04:41 UTC
This is a patch that should fix this BZ:

https://github.com/openshift/openshift-ansible/pull/12153/files

Comment 15 Junqi Zhao 2020-05-21 08:51:58 UTC
Tested with 
# rpm -qa | grep openshift-ansible
openshift-ansible-docs-3.11.218-1.git.0.6f55149.el7.noarch
openshift-ansible-playbooks-3.11.218-1.git.0.6f55149.el7.noarch
openshift-ansible-3.11.218-1.git.0.6f55149.el7.noarch
openshift-ansible-roles-3.11.218-1.git.0.6f55149.el7.noarch

install metrics from a bastion server (not a master), installation is successful, see th log

TASK [openshift_metrics : copy] ********************************************************************************************************
task path: /usr/share/ansible/openshift-ansible/roles/openshift_metrics/tasks/generate_hawkular_certificates.yaml:30
Thursday 21 May 2020  04:41:04 -0400 (0:00:00.338)       0:00:32.949 **********
skipping: [ci-vm-10-0-150-187.hosted.upshift.rdu2.redhat.com] => {
    "changed": false,
    "skip_reason": "Conditional result was False"
}

# cat /usr/share/ansible/openshift-ansible/roles/openshift_metrics/tasks/generate_hawkular_certificates.yaml
     30 - copy:
     31     content: "{{ htpasswd_output.stdout_lines | join('') }}"
     32     dest: "{{ local_tmp.stdout }}/hawkular-metrics.htpasswd"
     33   when: htpasswd_output is defined | default(False) | bool

Comment 18 errata-xmlrpc 2020-05-28 05:44:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2215