Description of problem: The following failures are observed during pre-upgrade validation[2] prior to updade of overcloud nodes from 13 to 16.1 [1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html-single/framework_for_upgrades_13_to_16.1/index#validating-the-pre-upgrade-requirements ~~~ (undercloud) [stack@undercloud-0 ~]$ openstack tripleo validator run --group pre-upgrade ... +--------------------------------------+-------------------------------------+--------+-----------------------+----------------------------------------------------------------------------+---------------------+-------------+ | UUID | Validations | Status | Host Group(s) | Status by Host | Unreachable Host(s) | Duration | +--------------------------------------+-------------------------------------+--------+-----------------------+----------------------------------------------------------------------------+---------------------+-------------+ | 525400df-30c2-0b1c-bbbc-00000000000b | openstack-endpoints | PASSED | undercloud | undercloud | | 0:00:04.331 | | 525400df-30c2-26eb-509b-00000000000b | image-serve | PASSED | undercloud | undercloud | | 0:00:02.268 | | 525400df-30c2-3d3e-7a30-00000000000b | service-status | FAILED | undercloud, overcloud | compute-0, compute-1, controller-0, controller-1, controller-2, undercloud | | 0:00:00.672 | | 525400df-30c2-6fb6-9d16-00000000000b | containerized-undercloud-docker | PASSED | undercloud | undercloud | | 0:00:00.798 | | 525400df-30c2-78ff-0380-00000000000b | container-status | FAILED | undercloud, overcloud | compute-0, compute-1, controller-0, controller-1, controller-2, undercloud | | 0:00:02.809 | | 525400df-30c2-7a76-0b50-00000000000b | undercloud-disk-space-pre-upgrade | PASSED | undercloud | undercloud | | 0:00:02.296 | | 525400df-30c2-7db9-49f5-00000000000b | ironic-boot-configuration | PASSED | undercloud | undercloud | | 0:00:01.341 | | 525400df-30c2-8b5b-c301-00000000000b | undercloud-heat-purge-deleted | PASSED | undercloud | undercloud | | 0:00:01.889 | | 525400df-30c2-a0c8-37dc-00000000000b | collect-flavors-and-verify-profiles | FAILED | undercloud | undercloud | | 0:00:02.143 | | 525400df-30c2-b855-8a1d-00000000000b | check-ftype | FAILED | undercloud, overcloud | compute-0, compute-1, controller-0, controller-1, controller-2, undercloud | | 0:00:00.683 | | 525400df-30c2-b956-c41c-00000000000b | undercloud-ram | PASSED | undercloud | undercloud | | 0:00:01.967 | | 525400df-30c2-bef4-d515-00000000000b | undercloud-service-status | PASSED | undercloud | undercloud | | 0:00:01.817 | | 525400df-30c2-cf23-c1d9-00000000000b | repos | FAILED | undercloud, overcloud | compute-0, compute-1, controller-0, controller-1, controller-2, undercloud | | 0:00:08.172 | | 525400df-30c2-d264-5e62-00000000000b | check-latest-packages-version | PASSED | undercloud | undercloud | | 0:01:47.883 | | 525400df-30c2-df65-a431-00000000000b | nova-status | FAILED | nova_api | controller-0, controller-1, controller-2 | | 0:00:00.601 | | 525400df-30c2-e1f7-a587-00000000000b | validate-selinux | FAILED | all | compute-0, compute-1, controller-0, controller-1, controller-2, undercloud | | 0:00:02.700 | | 525400df-30c2-e264-5510-00000000000b | node-health | PASSED | undercloud | undercloud | | 0:00:03.364 | | 525400df-30c2-e8ae-a513-00000000000b | stack-health | PASSED | undercloud | undercloud | | 0:00:02.420 | +--------------------------------------+-------------------------------------+--------+-----------------------+----------------------------------------------------------------------------+---------------------+-------------+ ~~~ Among these failures, all of the validation failures with overcloud nodes were caused by missing python3 command. ~~~ (undercloud) [stack@undercloud-0 ~]$ openstack tripleo validator show run 525400df-30c2-3d3e-7a30-00000000000b { "task": { "hosts": { "compute-0": { "_ansible_no_log": false, "action": "command", "changed": false, "failed": true, "module_stderr": "Shared connection to 192.168.24.37 closed.\r\n", "module_stdout": "/bin/sh: /usr/bin/python3: No such file or directory\r\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127 } }, "name": "get failed systemd units", "status": "FAILED" } } ... ~~~ I think these errors are "reasonable" regarding the fact that overcloud nodes still have OSP13 installed and don't require python3. We need some consideration in tripleo-validation or documentation to avoid this false errors. Version-Release number of selected component (if applicable): RHOSP13z12 ~~~ ansible-tripleo-ipsec-8.1.1-0.20190513184007.7eb892c.el7ost.noarch openstack-tripleo-common-8.7.1-20.el7ost.noarch openstack-tripleo-common-containers-8.7.1-20.el7ost.noarch openstack-tripleo-heat-templates-8.4.1-58.1.el7ost.noarch openstack-tripleo-image-elements-8.0.3-1.el7ost.noarch openstack-tripleo-puppet-elements-8.1.1-2.el7ost.noarch openstack-tripleo-ui-8.3.2-3.el7ost.noarch openstack-tripleo-validations-8.5.0-4.el7ost.noarch puppet-tripleo-8.5.1-14.el7ost.noarch python-tripleoclient-9.3.1-7.el7ost.noarch ~~~ How reproducible: Always Steps to Reproduce: 1. Run validation according to the documentation[1] Actual results: The pre-upgrade validation reports failures because of missing python3 Expected results: The pre upgrade validation reports no failures caused by missing python3 Additional info:
Moving this BZ back to DFG:DF, as this is a pure Validations Framework issue. My guess is that the fact of having the Undercloud in RHEL8 with OSP16.1 (python3) and the overcloud nodes in RHEL7 with OSP13 (no python3) causes the issue. The Framework will probably need to set up the ansible_python_interpreter to /usr/libexec/platform-python (which is present in RHEL7 and RHEL8) or add some logic to capture the right python binary in the target system: https://github.com/redhat-openstack/infrared/blob/c2f6cb0b793c12a5f072ef5c2f29dc98e3ff0aeb/plugins/tripleo-undercloud/update_inventory.yml#L28-L45 Something like it's done here...it relies on the raw module (which doesn't use python underneath) to capture the binary in the system and then it sets it up.
have to check, but iirc the tripleo-ansible-inventory script takes some options, among them the python interpreter. Maybe we can tweak it a bit.
I can see that the OSP16.1 Undercloud has Ansible 2.9 version, so maybe it's just a fact of changing these ansible options: https://docs.ansible.com/ansible/latest/reference_appendices/interpreter_discovery.html
Running the validation with Mathieus patch worked: openstack tripleo validator run --debug --plan qe-Cloud-0 --validation check-rhsm-version --python-interpreter /usr/libexec/platform-python (undercloud) [stack@undercloud-0 ~]$ openstack tripleo validator show run 5254007e-7d72-bcbc-d185-00000000000b { "task": { "hosts": { "compute-0": { "_ansible_no_log": false, "action": "fail", "changed": false, "failed": true, "msg": "8.2 does not match configured rhsm_version Release not set" } }, "name": "Check RHSM version", "status": "FAILED" } } { "task": { "hosts": { "compute-1": { "_ansible_no_log": false, "action": "fail", "changed": false, "failed": true, "msg": "8.2 does not match configured rhsm_version Release not set" } }, "name": "Check RHSM version", "status": "FAILED" } } { "task": { "hosts": { "controller-0": { "_ansible_no_log": false, "action": "fail", "changed": false, "failed": true, "msg": "8.2 does not match configured rhsm_version Release not set" } }, "name": "Check RHSM version", "status": "FAILED" } } { "task": { "hosts": { "controller-1": { "_ansible_no_log": false, "action": "fail", "changed": false, "failed": true, "msg": "8.2 does not match configured rhsm_version Release not set" } }, "name": "Check RHSM version", "status": "FAILED" } } { "task": { "hosts": { "controller-2": { "_ansible_no_log": false, "action": "fail", "changed": false, "failed": true, "msg": "8.2 does not match configured rhsm_version Release not set" } }, "name": "Check RHSM version", "status": "FAILED" } } While, if I run it without the parameter I would get: (undercloud) [stack@undercloud-0 ~]$ openstack tripleo validator show run 5254007e-7d72-e718-2945-00000000000b { "task": { "hosts": { "compute-0": { "_ansible_no_log": false, "action": "command", "changed": false, "failed": true, "module_stderr": "Shared connection to 192.168.24.51 closed.\r\n", "module_stdout": "/bin/sh: /usr/bin/python3: No such file or directory\r\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127 } }, "name": "Retrieve RHSM version", "status": "FAILED" } } { "task": { "hosts": { "compute-1": { "_ansible_no_log": false, "action": "command", "changed": false, "failed": true, "module_stderr": "Shared connection to 192.168.24.38 closed.\r\n", "module_stdout": "/bin/sh: /usr/bin/python3: No such file or directory\r\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127 } }, "name": "Retrieve RHSM version", "status": "FAILED" } } { "task": { "hosts": { "controller-0": { "_ansible_no_log": false, "action": "command", "changed": false, "failed": true, "module_stderr": "Shared connection to 192.168.24.16 closed.\r\n", "module_stdout": "/bin/sh: /usr/bin/python3: No such file or directory\r\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127 } }, "name": "Retrieve RHSM version", "status": "FAILED" } } { "task": { "hosts": { "controller-1": { "_ansible_no_log": false, "action": "command", "changed": false, "failed": true, "module_stderr": "Shared connection to 192.168.24.6 closed.\r\n", "module_stdout": "/bin/sh: /usr/bin/python3: No such file or directory\r\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127 } }, "name": "Retrieve RHSM version", "status": "FAILED" } } { "task": { "hosts": { "controller-2": { "_ansible_no_log": false, "action": "command", "changed": false, "failed": true, "module_stderr": "Shared connection to 192.168.24.14 closed.\r\n", "module_stdout": "/bin/sh: /usr/bin/python3: No such file or directory\r\n", "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error", "rc": 127 } }, "name": "Retrieve RHSM version", "status": "FAILED" } } sys:1: ResourceWarning: unclosed <ssl.SSLSocket fd=4, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.24.2', 37782), raddr=('192.168.24.2 ', 13000)> The only complain is the fact of having to pass an extra parameter for all the validations run in this type of situation (different RHEL versions between UC and OC nodes). It would be nicer that the code would realize automagically that it has to use /usr/libexec/platform-python
*** Bug 1894000 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.3 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:5413
I think we should also update the document to use the new option. I opened another bug for the documentation update. https://bugzilla.redhat.com/show_bug.cgi?id=1908569