Bug 1873470

Summary: Pre-upgrade validations fail because of missing python3 command in overcloud nodes
Product: Red Hat OpenStack Reporter: Takashi Kajinami <tkajinam>
Component: python-tripleoclientAssignee: mathieu bultel <mbultel>
Status: CLOSED ERRATA QA Contact: David Rosenfeld <drosenfe>
Severity: medium Docs Contact:
Priority: medium    
Version: 16.1 (Train)CC: cjeanner, emacchi, gchamoul, hbrock, jbuchta, jfrancoa, jhardee, jjoyce, jmelvin, jschluet, jslagle, mbultel, mburns, mgarciac, rbrady, slinaber, spower, tvignaud, ykulkarn
Target Milestone: z3Keywords: Triaged
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: python-tripleoclient-12.3.2-1.20200914164930.el8ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-12-15 18:36:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Takashi Kajinami 2020-08-28 12:32:43 UTC
Description of problem:

The following failures are observed during pre-upgrade validation[2] prior to updade of overcloud nodes from 13 to 16.1
 [1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html-single/framework_for_upgrades_13_to_16.1/index#validating-the-pre-upgrade-requirements

~~~
(undercloud) [stack@undercloud-0 ~]$ openstack tripleo validator run --group pre-upgrade
...
+--------------------------------------+-------------------------------------+--------+-----------------------+----------------------------------------------------------------------------+---------------------+-------------+
| UUID                                 | Validations                         | Status | Host Group(s)         | Status by Host                                                             | Unreachable Host(s) | Duration    |
+--------------------------------------+-------------------------------------+--------+-----------------------+----------------------------------------------------------------------------+---------------------+-------------+
| 525400df-30c2-0b1c-bbbc-00000000000b | openstack-endpoints                 | PASSED | undercloud            | undercloud                                                                 |                     | 0:00:04.331 |
| 525400df-30c2-26eb-509b-00000000000b | image-serve                         | PASSED | undercloud            | undercloud                                                                 |                     | 0:00:02.268 |
| 525400df-30c2-3d3e-7a30-00000000000b | service-status                      | FAILED | undercloud, overcloud | compute-0, compute-1, controller-0, controller-1, controller-2, undercloud |                     | 0:00:00.672 |
| 525400df-30c2-6fb6-9d16-00000000000b | containerized-undercloud-docker     | PASSED | undercloud            | undercloud                                                                 |                     | 0:00:00.798 |
| 525400df-30c2-78ff-0380-00000000000b | container-status                    | FAILED | undercloud, overcloud | compute-0, compute-1, controller-0, controller-1, controller-2, undercloud |                     | 0:00:02.809 |
| 525400df-30c2-7a76-0b50-00000000000b | undercloud-disk-space-pre-upgrade   | PASSED | undercloud            | undercloud                                                                 |                     | 0:00:02.296 |
| 525400df-30c2-7db9-49f5-00000000000b | ironic-boot-configuration           | PASSED | undercloud            | undercloud                                                                 |                     | 0:00:01.341 |
| 525400df-30c2-8b5b-c301-00000000000b | undercloud-heat-purge-deleted       | PASSED | undercloud            | undercloud                                                                 |                     | 0:00:01.889 |
| 525400df-30c2-a0c8-37dc-00000000000b | collect-flavors-and-verify-profiles | FAILED | undercloud            | undercloud                                                                 |                     | 0:00:02.143 |
| 525400df-30c2-b855-8a1d-00000000000b | check-ftype                         | FAILED | undercloud, overcloud | compute-0, compute-1, controller-0, controller-1, controller-2, undercloud |                     | 0:00:00.683 |
| 525400df-30c2-b956-c41c-00000000000b | undercloud-ram                      | PASSED | undercloud            | undercloud                                                                 |                     | 0:00:01.967 |
| 525400df-30c2-bef4-d515-00000000000b | undercloud-service-status           | PASSED | undercloud            | undercloud                                                                 |                     | 0:00:01.817 |
| 525400df-30c2-cf23-c1d9-00000000000b | repos                               | FAILED | undercloud, overcloud | compute-0, compute-1, controller-0, controller-1, controller-2, undercloud |                     | 0:00:08.172 |
| 525400df-30c2-d264-5e62-00000000000b | check-latest-packages-version       | PASSED | undercloud            | undercloud                                                                 |                     | 0:01:47.883 |
| 525400df-30c2-df65-a431-00000000000b | nova-status                         | FAILED | nova_api              | controller-0, controller-1, controller-2                                   |                     | 0:00:00.601 |
| 525400df-30c2-e1f7-a587-00000000000b | validate-selinux                    | FAILED | all                   | compute-0, compute-1, controller-0, controller-1, controller-2, undercloud |                     | 0:00:02.700 |
| 525400df-30c2-e264-5510-00000000000b | node-health                         | PASSED | undercloud            | undercloud                                                                 |                     | 0:00:03.364 |
| 525400df-30c2-e8ae-a513-00000000000b | stack-health                        | PASSED | undercloud            | undercloud                                                                 |                     | 0:00:02.420 |
+--------------------------------------+-------------------------------------+--------+-----------------------+----------------------------------------------------------------------------+---------------------+-------------+
~~~

Among these failures, all of the validation failures with overcloud nodes were caused
by missing python3 command.

~~~
(undercloud) [stack@undercloud-0 ~]$ openstack tripleo validator show run 525400df-30c2-3d3e-7a30-00000000000b
{
    "task": {
        "hosts": {
            "compute-0": {
                "_ansible_no_log": false,
                "action": "command",
                "changed": false,
                "failed": true,
                "module_stderr": "Shared connection to 192.168.24.37 closed.\r\n",
                "module_stdout": "/bin/sh: /usr/bin/python3: No such file or directory\r\n",
                "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error",
                "rc": 127
            }
        },
        "name": "get failed systemd units",
        "status": "FAILED"
    }
}
...
~~~

I think these errors are "reasonable" regarding the fact that overcloud nodes still have OSP13 installed
and don't require python3.
We need some consideration in tripleo-validation or documentation to avoid this false errors.

Version-Release number of selected component (if applicable):
RHOSP13z12
~~~
ansible-tripleo-ipsec-8.1.1-0.20190513184007.7eb892c.el7ost.noarch
openstack-tripleo-common-8.7.1-20.el7ost.noarch
openstack-tripleo-common-containers-8.7.1-20.el7ost.noarch
openstack-tripleo-heat-templates-8.4.1-58.1.el7ost.noarch
openstack-tripleo-image-elements-8.0.3-1.el7ost.noarch
openstack-tripleo-puppet-elements-8.1.1-2.el7ost.noarch
openstack-tripleo-ui-8.3.2-3.el7ost.noarch
openstack-tripleo-validations-8.5.0-4.el7ost.noarch
puppet-tripleo-8.5.1-14.el7ost.noarch
python-tripleoclient-9.3.1-7.el7ost.noarch
~~~



How reproducible:
Always

Steps to Reproduce:
1. Run validation according to the documentation[1]

Actual results:
The pre-upgrade validation reports failures because of missing python3

Expected results:
The pre upgrade validation reports no failures caused by missing python3

Additional info:

Comment 1 Jose Luis Franco 2020-09-01 05:28:19 UTC
Moving this BZ back to DFG:DF, as this is a pure Validations Framework issue. My guess is that the fact of having the Undercloud in RHEL8 with OSP16.1 (python3) and the overcloud nodes in RHEL7 with OSP13 (no python3) causes the issue. The Framework will probably need to set up the ansible_python_interpreter to /usr/libexec/platform-python (which is present in RHEL7 and RHEL8) or add some logic to capture the right python binary in the target system:
https://github.com/redhat-openstack/infrared/blob/c2f6cb0b793c12a5f072ef5c2f29dc98e3ff0aeb/plugins/tripleo-undercloud/update_inventory.yml#L28-L45

Something like it's done here...it relies on the raw module (which doesn't use python underneath) to capture the binary in the system and then it sets it up.

Comment 2 Cédric Jeanneret 2020-09-01 06:28:14 UTC
have to check, but iirc the tripleo-ansible-inventory script takes some options, among them the python interpreter. Maybe we can tweak it a bit.

Comment 3 Jose Luis Franco 2020-09-01 14:03:51 UTC
I can see that the OSP16.1 Undercloud has Ansible 2.9 version, so maybe it's just a fact of changing these ansible options: https://docs.ansible.com/ansible/latest/reference_appendices/interpreter_discovery.html

Comment 4 Jose Luis Franco 2020-09-04 06:49:11 UTC
Running the validation with Mathieus patch worked:

openstack tripleo validator run --debug --plan qe-Cloud-0 --validation check-rhsm-version --python-interpreter /usr/libexec/platform-python

(undercloud) [stack@undercloud-0 ~]$ openstack tripleo validator show run 5254007e-7d72-bcbc-d185-00000000000b                                                              
{
    "task": {
        "hosts": {
            "compute-0": {
                "_ansible_no_log": false,
                "action": "fail",
                "changed": false,
                "failed": true,
                "msg": "8.2 does not match configured rhsm_version Release not set"
            }
        },
        "name": "Check RHSM version",
        "status": "FAILED"
    }
}
{
    "task": {
        "hosts": {
            "compute-1": {
                "_ansible_no_log": false,
                "action": "fail",
                "changed": false,
                "failed": true,
                "msg": "8.2 does not match configured rhsm_version Release not set"
            }
        },
        "name": "Check RHSM version",
        "status": "FAILED"
    }
}
{                                                                                                                                                                    
    "task": {
        "hosts": {
            "controller-0": {
                "_ansible_no_log": false,                                                                                                                                   
                "action": "fail",
                "changed": false,                                                                                                                                           
                "failed": true,
                "msg": "8.2 does not match configured rhsm_version Release not set"
            }
        },
        "name": "Check RHSM version",
        "status": "FAILED"
    }
}
{
    "task": {
        "hosts": {
            "controller-1": {
                "_ansible_no_log": false,
                "action": "fail",
                "changed": false,
                "failed": true,
                "msg": "8.2 does not match configured rhsm_version Release not set"
            }
        },
        "name": "Check RHSM version",
        "status": "FAILED"
    }
}
{
    "task": {
        "hosts": {
            "controller-2": {
                "_ansible_no_log": false,
                "action": "fail",
                "changed": false,
                "failed": true,
                "msg": "8.2 does not match configured rhsm_version Release not set"
            }
        },
        "name": "Check RHSM version",
        "status": "FAILED"
    }
}


While, if I run it without the parameter I would get:

(undercloud) [stack@undercloud-0 ~]$ openstack tripleo validator show run 5254007e-7d72-e718-2945-00000000000b                                                               
{                                                                                                                                                                            
    "task": {                                                                                                                                                                
        "hosts": {                                                                                                                                                           
            "compute-0": {                                                                                                                                                   
                "_ansible_no_log": false,                                                                                                                                    
                "action": "command",                                                                                                                                         
                "changed": false,                                                                                                                                            
                "failed": true,                                                                                                                                              
                "module_stderr": "Shared connection to 192.168.24.51 closed.\r\n",                                                                                           
                "module_stdout": "/bin/sh: /usr/bin/python3: No such file or directory\r\n",                                                                                 
                "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error",                           
                "rc": 127                                                                                                                                                    
            }                                                                                                                                                                
        },                                                                                                                                                                   
        "name": "Retrieve RHSM version",                                                                                                                                     
        "status": "FAILED"                                                                                                                                                   
    }
}
{
    "task": {
        "hosts": {
            "compute-1": {
                "_ansible_no_log": false,
                "action": "command",
                "changed": false,
                "failed": true,
                "module_stderr": "Shared connection to 192.168.24.38 closed.\r\n",
                "module_stdout": "/bin/sh: /usr/bin/python3: No such file or directory\r\n",
                "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error",
                "rc": 127
            }
        },
        "name": "Retrieve RHSM version",
        "status": "FAILED"
    }
}
{                                                                                                                                                                  
    "task": {
        "hosts": {
            "controller-0": {
                "_ansible_no_log": false,
                "action": "command",
                "changed": false,
                "failed": true,
                "module_stderr": "Shared connection to 192.168.24.16 closed.\r\n",
                "module_stdout": "/bin/sh: /usr/bin/python3: No such file or directory\r\n",
                "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error",
                "rc": 127
            }
        },
        "name": "Retrieve RHSM version",
        "status": "FAILED"
    }
}
{
    "task": {
        "hosts": {
            "controller-1": {
                "_ansible_no_log": false,
                "action": "command",
                "changed": false,
                "failed": true,
                "module_stderr": "Shared connection to 192.168.24.6 closed.\r\n",
                "module_stdout": "/bin/sh: /usr/bin/python3: No such file or directory\r\n",
                "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error",
                "rc": 127
            }
        },
        "name": "Retrieve RHSM version",
        "status": "FAILED"
    }
}
{
    "task": {
        "hosts": {
            "controller-2": {
                "_ansible_no_log": false,
                "action": "command",
                "changed": false,
                "failed": true,
                "module_stderr": "Shared connection to 192.168.24.14 closed.\r\n",
                "module_stdout": "/bin/sh: /usr/bin/python3: No such file or directory\r\n",
                "msg": "The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error",
                "rc": 127
            }
        },
        "name": "Retrieve RHSM version",
        "status": "FAILED"
    }
}
sys:1: ResourceWarning: unclosed <ssl.SSLSocket fd=4, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.24.2', 37782), raddr=('192.168.24.2
', 13000)>


The only complain is the fact of having to pass an extra parameter for all the validations run in this type of situation (different RHEL versions between UC and OC nodes). It would be nicer that the code would realize automagically that it has to use /usr/libexec/platform-python

Comment 7 Jose Luis Franco 2020-11-03 15:20:41 UTC
*** Bug 1894000 has been marked as a duplicate of this bug. ***

Comment 21 errata-xmlrpc 2020-12-15 18:36:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.3 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:5413

Comment 22 Takashi Kajinami 2020-12-17 03:53:53 UTC
I think we should also update the document to use the new option.
I opened another bug for the documentation update.
 https://bugzilla.redhat.com/show_bug.cgi?id=1908569