Bug 1872517
| Summary: | node-health validation fails because of missing hosts entries in undercloud | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Takashi Kajinami <tkajinam> |
| Component: | openstack-tripleo-validations | Assignee: | Gaël Chamoulaud <gchamoul> |
| Status: | CLOSED ERRATA | QA Contact: | nlevinki <nlevinki> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 13.0 (Queens) | CC: | cjeanner, drosenfe, gchamoul, jfrancoa, jhajyahy, jjoyce, jschluet, kecarter, slinaber, tvignaud |
| Target Milestone: | --- | Keywords: | Triaged, ZStream |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | openstack-tripleo-validations-8.5.0-7.el7ost | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-03-18 13:08:47 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Takashi Kajinami
2020-08-26 00:35:35 UTC
Hello Folks,
I am moving it back to DFG:DF as the issue isn't related to the FFU itself. This validation is failing on a fresh (or not fresh) OSP13 environment, before any of the FFU process is triggered.
The complain here is that the node-health validation is trying to check the health of the nodes by pinging at their hostname, but the undercloud in OSP13 doesn't have any information about the Overcloud node's hostnames:
(undercloud) [stack@undercloud-0 ~]$ cat /etc/hosts
127.0.0.1 undercloud-0.redhat.local undercloud-0
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.35.64.93 rhos-qe-mirror-tlv.usersys.redhat.com download.lab.bos.redhat.com download.eng.bos.redhat.com download-node-02.eng.bos.redhat.com
And if you try to ping, for example the compute, you will get:
(undercloud) [stack@undercloud-0 ~]$ ping compute-1
ping: compute-1: Name or service not known
Which seems to be what the validation is doing (iterating over the ansible groups and pinging them by hostname):
- name: Check if hosts are IPs
set_fact: hosts_are_ips="{{ item | ipaddr == item }}"
with_items: "{{ groups.overcloud }}"
- name: Ping all overcloud nodes
icmp_ping:
host: "{{ item }}"
with_items: "{{ groups.overcloud }}"
ignore_errors: true
register: ping_results
So, imho, it should be the validation what needs to be improved.
I can see that the OSP16.1 Undercloud has Ansible 2.9 version, so maybe it's just a fact of changing these ansible options: https://docs.ansible.com/ansible/latest/reference_appendices/interpreter_discovery.html Hello, IIRC osp-13 doesn't inject things in the /etc/hosts, while it does on osp-16.1 (and maybe with earlier versions, but since they are EOL...). That's probably "just" the root cause. Meaning, in short: you can't run this validation on an osp-13 undercloud, unfortunately. @Jose: you might want to update the doc mentioning it, and maybe modify the command in order to filter out this validation? Cheers, C. (In reply to Cédric Jeanneret from comment #3) > Hello, > > IIRC osp-13 doesn't inject things in the /etc/hosts, while it does on > osp-16.1 (and maybe with earlier versions, but since they are EOL...). > That's probably "just" the root cause. > > Meaning, in short: you can't run this validation on an osp-13 undercloud, > unfortunately. > > @Jose: you might want to update the doc mentioning it, and maybe modify the > command in order to filter out this validation? > > Cheers, > > C. Well, if that's the case we then need to remove this validation from the group on RHOSP13. As in the documentation we only suggest to run pre-upgrade group: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html-single/framework_for_upgrades_13_to_16.1/index#validating-red-hat-openstack-platform-oldvernum-before-the-upgrade How hard is to change the validation from using the hostname into using the IP? if I understand correctly we should have that IP in the environment, don't we? I am quite sure this validation, when originally made for OSP13 didn't consider the hosts to be injected into the Undercloud's /etc/hosts. This is not a validation that got recently backported, it's been there for two years already https://github.com/openstack/tripleo-validations/commit/a0c06ae7278f7446babd8c8aed92ce9c5a25fa3f#diff-d242bdac83a2b5cb825eaca5c1cde2dda1b1741fc63cb693dc7868776fb44230 If it's easier for you, we can remove it from the pre-upgrade group, but I have the feeling that this is a pretty important validation though. Cheers, José Luis hmm, wondering how it was supposed to work, especially since it's being launched from within mistral container at that point (osp-13 doesn't have the new validation framework, everything runs as a mistral workflow). That's probably a question for Gael in the end, since this is OSP-13, he has more knowledge than me. Used procedure from the link in Comment 1. The node-health validation passed: === Running validation: "node-health" === Success! The validation passed for all hosts: * undercloud Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 13.0 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0932 |