Bug 1950533

Summary: haproxy validator should not fail on compute nodes
Product: Red Hat OpenStack Reporter: Chris Fields <cfields>
Component: validations-commonAssignee: Michele Baldessari <michele>
Status: CLOSED ERRATA QA Contact: nlevinki <nlevinki>
Severity: low Docs Contact:
Priority: low    
Version: 16.2 (Train)CC: bperkins, chjones, gchamoul, jjoyce, jpodivin, jschluet, lmiccini, michele, slinaber, tvignaud, uemit.seren
Target Milestone: z2Keywords: Triaged
Target Release: 16.2 (Train on RHEL 8.4)   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-tripleo-validations-11.6.1-2.20210713004808.f46d2bb.el8ost validations-common-1.1.2-2.20210721144807.92f51ea.el8ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-23 22:10:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Chris Fields 2021-04-16 19:43:33 UTC
Description of problem:

(undercloud) [stack@undercloud-0 overcloud-test]$ openstack tripleo validator run --validation haproxy

(undercloud) [stack@undercloud-0 overcloud-test]$ openstack tripleo validator show run cf1749cf-82c6-4963-9fb3-491322abdabc 


        {
            "task": {
                "hosts": {
                    "overcloud-computesriov-0": {
                        "_ansible_no_log": false,
                        "action": "haproxy_conf",
                        "changed": false,
                        "failed": true,
                        "invocation": {
                            "module_args": {
                                "path": "/var/lib/config-data/puppet-generated/haproxy/etc/haproxy/haproxy.cfg"
                            }
                        },
                        "msg": "Could not open the haproxy conf file at: '/var/lib/config-data/puppet-generated/haproxy/etc/haproxy/haproxy.cfg'"
                    }
                },
                "name": "Gather the HAProxy config",
                "status": "FAILED"
            }

This validator fails on compute nodes.  haproxy would not normally be running on compute nodes so this validator needs to be role aware or at least check for the presence of the haproxy container.  If it is not there then don't do haproxy checks.  

Version-Release number of selected component (if applicable):
16.1.4

How reproducible:
100%

Comment 1 Luca Miccini 2021-04-19 10:56:18 UTC
fwiw validator run supports '--limit' so operators can decide on which node to run this (useful for composable or non standard roles as well).

$ openstack tripleo validator run --validation haproxy --limit Controller

Running Validations without Overcloud settings.
+--------------------------------------+-------------+--------+------------+------------------------------------------+-------------------+-------------+
|                 UUID                 | Validations | Status | Host_Group |              Status_by_Host              | Unreachable_Hosts |   Duration  |
+--------------------------------------+-------------+--------+------------+------------------------------------------+-------------------+-------------+
| fe3b1444-2bc5-47a9-8205-c528e48b71a7 |   haproxy   | PASSED |    all     | controller-0, controller-1, controller-2 |                   | 0:00:01.874 |
+--------------------------------------+-------------+--------+------------+------------------------------------------+-------------------+-------------+

Comment 2 Gaël Chamoulaud 2021-04-19 17:31:23 UTC
@lmiccini

The haproxy validation is hosted in validations-common and should be generic (Non tripleo-centric). Moreover, the target of the playbook is all the nodes by default[1] and should target only the haproxy group from the inventory.

The work to do here is:
- Make the haproxy validation more generic by removing tripleo references [2] and use the traditional haproxy.cfg file path (not the one used in a tripleo deployment)
- Create a new validations called tripleo-haproxy in tripleo-validations calling the haproxy role coming from validations-common with the haproxy.cfg path used by tripleo

[1] https://opendev.org/openstack/validations-common/src/branch/master/validations_common/playbooks/haproxy.yaml#L2
[2] https://opendev.org/openstack/validations-common/src/branch/master/validations_common/playbooks/haproxy.yaml#L9

Comment 5 Chris Jones 2021-04-20 14:42:00 UTC
What is the prevailing behaviour for validations? There must be many of them that can only run successfully on a subset of machines in an OSP deployment.

If this haproxy validation is the outlier and most of them self-filter, then I would agree we should modify it to behave more like the others, but if the common approach is to expect the operator or the organisation of the validations, to dictate which ones run where, then I would suggest that we should behave in that way.

It's worth noting that PIDONE is not the only team who use HAproxy, it's also used by Network on Octavia, so if this validation does become self-filtering it would need to also have a way to identify if it's running against an Octavia node.

I'd say it's also worth further noting that if validations are expected to be self-filtering based on roles, that is potentially quite fragile when customers have custom roles with names we can't predict (I'm assuming here that validations would be expected to self-filter based on the role name of a machine, rather than some more precise indicator).

Comment 6 Chris Fields 2021-04-23 17:42:09 UTC
Other validations are limited in scope - for example rabbitmq-limits only runs on controllers: 

(undercloud) [stack@undercloud-0 ~]$ openstack tripleo validator run --validation  rabbitmq-limits   
Running Validations without Overcloud settings.
+--------------------------------------+-----------------+--------+------------+------------------------------------------------------------------------+-------------------
|                 UUID                 |   Validations   | Status | Host_Group |                             Status_by_Host                             | Unreachable_Hosts |   
+--------------------------------------+-----------------+--------+------------+------------------------------------------------------------------------+-------------------+
| bd29b3df-5edf-42e6-a59e-48391bc7c3d0 | rabbitmq-limits | FAILED | Controller | overcloud-controller-0, overcloud-controller-1, overcloud-controller-2 |                   | 
+--------------------------------------+-----------------+--------+------------+------------------------------------------------------------------------+-------------------+

One of the issues I see with not self limiting is that the group validations get less useable.  For example, if you want to run --group post-deployment you have no way to tell it to run some validators only on controllers.  In this case you are guaranteed to fail on the haproxy validator.

Comment 7 Gaël Chamoulaud 2021-04-24 06:33:25 UTC
(In reply to Chris Fields from comment #6)
> Other validations are limited in scope - for example rabbitmq-limits only
> runs on controllers: 
> 
> (undercloud) [stack@undercloud-0 ~]$ openstack tripleo validator run
> --validation  rabbitmq-limits   
> Running Validations without Overcloud settings.
> +--------------------------------------+-----------------+--------+----------
> --+------------------------------------------------------------------------+-
> ------------------
> |                 UUID                 |   Validations   | Status |
> Host_Group |                             Status_by_Host                     
> | Unreachable_Hosts |   
> +--------------------------------------+-----------------+--------+----------
> --+------------------------------------------------------------------------+-
> ------------------+
> | bd29b3df-5edf-42e6-a59e-48391bc7c3d0 | rabbitmq-limits | FAILED |
> Controller | overcloud-controller-0, overcloud-controller-1,
> overcloud-controller-2 |                   | 
> +--------------------------------------+-----------------+--------+----------
> --+------------------------------------------------------------------------+-
> ------------------+
> 
> One of the issues I see with not self limiting is that the group validations
> get less useable.  For example, if you want to run --group post-deployment
> you have no way to tell it to run some validators only on controllers.  In
> this case you are guaranteed to fail on the haproxy validator.


Hi Chris,

The haproxy validation was self-filtered[1] before I moved it to validations-common
It was a mistake and that validation should be fixed in order to get the hosts key back to: 

- hosts: "{{ controller_rolename | default('Controller') }}"

instead of:

- hosts: all


[1] https://opendev.org/openstack/tripleo-validations/src/commit/ec0465e481234da62d1ba673b4432e44c930f630/playbooks/haproxy.yaml

Comment 8 Michele Baldessari 2021-04-24 09:28:24 UTC
(In reply to Gaël Chamoulaud from comment #7)
> (In reply to Chris Fields from comment #6)
> Hi Chris,
> 
> The haproxy validation was self-filtered[1] before I moved it to
> validations-common
> It was a mistake and that validation should be fixed in order to get the
> hosts key back to: 
> 
> - hosts: "{{ controller_rolename | default('Controller') }}"
> 
> instead of:
> 
> - hosts: all

Hi Gaël,

a question. Is there a way in validations to express 'the nodes which have the service XYZ configured/installed'?
That way this (and other) validations would just work with any composable roles.

A bit like the *_node_names hiera key do today:
[root@ctrl-2-0 hieradata]# hiera -c /etc/puppet/hiera.yaml haproxy_short_node_names 
["ctrl-1-0", "ctrl-2-0", "ctrl-3-0"]                                                

Is that doable today within the validation framework?

cheers,
Michele

> 
> 
> [1]
> https://opendev.org/openstack/tripleo-validations/src/commit/
> ec0465e481234da62d1ba673b4432e44c930f630/playbooks/haproxy.yaml

Comment 9 Gaël Chamoulaud 2021-04-24 13:43:20 UTC
(In reply to Michele Baldessari from comment #8)
> (In reply to Gaël Chamoulaud from comment #7)
> > (In reply to Chris Fields from comment #6)
> > Hi Chris,
> > 
> > The haproxy validation was self-filtered[1] before I moved it to
> > validations-common
> > It was a mistake and that validation should be fixed in order to get the
> > hosts key back to: 
> > 
> > - hosts: "{{ controller_rolename | default('Controller') }}"
> > 
> > instead of:
> > 
> > - hosts: all
> 
> Hi Gaël,
> 
> a question. Is there a way in validations to express 'the nodes which have
> the service XYZ configured/installed'?
> That way this (and other) validations would just work with any composable
> roles.

Hi Michele,

We actually have this through the inventory (from the undercloud):

$ tripleo-ansible-inventory —list | jq . 


> A bit like the *_node_names hiera key do today:
> [root@ctrl-2-0 hieradata]# hiera -c /etc/puppet/hiera.yaml
> haproxy_short_node_names 
> ["ctrl-1-0", "ctrl-2-0", "ctrl-3-0"]                                        

In the validations roles, you can also query the hiera db through our ansible custom module called hiera
to know if a specific service is enabled. It is especially interesting for optional services such as telemetry, or cloudops.

> Is that doable today within the validation framework?

So yes it is doable.

> 
> cheers,
> Michele
> 
> > 
> > 
> > [1]
> > https://opendev.org/openstack/tripleo-validations/src/commit/
> > ec0465e481234da62d1ba673b4432e44c930f630/playbooks/haproxy.yaml

Comment 12 Cédric Jeanneret 2021-09-20 13:48:18 UTC
*** Bug 2005904 has been marked as a duplicate of this bug. ***

Comment 23 errata-xmlrpc 2022-03-23 22:10:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 16.2.2), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:1001