Bug 1769872 - OSP16 | services using service_names are broken
Summary: OSP16 | services using service_names are broken
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.0 (Train)
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: beta
: 16.0 (Train on RHEL 8.1)
Assignee: Emilien Macchi
QA Contact: David Rosenfeld
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-11-07 16:21 UTC by Leonid Natapov
Modified: 2020-02-06 14:43 UTC (History)
9 users (show)

Fixed In Version: openstack-tripleo-heat-templates-11.3.1-0.20191205213518.5e2fc47.el8ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-02-06 14:42:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1855138 0 None None None 2019-12-04 15:52:19 UTC
OpenStack gerrit 697320 0 'None' MERGED tripleo-hieradata: re-enable "service_names" 2020-02-27 13:14:22 UTC
OpenStack gerrit 697323 0 'None' MERGED Re-enable "service_names" hieradata 2020-02-27 13:14:22 UTC
OpenStack gerrit 697325 0 'None' MERGED Revert "Replace hiera('service_names') by hiera('enabled_services')" 2020-02-27 13:14:22 UTC
Red Hat Product Errata RHEA-2020:0283 0 None None None 2020-02-06 14:43:22 UTC

Description Leonid Natapov 2019-11-07 16:21:25 UTC
Basically the problem is that in the nova-compute-container-puppet.yaml should make that only where nova-compute together with collectd is deployed the tripleo.collectd.plugins.nova_compute hiera variable will be created

And this does not happen now.
So seems like services_config does not work any more.

service_config_settings:
        rsyslog:
          tripleo_logging_sources_nova_compute:
            - {get_param: NovaComputeLoggingSource}
        collectd:
          tripleo.collectd.plugins.nova_compute:
            - virt
          collectd::plugin::virt::connection: 'qemu:///system'

Comment 5 Martin Magr 2019-11-27 12:29:45 UTC
Looking at this problem I see "nova_compute" in list of "enabled_services" in hieradata on Controller node. The OS::TripleO::Services::NovaCompute is not in the roles_data.yaml as can be seen below. This fact pretty much breaks any puppet side logic which is dependent on enabled services on node including collectd plugin configuration or rsyslog logging sources configuration. Sadly I'm not able to find a proper place, where to fix this regression.


[root@controller-0 hieradata]# grep enabled_services -A58 all_nodes.json
    "enabled_services": [
        "keystone_admin_api",
        "keystone_public_api",
        "boot_params_service",
        "ca_certs",
        "certmonger_user",
        "cinder_api",
        "cinder_scheduler",
        "cinder_volume",
        "clustercheck",
        "collectd",
        "container_image_prepare",
        "glance_api",
        "haproxy",
        "heat_api",
        "heat_api_cloudwatch_disabled",
        "heat_api_cfn",
        "heat_engine",
        "horizon",
        "iscsid",
        "kernel",
        "keystone",
        "memcached",
        "metrics-qdr",
        "mysql",
        "mysql_client",
        "neutron_api",
        "neutron_plugin_ml2_ovn",
        "nova_api",
        "nova_conductor",
        "nova_metadata",
        "nova_scheduler",
        "nova_vnc_proxy",
        "logrotate_crond",
        "ovn_dbs",
        "ovn_controller",
        "pacemaker",
        "placement",
        "oslo_messaging_rpc",
        "oslo_messaging_notify",
        "podman",
        "redis",
        "rsyslog",
        "snmp",
        "sshd",
        "swift_proxy",
        "swift_ringbuilder",
        "swift_storage",
        "chrony",
        "timezone",
        "tripleo_firewall",
        "tripleo_packages",
        "tuned",
        "nova_compute",
        "nova_libvirt",
        "nova_libvirt_guests",
        "nova_migration_target",
        "ovn_metadata"
    ],
[root@controller-0 hieradata]# exit
exit
[heat-admin@controller-0 ~]$ exit
logout
Connection to 192.168.24.52 closed.
(undercloud) [stack@undercloud-0 ~]$ grep "name: Controller" /usr/share/openstack-tripleo-heat-templates/roles_data.yaml -A180 > the.log
(undercloud) [stack@undercloud-0 ~]$ tail -3 the.log 
    - OS::TripleO::Services::Zaqar
###############################################################################
# Role: Compute                                                               #
(undercloud) [stack@undercloud-0 ~]$ grep -i compute the.log 
# Role: Compute                                                               #
(undercloud) [stack@undercloud-0 ~]$

Comment 6 David Peacock 2019-12-04 15:14:33 UTC
So I can see the nova_compute in my lab too, so that can be considered reproduced.  I'm continuing to look at this.  Question in the meanwhile; surely collectd *can* run on controllers, right?  As in, collecting controller metrics is something we should be able to do on a controller, no?

Comment 7 Alex Schultz 2019-12-04 15:25:39 UTC
enabled_services is a global configuration (as indicated by its existence in all_nodes.json). This was true in previous versions.  The controller specifically needs to know what services are enabled on a global level in the cloud so that things like keystone, haproxy, mysql users can be properly configured. This seems to be a bug in how the collectd plugin configuration is choosing what services are available locally. I am uncertain if there exists a local representation of the services installed locally. I believe historically we've controlled that via services in a role rather than a hiera key.  You could add a special key via the NovaCompute service that would only be configured in the service_config.json so that the collectd would key off of that.

Comment 9 Alex Schultz 2019-12-04 15:41:53 UTC
Upon further review, it looks as though https://review.opendev.org/#/q/topic:bug/1835551+(status:open+OR+status:merged) caused the issue.  service_names used to exist until Train which was incorrectly removed because service_names was scoped to the role. We'll need to revert these changes.

Comment 15 Leonid Natapov 2019-12-17 11:02:06 UTC
Verified. The BZ was opened because collectd virt plugin appear on controller nodes while it should appear only on compute nodes where libvirt is running. 

Currently virt plugin runs only on compute nodes. I also don't see novas_compute role on controller node as described in the BZ.

Comment 18 errata-xmlrpc 2020-02-06 14:42:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:0283


Note You need to log in before you can comment on or make changes to this bug.