Bug 1769872

Summary: OSP16 | services using service_names are broken
Product: Red Hat OpenStack Reporter: Leonid Natapov <lnatapov>
Component: openstack-tripleo-heat-templatesAssignee: Emilien Macchi <emacchi>
Status: CLOSED ERRATA QA Contact: David Rosenfeld <drosenfe>
Severity: high Docs Contact:
Priority: urgent    
Version: 16.0 (Train)CC: aschultz, cjeanner, dpeacock, emacchi, mburns, mmagr, mrunge, pkilambi, sclewis
Target Milestone: betaKeywords: Regression, Triaged
Target Release: 16.0 (Train on RHEL 8.1)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-11.3.1-0.20191205213518.5e2fc47.el8ost Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-02-06 14:42:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Leonid Natapov 2019-11-07 16:21:25 UTC
Basically the problem is that in the nova-compute-container-puppet.yaml should make that only where nova-compute together with collectd is deployed the tripleo.collectd.plugins.nova_compute hiera variable will be created

And this does not happen now.
So seems like services_config does not work any more.

service_config_settings:
        rsyslog:
          tripleo_logging_sources_nova_compute:
            - {get_param: NovaComputeLoggingSource}
        collectd:
          tripleo.collectd.plugins.nova_compute:
            - virt
          collectd::plugin::virt::connection: 'qemu:///system'

Comment 5 Martin Magr 2019-11-27 12:29:45 UTC
Looking at this problem I see "nova_compute" in list of "enabled_services" in hieradata on Controller node. The OS::TripleO::Services::NovaCompute is not in the roles_data.yaml as can be seen below. This fact pretty much breaks any puppet side logic which is dependent on enabled services on node including collectd plugin configuration or rsyslog logging sources configuration. Sadly I'm not able to find a proper place, where to fix this regression.


[root@controller-0 hieradata]# grep enabled_services -A58 all_nodes.json
    "enabled_services": [
        "keystone_admin_api",
        "keystone_public_api",
        "boot_params_service",
        "ca_certs",
        "certmonger_user",
        "cinder_api",
        "cinder_scheduler",
        "cinder_volume",
        "clustercheck",
        "collectd",
        "container_image_prepare",
        "glance_api",
        "haproxy",
        "heat_api",
        "heat_api_cloudwatch_disabled",
        "heat_api_cfn",
        "heat_engine",
        "horizon",
        "iscsid",
        "kernel",
        "keystone",
        "memcached",
        "metrics-qdr",
        "mysql",
        "mysql_client",
        "neutron_api",
        "neutron_plugin_ml2_ovn",
        "nova_api",
        "nova_conductor",
        "nova_metadata",
        "nova_scheduler",
        "nova_vnc_proxy",
        "logrotate_crond",
        "ovn_dbs",
        "ovn_controller",
        "pacemaker",
        "placement",
        "oslo_messaging_rpc",
        "oslo_messaging_notify",
        "podman",
        "redis",
        "rsyslog",
        "snmp",
        "sshd",
        "swift_proxy",
        "swift_ringbuilder",
        "swift_storage",
        "chrony",
        "timezone",
        "tripleo_firewall",
        "tripleo_packages",
        "tuned",
        "nova_compute",
        "nova_libvirt",
        "nova_libvirt_guests",
        "nova_migration_target",
        "ovn_metadata"
    ],
[root@controller-0 hieradata]# exit
exit
[heat-admin@controller-0 ~]$ exit
logout
Connection to 192.168.24.52 closed.
(undercloud) [stack@undercloud-0 ~]$ grep "name: Controller" /usr/share/openstack-tripleo-heat-templates/roles_data.yaml -A180 > the.log
(undercloud) [stack@undercloud-0 ~]$ tail -3 the.log 
    - OS::TripleO::Services::Zaqar
###############################################################################
# Role: Compute                                                               #
(undercloud) [stack@undercloud-0 ~]$ grep -i compute the.log 
# Role: Compute                                                               #
(undercloud) [stack@undercloud-0 ~]$

Comment 6 David Peacock 2019-12-04 15:14:33 UTC
So I can see the nova_compute in my lab too, so that can be considered reproduced.  I'm continuing to look at this.  Question in the meanwhile; surely collectd *can* run on controllers, right?  As in, collecting controller metrics is something we should be able to do on a controller, no?

Comment 7 Alex Schultz 2019-12-04 15:25:39 UTC
enabled_services is a global configuration (as indicated by its existence in all_nodes.json). This was true in previous versions.  The controller specifically needs to know what services are enabled on a global level in the cloud so that things like keystone, haproxy, mysql users can be properly configured. This seems to be a bug in how the collectd plugin configuration is choosing what services are available locally. I am uncertain if there exists a local representation of the services installed locally. I believe historically we've controlled that via services in a role rather than a hiera key.  You could add a special key via the NovaCompute service that would only be configured in the service_config.json so that the collectd would key off of that.

Comment 9 Alex Schultz 2019-12-04 15:41:53 UTC
Upon further review, it looks as though https://review.opendev.org/#/q/topic:bug/1835551+(status:open+OR+status:merged) caused the issue.  service_names used to exist until Train which was incorrectly removed because service_names was scoped to the role. We'll need to revert these changes.

Comment 15 Leonid Natapov 2019-12-17 11:02:06 UTC
Verified. The BZ was opened because collectd virt plugin appear on controller nodes while it should appear only on compute nodes where libvirt is running. 

Currently virt plugin runs only on compute nodes. I also don't see novas_compute role on controller node as described in the BZ.

Comment 18 errata-xmlrpc 2020-02-06 14:42:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:0283