Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1744706

Summary: [Cloudops] there is no amqp1 installed on Compute nodes in Spine-Leaf network topology
Product: Red Hat OpenStack Reporter: Yuri Obshansky <yobshans>
Component: openstack-tripleo-heat-templatesAssignee: Emma Foley <efoley>
Status: CLOSED NOTABUG QA Contact: Leonid Natapov <lnatapov>
Severity: high Docs Contact:
Priority: medium    
Version: 13.0 (Queens)CC: aschultz, mburns, mmagr, mrunge
Target Milestone: z9Keywords: Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-09-25 11:39:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yuri Obshansky 2019-08-22 17:24:36 UTC
Description of problem:
OSP 13 Overcloud with Spine-Leaf network topology deployed with Metrics_QDR 
Containers metrcs_qdr installed on all nodes using workaround proposed by Harald Jensås as

parameter_defaults:
  ...
  Compute1ExtraConfig:
    ...
    tripleo::profile::base::metrics::qdr::listener_addr: "%{hiera('internal_api1')}"
  Compute2ExtraConfig:
    ...
    tripleo::profile::base::metrics::qdr::listener_addr: "%{hiera('internal_api2')}"

Unfortunately, data from Compute nodes which are running on leafs do not transferred to Prometheus. There is no data from overcloud-compute1-0 and overcloud-compute2-0

sa_collectd_metric_per_host{endpoint="prom-http",exported_instance="172.25.1.5",service="white-smartgateway"}	0
sa_collectd_metric_per_host{endpoint="prom-http",exported_instance="172.25.1.6",service="white-smartgateway"}	0
sa_collectd_metric_per_host{endpoint="prom-http",exported_instance="172.25.1.7",service="white-smartgateway"}	0
sa_collectd_metric_per_host{endpoint="prom-http",exported_instance="overcloud-compute0-0.localdomain",service="white-smartgateway"}	277
sa_collectd_metric_per_host{endpoint="prom-http",exported_instance="overcloud-controller0-0.localdomain",service="white-smartgateway"}	360
sa_collectd_metric_per_host{endpoint="prom-http",exported_instance="overcloud-controller0-1.localdomain",service="white-smartgateway"}	203
sa_collectd_metric_per_host{endpoint="prom-http",exported_instance="overcloud-controller0-2.localdomain",service="white-smartgateway"}

It happened becasue deployment process did not install amqp1 on Compute nodes which are running on leafs.

Difference of configuration files on Compute node on central site between Compute node on leaf
COMPUTE-1 (ON LEAF)
----------
()[root@overcloud-compute2-0 collectd.d]# ls -al
total 64
drwxr-x---. 2 root root 4096 Aug 22 15:55 .
drwxr-xr-x. 1 root root   45 Aug 22 15:55 ..
-rw-r-----. 1 root root  197 Aug 20 03:23 05-logfile.conf
-rw-r-----. 1 root root  231 Aug 20 03:23 10-cpu.conf
-rw-r-----. 1 root root  205 Aug 20 03:23 10-df.conf
-rw-r-----. 1 root root  119 Aug 20 03:23 10-disk.conf
-rw-r-----. 1 root root  215 Aug 20 03:23 10-hugepages.conf
-rw-r-----. 1 root root  151 Aug 20 03:23 10-interface.conf
-rw-r-----. 1 root root  119 Aug 20 03:23 10-load.conf
-rw-r-----. 1 root root  147 Aug 20 03:23 10-memory.conf
-rw-r-----. 1 root root  110 Aug 20 03:23 10-ovs_events.conf
-rw-r-----. 1 root root  108 Aug 20 03:23 10-ovs_stats.conf
-rw-r-----. 1 root root   77 Aug 20 03:23 10-processes.conf
-rw-r-----. 1 root root  105 Aug 20 03:23 10-tcpconns.conf
-rw-r-----. 1 root root  209 Aug 20 03:23 10-unixsock.conf
-rw-r-----. 1 root root   74 Aug 20 03:23 10-uptime.conf
-rw-r-----. 1 root root  126 Aug 20 03:23 10-virt.conf


COMPUTE-0 (CENTRAL)
--------
drwxr-x---. 2 root root 4096 Aug 22 15:04 .
drwxr-xr-x. 1 root root   45 Aug 22 15:04 ..
-rw-r-----. 1 root root  197 Aug 20 03:23 05-logfile.conf
-rw-r-----. 1 root root  438 Aug 20 03:23 10-amqp1.conf
-rw-r-----. 1 root root  231 Aug 20 03:23 10-cpu.conf
-rw-r-----. 1 root root  205 Aug 20 03:23 10-df.conf
-rw-r-----. 1 root root  119 Aug 20 03:23 10-disk.conf
-rw-r-----. 1 root root  215 Aug 20 03:23 10-hugepages.conf
-rw-r-----. 1 root root  151 Aug 20 03:23 10-interface.conf
-rw-r-----. 1 root root  119 Aug 20 03:23 10-load.conf
-rw-r-----. 1 root root  147 Aug 20 03:23 10-memory.conf
-rw-r-----. 1 root root  110 Aug 20 03:23 10-ovs_events.conf
-rw-r-----. 1 root root  108 Aug 20 03:23 10-ovs_stats.conf
-rw-r-----. 1 root root   77 Aug 20 03:23 10-processes.conf
-rw-r-----. 1 root root  105 Aug 20 03:23 10-tcpconns.conf
-rw-r-----. 1 root root  209 Aug 20 03:23 10-unixsock.conf
-rw-r-----. 1 root root   74 Aug 20 03:23 10-uptime.conf
-rw-r-----. 1 root root  126 Aug 20 03:23 10-virt.conf
 

Version-Release number of selected component (if applicable):
[stack@site-undercloud-0 ~]$ rpm -qa | grep tripleo
openstack-tripleo-validations-8.4.5-2.el7ost.noarch
puppet-tripleo-8.4.1-23.el7ost.noarch
openstack-tripleo-heat-templates-8.3.1-72.el7ost.noarch
python-tripleoclient-9.2.7-11.el7ost.noarch
openstack-tripleo-common-8.6.8-13.el7ost.noarch
openstack-tripleo-common-containers-8.6.8-13.el7ost.noarch
ansible-tripleo-ipsec-8.1.1-0.20190513184007.7eb892c.el7ost.noarch
openstack-tripleo-image-elements-8.0.2-2.el7ost.noarch
openstack-tripleo-ui-8.3.2-3.el7ost.noarch
openstack-tripleo-puppet-elements-8.0.2-3.el7ost.noarch

How reproducible:

1. Install Openstack DCN with Spine-Leaf network topology 
https://docs.google.com/document/d/1UuIqXI_HMpKFjA2yHei0m7KvlWdJUJA3KQHUtMFOoNk/edit#
2. Edit custom roles file with service
    - OS::TripleO::Services::MetricsQdr 
3. Create file metrics-qdr-collectd.yml
---
tripleo_heat_templates:
    - /usr/share/openstack-tripleo-heat-templates/environments/metrics-collectd-qdr.yaml

custom_templates:
    parameter_defaults:
        DnsServers: ["10.19.110.3","10.19.110.3","10.46.0.31"]
        CollectdAmqpInterval: 1
        CollectdConnectionType: amqp1
        CollectdExtraPlugins:
            - cpu
            - df
            - hugepages
            - ovs_events
            - ovs_stats
            - load
            - uptime
        CollectdAmqpInstances:
            telemetry:
                format: JSON
                presettle: true
        MetricsQdrConnectors:
            - host: qdr-white-port-5671-sa-telemetry.apps.dev7.nfvpe.site
              port: 443
              role: edge
              verifyHostname: false
              sslProfile: sslProfile
        MetricsQdrSSLProfiles:
            - name: sslProfile
4. add to overcloud deployment
-e /home/stack/osp-13-spine-leaf-2-compute-on-leaf/metrics-qdr-collectd.yaml \
--environment-file /usr/share/openstack-tripleo-heat-templates/environments/metrics-collectd-qdr.yaml \

5. Up[date container_images.yaml/docker-images.yaml with images
DockerMetricsQdrConfigImage: 192.168.24.1:8787/rhosp13/openstack-qdrouterd:2019-06-27.1
  DockerMetricsQdrImage: 192.168.24.1:8787/rhosp13/openstack-qdrouterd:2019-06-27.1
  DockerQdrouterdConfigImage: 192.168.24.1:8787/rhosp13/openstack-qdrouterd:2019-06-27.1
  DockerQdrouterdImage: 192.168.24.1:8787/rhosp13/openstack-qdrouterd:2019-06-27.1

6. Update nodes-data.yaml with 
parameter_defaults:
  ...
  Compute1ExtraConfig:
    ...
    tripleo::profile::base::metrics::qdr::listener_addr: "%{hiera('internal_api1')}"
  Compute2ExtraConfig:
    ...
    tripleo::profile::base::metrics::qdr::listener_addr: "%{hiera('internal_api2')}"

7. Start deploymment

Actual results:
Deployment passed. No data transferred to Prometheus.

Expected results:
Deployment passed. All data transferred to Prometheus.

Additional info:

Comment 5 Yuri Obshansky 2019-09-11 14:24:48 UTC
As workaround I use
nodes-data.yaml with 
parameter_defaults:
  ...
  Compute1ExtraConfig:
    ...
    tripleo::profile::base::metrics::collectd::amqp_host: "%{hiera('internal_api1')}"
  Compute2ExtraConfig:
    ...
    tripleo::profile::base::metrics::collectd::amqp_host: "%{hiera('internal_api2')}"

amqp has been installed on Compute nodes on Leafs
[root@overcloud-compute1-0 ~]# cd /var/lib/config-data/collectd/etc/collectd.d
[root@overcloud-compute1-0 collectd.d]# ll
total 64
-rw-r-----. 1 root root 197 Sep 11 13:14 05-logfile.conf
-rw-r-----. 1 root root 450 Sep 11 13:14 10-amqp1.conf
-rw-r-----. 1 root root 231 Sep 11 13:14 10-cpu.conf
-rw-r-----. 1 root root 205 Sep 11 13:14 10-df.conf
-rw-r-----. 1 root root 119 Sep 11 13:14 10-disk.conf
-rw-r-----. 1 root root 215 Sep 11 13:14 10-hugepages.conf
-rw-r-----. 1 root root 151 Sep 11 13:14 10-interface.conf
-rw-r-----. 1 root root 119 Sep 11 13:14 10-load.conf
-rw-r-----. 1 root root 147 Sep 11 13:14 10-memory.conf
-rw-r-----. 1 root root 110 Sep 11 13:14 10-ovs_events.conf
-rw-r-----. 1 root root 108 Sep 11 13:14 10-ovs_stats.conf
-rw-r-----. 1 root root  77 Sep 11 13:14 10-processes.conf
-rw-r-----. 1 root root 105 Sep 11 13:14 10-tcpconns.conf
-rw-r-----. 1 root root 209 Sep 11 13:14 10-unixsock.conf
-rw-r-----. 1 root root  74 Sep 11 13:14 10-uptime.conf
-rw-r-----. 1 root root 126 Sep 11 13:14 10-virt.conf

Comment 6 Emma Foley 2019-09-12 09:53:09 UTC
The following changes were made to nodes-data.yaml to get collectd-amqp configured correctly::
 
parameter_defaults:
  ...
  Compute1ExtraConfig:
    ...
    tripleo::profile::base::metrics::collectd::amqp_host: "%{hiera('internal_api1')}"
  Compute2ExtraConfig:
    ...
    tripleo::profile::base::metrics::collectd::amqp_host: "%{hiera('internal_api2')}"

Comment 7 Emma Foley 2019-09-12 10:29:32 UTC
Logged a bug for the required documentation https://bugzilla.redhat.com/show_bug.cgi?id=1751649

Comment 8 Martin Magr 2019-09-25 11:39:45 UTC
Closing this BZ as issue has been solved on engineering side.