Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 1584762 - Undercloud instances don't have hardware.* metrics enabled in polling.yaml
Undercloud instances don't have hardware.* metrics enabled in polling.yaml
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates (Show other bugs)
13.0 (Queens)
Unspecified Unspecified
high Severity high
: z2
: 13.0 (Queens)
Assigned To: Mehdi ABAAKOUK
Sasha Smolyak
: Triaged, ZStream
Depends On:
Blocks: 1622839 1590114
  Show dependency treegraph
 
Reported: 2018-05-31 11:15 EDT by Sasha Smolyak
Modified: 2018-08-29 12:37 EDT (History)
14 users (show)

See Also:
Fixed In Version: openstack-tripleo-heat-templates-8.0.3-2.el7ost
Doc Type: Known Issue
Doc Text:
If Telemetry is manually enabled on the undercloud, `hardware.*` metrics does not work due to a misconfiguration of the firewall on each of the nodes. As a workaround, you need to manually set the `snmpd` subnet with the control plane network by adding an extra template for the undercloud deployment as follows: parameter_defaults: SnmpdIpSubnet: 192.168.24.0/24
Story Points: ---
Clone Of:
: 1590114 1622839 (view as bug list)
Environment:
Last Closed: 2018-08-29 12:36:45 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
OpenStack gerrit 572108 None master: MERGED tripleo-heat-templates: snmp: listen on ctrlplane (Ia310e02d30ce037c2cc7fec146f27fbd0f8055f4) 2018-07-18 21:55 EDT
OpenStack gerrit 575543 None None None 2018-07-23 22:31 EDT
Red Hat Product Errata RHBA-2018:2574 None None None 2018-08-29 12:37 EDT

  None (edit)
Description Sasha Smolyak 2018-05-31 11:15:13 EDT
Description of problem:
Undercloud instances don't have hardware.* metrics enabled in polling.yaml

Version-Release number of selected component (if applicable):

Performed update from puddle of 2018-05-15.2 to
puppet-gnocchi-12.4.0-0.20180329032858.5dfa350.el7ost.noarch
gnocchi-metricd-4.2.3-2.el7ost.noarch
python-gnocchi-4.2.3-2.el7ost.noarch
gnocchi-api-4.2.3-2.el7ost.noarch
python2-gnocchiclient-7.0.1-1.el7ost.noarch
gnocchi-statsd-4.2.3-2.el7ost.noarch
gnocchi-common-4.2.3-2.el7ost.noarch


How reproducible:
100%

Steps to Reproduce:
1. Install 2018-05-15.2 
2. Update to 2018-05-29.2 
3. Check /etc/ceilometer/polling.yaml in undercloud machine

Actual results:
No hardware.* metrics hence no cpu or memory measures of undercloud machine collected.

Expected results:
hardware* metrics present in the file.

Additional info:
First seen it after update, but I think it's worth checking whether the metrics disappeared from the polling.yaml in the build at some point.
I do know that after upgrade from 12 they were there.
Comment 3 Mehdi ABAAKOUK 2018-06-04 08:54:22 EDT
My current investigation leads to a snmpd issue or configuration:

* /etc/ceilometer/polling.yaml have hardware metrics
* We see 3 instances instead of 8 in /var/log/ceilometer/agent-notification.log
* The 3 ceph nodes have the hardware.* metrics, but not the compute and controller nodes
* /var/log/ceilometer/central.log is full of:

2018-06-04 08:48:23.838 2952 WARNING ceilometer.hardware.pollsters.generic [-] inspector call failed for hardware.cpu.util host 192.168.24.11: An error occurred, oids ['1.3.6.1.4.1.2021.11.9.0'], host 192.168.24.11, No SNMP response received before timeout: SNMPException: An error occurred, oids ['1.3.6.1.4.1.2021.11.9.0'], host 192.168.24.11, No SNMP response received before timeout
2018-06-04 08:48:30.033 2952 WARNING ceilometer.hardware.pollsters.generic [-] inspector call failed for hardware.cpu.util host 192.168.24.19: An error occurred, oids ['1.3.6.1.4.1.2021.11.9.0'], host 192.168.24.19, No SNMP response received before timeout: SNMPException: An error occurred, oids ['1.3.6.1.4.1.2021.11.9.0'], host 192.168.24.19, No SNMP response received before timeout



This suggest that snmpd on the controller and compute nodes is not answering to ceilometer-agent-central.
Comment 4 Mehdi ABAAKOUK 2018-06-04 09:16:55 EDT
This is a firewall issue. The firewall rules created on controller/compute nodes doesn't use the correct subnet (it's 172.17.1.0/24 instead of 92.168.24.0/24   :

* Broken node:

[root@compute-1 ~]# iptables -nL |grep 161
ACCEPT     udp  --  172.17.1.0/24        0.0.0.0/0            multiport dports 161 state NEW /* 124 snmp ipv4 */

* Working node:

[root@ceph-0 snmp]# iptables -nL |grep 161
ACCEPT     udp  --  192.168.24.0/24      0.0.0.0/0            multiport dports 161 state NEW /* 124 snmp ipv4 */
Comment 22 Sasha Smolyak 2018-08-23 05:36:53 EDT
The workaround works OK, now it's important to add this to documentation for installation/update/upgrade
Comment 23 Mehdi ABAAKOUK 2018-08-28 03:47:30 EDT
I confirm the regression have been backported to OSP10, so I have cloned it for OSP10 #1622839
Comment 25 errata-xmlrpc 2018-08-29 12:36:45 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2574

Note You need to log in before you can comment on or make changes to this bug.