Bug 1584762
Summary: | Undercloud instances don't have hardware.* metrics enabled in polling.yaml | |||
---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Sasha Smolyak <ssmolyak> | |
Component: | openstack-tripleo-heat-templates | Assignee: | Mehdi ABAAKOUK <mabaakou> | |
Status: | CLOSED ERRATA | QA Contact: | Sasha Smolyak <ssmolyak> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 13.0 (Queens) | CC: | apannu, apevec, dnavale, joflynn, jruzicka, jschluet, lhh, mabaakou, maufart, mburns, pkilambi, srevivo, takito, vfarias | |
Target Milestone: | z2 | Keywords: | Triaged, ZStream | |
Target Release: | 13.0 (Queens) | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | openstack-tripleo-heat-templates-8.0.3-2.el7ost | Doc Type: | Known Issue | |
Doc Text: |
If Telemetry is manually enabled on the undercloud, `hardware.*` metrics does not work due to a misconfiguration of the firewall on each of the nodes. As a workaround, you need to manually set the `snmpd` subnet with the control plane network by adding an extra template for the undercloud deployment as follows:
parameter_defaults:
SnmpdIpSubnet: 192.168.24.0/24
|
Story Points: | --- | |
Clone Of: | ||||
: | 1590114 1622839 (view as bug list) | Environment: | ||
Last Closed: | 2018-08-29 16:36:45 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1590114, 1622839 |
Description
Sasha Smolyak
2018-05-31 15:15:13 UTC
My current investigation leads to a snmpd issue or configuration: * /etc/ceilometer/polling.yaml have hardware metrics * We see 3 instances instead of 8 in /var/log/ceilometer/agent-notification.log * The 3 ceph nodes have the hardware.* metrics, but not the compute and controller nodes * /var/log/ceilometer/central.log is full of: 2018-06-04 08:48:23.838 2952 WARNING ceilometer.hardware.pollsters.generic [-] inspector call failed for hardware.cpu.util host 192.168.24.11: An error occurred, oids ['1.3.6.1.4.1.2021.11.9.0'], host 192.168.24.11, No SNMP response received before timeout: SNMPException: An error occurred, oids ['1.3.6.1.4.1.2021.11.9.0'], host 192.168.24.11, No SNMP response received before timeout 2018-06-04 08:48:30.033 2952 WARNING ceilometer.hardware.pollsters.generic [-] inspector call failed for hardware.cpu.util host 192.168.24.19: An error occurred, oids ['1.3.6.1.4.1.2021.11.9.0'], host 192.168.24.19, No SNMP response received before timeout: SNMPException: An error occurred, oids ['1.3.6.1.4.1.2021.11.9.0'], host 192.168.24.19, No SNMP response received before timeout This suggest that snmpd on the controller and compute nodes is not answering to ceilometer-agent-central. This is a firewall issue. The firewall rules created on controller/compute nodes doesn't use the correct subnet (it's 172.17.1.0/24 instead of 92.168.24.0/24 : * Broken node: [root@compute-1 ~]# iptables -nL |grep 161 ACCEPT udp -- 172.17.1.0/24 0.0.0.0/0 multiport dports 161 state NEW /* 124 snmp ipv4 */ * Working node: [root@ceph-0 snmp]# iptables -nL |grep 161 ACCEPT udp -- 192.168.24.0/24 0.0.0.0/0 multiport dports 161 state NEW /* 124 snmp ipv4 */ This have been broken by: https://github.com/openstack/tripleo-heat-templates/commit/43155ed1462a8e27c9efdbb345bfc5832c50bd2f The workaround works OK, now it's important to add this to documentation for installation/update/upgrade I confirm the regression have been backported to OSP10, so I have cloned it for OSP10 #1622839 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2574 |