Bug 1659453

Summary: SNMP-based metrics are missing on undercloud
Product: Red Hat OpenStack Reporter: Marek Aufart <maufart>
Component: gnocchiAssignee: Martin Magr <mmagr>
Status: CLOSED EOL QA Contact: Leonid Natapov <lnatapov>
Severity: medium Docs Contact:
Priority: high    
Version: 14.0 (Rocky)CC: alolivei, apannu, apevec, iovadia, jhajyahy, jjoyce, jschluet, lhh, lmadsen, mmagr, pkilambi, pveiga, rchincho, spower, ssmolyak, tuado
Target Milestone: z3Keywords: Reopened, Triaged, ZStream
Target Release: 14.0 (Rocky)Flags: maufart: needinfo? (jhajyahy)
maufart: needinfo? (spower)
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-01-24 12:28:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1709277    
Bug Blocks: 1626151, 1753950    

Description Marek Aufart 2018-12-14 12:37:04 UTC
Description of problem:
Gnocchi on undercloud should provide hardware.* metrics. These metrics are captured via SNMP from undercloud nodes. Currently, no such metrics are present in gnocchi metric list output.

Version-Release number of selected component (if applicable):
OSP14


How reproducible:
always


Steps to Reproduce:
1. deploy OSP14 with enabled telemetry on undercloud
2. $ gnocchi metric list

Actual results:
no metrics from "hardware" category are present

Expected results:
hardware metrics should be present, e.g. hardware.cpu.util

Additional info:
This issue was initially discussed on rhos-mm mailing list [1] where it looked that it is just configuration issue, so we raised docs BZ to document how to setup it [2]. After several days of debugging, we are _not_ able get it working, so raising this BZ. 

This issue introduces regression in CloudForms integration (missing metrics for undercloud nodes).

[1] http://post-office.corp.redhat.com/archives/rhos-mm/2018-December/msg00002.html
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1655958

Comment 2 Martin Magr 2018-12-17 15:49:52 UTC
Ok, maybe we missed SNMP configuration whe going to containerized undercloud. We have to ivestigate more on that.

Comment 6 Martin Magr 2019-05-14 09:24:46 UTC
As from an Alex's comment in patch #656760 I've deployed queens with telemetry enabled and I also get empty metric list. What I found was following, probably related. i'm continuing with investigation. Marek, are you sure that this is a 13-to-14 regression?

2019-05-14 09:17:52.632 63609 DEBUG ceilometer.polling.manager [-] Skip pollster hardware.memory.used, no resources found this cycle poll_and_notify /usr/lib/python2.7/site-packages/ceilometer/polling/manager.py:175
2019-05-14 09:17:52.632 63609 DEBUG ceilometer.polling.manager [-] Skip pollster hardware.system_stats.io.outgoing.blocks, no resources found this cycle poll_and_notify /usr/lib/python2.7/site-packages/ceilometer/polling/manager.py:175
2019-05-14 09:17:52.632 63609 DEBUG ceilometer.polling.manager [-] Skip pollster hardware.memory.buffer, no resources found this cycle poll_and_notify /usr/lib/python2.7/site-packages/ceilometer/polling/manager.py:175
2019-05-14 09:17:52.632 63609 DEBUG ceilometer.polling.manager [-] Skip pollster hardware.memory.swap.avail, no resources found this cycle poll_and_notify /usr/lib/python2.7/site-packages/ceilometer/polling/manager.py:175
2019-05-14 09:17:52.632 63609 DEBUG ceilometer.polling.manager [-] Skip pollster hardware.cpu.util, no resources found this cycle poll_and_notify /usr/lib/python2.7/site-packages/ceilometer/polling/manager.py:175
2019-05-14 09:17:52.633 63609 DEBUG ceilometer.polling.manager [-] Skip pollster hardware.memory.total, no resources found this cycle poll_and_notify /usr/lib/python2.7/site-packages/ceilometer/polling/manager.py:175
2019-05-14 09:17:52.633 63609 DEBUG ceilometer.polling.manager [-] Skip pollster hardware.system_stats.io.incoming.blocks, no resources found this cycle poll_and_notify /usr/lib/python2.7/site-packages/ceilometer/polling/manager.py:175
2019-05-14 09:17:52.633 63609 DEBUG ceilometer.polling.manager [-] Skip pollster hardware.memory.cached, no resources found this cycle poll_and_notify /usr/lib/python2.7/site-packages/ceilometer/polling/manager.py:175
2019-05-14 09:17:52.633 63609 DEBUG ceilometer.polling.manager [-] Skip pollster hardware.network.ip.incoming.datagrams, no resources found this cycle poll_and_notify /usr/lib/python2.7/site-packages/ceilometer/polling/manager.py:175
2019-05-14 09:17:52.633 63609 DEBUG ceilometer.polling.manager [-] Skip pollster hardware.network.ip.outgoing.datagrams, no resources found this cycle poll_and_notify /usr/lib/python2.7/site-packages/ceilometer/polling/manager.py:175
2019-05-14 09:17:52.633 63609 DEBUG ceilometer.polling.manager [-] Skip pollster hardware.memory.swap.total, no resources found this cycle poll_and_notify /usr/lib/python2.7/site-packages/ceilometer/polling/manager.py:175

Comment 7 Martin Magr 2019-05-14 14:01:32 UTC
Same is for pike:

(undercloud) [stack@undercloud ~]$ grep telemetry undercloud.conf                                               
enable_telemetry = True
(undercloud) [stack@undercloud ~]$ ps -ef | grep ceilometer
ceilome+ 20087     1  0 13:38 ?        00:00:09 ceilometer-agent-notification: master process [/usr/bin/ceilometer-agent-notification --logfile /var/log/ceilometer/agent-notification.log]
ceilome+ 20165     1  0 13:38 ?        00:00:10 ceilometer-polling: master process [/usr/bin/ceilometer-polling --polling-namespaces central --logfile /var/log/ceilometer/central.log]
ceilome+ 20262 20087  3 13:38 ?        00:00:46 ceilometer-agent-notification: NotificationService worker(0)
ceilome+ 20279 20165  0 13:38 ?        00:00:00 ceilometer-polling: AgentManager worker(0)
stack    26553 15685  0 13:59 pts/0    00:00:00 grep --color=auto ceilometer
(undercloud) [stack@undercloud ~]$ gnocchi metric list

(undercloud) [stack@undercloud ~]$ rpm -qa openstack-ceilometer*
openstack-ceilometer-polling-9.0.8-0.20190511015415.4a82ac5.el7.noarch
openstack-ceilometer-common-9.0.8-0.20190511015415.4a82ac5.el7.noarch
openstack-ceilometer-notification-9.0.8-0.20190511015415.4a82ac5.el7.noarch
openstack-ceilometer-central-9.0.8-0.20190511015415.4a82ac5.el7.noarch
(undercloud) [stack@undercloud ~]$ 

With what version of OpenStack did you have the hardware metrics working?

Comment 8 Marek Aufart 2019-05-14 14:19:51 UTC
Let's sync with QE on this - Ido, do I remember correctly, that first failure of undercloud Hosts metrics appeared in OSP14 (containerized undercloud) and was working in OSP13? (or was there some workaround needed)

Comment 9 Rahul Chincholkar 2019-10-02 09:47:46 UTC
Reopening as this blocks BZ #1753950

Comment 14 Stephen 2019-11-07 20:39:23 UTC
*** Bug 1540950 has been marked as a duplicate of this bug. ***