Bug 1476452 - Host metrics are not being generated by gnocchi [NEEDINFO]
Host metrics are not being generated by gnocchi
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-ceilometer (Show other bugs)
10.0 (Newton)
x86_64 Linux
high Severity high
: z5
: 10.0 (Newton)
Assigned To: Mehdi ABAAKOUK
Sasha Smolyak
: OtherQA, Triaged, ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-29 03:05 EDT by Nilesh
Modified: 2018-03-05 07:51 EST (History)
12 users (show)

See Also:
Fixed In Version: openstack-ceilometer-7.1.0-3
Doc Type: Bug Fix
Doc Text:
This update fixes a bug that prevented hosts from providing total memory through SNMP, which in turn prevented gnocchi from generating correct host metrics.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-09-28 12:34:13 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
pablo.iranzo: needinfo? (eglynn)


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Launchpad 1707859 None None None 2017-08-01 05:09 EDT
OpenStack gerrit 489535 None None None 2017-08-01 05:13 EDT
OpenStack gerrit 493288 None None None 2017-08-28 08:11 EDT
Red Hat Product Errata RHBA-2017:2822 normal SHIPPED_LIVE openstack-ceilometer bug fix advisory 2017-09-28 16:32:29 EDT

  None (edit)
Description Nilesh 2017-07-29 03:05:03 EDT
Description of problem:

Host metrics are being displayed by gnocchi.
gnocchi resource list | grep host generates no output.

1) Verify the configurtion for gnochi looks ok 
~~~
[nchandek@collab-shell etc]$ grep -i "coordination_url" gnocchi/gnocchi.conf
#coordination_url = <None>
coordination_url = redis://:ZB8HeMtNqKYyhxQDvgktqGPb8@200.200.3.2:6379/
[nchandek@collab-shell etc]$ 
~~~

~~~
[nchandek@collab-shell etc]$ grep -i "archive_policy_name" gnocchi/gnocchi.conf
#archive_policy_name = <None>
archive_policy_name = low
[nchandek@collab-shell etc]$ 
~~~

2) Verify the config file for ceilometer looks ok 
~~~
[nchandek@collab-shell etc]$ grep -i "meter_dispatchers" ceilometer/ceilometer.conf 
#meter_dispatchers = database
meter_dispatchers=gnocchi
[nchandek@collab-shell etc]$ 
~~~

~~~
[nchandek@collab-shell etc]$ grep -i "event_dispatchers" ceilometer/ceilometer.conf 
#event_dispatchers =
event_dispatchers=database
[nchandek@collab-shell etc]$ 
~~~

~~~
[nchandek@collab-shell etc]$ grep -i "filter_project" ceilometer/ceilometer.conf 
#filter_project = gnocchi
filter_project=service
[nchandek@collab-shell etc]$ 
~~~

~~~
[nchandek@collab-shell etc]$ grep -i "archive_policy" ceilometer/ceilometer.conf 
#archive_policy = <None>
archive_policy=low
[nchandek@collab-shell etc]$ 
~~~

~~~
[nchandek@collab-shell etc]$ grep -i "esources_definition_file" ceilometer/ceilometer.conf 
#resources_definition_file = gnocchi_resources.yaml
resources_definition_file=gnocchi_resources.yaml
[nchandek@collab-shell etc]$ 
~~~

2) Errors log, // less ceilometer/central.log
~~~
2017-07-28 12:05:27.118 134327 INFO ceilometer.agent.manager [-] Polling pollster hardware.memory.swap.total in the context of meter_snmp
2017-07-28 12:05:27.258 134327 INFO ceilometer.agent.manager [-] Polling pollster hardware.memory.used in the context of meter_snmp
2017-07-28 12:05:27.290 134327 ERROR ceilometer.hardware.pollsters.generic [-] inspector call failed for hardware.memory.used host overcloud-compute-0.localdomain: invalid literal for int() with base 10: ''
2017-07-28 12:05:27.290 134327 ERROR ceilometer.hardware.pollsters.generic Traceback (most recent call last):
2017-07-28 12:05:27.290 134327 ERROR ceilometer.hardware.pollsters.generic   File "/usr/lib/python2.7/site-packages/ceilometer/hardware/pollsters/generic.py", line 159, in get_samples
2017-07-28 12:05:27.290 134327 ERROR ceilometer.hardware.pollsters.generic     param=inspector_param))
2017-07-28 12:05:27.290 134327 ERROR ceilometer.hardware.pollsters.generic   File "/usr/lib/python2.7/site-packages/ceilometer/hardware/inspector/snmp.py", line 269, in inspect_generic
2017-07-28 12:05:27.290 134327 ERROR ceilometer.hardware.pollsters.generic     suffix)
2017-07-28 12:05:27.290 134327 ERROR ceilometer.hardware.pollsters.generic   File "/usr/lib/python2.7/site-packages/ceilometer/hardware/inspector/snmp.py", line 277, in _post_op_memory_avail_to_used
2017-07-28 12:05:27.290 134327 ERROR ceilometer.hardware.pollsters.generic     value = int(cache[self._CACHE_KEY_OID][_memory_total_oid]) - value
2017-07-28 12:05:27.290 134327 ERROR ceilometer.hardware.pollsters.generic   File "/usr/lib/python2.7/site-packages/pyasn1/type/univ.py", line 476, in __int__
2017-07-28 12:05:27.290 134327 ERROR ceilometer.hardware.pollsters.generic     def __int__(self): return int(self._value)
2017-07-28 12:05:27.290 134327 ERROR ceilometer.hardware.pollsters.generic ValueError: invalid literal for int() with base 10: ''
2017-07-28 12:05:27.290 134327 ERROR ceilometer.hardware.pollsters.generic 
2017-07-28 12:05:27.329 134327 ERROR ceilometer.hardware.pollsters.generic [-] inspector call failed for hardware.memory.used host overcloud-controller-0.localdomain: invalid literal for int() with base 10: ''
2017-07-28 12:05:27.329 134327 ERROR ceilometer.hardware.pollsters.generic Traceback (most recent call last):
2017-07-28 12:05:27.329 134327 ERROR ceilometer.hardware.pollsters.generic   File "/usr/lib/python2.7/site-packages/ceilometer/hardware/pollsters/generic.py", line 159, in get_samples
2017-07-28 12:05:27.329 134327 ERROR ceilometer.hardware.pollsters.generic     param=inspector_param))
2017-07-28 12:05:27.329 134327 ERROR ceilometer.hardware.pollsters.generic   File "/usr/lib/python2.7/site-packages/ceilometer/hardware/inspector/snmp.py", line 269, in inspect_generic
2017-07-28 12:05:27.329 134327 ERROR ceilometer.hardware.pollsters.generic     suffix)
2017-07-28 12:05:27.329 134327 ERROR ceilometer.hardware.pollsters.generic   File "/usr/lib/python2.7/site-packages/ceilometer/hardware/inspector/snmp.py", line 277, in _post_op_memory_avail_to_used
2017-07-28 12:05:27.329 134327 ERROR ceilometer.hardware.pollsters.generic     value = int(cache[self._CACHE_KEY_OID][_memory_total_oid]) - value
2017-07-28 12:05:27.329 134327 ERROR ceilometer.hardware.pollsters.generic   File "/usr/lib/python2.7/site-packages/pyasn1/type/univ.py", line 476, in __int__
2017-07-28 12:05:27.329 134327 ERROR ceilometer.hardware.pollsters.generic     def __int__(self): return int(self._value)
2017-07-28 12:05:27.329 134327 ERROR ceilometer.hardware.pollsters.generic ValueError: invalid literal for int() with base 10: ''
~~~

3) # less gnocchi/app.log // We checked `swift` service is running
~~~
2017-07-28 12:10:33.634 16221 INFO swiftclient [-] RESP BODY: <html><h1>Not Found</h1><p>The resource could not be found.</p></html>
2017-07-28 12:10:33.634 16221 ERROR swiftclient [-] Container GET failed: http://10.137.66.223:8080/v1/AUTH_14df692169d14714a774ecf44e092b70/gnocchi.669d78fc-0a8e-443a-8f8e-53daf349fb9a?format=json 404 Not Found  [first 60 chars of response] <html><h1>Not Found</h1><p>The resource could not be found.<
2017-07-28 12:10:33.634 16221 ERROR swiftclient Traceback (most recent call last):
2017-07-28 12:10:33.634 16221 ERROR swiftclient   File "/usr/lib/python2.7/site-packages/swiftclient/client.py", line 1647, in _retry
2017-07-28 12:10:33.634 16221 ERROR swiftclient     service_token=self.service_token, **kwargs)
2017-07-28 12:10:33.634 16221 ERROR swiftclient   File "/usr/lib/python2.7/site-packages/swiftclient/client.py", line 873, in get_container
2017-07-28 12:10:33.634 16221 ERROR swiftclient     service_token=service_token, headers=headers)
2017-07-28 12:10:33.634 16221 ERROR swiftclient   File "/usr/lib/python2.7/site-packages/swiftclient/client.py", line 917, in get_container
2017-07-28 12:10:33.634 16221 ERROR swiftclient     raise ClientException.from_response(resp, 'Container GET failed', body)
2017-07-28 12:10:33.634 16221 ERROR swiftclient ClientException: Container GET failed: http://10.137.66.223:8080/v1/AUTH_14df692169d14714a774ecf44e092b70/gnocchi.669d78fc-0a8e-443a-8f8e-53daf349fb9a?format=json 404 Not Found  [first 60 chars of response] <html><h1>Not Found</h1><p>The resource could not be found.<
2017-07-28 12:10:33.634 16221 ERROR swiftclient 
~~~

4) # less gnocchi/metricd.log // swift is not processing the request. 
~~~
2017-07-28 12:09:12.917 928233 INFO swiftclient [-] REQ: curl -i http://10.137.66.223:8080/v1/AUTH_14df692169d14714a774ecf44e092b70/measure/328a2b89-fec3-4e7b-9df3-45f6ec9b67fb/b1eb8035-5581-43a1-9f38-be0d9b2584f8_20170728_08%3A33%3A09 -X GET -H "X-Auth-Token: 95a302f61f5046b8..."
2017-07-28 12:09:12.917 928233 INFO swiftclient [-] RESP STATUS: 404 Not Found
2017-07-28 12:09:12.918 928233 INFO swiftclient [-] RESP HEADERS: {u'Date': u'Fri, 28 Jul 2017 12:09:12 GMT', u'Content-Length': u'70', u'Content-Type': u'text/html; charset=UTF-8', u'X-Trans-Id': u'txcf196812b6844995ae448-00597b2968'}
2017-07-28 12:09:12.918 928233 INFO swiftclient [-] RESP BODY: <html><h1>Not Found</h1><p>The resource could not be found.</p></html>
2017-07-28 12:09:12.918 928233 ERROR swiftclient [-] Object GET failed: http://10.137.66.223:8080/v1/AUTH_14df692169d14714a774ecf44e092b70/measure/328a2b89-fec3-4e7b-9df3-45f6ec9b67fb/b1eb8035-5581-43a1-9f38-be0d9b2584f8_20170728_08%3A33%3A09 404 Not Found  [first 60 chars of response] <html><h1>Not Found</h1><p>The resource could not be found.<
2017-07-28 12:09:12.918 928233 ERROR swiftclient Traceback (most recent call last):
2017-07-28 12:09:12.918 928233 ERROR swiftclient   File "/usr/lib/python2.7/site-packages/swiftclient/client.py", line 1647, in _retry
2017-07-28 12:09:12.918 928233 ERROR swiftclient     service_token=self.service_token, **kwargs)
2017-07-28 12:09:12.918 928233 ERROR swiftclient   File "/usr/lib/python2.7/site-packages/swiftclient/client.py", line 1139, in get_object
2017-07-28 12:09:12.918 928233 ERROR swiftclient     raise ClientException.from_response(resp, 'Object GET failed', body)
2017-07-28 12:09:12.918 928233 ERROR swiftclient ClientException: Object GET failed: http://10.137.66.223:8080/v1/AUTH_14df692169d14714a774ecf44e092b70/measure/328a2b89-fec3-4e7b-9df3-45f6ec9b67fb/b1eb8035-5581-43a1-9f38-be0d9b2584f8_20170728_08%3A33%3A09 404 Not Found  [first 60 chars of response] <html><h1>Not Found</h1><p>The resource could not be found.<
2017-07-28 12:09:12.918 928233 ERROR swiftclient 
2017-07-28 12:09:12.919 928233 ERROR gnocchi.storage._carbonara [-] Error processing new measures
2017-07-28 12:09:12.919 928233 ERROR gnocchi.storage._carbonara Traceback (most recent call last):
2017-07-28 12:09:12.919 928233 ERROR gnocchi.storage._carbonara   File "/usr/lib/python2.7/site-packages/gnocchi/storage/_carbonara.py", line 508, in process_new_measures
2017-07-28 12:09:12.919 928233 ERROR gnocchi.storage._carbonara     with self._process_measure_for_metric(metric) as measures:
2017-07-28 12:09:12.919 928233 ERROR gnocchi.storage._carbonara   File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
2017-07-28 12:09:12.919 928233 ERROR gnocchi.storage._carbonara     return self.gen.next()
2017-07-28 12:09:12.919 928233 ERROR gnocchi.storage._carbonara   File "/usr/lib/python2.7/site-packages/gnocchi/storage/swift.py", line 196, in _process_measure_for_metric
2017-07-28 12:09:12.919 928233 ERROR gnocchi.storage._carbonara     self.MEASURE_PREFIX, f['name'])
2017-07-28 12:09:12.919 928233 ERROR gnocchi.storage._carbonara   File "/usr/lib/python2.7/site-packages/swiftclient/client.py", line 1753, in get_object
2017-07-28 12:09:12.919 928233 ERROR gnocchi.storage._carbonara     headers=headers)
2017-07-28 12:09:12.919 928233 ERROR gnocchi.storage._carbonara   File "/usr/lib/python2.7/site-packages/swiftclient/client.py", line 1647, in _retry
2017-07-28 12:09:12.919 928233 ERROR gnocchi.storage._carbonara     service_token=self.service_token, **kwargs)
2017-07-28 12:09:12.919 928233 ERROR gnocchi.storage._carbonara   File "/usr/lib/python2.7/site-packages/swiftclient/client.py", line 1139, in get_object
2017-07-28 12:09:12.919 928233 ERROR gnocchi.storage._carbonara     raise ClientException.from_response(resp, 'Object GET failed', body)
2017-07-28 12:09:12.919 928233 ERROR gnocchi.storage._carbonara ClientException: Object GET failed: http://10.137.66.223:8080/v1/AUTH_14df692169d14714a774ecf44e092b70/measure/328a2b89-fec3-4e7b-9df3-45f6ec9b67fb/b1eb8035-5581-43a1-9f38-be0d9b2584f8_20170728_08%3A33%3A09 404 Not Found  [first 60 chars of response] <html><h1>Not Found</h1><p>The resource could not be found.<
2017-07-28 12:09:12.919 928233 ERROR gnocchi.storage._carbonara 
2017-07-28 12:09:19.388 928190 WARNING gnocchi.cli [-] Metric processing lagging scheduling rate. It is recommended to increase the number of workers or to lengthen processing interval.
(END)
~~~

5) we observed Service is unavailable 
~~~
2017-07-28 08:50:17.572 16796 CRITICAL gnocchi [-] ClientException: Authorization Failure. Authorization Failed: Service Unavailable (HTTP 503)
2017-07-28 08:50:17.572 16796 ERROR gnocchi Traceback (most recent call last):
2017-07-28 08:50:17.572 16796 ERROR gnocchi   File "/usr/bin/gnocchi-statsd", line 10, in <module>
2017-07-28 08:50:17.572 16796 ERROR gnocchi     sys.exit(statsd())
2017-07-28 08:50:17.572 16796 ERROR gnocchi   File "/usr/lib/python2.7/site-packages/gnocchi/cli.py", line 74, in statsd
2017-07-28 08:50:17.572 16796 ERROR gnocchi     statsd_service.start()
2017-07-28 08:50:17.572 16796 ERROR gnocchi   File "/usr/lib/python2.7/site-packages/gnocchi/statsd.py", line 174, in start
2017-07-28 08:50:17.572 16796 ERROR gnocchi     stats = Stats(conf)
2017-07-28 08:50:17.572 16796 ERROR gnocchi   File "/usr/lib/python2.7/site-packages/gnocchi/statsd.py", line 38, in __init__
2017-07-28 08:50:17.572 16796 ERROR gnocchi     self.storage = storage.get_driver(self.conf)
2017-07-28 08:50:17.572 16796 ERROR gnocchi   File "/usr/lib/python2.7/site-packages/gnocchi/storage/__init__.py", line 158, in get_driver
2017-07-28 08:50:17.572 16796 ERROR gnocchi     return get_driver_class(conf)(conf.storage)
2017-07-28 08:50:17.572 16796 ERROR gnocchi   File "/usr/lib/python2.7/site-packages/gnocchi/storage/swift.py", line 98, in __init__
2017-07-28 08:50:17.572 16796 ERROR gnocchi     self.swift.put_container(self.MEASURE_PREFIX)
2017-07-28 08:50:17.572 16796 ERROR gnocchi   File "/usr/lib/python2.7/site-packages/swiftclient/client.py", line 1728, in put_container
2017-07-28 08:50:17.572 16796 ERROR gnocchi     query_string=query_string)
2017-07-28 08:50:17.572 16796 ERROR gnocchi   File "/usr/lib/python2.7/site-packages/swiftclient/client.py", line 1635, in _retry
2017-07-28 08:50:17.572 16796 ERROR gnocchi     self.url, self.token = self.get_auth()
2017-07-28 08:50:17.572 16796 ERROR gnocchi   File "/usr/lib/python2.7/site-packages/swiftclient/client.py", line 1587, in get_auth
2017-07-28 08:50:17.572 16796 ERROR gnocchi     timeout=self.timeout)
2017-07-28 08:50:17.572 16796 ERROR gnocchi   File "/usr/lib/python2.7/site-packages/swiftclient/client.py", line 662, in get_auth
2017-07-28 08:50:17.572 16796 ERROR gnocchi     auth_version=auth_version)
2017-07-28 08:50:17.572 16796 ERROR gnocchi   File "/usr/lib/python2.7/site-packages/swiftclient/client.py", line 582, in get_auth_keystone
2017-07-28 08:50:17.572 16796 ERROR gnocchi     raise ClientException('Authorization Failure. %s' % err)
2017-07-28 08:50:17.572 16796 ERROR gnocchi ClientException: Authorization Failure. Authorization Failed: Service Unavailable (HTTP 503
~~


We performed below things // Note gnocchi packages are up-to-date. 

b) Update swift clinet packages as well.

14) modify the below parameters and restart the service , Please do this on all controllers. 

a) Disable the Ceilometer Swift Middleware completely in /etc/swift/proxy-server.conf as below: Remove the Ceilometer middleware from the [pipeline:main] section
~~~
[....]
[pipeline:main]
#pipeline = catch_errors healthcheck proxy-logging cache ratelimit bulk tempurl formpost authtoken keystone staticweb versioned_writes ceilometer proxy-logging proxy-server

pipeline = catch_errors healthcheck proxy-logging cache ratelimit bulk tempurl formpost authtoken keystone staticweb versioned_writes proxy-logging proxy-server
[....]
~~~

b) Comment out [filter:ceilometer] section

~~~
[filter:ceilometer]
#paste.filter_factory = ceilometermiddleware.swift:filter_factory

#url = rabbit://guest:cP6xVkMPRYbCfhQK9usDjPqA8@XXX.X.X.XX:5672,guest:cP6xVkMPRYbCfhQK9usDjPqA8@XXX.X.X.XX:5672,guest:cP6xVkMPRYbCfhQK9usDjPqA8@XXX.X.X.XX:5672//
~~~

c) Restart the services. 
~~~
systemctl restart openstack-ceilometer* && systemctl restart openstack-gnocchi* && systemctl restart openstack-swift* && systemctl restart openstack-httpd*
~~~


Version-Release number of selected component (if applicable):


How reproducible:

OSP-10 

[root@overcloud-controller-0 ceilometer]# rpm -qa|grep gnocchi
                openstack-gnocchi-indexer-sqlalchemy-3.0.11-1.el7ost.noarch
                openstack-gnocchi-metricd-3.0.11-1.el7ost.noarch
                puppet-gnocchi-9.4.1-1.el7ost.noarch
                python-gnocchi-3.0.11-1.el7ost.noarch
                openstack-gnocchi-statsd-3.0.11-1.el7ost.noarch
                openstack-gnocchi-common-3.0.11-1.el7ost.noarch
                python-gnocchi-tests-3.0.11-1.el7ost.noarch
                python-gnocchiclient-2.8.2-2.el7ost.noarch
                openstack-gnocchi-carbonara-3.0.11-1.el7ost.noarch
                openstack-gnocchi-api-3.0.11-1.el7ost.noarch
                [root@overcloud-controller-0 ceilometer]#



Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Comment 7 Mehdi ABAAKOUK 2017-08-03 04:40:20 EDT
The issue comes from this backtrace:

2017-07-28 12:05:27.329 134327 ERROR ceilometer.hardware.pollsters.generic     value = int(cache[self._CACHE_KEY_OID][_memory_total_oid]) - value
2017-07-28 12:05:27.329 134327 ERROR ceilometer.hardware.pollsters.generic   File "/usr/lib/python2.7/site-packages/pyasn1/type/univ.py", line 476, in __int__
2017-07-28 12:05:27.329 134327 ERROR ceilometer.hardware.pollsters.generic     def __int__(self): return int(self._value)
2017-07-28 12:05:27.329 134327 ERROR ceilometer.hardware.pollsters.generic ValueError: invalid literal for int() with base 10: ''

It shows an issue when a host doesn't provide the total memory via snmp.

I have proposed a fix upstream two days agao, it's currently under review.
Comment 10 Mehdi ABAAKOUK 2017-08-28 08:28:30 EDT
Change have been merged upstream, I will backport it downstream and build a package.
Comment 19 Sasha Smolyak 2017-09-25 04:31:26 EDT
The hotfix was supplied to the customer, waiting for customer to verify the fix.
Comment 21 errata-xmlrpc 2017-09-28 12:34:13 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2822

Note You need to log in before you can comment on or make changes to this bug.