Bug 1484078

Summary: Panko: mysql bigint support
Product: Red Hat OpenStack Reporter: AlbertoG <agarciam>
Component: openstack-pankoAssignee: Julien Danjou <jdanjou>
Status: CLOSED ERRATA QA Contact: Sasha Smolyak <ssmolyak>
Severity: high Docs Contact:
Priority: high    
Version: 11.0 (Ocata)CC: jdanjou, mschuppe, pkilambi, rcernin, ssmolyak
Target Milestone: rcKeywords: Triaged
Target Release: 12.0 (Pike)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-panko-3.0.0-2.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1495270 (view as bug list) Environment:
Last Closed: 2017-12-13 21:53:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1495270    

Description AlbertoG 2017-08-22 15:48:25 UTC
Description of problem:

Value column of panko trait_int table is int, which is not big enough to store network usage data. Therefore out of range errors like the one pasted below are raised constantly:

ERROR ceilometer.event.storage.impl_sqlalchemy [-] Failed to record event: (_mysql_exceptions.DataError) (1264, "Out of range value for column 'value' at row 1") [SQL: u'INSERT INTO trait_int (event_id, `key`, value) VALUES (%s, %s, %s)'] [parameters: (256869L, 'bytes_sent', 2644705280)]

Version-Release number of selected component (if applicable):


How reproducible:
With ceilometer events enabled and panko configured as event backend, create a virtual machine which sends traffic for a considerable amount of time.

Steps to Reproduce:
1.
2.
3.

Actual results:
Out of range errors in event collector

Expected results:
Event stored correctly in panko database

Additional info:

Comment 5 Martin Schuppert 2017-09-04 06:27:29 UTC
When we have the situation, messages got stuck in the event.sample queue:

# rabbitmqctl list_queues name consumers messages | grep event
event.sample	3	43492

Strangely, just purging the queue left 3 messages on it (repeated the purge 2-3 times).  Only after stopping the ceilometer-collector services on all controllers did result in being able to purge the remaining messages. It seems those messages are the messages result in the bigint error, like:

2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy [-] Failed to record event: (pymysql.err.DataError) (1264, u"Out of range value for column 'value' at row 1") [SQL: u'INSERT INTO trait_int (event_id, `key`, value) VALUES (%(event_id)s, %(key)s, %(value)s)'] [parameters: {'event_id': 68727326, 'value': 3280076800, 'key': u'bytes_sent'}]
2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy Traceback (most recent call last):
2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy   File "/usr/lib/python2.7/site-packages/panko/storage/impl_sqlalchemy.py", line 197, in record_events
2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy     trait_map[dtype])
2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 1046, in execute
2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy     bind, close_with_result=True).execute(clause, params or {})
2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy   File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 914, in execute
2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy     return meth(self, multiparams, params)
2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy   File "/usr/lib64/python2.7/site-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection
2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy     return connection._execute_clauseelement(self, multiparams, params)
2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy   File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement
2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy     compiled_sql, distilled_params
2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy   File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1146, in _execute_context
2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy     context)
2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy   File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1337, in _handle_dbapi_exception
2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy     util.raise_from_cause(newraise, exc_info)
2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy   File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 200, in raise_from_cause
2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy     reraise(type(exception), exception, tb=exc_tb)
2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy   File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy     context)
2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy   File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute
2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy     cursor.execute(statement, parameters)
2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy   File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 166, in execute
2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy     result = self._query(query)
2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy   File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 322, in _query
2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy     conn.query(q)
2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy   File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 841, in query
2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy     self._affected_rows = self._read_query_result(unbuffered=unbuffered)
2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy   File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1029, in _read_query_result
2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy     result.read()
2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy   File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1312, in read
2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy     first_packet = self.connection._read_packet()
2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy   File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 991, in _read_packet
2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy     packet.check_error()
2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy   File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 393, in check_error
2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy     err.raise_mysql_exception(self._data)
2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy   File "/usr/lib/python2.7/site-packages/pymysql/err.py", line 107, in raise_mysql_exception
2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy     raise errorclass(errno, errval)
2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy DBDataError: (pymysql.err.DataError) (1264, u"Out of range value for column 'value' at row 1") [SQL: u'INSERT INTO trait_int (event_id, `key`, value) VALUES (%(event_id)s, %(key)s, %(value)s)'] [parameters: {'event_id': 68727326, 'value': 3280076800, 'key': u'bytes_sent'}]
2017-09-01 12:16:13.891 58769 ERROR panko.storage.impl_sqlalchemy
2017-09-01 12:16:13.894 58769 ERROR ceilometer.collector [-] Dispatcher failed to handle the notification, re-queuing it.
2017-09-01 12:16:13.894 58769 ERROR ceilometer.collector Traceback (most recent call last):
2017-09-01 12:16:13.894 58769 ERROR ceilometer.collector   File "/usr/lib/python2.7/site-packages/ceilometer/collector.py", line 177, in sample
2017-09-01 12:16:13.894 58769 ERROR ceilometer.collector     self.dispatcher_manager.map_method(self.method, goods)
2017-09-01 12:16:13.894 58769 ERROR ceilometer.collector   File "/usr/lib/python2.7/site-packages/stevedore/extension.py", line 282, in map_method
2017-09-01 12:16:13.894 58769 ERROR ceilometer.collector     method_name, *args, **kwds)
2017-09-01 12:16:13.894 58769 ERROR ceilometer.collector   File "/usr/lib/python2.7/site-packages/stevedore/extension.py", line 255, in map
2017-09-01 12:16:13.894 58769 ERROR ceilometer.collector     self._invoke_one_plugin(response.append, func, e, args, kwds)
2017-09-01 12:16:13.894 58769 ERROR ceilometer.collector   File "/usr/lib/python2.7/site-packages/stevedore/extension.py", line 286, in _invoke_one_plugin
2017-09-01 12:16:13.894 58769 ERROR ceilometer.collector     response_callback(func(e, *args, **kwds))
2017-09-01 12:16:13.894 58769 ERROR ceilometer.collector   File "/usr/lib/python2.7/site-packages/stevedore/extension.py", line 260, in _call_extension_method
2017-09-01 12:16:13.894 58769 ERROR ceilometer.collector     return getattr(extension.obj, method_name)(*args, **kwds)
2017-09-01 12:16:13.894 58769 ERROR ceilometer.collector   File "/usr/lib/python2.7/site-packages/panko/dispatcher/database.py", line 68, in record_events
2017-09-01 12:16:13.894 58769 ERROR ceilometer.collector     self.conn.record_events(event_list)
2017-09-01 12:16:13.894 58769 ERROR ceilometer.collector   File "/usr/lib/python2.7/site-packages/panko/storage/impl_sqlalchemy.py", line 206, in record_events
2017-09-01 12:16:13.894 58769 ERROR ceilometer.collector     raise error
2017-09-01 12:16:13.894 58769 ERROR ceilometer.collector DBDataError: (pymysql.err.DataError) (1264, u"Out of range value for column 'value' at row 1") [SQL: u'INSERT INTO trait_int (event_id, `key`, value) VALUES (%(event_id)s, %(key)s, %(value)s)'] [parameters: {'event_id': 68727326, 'value': 3280076800, 'key': u'bytes_sent'}]
2017-09-01 12:16:13.894 58769 ERROR ceilometer.collector

This time we see more than just the same bigint in the logs, e.g.:

[root@gotpospctrl1 ceilometer]# grep "Out of range" collector.log | sed "s/.*'value': \([0-9]*\).*/\1/" | sort -u
11045634048
2207252480
2361524224
2422603776
3280076800

Trying to get the message from the queue manually to check them failed with 'No items' output.  We had to again stop all openstack-ceilometer-collector services (on all controllers) for the queue to be "released", then we see e.g. the can get the message:

+--------------+------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+------------------+-------------+
| routing_key  |  exchange  | message_count |                                                                                                                                                                                                                                                                                                                                                                                                                                                                      payload                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | payload_bytes | payload_encoding | redelivered |
+--------------+------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+------------------+-------------+
| event.sample | ceilometer | 15            | {"oslo.message": "{\"priority\": \"SAMPLE\", \"_unique_id\": \"3788b4c22d594004a0b523ac53df4ec6\", \"event_type\": \"event\", \"timestamp\": \"2017-08-30 13:31:32.701755\", \"publisher_id\": \"telemetry.publisher.controller3.localdomain\", \"payload\": [{\"event_type\": \"image.send\", \"traits\": [[\"user_id\", 1, \"a36cfa19dcc9473a9988bb0e9c243b5a\"], [\"service\", 1, \"image.localhost\"], [\"bytes_sent\", 2, 3280076800], [\"receiver_project\", 1, \"92b71e9e47de4881a2ba517ebecb7f3d\"], [\"receiver_user\", 1, \"6335a8bf979043e08a88061f881af92a\"], [\"image_id\", 1, \"e6732b61-f5ab-4eb3-8e9c-d0a3442ee909\"]], \"message_signature\": \"a37939219ae78c7c3b62a2ed5e20da003e7f66489ea43ed5c113531fef7711f3\", \"raw\": {}, \"generated\": \"2017-08-30T13:31:30.835734\", \"message_id\": \"ec472fcf-7cf8-4bcc-919b-2c1f19a1469b\"}], \"message_id\": \"2428730e-18b1-4fe8-857c-bd6f0d773f17\"}", "oslo.version": "2.0"} | 913           | string           | True        |
+--------------+------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+------------------+-------------+

So to cleanup rabbit from this issue we need to stop openstack-ceilometer-collector and then we can purge the messages using a bigint value.

Comment 22 errata-xmlrpc 2017-12-13 21:53:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462