Description of problem: heat_tempest_plugin.tests.scenario.test_aodh_alarm.AodhAlarmTest.test_alarm[id-fc0f18a6-f65c-4df1-b9c5-e160dea59849] fails to receive expected alarm http://staging-jenkins2-qe-playground.usersys.redhat.com/view/DFG/view/cloud_apps/view/heat/job/DFG-cloud_apps-heat-15_director-rhel-virthost-3cont_1comp_3ceph-ipv4-geneve-poc/12/testReport/heat_tempest_plugin.tests.scenario.test_aodh_alarm/AodhAlarmTest/test_alarm_id_fc0f18a6_f65c_4df1_b9c5_e160dea59849_/ WARNING [heat_tempest_plugin.tests.scenario.test_aodh_alarm] check_instance_count exp:2, act:1 WARNING [heat_tempest_plugin.tests.scenario.test_aodh_alarm] check_instance_count exp:2, act:1 WARNING [heat_tempest_plugin.tests.scenario.test_aodh_alarm] check_instance_count exp:2, act:1 }}} Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/heat_tempest_plugin/tests/scenario/test_aodh_alarm.py", line 64, in test_alarm 120, 2, self.check_instance_count, stack_identifier, 2)) File "/usr/lib/python3.6/site-packages/unittest2/case.py", line 705, in assertTrue raise self.failureException(msg) AssertionError: False is not true Version-Release number of selected component (if applicable): How reproducible: Rune regression tempest job on heat Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
This seems python3 issue for aodh (member list has b' as they are converted to bytes in python3) , as you can see in the traceback. Alarm evaluators can't join the group, hence alarms are not evaluated. I don't know whether we still support aodh in OSP15. Anyway, this should go to DFG:MetMon : 3a2912f6-031f-45f2-9bc4-0ef0e1b49845 extract_my_subset /usr/lib/python3.6/site-packages/aodh/coordination.py:227 2019-05-30 07:41:43.998 20 WARNING aodh.coordination [-] Cannot extract tasks because agent failed to join group properly. Rejoining group. 2019-05-30 07:41:44.004 20 ERROR aodh.evaluator [-] alarm evaluation cycle failed: aodh.coordination.MemberNotInGroupError: Group ID: alarm_evaluator, Members: {b'3a2912f6-031f-45f2-9bc4-0ef0e1b49845', b'd58cfc59-daaf-4b7d-8476-19f7e18ddb77', b'0d5e8674-1c2e-4884-910f-887dd62fc53c'}, Me: 3a2912f6-031f-45f2-9bc4-0ef0e1b49845: Current agent is not part of group and cannot take tasks 2019-05-30 07:41:44.004 20 ERROR aodh.evaluator Traceback (most recent call last): 2019-05-30 07:41:44.004 20 ERROR aodh.evaluator File "/usr/lib/python3.6/site-packages/aodh/evaluator/__init__.py", line 249, in _evaluate_assigned_alarms 2019-05-30 07:41:44.004 20 ERROR aodh.evaluator alarms = self._assigned_alarms() 2019-05-30 07:41:44.004 20 ERROR aodh.evaluator File "/usr/lib/python3.6/site-packages/aodh/evaluator/__init__.py", line 278, in _assigned_alarms 2019-05-30 07:41:44.004 20 ERROR aodh.evaluator self.PARTITIONING_GROUP_NAME, all_alarm_ids) 2019-05-30 07:41:44.004 20 ERROR aodh.evaluator File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 292, in wrapped_f 2019-05-30 07:41:44.004 20 ERROR aodh.evaluator return self.call(f, *args, **kw) 2019-05-30 07:41:44.004 20 ERROR aodh.evaluator File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 358, in call 2019-05-30 07:41:44.004 20 ERROR aodh.evaluator do = self.iter(retry_state=retry_state) 2019-05-30 07:41:44.004 20 ERROR aodh.evaluator File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 331, in iter 2019-05-30 07:41:44.004 20 ERROR aodh.evaluator raise retry_exc.reraise() 2019-05-30 07:41:44.004 20 ERROR aodh.evaluator File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 167, in reraise 2019-05-30 07:41:44.004 20 ERROR aodh.evaluator raise self.last_attempt.result() 2019-05-30 07:41:44.004 20 ERROR aodh.evaluator File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 425, in result 2019-05-30 07:41:44.004 20 ERROR aodh.evaluator return self.__get_result() 2019-05-30 07:41:44.004 20 ERROR aodh.evaluator File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 384, in __get_result 2019-05-30 07:41:44.004 20 ERROR aodh.evaluator raise self._exception 2019-05-30 07:41:44.004 20 ERROR aodh.evaluator File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 361, in call 2019-05-30 07:41:44.004 20 ERROR aodh.evaluator result = fn(*args, **kwargs) 2019-05-30 07:41:44.004 20 ERROR aodh.evaluator File "/usr/lib/python3.6/site-packages/aodh/coordination.py", line 234, in extract_my_subset 2019-05-30 07:41:44.004 20 ERROR aodh.evaluator raise MemberNotInGroupError(group_id, members, self._my_id) 2019-05-30 07:41:44.004 20 ERROR aodh.evaluator aodh.coordination.MemberNotInGroupError: Group ID: alarm_evaluator, Members: {b'3a2912f6-031f-45f2-9bc4-0ef0e1b49845', b'd58cfc59-daaf-4b7d-8476-19f7e18ddb77', b'0d5e8674-1c2e-4884-910f-887dd62fc53c'}, Me: 3a2912f6-031f-45f2-9bc4-0ef0e1b49845: Current agent is not part of group and cannot take tasks I did fix this issue in your setup by encoding the uuids. But then it failed with another error. 2019-05-30 08:38:37.181 19 DEBUG aodh.coordination [-] My subset: ['0fa4768d-6793-4bb7-9410-db47f807ebdf'] extract_my_subset /usr/lib/python3.6/site-packages/aodh/coordination.py:240 2019-05-30 08:38:37.181 19 INFO aodh.evaluator [-] initiating evaluation cycle on 1 alarms 2019-05-30 08:38:37.181 19 DEBUG aodh.evaluator [-] evaluating alarm 0fa4768d-6793-4bb7-9410-db47f807ebdf _evaluate_alarm /usr/lib/python3.6/site-packages/aodh/evaluator/__init__.py:263 2019-05-30 08:38:37.182 19 DEBUG aodh.evaluator.threshold [-] query stats from 2019-05-30 08:36:37.182117 to 2019-05-30 08:38:37.182117 _bound_duration /usr/lib/python3.6/site-packages/aodh/evaluator/threshold.py:71 2019-05-30 08:38:37.726 19 WARNING aodh.evaluator.gnocchi [-] alarm statistics retrieval failed: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>500 Internal Server Error</title> </head><body> <h1>Internal Server Error</h1> <p>The server encountered an internal error or misconfiguration and was unable to complete your request.</p> <p>Please contact the server administrator at [no address given] to inform them of the time this error occurred, and the actions you performed just before this error.</p> <p>More information about this error may be available in the server error log.</p> </body></html> (HTTP 500): gnocchiclient.exceptions.ClientException: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> 2019-05-30 08:38:37.727 19 DEBUG aodh.queue [-] alarm 0fa4768d-6793-4bb7-9410-db47f807ebdf has no action configured for state transition from insufficient data to state insufficient data, skipping the notification. notify /usr/lib/python3.6/site-packages/aodh/queue.py:48 Which tells me that there is some config issue as it can't do a alarm statistics retrieval.
*** This bug has been marked as a duplicate of bug 1716350 ***