Bug 1715114
| Summary: | tests.scenario.test_aodh_alarm.AodhAlarmTest.test_alarm fails to receive expected alarm | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Victor Voronkov <vvoronko> |
| Component: | openstack-aodh | Assignee: | Vinay Kapalavai <vkapalav> |
| Status: | CLOSED DUPLICATE | QA Contact: | Nataf Sharabi <nsharabi> |
| Severity: | high | Docs Contact: | |
| Priority: | medium | ||
| Version: | 15.0 (Stein) | CC: | apevec, jschluet, lhh, mburns, mcornea, mmagr, ramishra, sbaker, shardy |
| Target Milestone: | ga | Keywords: | Triaged |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-06-17 14:37:45 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Victor Voronkov
2019-05-29 15:29:57 UTC
This seems python3 issue for aodh (member list has b' as they are converted to bytes in python3) , as you can see in the traceback. Alarm evaluators can't join the group, hence alarms are not evaluated.
I don't know whether we still support aodh in OSP15. Anyway, this should go to DFG:MetMon
: 3a2912f6-031f-45f2-9bc4-0ef0e1b49845 extract_my_subset /usr/lib/python3.6/site-packages/aodh/coordination.py:227
2019-05-30 07:41:43.998 20 WARNING aodh.coordination [-] Cannot extract tasks because agent failed to join group properly. Rejoining group.
2019-05-30 07:41:44.004 20 ERROR aodh.evaluator [-] alarm evaluation cycle failed: aodh.coordination.MemberNotInGroupError: Group ID: alarm_evaluator, Members: {b'3a2912f6-031f-45f2-9bc4-0ef0e1b49845', b'd58cfc59-daaf-4b7d-8476-19f7e18ddb77', b'0d5e8674-1c2e-4884-910f-887dd62fc53c'}, Me: 3a2912f6-031f-45f2-9bc4-0ef0e1b49845: Current agent is not part of group and cannot take tasks
2019-05-30 07:41:44.004 20 ERROR aodh.evaluator Traceback (most recent call last):
2019-05-30 07:41:44.004 20 ERROR aodh.evaluator File "/usr/lib/python3.6/site-packages/aodh/evaluator/__init__.py", line 249, in _evaluate_assigned_alarms
2019-05-30 07:41:44.004 20 ERROR aodh.evaluator alarms = self._assigned_alarms()
2019-05-30 07:41:44.004 20 ERROR aodh.evaluator File "/usr/lib/python3.6/site-packages/aodh/evaluator/__init__.py", line 278, in _assigned_alarms
2019-05-30 07:41:44.004 20 ERROR aodh.evaluator self.PARTITIONING_GROUP_NAME, all_alarm_ids)
2019-05-30 07:41:44.004 20 ERROR aodh.evaluator File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 292, in wrapped_f
2019-05-30 07:41:44.004 20 ERROR aodh.evaluator return self.call(f, *args, **kw)
2019-05-30 07:41:44.004 20 ERROR aodh.evaluator File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 358, in call
2019-05-30 07:41:44.004 20 ERROR aodh.evaluator do = self.iter(retry_state=retry_state)
2019-05-30 07:41:44.004 20 ERROR aodh.evaluator File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 331, in iter
2019-05-30 07:41:44.004 20 ERROR aodh.evaluator raise retry_exc.reraise()
2019-05-30 07:41:44.004 20 ERROR aodh.evaluator File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 167, in reraise
2019-05-30 07:41:44.004 20 ERROR aodh.evaluator raise self.last_attempt.result()
2019-05-30 07:41:44.004 20 ERROR aodh.evaluator File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 425, in result
2019-05-30 07:41:44.004 20 ERROR aodh.evaluator return self.__get_result()
2019-05-30 07:41:44.004 20 ERROR aodh.evaluator File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 384, in __get_result
2019-05-30 07:41:44.004 20 ERROR aodh.evaluator raise self._exception
2019-05-30 07:41:44.004 20 ERROR aodh.evaluator File "/usr/lib/python3.6/site-packages/tenacity/__init__.py", line 361, in call
2019-05-30 07:41:44.004 20 ERROR aodh.evaluator result = fn(*args, **kwargs)
2019-05-30 07:41:44.004 20 ERROR aodh.evaluator File "/usr/lib/python3.6/site-packages/aodh/coordination.py", line 234, in extract_my_subset
2019-05-30 07:41:44.004 20 ERROR aodh.evaluator raise MemberNotInGroupError(group_id, members, self._my_id)
2019-05-30 07:41:44.004 20 ERROR aodh.evaluator aodh.coordination.MemberNotInGroupError: Group ID: alarm_evaluator, Members: {b'3a2912f6-031f-45f2-9bc4-0ef0e1b49845', b'd58cfc59-daaf-4b7d-8476-19f7e18ddb77', b'0d5e8674-1c2e-4884-910f-887dd62fc53c'}, Me: 3a2912f6-031f-45f2-9bc4-0ef0e1b49845: Current agent is not part of group and cannot take tasks
I did fix this issue in your setup by encoding the uuids. But then it failed with another error.
2019-05-30 08:38:37.181 19 DEBUG aodh.coordination [-] My subset: ['0fa4768d-6793-4bb7-9410-db47f807ebdf'] extract_my_subset /usr/lib/python3.6/site-packages/aodh/coordination.py:240
2019-05-30 08:38:37.181 19 INFO aodh.evaluator [-] initiating evaluation cycle on 1 alarms
2019-05-30 08:38:37.181 19 DEBUG aodh.evaluator [-] evaluating alarm 0fa4768d-6793-4bb7-9410-db47f807ebdf _evaluate_alarm /usr/lib/python3.6/site-packages/aodh/evaluator/__init__.py:263
2019-05-30 08:38:37.182 19 DEBUG aodh.evaluator.threshold [-] query stats from 2019-05-30 08:36:37.182117 to 2019-05-30 08:38:37.182117 _bound_duration /usr/lib/python3.6/site-packages/aodh/evaluator/threshold.py:71
2019-05-30 08:38:37.726 19 WARNING aodh.evaluator.gnocchi [-] alarm statistics retrieval failed: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>500 Internal Server Error</title>
</head><body>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error or
misconfiguration and was unable to complete
your request.</p>
<p>Please contact the server administrator at
[no address given] to inform them of the time this error occurred,
and the actions you performed just before this error.</p>
<p>More information about this error may be available
in the server error log.</p>
</body></html>
(HTTP 500): gnocchiclient.exceptions.ClientException: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
2019-05-30 08:38:37.727 19 DEBUG aodh.queue [-] alarm 0fa4768d-6793-4bb7-9410-db47f807ebdf has no action configured for state transition from insufficient data to state insufficient data, skipping the notification. notify /usr/lib/python3.6/site-packages/aodh/queue.py:48
Which tells me that there is some config issue as it can't do a alarm statistics retrieval.
*** This bug has been marked as a duplicate of bug 1716350 *** |