Bug 660289
| Summary: | [RFE] Configurable queue threshold alerts | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Gordon Sim <gsim> | ||||
| Component: | qpid-cpp | Assignee: | Gordon Sim <gsim> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Frantisek Reznicek <freznice> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 1.3 | CC: | esammons, freznice, iboverma, jneedle, mhusnain, tross | ||||
| Target Milestone: | 2.0 | Keywords: | FutureFeature | ||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | qpid-cpp-mrg-0.9.1073306-1 | Doc Type: | Enhancement | ||||
| Doc Text: |
Cause:
Growing queue depth is one of the common issues that needs to be monitored for by those responsible for the smooth running of messaging based systems.
Consequence:
Previously there was no way to get warned of growing queues (either had to wait until a limit was hit and the application had to deal with the brokers refusal to accept any more, or needed to continually poll the depth).
Change:
The broker now allows QMF events to be generated when the queue depth reaches particular pre-configured limits.
Result:
Systems can now more easily incorporate early warnings of message build up in queues.
Release Note Entry:
Previously, messaging was unable to monitor the growing queue depth aside from polling constantly or waiting till the maximum level was reached to issue a warning. The broker now allows QMF events to be generated when the queue depth reaches a previously configured threshold, which provides an early warning for elongated messaging queues.
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2011-06-23 15:48:27 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 683599 | ||||||
| Bug Blocks: | |||||||
| Attachments: |
|
||||||
|
Description
Gordon Sim
2010-12-06 11:52:38 UTC
For an example/test, see http://svn.apache.org/viewvc/qpid/trunk/qpid/tests/src/py/qpid_tests/broker_0_10/threshold.py?view=markup There are two new queue arguments - qpid.alert_count and qpid.alert_size - which can be used to control the point at which an alert is generated. An alert is generated when a message is added to the queue and the size or count is at or above the configured alert limit. By default an alert will be generated when reaching 80% of any configured queue limit. To avoid excess traffic, once an alert has been raised a further event will not be reissued for a queue for a configurable period. This can be controlled via another queue argument - qpid.alert_repeat_gap - which specifies the minimum interval in seconds between events. The default value for this is one minute (60 seconds). (To comply with the argument naming already supported in the java broker the following aliases for these arguments are also recognised: x-qpid-maximum-message-count, x-qpid-maximum-message-size, x-qpid-minimum-alert-repeat-gap).
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
New Contents:
Cause:
Growing queue depth is one of the common issues that needs to be monitored for by those responsible for the smooth running of messaging based systems.
Consequence:
Previously there was no way to get warned of growing queues (either had to wait until a limit was hit and the application had to deal with the brokers refusal to accept any more, or needed to continually poll the depth).
Change:
The broker now allows QMF events to be generated when the queue depth reaches particular pre-configured limits.
Result:
Systems can now more easily incorporate early warnings of message build up in queues.
The feature was tested on RHEL 4.9 / 5.6 / 6.1snap4 i[36]86 / x86_64 on packages: python-qpid-0.10-1.el5.noarch python-qpid-qmf-0.10-5.el5.x86_64 qpid-cpp-*-0.10-3.el5.x86_64 qpid-dotnet-0.4.738274-2.el5.x86_64 qpid-java-*-0.10-2.el5.noarch qpid-qmf*0.10-5.el5.x86_64 qpid-tests-0.10-1.el5.noarch qpid-tools-0.10-3.el5.noarch rh-qpid-cpp-tests-0.10-3.el5.x86_64 ruby-qpid-qmf-0.10-5.el5.x86_64 sesame-0.10-1.el5.x86_64 There are still 4 open questions which need to be clarified: 1] Alert is triggered once when a condition is met If more than one alert condition is given, then first one which is met triggers the alert, but second one does not. Is this intentional? 2] An alert is triggered when the queue grows first time over the set threshold, no alert triggered for the next occurrences of such threshold limit cross. Is it intentional to not trigger the queue alert when the queue crosses the threshold for second / N-times? Example: Let's assume configured queue alert threshold for 2 messages in the queue 'AQ': - AQ message depth 0 - AQ message depth 1 - AQ message depth 2 -> an queue alert triggered - AQ message depth 1 - AQ message depth 2 -> no queue alert triggered, is this correct? 3] The feature specification (DOC-51693) mentions that broker log file should contain the message about triggered queue alert. I ca see some messages at trace+ log level, but this is related more to QMF traffic, no queue alert messages at debug+ level, so is this requirement still valid? Could you possibly review my observation points, please? 1] No distinction is made as to whether it was a count or a size limit that was exceeded. Once the limit is exceeded based on one of these values, the fact that it is then exceeded also on the other value is not a significant event in the current implementation. 2] The example given *should* trigger an alert on recrossing the threshold after dipping back below. Indeed this is the behaviour I see. Following your example: observe events with: drain -f --count 0 'qmf.default.topic/agent.ind.event.org_apache_qd_broker.queueThresholdExceeded.#' then: qpid-config add queue my-queue --argument 'qpid.alert_count=2' spout my-queue spout my-queue //event observed drain my-queue spout my-queue //further event observed Does this not match what you are seeing? 3] Unfortunately an obvious log message is not generated in the currently packaged code. That was an oversight on my part that has now been addressed upstream (https://issues.apache.org/jira/browse/QPID-3204). Hi Gordon, thanks very much for your reply. ad 1] I understand, I'm going to highlight that for doc bug 683599 ad 2] I might be doing something wrong, but I still can find case when it is not working as expected. 2a] your case mentioned in comment 5 with drain and spout clients is working well and showing that queue alerts are re-triggered. 2b] my unit test does (I believe) the same scenario and I cannot see re-triggered alert, see details below. ad 3] Agreed, logging should show the alerts. I'm going to create defect matching QPID-3204 If customers are still requesting the logging for this feature, then I propose to mark the new bug as dependency for this defect, otherwise w/o dependency. Would you agree? Created attachment 493148 [details]
Custom alert unit tests
The log transcript shows that if the queue threshold is crossed again then no qmf alert is triggered... (case 2b])
I'm suspicious that the difference between yout case 2a] and my 2b] is in the fact that 2b] case use single session and even single sender and receiver.
May I possibly ask you for double-check of my two units HLAPITests.test_qa__A and HLAPITests.test_qa__B, please?
[freznice@dhcp-26-251 qpid_ptest_unit_tests]$ PYTHONPATH=${PYTHONPATH}:custom_py_units qpid-python-test -m custom_py_units custom_py_units.queue.HLAPITests.test_qa__A
custom_py_units.queue.HLAPITests.test_qa__A .............................................................................................. start
-A
None
None
Message(properties={u'qmf.agent': u'apache.org:qpidd:a5dc8ec5-9faa-4843-8df0-60cef488649a', 'x-amqp-0-10.routing-key': u'agent.ind.event.org_apache_qpid_broker.queueThresholdExceeded.warn.apache_org.qpidd.a5dc8ec5-9faa-4843-8df0-60cef488649a', 'x-amqp-0-10.app-id': 'qmf2', u'qmf.content': u'_event', u'qmf.opcode': u'_data_indication', u'method': u'indication'}, content=[{u'_schema_id': {u'_package_name': 'org.apache.qpid.broker', u'_class_name': 'queueThresholdExceeded', u'_type': '_event', u'_hash': UUID('9d1754c2-79bc-3767-df74-92d7bd1caa0e')}, u'_timestamp': 1303208400120574878, u'_values': {u'byteDepth': 9, u'msgDepth': 3, u'qName': 'AQA'}, u'_severity': 4}])
None
None
-B
Message(properties={'x-amqp-0-10.routing-key': u'AQA'}, content='Hi1')
Message(properties={'x-amqp-0-10.routing-key': u'AQA'}, content='Hi2')
Message(properties={'x-amqp-0-10.routing-key': u'AQA'}, content='Hi3')
Message(properties={'x-amqp-0-10.routing-key': u'AQA'}, content='Hi4')
Message(properties={'x-amqp-0-10.routing-key': u'AQA'}, content='Hi5')
-C
None
None
None
None
None
-D
Message(properties={'x-amqp-0-10.routing-key': u'AQA'}, content='Hi1')
Message(properties={'x-amqp-0-10.routing-key': u'AQA'}, content='Hi2')
Message(properties={'x-amqp-0-10.routing-key': u'AQA'}, content='Hi3')
Message(properties={'x-amqp-0-10.routing-key': u'AQA'}, content='Hi4')
Message(properties={'x-amqp-0-10.routing-key': u'AQA'}, content='Hi5')
[freznice@dhcp-26-251 qpid_ptest_unit_tests]$ PYTHONPATH=${PYTHONPATH}:custom_py_units qpid-python-test -m custom_py_units custom_py_units.queue.HLAPITests.test_qa__B
custom_py_units.queue.HLAPITests.test_qa__B .............................................................................................. start
-A
None
None
Message(properties={u'qmf.agent': u'apache.org:qpidd:a5dc8ec5-9faa-4843-8df0-60cef488649a', 'x-amqp-0-10.routing-key': u'agent.ind.event.org_apache_qpid_broker.queueThresholdExceeded.warn.apache_org.qpidd.a5dc8ec5-9faa-4843-8df0-60cef488649a', 'x-amqp-0-10.app-id': 'qmf2', u'qmf.content': u'_event', u'qmf.opcode': u'_data_indication', u'method': u'indication'}, content=[{u'_schema_id': {u'_package_name': 'org.apache.qpid.broker', u'_class_name': 'queueThresholdExceeded', u'_type': '_event', u'_hash': UUID('9d1754c2-79bc-3767-df74-92d7bd1caa0e')}, u'_timestamp': 1303208429720923060, u'_values': {u'byteDepth': 9, u'msgDepth': 3, u'qName': 'AQB'}, u'_severity': 4}])
None
None
-B
Message(properties={'x-amqp-0-10.routing-key': u'AQB'}, content='Hi1')
Message(properties={'x-amqp-0-10.routing-key': u'AQB'}, content='Hi2')
Message(properties={'x-amqp-0-10.routing-key': u'AQB'}, content='Hi3')
Message(properties={'x-amqp-0-10.routing-key': u'AQB'}, content='Hi4')
-C
None
None
None
None
-D
Message(properties={'x-amqp-0-10.routing-key': u'AQB'}, content='Hi5')
Message(properties={'x-amqp-0-10.routing-key': u'AQB'}, content='Hi2')
Message(properties={'x-amqp-0-10.routing-key': u'AQB'}, content='Hi3')
Message(properties={'x-amqp-0-10.routing-key': u'AQB'}, content='Hi4')
Message(properties={'x-amqp-0-10.routing-key': u'AQB'}, content='Hi5')
custom_py_units.queue.HLAPITests.test_qa__B .............................................................................................. pass
The above point 3], from comment 5 is now tracked separately as bug 697816. Regarding the unit test failures, I *think* your test may not be acknowledging messages(?), thus the queue depth is not decreased as you expect. Certainly I can't see where the acknowledgement is occurring. Regarding the lack of logging, I myself would say that this need not depend on that. DOC-51693 was a very loose statement of requirements. While having the log entry is important (and should and will be added), this BZ does not in my view mandate it. Thanks again, you're indeed again right with missing message acks. With acknowledges it works as expected. I believe I'll be able to finish this defect in a day or two. Keeping bug 697816 separate from this defect. This is the testing status: - raising single queue event works just fine - message count threshold - message[s] byte size threshold - message count & byte size thresholds together - raising multiple queue event by multiple crossing the threshold work fine - message count threshold - message[s] byte size threshold - message count & byte size thresholds together - raising default alert for queue linked to flow control stop works fine There is one more feature mentioned by documentation (bug 683599): - alert repetition specified by qpid.alert_repeat_gap / x-qpid-minimum-alert-repeat-gap (which should be by default 60 secs) The alert repetition seems to not work as expected: I tested by both approaches (unit test in python and by c++ spout / drain): <term-A> .../drain -f --count 0 'qmf.default.topic/agent.ind.event.org_apache_qpid_broker.queueThresholdExceeded.#' ->one alert seen <term-B> # (*) qpid-config add queue my-queue --argument 'qpid.alert_count=2' .../spout my-queue .../spout my-queue # first alert triggered in term-A Even with qpid-config my-queue --argument 'qpid.alert_count=2' --argument 'qpid.alert_repeat_gap=5' I can see just one alert triggered. The documentation mentiones that qpid.alert_repeat_gap / x-qpid-minimum-alert-repeat-gap functionality, so it should work, which is in contrary in my observations. Could you possibly confirm that my scenario is correct, please? The alert repeat gap is the time for which no repeat alert will be issued. Before that time expires extra messages will not trigger the alert; after that time expires a further enqueue will trigger a further alert (followed again by a period where no alerts are generated). An alert is not automatically issued when that time expires however; alerts are always only triggered by an enqueue. The alert repeat gap merely prevents large numbers of events being triggered when lots of enqueues occur when the limit is exceeded. The feature is implemented and works as expected, tested on RHEL 4.9 / 5.6 / 6.1snap4 i[36]86 / x86_64 on packages: python-qpid-0.10-1.el5.noarch python-qpid-qmf-0.10-5.el5.x86_64 qpid-cpp-*-0.10-3.el5.x86_64 qpid-java-*-0.10-2.el5.noarch qpid-qmf*0.10-5.el5.x86_64 qpid-tests-0.10-1.el5.noarch qpid-tools-0.10-3.el5.noarch rh-qpid-cpp-tests-0.10-3.el5.x86_64 ruby-qpid-qmf-0.10-5.el5.x86_64 sesame-0.10-1.el5.x86_64 The testing is complete, waiting for cross-linked documentation bug. The feature is functional (comment 13) and properly documented (bug 683599). -> VERIFIED
Technical note updated. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
Diffed Contents:
@@ -8,4 +8,8 @@
The broker now allows QMF events to be generated when the queue depth reaches particular pre-configured limits.
Result:
-Systems can now more easily incorporate early warnings of message build up in queues.+Systems can now more easily incorporate early warnings of message build up in queues.
+
+Release Note Entry:
+
+Previously, messaging was unable to monitor the growing queue depth aside from polling constantly or waiting till the maximum level was reached to issue a warning. The broker now allows QMF events to be generated when the queue depth reaches a previously configured threshold, which provides an early warning for elongated messaging queues.
Technical note can be viewed in the release notes for 2.0 at the documentation stage here: http://documentation-stage.bne.redhat.com/docs/en-US/Red_Hat_Enterprise_MRG/2.0/html-single/MRG_Release_Notes/index.html#tabl-MRG_Release_Notes-RHM_Update_Notes-RHM_Update_Notes An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2011-0890.html |