Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 660289

Summary: [RFE] Configurable queue threshold alerts
Product: Red Hat Enterprise MRG Reporter: Gordon Sim <gsim>
Component: qpid-cppAssignee: Gordon Sim <gsim>
Status: CLOSED ERRATA QA Contact: Frantisek Reznicek <freznice>
Severity: medium Docs Contact:
Priority: medium    
Version: 1.3CC: esammons, freznice, iboverma, jneedle, mhusnain, tross
Target Milestone: 2.0Keywords: FutureFeature
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qpid-cpp-mrg-0.9.1073306-1 Doc Type: Enhancement
Doc Text:
Cause: Growing queue depth is one of the common issues that needs to be monitored for by those responsible for the smooth running of messaging based systems. Consequence: Previously there was no way to get warned of growing queues (either had to wait until a limit was hit and the application had to deal with the brokers refusal to accept any more, or needed to continually poll the depth). Change: The broker now allows QMF events to be generated when the queue depth reaches particular pre-configured limits. Result: Systems can now more easily incorporate early warnings of message build up in queues. Release Note Entry: Previously, messaging was unable to monitor the growing queue depth aside from polling constantly or waiting till the maximum level was reached to issue a warning. The broker now allows QMF events to be generated when the queue depth reaches a previously configured threshold, which provides an early warning for elongated messaging queues.
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-06-23 15:48:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 683599    
Bug Blocks:    
Attachments:
Description Flags
Custom alert unit tests none

Description Gordon Sim 2010-12-06 11:52:38 UTC
Allow a configurable threshold to be set, such that a QMF alert is generated when that threshold is exceeded.

Comment 2 Gordon Sim 2011-02-22 12:02:48 UTC
For an example/test, see http://svn.apache.org/viewvc/qpid/trunk/qpid/tests/src/py/qpid_tests/broker_0_10/threshold.py?view=markup

There are two new queue arguments - qpid.alert_count and qpid.alert_size - which can be used to control the point at which an alert is generated. An alert is generated when a message is added to the queue and the size or count is at or above the configured alert limit. By default an alert will be generated when reaching 80% of any configured queue limit.

To avoid excess traffic, once an alert has been raised a further event will not be reissued for a queue for a configurable period. This can be controlled via another queue argument - qpid.alert_repeat_gap - which specifies the minimum interval in seconds between events. The default value for this is one minute (60 seconds).

(To comply with the argument naming already supported in the java broker the following aliases for these arguments are also recognised: x-qpid-maximum-message-count, x-qpid-maximum-message-size, x-qpid-minimum-alert-repeat-gap).

Comment 3 Gordon Sim 2011-03-07 13:31:41 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause:
Growing queue depth is one of the common issues that needs to be monitored for by those responsible for the smooth running of messaging based systems.

Consequence:
Previously there was no way to get warned of growing queues (either had to wait until a limit was hit and the application had to deal with the brokers refusal to accept any more, or needed to continually poll the depth).

Change:
The broker now allows QMF events to be generated when the queue depth reaches particular pre-configured limits.

Result:
Systems can now more easily incorporate early warnings of message build up in queues.

Comment 4 Frantisek Reznicek 2011-04-19 08:43:54 UTC
The feature was tested on RHEL 4.9 / 5.6 / 6.1snap4 i[36]86 / x86_64 on packages:
python-qpid-0.10-1.el5.noarch
python-qpid-qmf-0.10-5.el5.x86_64
qpid-cpp-*-0.10-3.el5.x86_64
qpid-dotnet-0.4.738274-2.el5.x86_64
qpid-java-*-0.10-2.el5.noarch
qpid-qmf*0.10-5.el5.x86_64
qpid-tests-0.10-1.el5.noarch
qpid-tools-0.10-3.el5.noarch
rh-qpid-cpp-tests-0.10-3.el5.x86_64
ruby-qpid-qmf-0.10-5.el5.x86_64
sesame-0.10-1.el5.x86_64

There are still 4 open questions which need to be clarified:
1] Alert is triggered once when a condition is met

  If more than one alert condition is given, then first one which is met triggers the alert, but second one does not. Is this intentional?

2] An alert is triggered when the queue grows first time over the set threshold, no alert triggered for the next occurrences of such threshold limit cross.

  Is it intentional to not trigger the queue alert when the queue crosses the threshold for second / N-times? Example:
  Let's assume configured queue alert threshold for 2 messages in the queue 'AQ':
  - AQ message depth 0
  - AQ message depth 1
  - AQ message depth 2 -> an queue alert triggered
  - AQ message depth 1
  - AQ message depth 2 -> no queue alert triggered, is this correct?

3] The feature specification (DOC-51693) mentions that broker log file should contain the message about triggered queue alert.
  I ca see some messages at trace+ log level, but this is related more to QMF traffic, no queue alert messages at debug+ level, so  is this requirement still valid?


Could you possibly review my observation points, please?

Comment 5 Gordon Sim 2011-04-19 09:36:35 UTC
1] No distinction is made as to whether it was a count or a size limit that was exceeded. Once the limit is exceeded based on one of these values, the fact that it is then exceeded also on the other value is not a significant event in the current implementation.

2] The example given *should* trigger an alert on recrossing the threshold after dipping back below. Indeed this is the behaviour I see. Following your example:

  observe events with:

  drain -f --count 0 'qmf.default.topic/agent.ind.event.org_apache_qd_broker.queueThresholdExceeded.#'

  then:

  qpid-config add queue my-queue --argument 'qpid.alert_count=2'
  spout my-queue
  spout my-queue //event observed
  drain my-queue
  spout my-queue //further event observed

Does this not match what you are seeing?

3] Unfortunately an obvious log message is not generated in the currently packaged code. That was an oversight on my part that has now been addressed upstream (https://issues.apache.org/jira/browse/QPID-3204).

Comment 6 Frantisek Reznicek 2011-04-19 10:44:04 UTC
Hi Gordon, thanks very much for your reply.

ad 1] I understand, I'm going to highlight that for doc bug 683599

ad 2] I might be doing something wrong, but I still can find case when it is not working as expected.

  2a] your case mentioned in comment 5 with drain and spout clients is working well and showing that queue alerts are re-triggered.
  2b] my unit test does (I believe) the same scenario and I cannot see re-triggered alert, see details below.


ad 3] Agreed, logging should show the alerts.
  I'm going to create defect matching QPID-3204
  If customers are still requesting the logging for this feature, then I propose to mark the new bug as dependency for this defect, otherwise w/o dependency. Would you agree?

Comment 7 Frantisek Reznicek 2011-04-19 10:48:52 UTC
Created attachment 493148 [details]
Custom alert unit tests

The log transcript shows that if the queue threshold is crossed again then no qmf alert is triggered... (case 2b])

I'm suspicious that the difference between yout case 2a] and my 2b] is in the fact that 2b] case use single session and even single sender and receiver.

May I possibly ask you for double-check of my two units HLAPITests.test_qa__A and HLAPITests.test_qa__B, please?


[freznice@dhcp-26-251 qpid_ptest_unit_tests]$ PYTHONPATH=${PYTHONPATH}:custom_py_units qpid-python-test -m custom_py_units custom_py_units.queue.HLAPITests.test_qa__A
custom_py_units.queue.HLAPITests.test_qa__A .............................................................................................. start
  -A
  None
  None
  Message(properties={u'qmf.agent': u'apache.org:qpidd:a5dc8ec5-9faa-4843-8df0-60cef488649a', 'x-amqp-0-10.routing-key': u'agent.ind.event.org_apache_qpid_broker.queueThresholdExceeded.warn.apache_org.qpidd.a5dc8ec5-9faa-4843-8df0-60cef488649a', 'x-amqp-0-10.app-id': 'qmf2', u'qmf.content': u'_event', u'qmf.opcode': u'_data_indication', u'method': u'indication'}, content=[{u'_schema_id': {u'_package_name': 'org.apache.qpid.broker', u'_class_name': 'queueThresholdExceeded', u'_type': '_event', u'_hash': UUID('9d1754c2-79bc-3767-df74-92d7bd1caa0e')}, u'_timestamp': 1303208400120574878, u'_values': {u'byteDepth': 9, u'msgDepth': 3, u'qName': 'AQA'}, u'_severity': 4}])
  None
  None
  -B
  Message(properties={'x-amqp-0-10.routing-key': u'AQA'}, content='Hi1')
  Message(properties={'x-amqp-0-10.routing-key': u'AQA'}, content='Hi2')
  Message(properties={'x-amqp-0-10.routing-key': u'AQA'}, content='Hi3')
  Message(properties={'x-amqp-0-10.routing-key': u'AQA'}, content='Hi4')
  Message(properties={'x-amqp-0-10.routing-key': u'AQA'}, content='Hi5')
  -C
  None
  None
  None
  None
  None
  -D
  Message(properties={'x-amqp-0-10.routing-key': u'AQA'}, content='Hi1')
  Message(properties={'x-amqp-0-10.routing-key': u'AQA'}, content='Hi2')
  Message(properties={'x-amqp-0-10.routing-key': u'AQA'}, content='Hi3')
  Message(properties={'x-amqp-0-10.routing-key': u'AQA'}, content='Hi4')
  Message(properties={'x-amqp-0-10.routing-key': u'AQA'}, content='Hi5')

[freznice@dhcp-26-251 qpid_ptest_unit_tests]$ PYTHONPATH=${PYTHONPATH}:custom_py_units qpid-python-test -m custom_py_units custom_py_units.queue.HLAPITests.test_qa__B
custom_py_units.queue.HLAPITests.test_qa__B .............................................................................................. start
  -A
  None
  None
  Message(properties={u'qmf.agent': u'apache.org:qpidd:a5dc8ec5-9faa-4843-8df0-60cef488649a', 'x-amqp-0-10.routing-key': u'agent.ind.event.org_apache_qpid_broker.queueThresholdExceeded.warn.apache_org.qpidd.a5dc8ec5-9faa-4843-8df0-60cef488649a', 'x-amqp-0-10.app-id': 'qmf2', u'qmf.content': u'_event', u'qmf.opcode': u'_data_indication', u'method': u'indication'}, content=[{u'_schema_id': {u'_package_name': 'org.apache.qpid.broker', u'_class_name': 'queueThresholdExceeded', u'_type': '_event', u'_hash': UUID('9d1754c2-79bc-3767-df74-92d7bd1caa0e')}, u'_timestamp': 1303208429720923060, u'_values': {u'byteDepth': 9, u'msgDepth': 3, u'qName': 'AQB'}, u'_severity': 4}])
  None
  None
  -B
  Message(properties={'x-amqp-0-10.routing-key': u'AQB'}, content='Hi1')
  Message(properties={'x-amqp-0-10.routing-key': u'AQB'}, content='Hi2')
  Message(properties={'x-amqp-0-10.routing-key': u'AQB'}, content='Hi3')
  Message(properties={'x-amqp-0-10.routing-key': u'AQB'}, content='Hi4')
  -C
  None
  None
  None
  None
  -D
  Message(properties={'x-amqp-0-10.routing-key': u'AQB'}, content='Hi5')
  Message(properties={'x-amqp-0-10.routing-key': u'AQB'}, content='Hi2')
  Message(properties={'x-amqp-0-10.routing-key': u'AQB'}, content='Hi3')
  Message(properties={'x-amqp-0-10.routing-key': u'AQB'}, content='Hi4')
  Message(properties={'x-amqp-0-10.routing-key': u'AQB'}, content='Hi5')
custom_py_units.queue.HLAPITests.test_qa__B .............................................................................................. pass

Comment 8 Frantisek Reznicek 2011-04-19 11:26:44 UTC
The above point 3], from comment 5 is now tracked separately as bug 697816.

Comment 9 Gordon Sim 2011-04-19 11:39:57 UTC
Regarding the unit test failures, I *think* your test may not be acknowledging
messages(?), thus the queue depth is not decreased as you expect. Certainly I
can't see where the acknowledgement is occurring.

Regarding the lack of logging, I myself would say that this need not depend on
that. DOC-51693 was a very loose statement of requirements. While having the
log entry is important (and should and will be added), this BZ does not in my
view mandate it.

Comment 10 Frantisek Reznicek 2011-04-19 12:13:00 UTC
Thanks again,
you're indeed again right with missing message acks. With acknowledges it works as expected.
I believe I'll be able to finish this defect in a day or two.

Keeping bug 697816 separate from this defect.

Comment 11 Frantisek Reznicek 2011-04-20 12:41:11 UTC
This is the testing status:
- raising single queue event works just fine
  - message count threshold
  - message[s] byte size threshold
  - message count & byte size thresholds together
- raising multiple queue event by multiple crossing the threshold work fine
  - message count threshold
  - message[s] byte size threshold
  - message count & byte size thresholds together
- raising default alert for queue linked to flow control stop works fine

There is one more feature mentioned by documentation (bug 683599):
- alert repetition specified by qpid.alert_repeat_gap / x-qpid-minimum-alert-repeat-gap (which should be by default 60 secs)


The alert repetition seems to not work as expected:

I tested by both approaches (unit test in python and by c++ spout / drain):

<term-A>
  .../drain -f --count 0 'qmf.default.topic/agent.ind.event.org_apache_qpid_broker.queueThresholdExceeded.#'

->one alert seen

<term-B>
  # (*)
  qpid-config add queue my-queue --argument 'qpid.alert_count=2'
  .../spout my-queue
  .../spout my-queue # first alert triggered in term-A

Even with
  qpid-config  my-queue --argument 'qpid.alert_count=2' --argument 'qpid.alert_repeat_gap=5'
 
I can see just one alert triggered.

The documentation mentiones that qpid.alert_repeat_gap / x-qpid-minimum-alert-repeat-gap functionality, so it should work, which is in contrary in my observations.

Could you possibly confirm that my scenario is correct, please?

Comment 12 Gordon Sim 2011-04-20 12:58:04 UTC
The alert repeat gap is the time for which no repeat alert will be issued. Before that time expires extra messages will not trigger the alert; after that time expires a further enqueue will trigger a further alert (followed again by a period where no alerts are generated).  An alert is not automatically issued when that time expires however; alerts are always only triggered by an enqueue. 

The alert repeat gap merely prevents large numbers of events being triggered when lots of enqueues occur when the limit is exceeded.

Comment 13 Frantisek Reznicek 2011-04-21 10:05:56 UTC
The feature is implemented and works as expected, tested on RHEL 4.9 / 5.6 / 6.1snap4 i[36]86 / x86_64 on packages:

python-qpid-0.10-1.el5.noarch
python-qpid-qmf-0.10-5.el5.x86_64
qpid-cpp-*-0.10-3.el5.x86_64
qpid-java-*-0.10-2.el5.noarch
qpid-qmf*0.10-5.el5.x86_64
qpid-tests-0.10-1.el5.noarch
qpid-tools-0.10-3.el5.noarch
rh-qpid-cpp-tests-0.10-3.el5.x86_64
ruby-qpid-qmf-0.10-5.el5.x86_64
sesame-0.10-1.el5.x86_64


The testing is complete, waiting for cross-linked documentation bug.

Comment 15 Frantisek Reznicek 2011-05-17 10:24:45 UTC
The feature is functional (comment 13) and properly documented (bug 683599).

-> VERIFIED

Comment 16 Misha H. Ali 2011-05-31 06:49:57 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -8,4 +8,8 @@
 The broker now allows QMF events to be generated when the queue depth reaches particular pre-configured limits.
 
 Result:
-Systems can now more easily incorporate early warnings of message build up in queues.+Systems can now more easily incorporate early warnings of message build up in queues.
+
+Release Note Entry:
+
+Previously, messaging was unable to monitor the growing queue depth aside from polling constantly or waiting till the maximum level was reached to issue a warning. The broker now allows QMF events to be generated when the queue depth reaches a previously configured threshold, which provides an early warning for elongated messaging queues.

Comment 17 Misha H. Ali 2011-06-06 03:16:16 UTC
Technical note can be viewed in the release notes for 2.0 at the documentation stage here:

http://documentation-stage.bne.redhat.com/docs/en-US/Red_Hat_Enterprise_MRG/2.0/html-single/MRG_Release_Notes/index.html#tabl-MRG_Release_Notes-RHM_Update_Notes-RHM_Update_Notes

Comment 18 errata-xmlrpc 2011-06-23 15:48:27 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2011-0890.html