Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 876468

Summary: qpidd alert timing specified via 'qpid.alert_repeat_gap' or x-qpid-minimum-alert-repeat-gap is not always met
Product: Red Hat Enterprise MRG Reporter: Frantisek Reznicek <freznice>
Component: qpid-cppAssignee: Gordon Sim <gsim>
Status: CLOSED UPSTREAM QA Contact: MRG Quality Engineering <mrgqe-bugs>
Severity: high Docs Contact:
Priority: low    
Version: DevelopmentCC: jross
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2025-02-10 03:26:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Less complex and more focused reproducer none

Description Frantisek Reznicek 2012-11-14 08:54:48 UTC
Description of problem:

qpidd alert timing specified via 'qpid.alert_repeat_gap' or x-qpid-minimum-alert-repeat-gap is not always met.

qpidd sends various queue alerts via listening for instance to 'qmf.default.topic/agent.ind.event.org_apache_qpid_broker.queueThresholdExceeded.#'
Client which may trigger event may specify qpidd trigger count /timing using:
 * '"qpid.alert_count"', 'x-qpid-maximum-message-count'
 * '"qpid.alert_size"', 'x-qpid-maximum-message-size' 
 * '"qpid.alert_repeat_gap"','x-qpid-minimum-alert-repeat-gap'
properties.

Repeat gap property should define minimal time between two alerts raised for single event.

qpid-0.14 strictly followed this timing, but with qpid-0.18 there are cases when qpidd violates that timing. So it seems it is a regression.


Giving an example:

Continuous sending ~ 10000 messages to address test_queue_alerts-address; {node: {x-declare: {auto-delete: False, exclusive: False, arguments: {x-qpid-maximum-message-count: 40, 'qpid.alert_repeat_gap': 2, 'qpid.alert_size': 320}}, durable: False}, create: sender, delete: receiver}" should trigger
'qmf.default.topic/agent.ind.event.org_apache_qpid_broker.queueThresholdExceeded.#' events with periodicity lower than 1/2 Hz (period >= 2secs).

In reality I have detected that duration between two events might be less than 2 seconds, sometimes just slightly less (~1.7 secs, but there was case with gap lower than 0.1 secs).

Seen on all supported variants RHEL5.9/6.3 i[36]86/x86_64.


Version-Release number of selected component (if applicable):
python-qpid-0.18-4.el6.noarch
python-qpid-qmf-0.18-6.el6.x86_64
qpid-cpp-client-0.18-7.el6.x86_64
qpid-cpp-client-devel-0.18-7.el6.x86_64
qpid-cpp-client-devel-docs-0.18-7.el6.noarch
qpid-cpp-client-rdma-0.18-7.el6.x86_64
qpid-cpp-client-ssl-0.18-7.el6.x86_64
qpid-cpp-debuginfo-0.18-7.el6.x86_64
qpid-cpp-server-0.18-7.el6.x86_64
qpid-cpp-server-cluster-0.18-7.el6.x86_64
qpid-cpp-server-devel-0.18-7.el6.x86_64
qpid-cpp-server-ha-0.18-7.el6.x86_64
qpid-cpp-server-rdma-0.18-7.el6.x86_64
qpid-cpp-server-ssl-0.18-7.el6.x86_64
qpid-cpp-server-store-0.18-7.el6.x86_64
qpid-cpp-server-xml-0.18-7.el6.x86_64
qpid-java-client-0.18-5.el6.noarch
qpid-java-common-0.18-5.el6.noarch
qpid-java-example-0.18-5.el6.noarch
qpid-jca-0.18-5.el6.noarch
qpid-jca-xarecovery-0.18-5.el6.noarch
qpid-jca-zip-0.18-5.el6.noarch
qpid-qmf-0.18-6.el6.x86_64
qpid-qmf-debuginfo-0.18-6.el6.x86_64
qpid-qmf-devel-0.18-6.el6.x86_64
qpid-tests-0.18-2.el6.noarch
qpid-tools-0.18-5.el6.noarch
rh-qpid-cpp-tests-0.18-7.el6.x86_64
ruby-qpid-qmf-0.18-6.el6.x86_64


How reproducible:
80%

Steps to Reproduce:
1. run custom_py_units.queue.HLAPITests.test_queue_alerts_count_size_single_repeated in the loop waiting for failure (alert period:Tx expected to be > Ty)
  
Actual results:
Time gap between alerts is smaller than expected/requested.

Expected results:
Time gap between alerts has to be greather or equal than expected/requested.

Additional info:

custom_py_units.queue.HLAPITests.test_queue_alerts_count_size_single_repeated ....... start
  setup: loops:10, retrigger_cnt:0 queue_cnt_alerts_ena:True, queue_size_alerts_ena:True alert_min_period:2, alert_periodicity_check_cnt:1 msg_layout:None
custom_py_units.queue.HLAPITests.test_queue_alerts_count_size_single_repeated ....... fail
Error during test:  Traceback (most recent call last):
    File "/usr/bin/qpid-python-test", line 340, in run
      phase()
    File "/root/MRG/Messaging/qpid_ptest_unit_tests/custom_py_units/queue.py", line 1360, in test_queue_alerts_count_size_single_repeated
      in_queue_size_alerts_ena=True);
    File "/root/MRG/Messaging/qpid_ptest_unit_tests/custom_py_units/queue.py", line 1208, in do_test_queue_alerts
      ((ats2 - ats), (0.95 * in_alert_min_period)) );
    File "/usr/lib64/python2.6/unittest.py", line 324, in failUnless
      if not expr: raise self.failureException, msg
  AssertionError: alert period:1.72144 expected to be > 1.9


TX .check
ERROR: 1352740442.206227-1352740443.740597=1.534370
RX .check
[1352740374.94011, 1352740376.9492371, 1352740378.9835031, 1352740380.9989109, 1352740383.0171089, 1352740385.0527799, 1352740387.0898049, 1352740389.130404, 1352740391.1595359, 1352740393.1954119, 1352740395.243114, 1352740397.2437739, 1352740399.2847321, 1352740401.3046629, 1352740403.3166671, 1352740405.371567, 1352740407.39025, 1352740409.394731, 1352740411.428196, 1352740413.4556551, 1352740415.478524, 1352740417.495559, 1352740419.5170541, 1352740421.5502999, 1352740423.56301, 1352740425.578898, 1352740427.6195259, 1352740429.6266351, 1352740431.659626, 1352740433.6676769, 1352740435.6712511, 1352740437.7090001, 1352740439.7297721, 1352740442.2062271, 1352740443.740597, 1352740445.7513189, 1352740447.75245, 1352740449.774286, 1352740451.783339] []
1

TX .check
ERROR: 1352738983.971076-1352738984.008094=0.037018
RX .check
[1352738916.6770949, 1352738918.7008519, 1352738920.7145491, 1352738922.7184529, 1352738924.7311239, 1352738926.7565031, 1352738928.8097019, 1352738930.829648, 1352738932.8630049, 1352738934.8913989, 1352738936.9275041, 1352738938.9416499, 1352738940.949106, 1352738942.964879, 1352738945.011595, 1352738947.0185161, 1352738949.033437, 1352738951.035413, 1352738953.1025219, 1352738955.128267, 1352738957.146085, 1352738959.1750431, 1352738961.2078609, 1352738963.2547669, 1352738965.280633, 1352738967.326597, 1352738969.3462911, 1352738971.3691511, 1352738973.4169919, 1352738975.4798441, 1352738983.971076, 1352738984.0080941, 1352738986.0171051, 1352738988.04809, 1352738990.091434, 1352738992.1075189, 1352738994.116466, 1352738996.1290891, 1352738998.1318369, 1352739000.136611, 1352739002.1409161] []
1

TX .check
ERROR: 1352730746.503045-1352730748.046758=1.543713
RX .check
[1352730709.520736, 1352730711.528244, 1352730713.558624, 1352730715.618119, 1352730717.632519, 1352730719.671787, 1352730721.690572, 1352730723.7260039, 1352730725.7264349, 1352730727.7647519, 1352730729.776211, 1352730731.784663, 1352730733.802218, 1352730735.8115871, 1352730737.840585, 1352730739.8709581, 1352730741.9377589, 1352730743.991215, 1352730746.5030451, 1352730748.0467579, 1352730750.0613911, 1352730752.0781219, 1352730754.0843389, 1352730756.0878749, 1352730758.1123199, 1352730760.128511, 1352730762.1863461, 1352730764.2038059, 1352730766.21913, 1352730768.2629449, 1352730770.305311, 1352730772.3167281, 1352730774.35061, 1352730776.389415, 1352730778.4026909, 1352730780.4264591, 1352730782.441673, 1352730784.448782, 1352730786.4621229] []
1

Comment 2 Frantisek Reznicek 2012-11-14 09:04:02 UTC
Created attachment 644678 [details]
Less complex and more focused reproducer

Attached reproducer shows the qpidd event timestamps and compute the gaps.

How to run:
service qpidd restart ; while true; do python ./bz876468.py; ecode=$?; echo  $ecode; [ $ecode != 0 ] && break; done

Testing ends when detected event gap is lower than 90% of the expected time, detected gaps in interval 90% ... 100% are printed as warnings only.

Case should be triggerable within an hour.

Comment 3 Justin Ross 2012-11-14 14:30:20 UTC
We've never specified the minimum repeat gap timing as a hard requirement.  You point out that it's now (versus 0.14) being enforced as a soft requirement, with occasional variability below the threshold.  That's not quite a regression, imo.  It's a change in behavior that may or may not be desirable, all other constraints taking into consideration.

And I think that consideration ought to take place for 2.4.  Nacking this for 2.3.

Comment 4 Gordon Sim 2012-11-15 15:21:56 UTC
Two comments, neither of which may be in fact relevant :-)

(1) the timer is reset whenever the depth falls below the threshold, so it is possible that events are generated closer together than the minimum interval where the depth has dropped below the threshold then risen above it in between

(2) there has been a recent change to the way in which events are actually sent out. This doesn't affect the timing of them being generated, but could well affect the timing in which they are sent out (it would for example I think be more prone to batching events).

Comment 8 Red Hat Bugzilla 2025-02-10 03:26:56 UTC
This product has been discontinued or is no longer tracked in Red Hat Bugzilla.