Bug 876468
| Summary: | qpidd alert timing specified via 'qpid.alert_repeat_gap' or x-qpid-minimum-alert-repeat-gap is not always met | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Frantisek Reznicek <freznice> | ||||
| Component: | qpid-cpp | Assignee: | Gordon Sim <gsim> | ||||
| Status: | CLOSED UPSTREAM | QA Contact: | MRG Quality Engineering <mrgqe-bugs> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | low | ||||||
| Version: | Development | CC: | jross | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2025-02-10 03:26:56 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Created attachment 644678 [details] Less complex and more focused reproducer Attached reproducer shows the qpidd event timestamps and compute the gaps. How to run: service qpidd restart ; while true; do python ./bz876468.py; ecode=$?; echo $ecode; [ $ecode != 0 ] && break; done Testing ends when detected event gap is lower than 90% of the expected time, detected gaps in interval 90% ... 100% are printed as warnings only. Case should be triggerable within an hour. We've never specified the minimum repeat gap timing as a hard requirement. You point out that it's now (versus 0.14) being enforced as a soft requirement, with occasional variability below the threshold. That's not quite a regression, imo. It's a change in behavior that may or may not be desirable, all other constraints taking into consideration. And I think that consideration ought to take place for 2.4. Nacking this for 2.3. Two comments, neither of which may be in fact relevant :-) (1) the timer is reset whenever the depth falls below the threshold, so it is possible that events are generated closer together than the minimum interval where the depth has dropped below the threshold then risen above it in between (2) there has been a recent change to the way in which events are actually sent out. This doesn't affect the timing of them being generated, but could well affect the timing in which they are sent out (it would for example I think be more prone to batching events). This product has been discontinued or is no longer tracked in Red Hat Bugzilla. |
Description of problem: qpidd alert timing specified via 'qpid.alert_repeat_gap' or x-qpid-minimum-alert-repeat-gap is not always met. qpidd sends various queue alerts via listening for instance to 'qmf.default.topic/agent.ind.event.org_apache_qpid_broker.queueThresholdExceeded.#' Client which may trigger event may specify qpidd trigger count /timing using: * '"qpid.alert_count"', 'x-qpid-maximum-message-count' * '"qpid.alert_size"', 'x-qpid-maximum-message-size' * '"qpid.alert_repeat_gap"','x-qpid-minimum-alert-repeat-gap' properties. Repeat gap property should define minimal time between two alerts raised for single event. qpid-0.14 strictly followed this timing, but with qpid-0.18 there are cases when qpidd violates that timing. So it seems it is a regression. Giving an example: Continuous sending ~ 10000 messages to address test_queue_alerts-address; {node: {x-declare: {auto-delete: False, exclusive: False, arguments: {x-qpid-maximum-message-count: 40, 'qpid.alert_repeat_gap': 2, 'qpid.alert_size': 320}}, durable: False}, create: sender, delete: receiver}" should trigger 'qmf.default.topic/agent.ind.event.org_apache_qpid_broker.queueThresholdExceeded.#' events with periodicity lower than 1/2 Hz (period >= 2secs). In reality I have detected that duration between two events might be less than 2 seconds, sometimes just slightly less (~1.7 secs, but there was case with gap lower than 0.1 secs). Seen on all supported variants RHEL5.9/6.3 i[36]86/x86_64. Version-Release number of selected component (if applicable): python-qpid-0.18-4.el6.noarch python-qpid-qmf-0.18-6.el6.x86_64 qpid-cpp-client-0.18-7.el6.x86_64 qpid-cpp-client-devel-0.18-7.el6.x86_64 qpid-cpp-client-devel-docs-0.18-7.el6.noarch qpid-cpp-client-rdma-0.18-7.el6.x86_64 qpid-cpp-client-ssl-0.18-7.el6.x86_64 qpid-cpp-debuginfo-0.18-7.el6.x86_64 qpid-cpp-server-0.18-7.el6.x86_64 qpid-cpp-server-cluster-0.18-7.el6.x86_64 qpid-cpp-server-devel-0.18-7.el6.x86_64 qpid-cpp-server-ha-0.18-7.el6.x86_64 qpid-cpp-server-rdma-0.18-7.el6.x86_64 qpid-cpp-server-ssl-0.18-7.el6.x86_64 qpid-cpp-server-store-0.18-7.el6.x86_64 qpid-cpp-server-xml-0.18-7.el6.x86_64 qpid-java-client-0.18-5.el6.noarch qpid-java-common-0.18-5.el6.noarch qpid-java-example-0.18-5.el6.noarch qpid-jca-0.18-5.el6.noarch qpid-jca-xarecovery-0.18-5.el6.noarch qpid-jca-zip-0.18-5.el6.noarch qpid-qmf-0.18-6.el6.x86_64 qpid-qmf-debuginfo-0.18-6.el6.x86_64 qpid-qmf-devel-0.18-6.el6.x86_64 qpid-tests-0.18-2.el6.noarch qpid-tools-0.18-5.el6.noarch rh-qpid-cpp-tests-0.18-7.el6.x86_64 ruby-qpid-qmf-0.18-6.el6.x86_64 How reproducible: 80% Steps to Reproduce: 1. run custom_py_units.queue.HLAPITests.test_queue_alerts_count_size_single_repeated in the loop waiting for failure (alert period:Tx expected to be > Ty) Actual results: Time gap between alerts is smaller than expected/requested. Expected results: Time gap between alerts has to be greather or equal than expected/requested. Additional info: custom_py_units.queue.HLAPITests.test_queue_alerts_count_size_single_repeated ....... start setup: loops:10, retrigger_cnt:0 queue_cnt_alerts_ena:True, queue_size_alerts_ena:True alert_min_period:2, alert_periodicity_check_cnt:1 msg_layout:None custom_py_units.queue.HLAPITests.test_queue_alerts_count_size_single_repeated ....... fail Error during test: Traceback (most recent call last): File "/usr/bin/qpid-python-test", line 340, in run phase() File "/root/MRG/Messaging/qpid_ptest_unit_tests/custom_py_units/queue.py", line 1360, in test_queue_alerts_count_size_single_repeated in_queue_size_alerts_ena=True); File "/root/MRG/Messaging/qpid_ptest_unit_tests/custom_py_units/queue.py", line 1208, in do_test_queue_alerts ((ats2 - ats), (0.95 * in_alert_min_period)) ); File "/usr/lib64/python2.6/unittest.py", line 324, in failUnless if not expr: raise self.failureException, msg AssertionError: alert period:1.72144 expected to be > 1.9 TX .check ERROR: 1352740442.206227-1352740443.740597=1.534370 RX .check [1352740374.94011, 1352740376.9492371, 1352740378.9835031, 1352740380.9989109, 1352740383.0171089, 1352740385.0527799, 1352740387.0898049, 1352740389.130404, 1352740391.1595359, 1352740393.1954119, 1352740395.243114, 1352740397.2437739, 1352740399.2847321, 1352740401.3046629, 1352740403.3166671, 1352740405.371567, 1352740407.39025, 1352740409.394731, 1352740411.428196, 1352740413.4556551, 1352740415.478524, 1352740417.495559, 1352740419.5170541, 1352740421.5502999, 1352740423.56301, 1352740425.578898, 1352740427.6195259, 1352740429.6266351, 1352740431.659626, 1352740433.6676769, 1352740435.6712511, 1352740437.7090001, 1352740439.7297721, 1352740442.2062271, 1352740443.740597, 1352740445.7513189, 1352740447.75245, 1352740449.774286, 1352740451.783339] [] 1 TX .check ERROR: 1352738983.971076-1352738984.008094=0.037018 RX .check [1352738916.6770949, 1352738918.7008519, 1352738920.7145491, 1352738922.7184529, 1352738924.7311239, 1352738926.7565031, 1352738928.8097019, 1352738930.829648, 1352738932.8630049, 1352738934.8913989, 1352738936.9275041, 1352738938.9416499, 1352738940.949106, 1352738942.964879, 1352738945.011595, 1352738947.0185161, 1352738949.033437, 1352738951.035413, 1352738953.1025219, 1352738955.128267, 1352738957.146085, 1352738959.1750431, 1352738961.2078609, 1352738963.2547669, 1352738965.280633, 1352738967.326597, 1352738969.3462911, 1352738971.3691511, 1352738973.4169919, 1352738975.4798441, 1352738983.971076, 1352738984.0080941, 1352738986.0171051, 1352738988.04809, 1352738990.091434, 1352738992.1075189, 1352738994.116466, 1352738996.1290891, 1352738998.1318369, 1352739000.136611, 1352739002.1409161] [] 1 TX .check ERROR: 1352730746.503045-1352730748.046758=1.543713 RX .check [1352730709.520736, 1352730711.528244, 1352730713.558624, 1352730715.618119, 1352730717.632519, 1352730719.671787, 1352730721.690572, 1352730723.7260039, 1352730725.7264349, 1352730727.7647519, 1352730729.776211, 1352730731.784663, 1352730733.802218, 1352730735.8115871, 1352730737.840585, 1352730739.8709581, 1352730741.9377589, 1352730743.991215, 1352730746.5030451, 1352730748.0467579, 1352730750.0613911, 1352730752.0781219, 1352730754.0843389, 1352730756.0878749, 1352730758.1123199, 1352730760.128511, 1352730762.1863461, 1352730764.2038059, 1352730766.21913, 1352730768.2629449, 1352730770.305311, 1352730772.3167281, 1352730774.35061, 1352730776.389415, 1352730778.4026909, 1352730780.4264591, 1352730782.441673, 1352730784.448782, 1352730786.4621229] [] 1