Bug 1494952
Summary: | Sending durable messages via AMQP 1.0 on 1.36.0-9 broker gets stuck when destination queue is suddenly empty | ||
---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Pavel Moravec <pmoravec> |
Component: | qpid-cpp | Assignee: | Kim van der Riet <kim.vdriet> |
Status: | CLOSED ERRATA | QA Contact: | Petr Matousek <pematous> |
Severity: | high | Docs Contact: | |
Priority: | urgent | ||
Version: | 3.2 | CC: | jross, kim.vdriet, mcressma, pematous |
Target Milestone: | 3.2.6 | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | qpid-cpp-1.36.0-11 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-10-24 20:24:17 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1487729 |
Description
Pavel Moravec
2017-09-24 10:17:30 UTC
I can confirm the same behaviour when using Java client (as sender). When using journal-flush-timeout=1ms , broker stops accepting messages from the sender after random hundreds of messages (why hundreds?). If not using that option, the "no more ACK" comes everytime after 14*capacity messages received. (In reply to Pavel Moravec from comment #1) > I can confirm the same behaviour when using Java client (as sender). > Just for the sake of completeness, I run: msgSize=700 capacity=1 messages=1000000 sndRate=10 msgContent="" for i in $(seq 1 $msgSize); do msgContent="${msgContent}X" done java -jar /root/dtests/node_data/clients/java/aac/target/aac-staging-0.21.0.redhat-1.jar sender -a PerfQueue --capacity=$capacity -c $messages --msg-content=$msgContent -d $((messages/sndRate)) instead of the qpid-send. I have tracked this problem down to a race condition in linearstore. When the store flush timer is set to a low value, it triggers the race condition which has the effect of causing the flush timer to stop flushing. When using AMQP 1.0, this will have the effect of hanging the application. I have checked in a fix and tested it using the method above (https://bugzilla.redhat.com/show_bug.cgi?id=1494952#c0) and it works without a problem. The checkins for this issue can be seen at QPID-7895 (https://issues.apache.org/jira/browse/QPID-7895). (In reply to Kim van der Riet from comment #4) > I have tracked this problem down to a race condition in linearstore. When > the store flush timer is set to a low value, it triggers the race condition > which has the effect of causing the flush timer to stop flushing. When > using AMQP 1.0, this will have the effect of hanging the application. > > I have checked in a fix and tested it using the method above > (https://bugzilla.redhat.com/show_bug.cgi?id=1494952#c0) and it works > without a problem. > > The checkins for this issue can be seen at QPID-7895 > (https://issues.apache.org/jira/browse/QPID-7895). Successfully tested in qpid-cpp-server-1.36.0-11.el6.x86_64 build. Both reproducers (consumer stopped => producer blocked very soon; consumer running => producer stuck after hundreds of msgs) using both clients (C++ either Java). The issue has been fixed. Verified with both above available reproducers. Internal test case has been linked. packages under test: * qpid-cpp-1.36.0-11 -> VERIFIED Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2892 |