Bug 1095849

Summary: JAVA Client Can not recieve message with qpid ha cluster "Session exception occured while trying to commit"
Product: Red Hat Enterprise MRG Reporter: masterQi
Component: qpid-cppAssignee: messaging-bugs <messaging-bugs>
Status: CLOSED UPSTREAM QA Contact: Michal Toth <mtoth>
Severity: high Docs Contact:
Priority: high    
Version: DevelopmentCC: jross, masterQi, mtoth, zkraus
Target Milestone: 3.2   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: qpid-cpp-0.34-4 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2025-02-10 03:35:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Simple reproducer
none
Final reproducer with script to measure tx times.
none
Qpid broker ha logs
none
Updated reproducer, not able to reproduce the issue. none

Description masterQi 2014-05-08 16:31:28 UTC
Description of problem:

the java client can not recieve message in qpid ha cluster
but drain message is work fine 
if i remove ha cluster ,all are OK


2014-05-09 00:20:18,695 WARN  [org.springframework.jms.listener.DefaultMessageListenerContainer] Execution of JMS message listener failed, and no ErrorHandler has been set.
"javax.jms.JMSException: Session exception occured while trying to commit: timed out waiting for sync: complete = 8, point = 10
	at org.apache.qpid.client.AMQSession.toJMSException(AMQSession.java:3465)
	at org.apache.qpid.client.AMQSession.commit(AMQSession.java:843)
	at org.springframework.jms.support.JmsUtils.commitIfNecessary(JmsUtils.java:217)
	at org.springframework.jms.listener.AbstractMessageListenerContainer.commitIfNecessary(AbstractMessageListenerContainer.java:575)
	at org.springframework.jms.listener.AbstractMessageListenerContainer.doExecuteListener(AbstractMessageListenerContainer.java:481)
	at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:325)
	at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:263)
	at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1058)
	at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.executeOngoingLoop(DefaultMessageListenerContainer.java:1050)
	at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:947)
	at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.qpid.transport.SessionException: timed out waiting for sync: complete = 8, point = 10
	at org.apache.qpid.transport.Session.sync(Session.java:867)
	at org.apache.qpid.transport.Session.sync(Session.java:837)
	at org.apache.qpid.transport.Session.invoke(Session.java:789)
	at org.apache.qpid.transport.Session.invoke(Session.java:627)
	at org.apache.qpid.transport.SessionInvoker.txCommit(SessionInvoker.java:145)
	at org.apache.qpid.client.AMQSession_0_10.commitImpl(AMQSession_0_10.java:1037)
	at org.apache.qpid.client.AMQSession.commit(AMQSession.java:830)
	... 9 more
Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:
throw excpeion

Expected results:
no exception


Additional info:

Comment 1 masterQi 2014-05-08 16:37:17 UTC
there are manay qpid-ha tx queue


qpid: list
Summary of Objects by Type:
    Package                      Class         Active  Deleted
    ============================================================
    org.apache.qpid.broker       binding       17      0
    org.apache.qpid.broker       broker        1       0
    org.apache.qpid.broker       memory        1       0
    org.apache.qpid.broker       system        1       0
    org.apache.qpid.ha           habroker      1       0
    org.apache.qpid.legacystore  store         1       0
    org.apache.qpid.broker       subscription  8       0
    org.apache.qpid.broker       connection    3       0
    org.apache.qpid.legacystore  journal       2       0
    org.apache.qpid.broker       session       4       0
    org.apache.qpid.broker       queue         42      0
    org.apache.qpid.broker       exchange      9       0
    org.apache.qpid.broker       vhost         1       0
qpid: list queue
Object Summary:
    ID   Created   Destroyed  Index
    ==================================================================================================================================================================================
    132  16:01:41  -          org.apache.qpid.broker:queue:log
    133  16:01:41  -          org.apache.qpid.broker:queue:meizu.game.order.notify
    134  16:34:38  -          org.apache.qpid.broker:queue:qmfc-v2-GZ-GAME-19.65.19797.1
    135  16:34:38  -          org.apache.qpid.broker:queue:qmfc-v2-hb-GZ-GAME-19.65.19797.1
    136  16:34:38  -          org.apache.qpid.broker:queue:qmfc-v2-ui-GZ-GAME-19.65.19797.1
    137  16:01:52  -          org.apache.qpid.broker:queue:qpid.bridge_queue_qpid.broker-replicator.bridge.55f21fec-fb5d-49e2-92e3-45c3b8c6a31f_58b2be25-5a48-4691-8dcb-2f05e22af240
    138  16:25:18  -          org.apache.qpid.broker:queue:qpid.ha-tx:050dee8e-eb28-43ae-8853-6b8774ec467a
    139  16:31:23  -          org.apache.qpid.broker:queue:qpid.ha-tx:16df28dd-47f0-4a39-a96f-c7e07fd79c05
    140  16:28:18  -          org.apache.qpid.broker:queue:qpid.ha-tx:1b2d1353-8c42-484b-b231-25964a5f6453
    141  16:16:18  -          org.apache.qpid.broker:queue:qpid.ha-tx:26239421-986e-4dc3-b22a-243e44d36a40
    142  16:01:53  -          org.apache.qpid.broker:queue:qpid.ha-tx:2dfbb304-2020-464a-a14f-c87418956867
    143  16:01:53  -          org.apache.qpid.broker:queue:qpid.ha-tx:2e54c358-c200-4d4e-b4ac-ef8edb7c6b9d
    144  16:16:18  -          org.apache.qpid.broker:queue:qpid.ha-tx:30815be3-714d-4e4b-9733-b61359120ac3
    145  16:14:12  -          org.apache.qpid.broker:queue:qpid.ha-tx:31521fd7-5308-4a08-8882-bd8aefa0050e
    146  16:25:23  -          org.apache.qpid.broker:queue:qpid.ha-tx:34fe2710-9243-4884-8abb-85ab54685b1e
    147  16:34:19  -          org.apache.qpid.broker:queue:qpid.ha-tx:4c4dcb65-7a2c-4ee0-92a1-567da5ae5aba
    148  16:01:53  -          org.apache.qpid.broker:queue:qpid.ha-tx:5b0fa3f5-264c-4477-b6a0-46c0b39a6a79
    149  16:19:18  -          org.apache.qpid.broker:queue:qpid.ha-tx:5b8eb722-1a3d-4c46-97b1-26b108703044
    150  16:12:06  -          org.apache.qpid.broker:queue:qpid.ha-tx:6457ac02-fcb7-4144-b0c2-3339cfa2cf4f
    151  16:04:54  -          org.apache.qpid.broker:queue:qpid.ha-tx:701ac0c7-b3fd-41da-8832-70dc700cf0f3
    152  16:31:18  -          org.apache.qpid.broker:queue:qpid.ha-tx:70b0f4a0-2de0-49f7-ad6b-08506dbd211f
    153  16:09:06  -          org.apache.qpid.broker:queue:qpid.ha-tx:7b171b6a-8f17-4519-9fe5-204fc5626529
    154  16:34:25  -          org.apache.qpid.broker:queue:qpid.ha-tx:7c003fc5-9986-4598-8950-c409ff86b709
    155  16:09:06  -          org.apache.qpid.broker:queue:qpid.ha-tx:7c6c8f9a-8eab-4446-9bc2-50deaf66d627
    156  16:15:06  -          org.apache.qpid.broker:queue:qpid.ha-tx:80c0e04c-5f80-4f75-95ba-81091292cd4d
    157  16:12:06  -          org.apache.qpid.broker:queue:qpid.ha-tx:8c3aea7e-f333-48c8-829b-564f89d600ba
    158  16:22:23  -          org.apache.qpid.broker:queue:qpid.ha-tx:a11e7cbd-d5c0-4a50-b94d-d6bd8f36d8bc
    159  16:17:12  -          org.apache.qpid.broker:queue:qpid.ha-tx:ad1ab060-2a6f-41e8-80c2-16aabdfcf8de
    160  16:28:23  -          org.apache.qpid.broker:queue:qpid.ha-tx:b0454339-4759-44de-906f-d1bf3465a2cb
    161  16:10:00  -          org.apache.qpid.broker:queue:qpid.ha-tx:b4c5ad63-ec07-4619-b45c-35593894e072
    162  16:04:54  -          org.apache.qpid.broker:queue:qpid.ha-tx:d57c3e52-8f65-44e4-9595-6e8440d39978
    163  16:19:23  -          org.apache.qpid.broker:queue:qpid.ha-tx:da34735a-6e9c-44c0-bd8c-c8368ab74912
    164  16:22:18  -          org.apache.qpid.broker:queue:qpid.ha-tx:dad4938e-7308-4c9e-b648-49494da41682
    165  16:04:54  -          org.apache.qpid.broker:queue:qpid.ha-tx:e82f5575-9e2b-4567-be77-fb2a766fa74b
    166  16:07:54  -          org.apache.qpid.broker:queue:qpid.ha-tx:ead7752b-54f0-4e5b-a389-5d2b46b8949a
    167  16:34:23  -          org.apache.qpid.broker:queue:qpid.ha-tx:ecd4b7e9-1199-44be-927a-b5230ed8aa3c
    168  16:07:00  -          org.apache.qpid.broker:queue:qpid.ha-tx:ed388289-24b8-4415-9b3e-3568874c6f39
    169  16:34:19  -          org.apache.qpid.broker:queue:qpid.ha-tx:ed4417bf-ee74-4928-ad2d-989b3ed6b44a
    170  16:12:06  -          org.apache.qpid.broker:queue:qpid.ha-tx:f64dd47f-c174-40a9-aba9-f39c4ce9f5c6
    171  16:01:53  -          org.apache.qpid.broker:queue:qpid.ha-tx:fc85c5e7-be8e-44fd-8521-cf99dcfc3f96
    172  16:34:38  -          org.apache.qpid.broker:queue:reply-GZ-GAME-19.65.19797.1
    173  16:34:38  -          org.apache.qpid.broker:queue:topic-GZ-GAME-19.65.19797.1
    195  16:34:53  16:34:53   org.apache.qpid.broker:queue:64fdbc38-22a9-4950-91d6-28bdc4e9cf8d:0.0

Comment 2 masterQi 2014-05-08 16:43:31 UTC
broker:
   qpid-cpp-0.26
client version:
  qpid-client-0.26.jar
  qpid-common-0.26.jar

Comment 3 masterQi 2014-05-19 11:36:57 UTC
when i set "sessionTransacted" false the it's OK.

Comment 6 Alan Conway 2014-06-23 19:50:32 UTC
Created attachment 911563 [details]
Simple reproducer

The attached modified version of the Spout example reproduces the problem quickly. I have a 3 broker local cluster (same machine) with configuration:

auth=no
log-enable=info+
log-enable=debug+:ha
ha-cluster=yes
ha-replicate=all
ha-brokers-url=localhost:6000,localhost:6001,localhost:6002

Run the attached as follows:

java org/apache/qpid/example/Spout  -b x:y@localhost:6000 --content FOOBAR -c 100000 'q;{create:always}'

In a few seconds it hangs, then after another minute it exits with

Exception in thread "main" javax.jms.JMSException: Session exception occured while trying to commit: timed out waiting for sync: complete = 107, point = 109
	at org.apache.qpid.client.AMQSession.toJMSException(AMQSession.java:3465)
	at org.apache.qpid.client.AMQSession.commit(AMQSession.java:843)
	at org.apache.qpid.example.Spout.<init>(Spout.java:102)
	at org.apache.qpid.example.Spout.main(Spout.java:149)
Caused by: org.apache.qpid.transport.SessionException: timed out waiting for sync: complete = 107, point = 109
	at org.apache.qpid.transport.Session.sync(Session.java:867)
	at org.apache.qpid.transport.Session.sync(Session.java:837)
	at org.apache.qpid.transport.Session.invoke(Session.java:789)
	at org.apache.qpid.transport.Session.invoke(Session.java:627)
	at org.apache.qpid.transport.SessionInvoker.txCommit(SessionInvoker.java:145)
	at org.apache.qpid.client.AMQSession_0_10.commitImpl(AMQSession_0_10.java:1037)
	at org.apache.qpid.client.AMQSession.commit(AMQSession.java:830)
	... 2 more

The stack trace matches that from the bug report

Comment 7 Alan Conway 2014-06-26 22:28:20 UTC
I have a tentative fix but I want to consult a bit to see if there's a better one before I claim to have fixed this.

The fix, with reproducer, is up for review at https://reviews.apache.org/r/23096/

Comment 8 Alan Conway 2014-08-14 08:03:56 UTC
(In reply to Alan Conway from comment #7)
> I have a tentative fix but I want to consult a bit to see if there's a
> better one before I claim to have fixed this.
> 
> The fix, with reproducer, is up for review at
> https://reviews.apache.org/r/23096/

This fix is not acceptable. It masks the problem by making the client force an extra sync on the broker but this should not be necessary so we need a broker side fix. There appears to be a race condition where the broker sometimes fails to send a completion for the transaction commit command.

Comment 9 Alan Conway 2014-08-14 08:09:44 UTC
*** Bug 1117863 has been marked as a duplicate of this bug. ***

Comment 10 Alan Conway 2014-08-14 08:15:10 UTC
Bug 1117863 shows that this problem occurs with C++ client also which confirms it is a broker bug. It also shows that in the C++ case the completion IS sent eventually (c++ client is more patient) and client can continue. That should help track down the problem.

Comment 11 Leonid Zhaldybin 2014-08-14 10:34:02 UTC
*** Bug 1083676 has been marked as a duplicate of this bug. ***

Comment 12 Alan Conway 2014-08-20 21:37:33 UTC
Updated review board with broker-side fix, I think this solves the problem but will wait for review feedback. https://reviews.apache.org/r/23096

Comment 13 Alan Conway 2014-08-22 14:15:10 UTC
------------------------------------------------------------------------
r1619816 | aconway | 2014-08-22 10:13:14 -0400 (Fri, 22 Aug 2014) | 15 lines

QPID-5855: JAVA Client Can not recieve message with qpid ha cluster "Session exception occured while trying to commit"

The problem: the java client sets the sync flag on tx.commit and then waits for
completion of the entire transaction. According to the 0-10 spec, this is
correct, the commit (or rollback) will not complete until all of the
transactional commands have completed. However the C++ broker was sometimes
completing a commit *before* one of the the corresponding enqueues. It issued
the completions up to the commit (because the commit is makred sync) but there
is a "hole" for the incomplete enqueue. The enqueue is not marked sync so when
this hole is filled no completion is sent and the client hangs.

Fix: make tx.commit a "sync point", that is it behaves like execution.sync and
is not completed till all preceeding commands are complete. Note tx.rollback
does not need modification as it is never completed asynchronously.

------------------------------------------------------------------------

Comment 14 Alan Conway 2014-08-22 14:23:53 UTC
Created attachment 929663 [details]
Final reproducer with script to measure tx times.

Comment 15 Alan Conway 2014-08-29 13:48:09 UTC
This fix causes a regression: failure of qpid-txtest2 when run with linear store. Will post additional fix shortly.

Comment 16 Alan Conway 2014-08-29 19:29:57 UTC
Fix requires the following additional commit:

------------------------------------------------------------------------
r1621368 | aconway | 2014-08-29 15:29:06 -0400 (Fri, 29 Aug 2014) | 12 lines

QPID-5855: Fix to JAVA Client Can not recieve message with qpid ha cluster.

The original fix for this introduced a regression, running the qpid-txttest2
test against a cluster with the linear store failed. This fixes the fix.

- Run transaction commit logic when the commit completes. Report completion to the user only when
  all prior commands have completed (sync point)
- Fix missing initializer in client/amqp0_10/SessionImpl.cpp for transaction committing flag.
- Remove annoying log messages from IdSetter.h
- Skip transactional messages in prepare, don't wait till commit.
- Added fetch-timeout option to qpid-txtest2

------------------------------------------------------------------------

Comment 18 Alan Conway 2014-11-12 22:24:18 UTC
Fixed on branch 0.30-mrg-acownway-bz1095849

Comment 23 Michal Toth 2015-03-27 11:29:10 UTC
Created attachment 1007187 [details]
Qpid broker ha logs

Comment 24 Michal Toth 2015-03-27 11:31:55 UTC
Same error showed up, when sending a large amount of messages, using provided "simple reproducer" - modified Spout.java from example package, using default qpid.sync_op_timeout=60000.
Test was conducted on 3x rhel6x VMs set up for qpid-ha cluster.

Same result with "qpid.sync_op_timeout=5000" (error showed up sooner).

Exception in thread "main" javax.jms.JMSException: Session exception occured while trying to commit: timed out waiting for sync: complete = 2893, point = 2895
	at org.apache.qpid.client.AMQSession.toJMSException(AMQSession.java:3439)
	at org.apache.qpid.client.AMQSession.commit(AMQSession.java:843)
	at org.apache.qpid.example.Spout.<init>(Spout.java:102)
	at org.apache.qpid.example.Spout.main(Spout.java:150)
Caused by: org.apache.qpid.transport.SessionException: timed out waiting for sync: complete = 2893, point = 2895
	at org.apache.qpid.transport.Session.sync(Session.java:867)
	at org.apache.qpid.transport.Session.sync(Session.java:837)
	at org.apache.qpid.transport.Session.invoke(Session.java:789)
	at org.apache.qpid.transport.Session.invoke(Session.java:627)
	at org.apache.qpid.transport.SessionInvoker.txCommit(SessionInvoker.java:145)
	at org.apache.qpid.client.AMQSession_0_10.commitImpl(AMQSession_0_10.java:1022)
	at org.apache.qpid.client.AMQSession.commit(AMQSession.java:830)
	... 2 more


Following log data is from "qpid.sync_op_timeout=60000".

./run_example.sh org.apache.qpid.example.Spout -b guest/guest.6.251:5672 --content FOOBAR -c 1000000 --broker-option=sasl_mechs=ANONYMOUS 'q;{create:always}'

Exception in thread "main" javax.jms.JMSException: Session exception occured while trying to commit: timed out waiting for sync: complete = 472555, point = 472557
	at org.apache.qpid.client.AMQSession.toJMSException(AMQSession.java:3439)
	at org.apache.qpid.client.AMQSession.commit(AMQSession.java:843)
	at org.apache.qpid.example.Spout.<init>(Spout.java:102)
	at org.apache.qpid.example.Spout.main(Spout.java:149)
Caused by: org.apache.qpid.transport.SessionException: timed out waiting for sync: complete = 472555, point = 472557
	at org.apache.qpid.transport.Session.sync(Session.java:867)
	at org.apache.qpid.transport.Session.sync(Session.java:837)
	at org.apache.qpid.transport.Session.invoke(Session.java:789)
	at org.apache.qpid.transport.Session.invoke(Session.java:627)
	at org.apache.qpid.transport.SessionInvoker.txCommit(SessionInvoker.java:145)
	at org.apache.qpid.client.AMQSession_0_10.commitImpl(AMQSession_0_10.java:1022)
	at org.apache.qpid.client.AMQSession.commit(AMQSession.java:830)
	... 2 more

---------------------------

qpid-stat -q -b 192.168.6.251
Queues
  queue                                                                                                                      dur  autoDel  excl  msg    msgIn  msgOut  bytes  bytesIn  bytesOut  cons  bind
  ===========================================================================================================================================================================================================
  1937b4b8-9381-40f1-9455-4b41622be0bf:0.0                                                                                        Y        Y        0      0      0       0      0        0         1     2
  fb9b1ef0-f159-4fd7-8265-10828e012fe8:0.0                                                                                        Y        Y        0      0      0       0      0        0         1     2
  q                                                                                                                                               256k   256k     0    33.1m  33.1m       0         0     1
  qpid.bridge_queue_qpid.broker-replicator.bridge.431bb174-7fac-4390-abcd-efd1f368eb8d_aac99a43-3c08-4a10-98cc-1075c683f31f       Y        Y        0    311    311       0    224k     224k        1     3
  qpid.bridge_queue_qpid.broker-replicator.bridge.87e4e771-de29-49ab-a27b-c652d711f537_9c39b6c7-73b2-488b-a39d-f335a64bcbd6       Y        Y        0    301    301       0    221k     221k        1     3

--------------------------

Tested on packages
qpid-cpp-server-0.30-7.el6.x86_64
qpid-cpp-server-ha-0.30-7.el6.x86_64

Provided attachment 1007187 [details] are qpid-logs from all nodes.
Moving back to assigned.

Comment 26 Alan Conway 2015-03-30 19:06:45 UTC
Created attachment 1008641 [details]
Updated reproducer, not able to reproduce the issue.

I have not been able to reproduce the problem, have been running all day.

My current reproducer is attached, please compare to yours and let me know of any differences.

How long does it take for you to reproduce the failure (time or message count)?

I'm running on 3 RHEL 6.4 bare metal boxes, with a 0.30-mrg-checkpoint7 release build which should correspond to yours.

Can you think of any other details of your environment that I might need to reproduce? Is there any possibility you could give me access to your VMs so I can investigate directly there?

Comment 27 Michal Toth 2015-03-31 10:21:08 UTC
[0,root@mrg-qe-26 ~/bz1095849]$ ./runme 
./runme: line 18: /root/rh-qpid/release/src/tests/test_env.sh: No such file or directory
[1,root@mrg-qe-26 ~/bz1095849]$ ls
qpid-brk-ha-logs.tar  qpidd.log_1_primary  qpidd.log_2  qpidd.log_3  runme  Spout.java


So I am back to using only provided Spout.java from example package, with "transacted session" & "ssn.commit()" after each message.
[0,root@mrg-qe-26 ~/bz1095849]$ rpm -qa qpid-java*
qpid-java-client-0.30-4.el6.noarch
qpid-java-common-0.30-4.el6.noarch
qpid-java-example-0.30-4.el6.noarch


[0,root@mrg-qe-26 /usr/share/doc/qpid-java-0.30/examples]$ ./run_example.sh org.apache.qpid.example.Spout -b=guest:guest.6.251:5672 -c 1000000 "test;{create:always}"
{b=guest:guest.6.251:5672, c=1000000}
Exception in thread "main" javax.jms.JMSException: Session exception occured while trying to commit: timed out waiting for sync: complete = 1040079, point = 1040081
	at org.apache.qpid.client.AMQSession.toJMSException(AMQSession.java:3688)
	at org.apache.qpid.client.AMQSession.commit(AMQSession.java:971)
	at org.apache.qpid.example.Spout.<init>(Spout.java:102)
	at org.apache.qpid.example.Spout.main(Spout.java:148)
Caused by: org.apache.qpid.transport.SessionException: timed out waiting for sync: complete = 1040079, point = 1040081
	at org.apache.qpid.transport.Session.sync(Session.java:867)
	at org.apache.qpid.transport.Session.sync(Session.java:837)
	at org.apache.qpid.transport.Session.invoke(Session.java:789)
	at org.apache.qpid.transport.Session.invoke(Session.java:627)
	at org.apache.qpid.transport.SessionInvoker.txCommit(SessionInvoker.java:145)
	at org.apache.qpid.client.AMQSession_0_10.commitImpl(AMQSession_0_10.java:1033)
	at org.apache.qpid.client.AMQSession.commit(AMQSession.java:958)
	... 2 more


130root@ha_rhel6i_1:~# rpm -qa |grep qpid-java
qpid-java-client-0.30-4.el6.noarch
qpid-java-common-0.30-4.el6.noarch
qpid-java-example-0.30-4.el6.noarch
root@ha_rhel6i_1:~# rpm -qa |grep qpid-cpp
qpid-cpp-client-devel-docs-0.30-7.el6.noarch
qpid-cpp-client-rdma-0.30-7.el6.i686
qpid-cpp-server-0.30-7.el6.i686
qpid-cpp-client-devel-0.30-7.el6.i686
qpid-cpp-server-xml-0.30-7.el6.i686
qpid-cpp-server-rdma-0.30-7.el6.i686
qpid-cpp-client-0.30-7.el6.i686
qpid-cpp-server-devel-0.30-7.el6.i686
qpid-cpp-server-linearstore-0.30-7.el6.i686
qpid-cpp-server-ha-0.30-7.el6.i686
root@ha_rhel6i_1:~# ip r
192.168.6.0/24 dev eth1  proto kernel  scope link  src 192.168.6.11 
...



root@ha_rhel6i_2:~# qpid-stat -q -b 192.168.6.251:5672 
Queues
  queue                                                                                                                      dur  autoDel  excl  msg    msgIn  msgOut  bytes  bytesIn  bytesOut  cons  bind
  ===========================================================================================================================================================================================================
  1c8b3172-a86e-4223-9c65-f9cd1267e0e5:0.0                                                                                        Y        Y        0      0      0       0      0        0         1     2
  qpid.bridge_queue_qpid.broker-replicator.bridge.7b55a587-aa72-4780-bff3-0d5201f406e0_2983b6bd-f92b-4ee4-9f36-c5a3d5a3b406       Y        Y        0     49     49       0   46.3k    46.3k        1     3
  test                                                                                                                                            520k   520k     0    68.1m  68.1m       0         0     1




------------------------------------------------------------------------------------------------------------------------------------------------------
2nd try added following line to Spout client (3 changes in total to example Spout.java client)
System.setProperty("qpid.sync_op_timeout", "5000");


[0,root@mrg-qe-26 /usr/share/doc/qpid-java-0.30/examples]$ ./run_example.sh org.apache.qpid.example.Spout -b=guest:guest.6.251:5672 -c 1000000 "test2;{create:always}"
{b=guest:guest.6.251:5672, c=1000000}
Exception in thread "main" javax.jms.JMSException: Session exception occured while trying to commit: timed out waiting for sync: complete = 818269, point = 818271
	at org.apache.qpid.client.AMQSession.toJMSException(AMQSession.java:3688)
	at org.apache.qpid.client.AMQSession.commit(AMQSession.java:971)
	at org.apache.qpid.example.Spout.<init>(Spout.java:102)
	at org.apache.qpid.example.Spout.main(Spout.java:148)
Caused by: org.apache.qpid.transport.SessionException: timed out waiting for sync: complete = 818269, point = 818271
	at org.apache.qpid.transport.Session.sync(Session.java:867)
	at org.apache.qpid.transport.Session.sync(Session.java:837)
	at org.apache.qpid.transport.Session.invoke(Session.java:789)
	at org.apache.qpid.transport.Session.invoke(Session.java:627)
	at org.apache.qpid.transport.SessionInvoker.txCommit(SessionInvoker.java:145)
	at org.apache.qpid.client.AMQSession_0_10.commitImpl(AMQSession_0_10.java:1033)
	at org.apache.qpid.client.AMQSession.commit(AMQSession.java:958)
	... 2 more



root@ha_rhel6i_2:~# clustat 
Cluster Status for dtests_ha @ Tue Mar 31 12:01:56 2015
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 192.168.6.11                                                        1 Online, rgmanager
 192.168.6.12                                                        2 Online, Local, rgmanager
 192.168.6.13                                                        3 Online, rgmanager

 Service Name                                                     Owner (Last)                                                     State         
 ------- ----                                                     ----- ------                                                     -----         
 service:qpidd_1                                                  192.168.6.11                                                     started       
 service:qpidd_2                                                  (192.168.6.12)                                                   failed        
 service:qpidd_3                                                  192.168.6.13                                                     started       
 service:qpidd_primary                                            192.168.6.11                                                     started       

root@ha_rhel6i_2:~# qpid-stat -q -b 192.168.6.251:5672 
Queues
  queue                                                                                                                      dur  autoDel  excl  msg    msgIn  msgOut  bytes  bytesIn  bytesOut  cons  bind
  ===========================================================================================================================================================================================================
  79bc2e25-c996-44e8-987b-c4c81f9dc37e:0.0                                                                                        Y        Y        0      0      0       0      0        0         1     2
  qpid.bridge_queue_qpid.broker-replicator.bridge.ef461632-409f-4f1d-9382-ba00697112a2_172066a3-252f-4c93-89f5-f91f84adaa02       Y        Y        0   3.19k  3.19k      0   2.18m    2.18m        1     3
  qpid.ha-tx:805d9081-0cfb-46dc-bc5d-68ef02595b97                                                                                 Y                 0      0      0       0      0        0         1     1
  test                                                                                                                                            520k   520k     0    45.8m  45.8m       0         0     1
  test2                                                                                                                                           409k   409k     0    54.0m  54.0m       0         0     1


I have also noticed, that both times one of the cluster nodes died.
I _think_ that in previous runs, the brokers did not died after error reported from client.
I will make another few runs and look into broker debug logs for possible detailed errors...

I will provide you credentials/ips information on machines via email.

Comment 30 Alan Conway 2015-04-13 17:11:01 UTC
Yes but only on Michal Toth's VMs and it is very slow to reproduce. I will get to the bottom of it but no idea how long it will take.

Comment 32 Alan Conway 2015-05-21 13:21:55 UTC
Root cause appears to be mis-handling of backup connections that are timed out due to heartbeat, resulting in primary believing the backups are still connected but backups not receiving data. This needs to be fixed.

Not clear why heartbeat timeout is being triggered, that also needs to be understood - are they simply set too low or is something holding up processing.

Comment 33 Alan Conway 2015-06-01 21:10:47 UTC
I believe I finally understand the cause of this bug and have a fix. The fix is not trivial but it is isolated to transactions under HA, it will simplify the code and I think will give a substantial performance improvement. I hope to have it implemented by end of the week.

Comment 34 Alan Conway 2015-09-03 18:10:42 UTC
http://git.app.eng.bos.redhat.com/git/rh-qpid.git/commit/?h=0.30-mrg-aconwy-bz1095849

Bug 1095849 - JAVA Client Can not recieve message with qpid ha cluster "Session exception occured while trying to commit"0.30-mrg-aconwy-bz1095849
Removed complex and incorrect HA+TX logic, reverted to the following limitation:

You can use transactions in a HA cluster, but there are limitations on the
transactional guarantees. Transactions function normally with the *primary*
broker but replication to the backups is not coverted by the atomic guarantee.

The following situations are all safe:
- Client rolls back a transaction.
- Client successfully commits a transaction.
- Primary fails during a transaction *before* the client sends a commit.
- Transaction contains only one message.

The problem case is when all of the following occur:
- transaction contains multiple actions (enqueues or dequeues)
- primary fails between client sending commit and receiving commit-complete.

In this case it is possible that only part of the transaction was replicated to
the backups, so on fail-over partial transaction results may be visible.

Comment 35 Alan Conway 2015-09-09 15:20:17 UTC
Rebased to 0.34-mrg branch

http://git.app.eng.bos.redhat.com/git/rh-qpid.git/log/?h=0.34-mrg-aconway-bz1095849


Also on trunk as 

r1701109 | QPID-5855 - Simplified HA transaction logic.

Comment 37 Michal Toth 2015-09-16 08:08:41 UTC
I regrettably report, that I have been able to reproduce the issue on the latest packages (0.34-4 qpid-cpp-server).
Client received either the same, or similar exception. I have been testing on rhel6 32 and 64 bit architectures and on two different
physical machines hosting 2 different HA clusters. Minority of test runs (1 million of messages) have passed. Most of them failed with either
Exception [1],[2] or [3]. None of the HA nodes have been killed or failed.

[1] - 
Exception in thread "main" javax.jms.JMSException: Session exception occured while trying to commit: timed out waiting for sync: complete = 341865, point = 341867
	at org.apache.qpid.client.AMQSession.toJMSException(AMQSession.java:3688)
	at org.apache.qpid.client.AMQSession.commit(AMQSession.java:971)
	at org.apache.qpid.example.Spout.<init>(Spout.java:99)
	at org.apache.qpid.example.Spout.main(Spout.java:147)
Caused by: org.apache.qpid.transport.SessionException: timed out waiting for sync: complete = 341865, point = 341867
	at org.apache.qpid.transport.Session.sync(Session.java:867)
	at org.apache.qpid.transport.Session.sync(Session.java:837)
	at org.apache.qpid.transport.Session.invoke(Session.java:789)
	at org.apache.qpid.transport.Session.invoke(Session.java:627)
	at org.apache.qpid.transport.SessionInvoker.txCommit(SessionInvoker.java:145)
	at org.apache.qpid.client.AMQSession_0_10.commitImpl(AMQSession_0_10.java:1033)
	at org.apache.qpid.client.AMQSession.commit(AMQSession.java:958)

[2] - 
IoReceiver - /192.168.6.251:5672 2015-09-15 16:40:03,743 ERROR [apache.qpid.client.AMQConnection] Throwable Received but no listener set: org.apache.qpid.AMQException: ch=0 id=1 ExecutionException(errorCode=INTERNAL_ERROR, commandId=492129, classCode=5, commandCode=2, fieldIndex=0, description=internal-error: Transaction prepare failed., errorInfo={}) [error code 541: internal error]
Exception in thread "main" javax.jms.JMSException: Session exception occured while trying to commit: ch=0 id=1 ExecutionException(errorCode=INTERNAL_ERROR, commandId=492129, classCode=5, commandCode=2, fieldIndex=0, description=internal-error: Transaction prepare failed., errorInfo={})
	at org.apache.qpid.client.AMQSession.toJMSException(AMQSession.java:3688)
	at org.apache.qpid.client.AMQSession.commit(AMQSession.java:971)
	at org.apache.qpid.example.Spout.<init>(Spout.java:99)
	at org.apache.qpid.example.Spout.main(Spout.java:147)
Caused by: org.apache.qpid.transport.SessionException: ch=0 id=1 ExecutionException(errorCode=INTERNAL_ERROR, commandId=492129, classCode=5, commandCode=2, fieldIndex=0, description=internal-error: Transaction prepare failed., errorInfo={})
	at org.apache.qpid.transport.Session.sync(Session.java:876)
	at org.apache.qpid.transport.Session.sync(Session.java:837)
	at org.apache.qpid.transport.Session.invoke(Session.java:789)
	at org.apache.qpid.transport.Session.invoke(Session.java:627)
	at org.apache.qpid.transport.SessionInvoker.txCommit(SessionInvoker.java:145)
	at org.apache.qpid.client.AMQSession_0_10.commitImpl(AMQSession_0_10.java:1033)
	at org.apache.qpid.client.AMQSession.commit(AMQSession.java:958)
  
[3] - 
Exception in thread "main" javax.jms.JMSException: Session exception occured while trying to commit: Session sync was interrupted by failover.
	at org.apache.qpid.client.AMQSession.toJMSException(AMQSession.java:3688)
	at org.apache.qpid.client.AMQSession.commit(AMQSession.java:971)
	at org.apache.qpid.example.Spout.<init>(Spout.java:99)
	at org.apache.qpid.example.Spout.main(Spout.java:147)
Caused by: org.apache.qpid.transport.SessionException: Session sync was interrupted by failover.
	at org.apache.qpid.transport.Session.checkFailoverRequired(Session.java:826)
	at org.apache.qpid.transport.Session.sync(Session.java:855)
	at org.apache.qpid.transport.Session.sync(Session.java:837)
	at org.apache.qpid.transport.Session.invoke(Session.java:789)
	at org.apache.qpid.transport.Session.invoke(Session.java:627)
	at org.apache.qpid.transport.SessionInvoker.txCommit(SessionInvoker.java:145)
	at org.apache.qpid.client.AMQSession_0_10.commitImpl(AMQSession_0_10.java:1033)
	at org.apache.qpid.client.AMQSession.commit(AMQSession.java:958)

----
Reproducer is fairly easy. Execute commit on session after each sent message.
I think that setting property "qpid.sync_op_timeout" to "5000" does not make difference.
Here are some of the executions:

(Upstream spout example client committing after each sent message)
$ ./run_example.sh org.apache.qpid.example.Spout -b guest:guest.6.251:5672 -c 1000000 "test;{create:always}"
{b=guest:guest.6.251:5672, c=1000000}
0
1000
2000
...
170000
Exception in thread "main" javax.jms.JMSException: Session exception occured while trying to commit: timed out waiting for sync: complete = 341865, point = 341867
	at org.apache.qpid.client.AMQSession.toJMSException(AMQSession.java:3688)
	at org.apache.qpid.client.AMQSession.commit(AMQSession.java:971)
	at org.apache.qpid.example.Spout.<init>(Spout.java:99)
	at org.apache.qpid.example.Spout.main(Spout.java:147)
Caused by: org.apache.qpid.transport.SessionException: timed out waiting for sync: complete = 341865, point = 341867
	at org.apache.qpid.transport.Session.sync(Session.java:867)
	at org.apache.qpid.transport.Session.sync(Session.java:837)
	at org.apache.qpid.transport.Session.invoke(Session.java:789)
	at org.apache.qpid.transport.Session.invoke(Session.java:627)
	at org.apache.qpid.transport.SessionInvoker.txCommit(SessionInvoker.java:145)
	at org.apache.qpid.client.AMQSession_0_10.commitImpl(AMQSession_0_10.java:1033)
	at org.apache.qpid.client.AMQSession.commit(AMQSession.java:958)
	... 2 more

qpid-cpp-server-0.34-4.el6.i686
qpid-cpp-server-ha-0.34-4.el6.i686
qpid-java-client-0.30-7.el6.noarch
qpid-java-common-0.30-7.el6.noarch
==> Not OK -> bug is still present in the latest packages


root@ha_rhel6i_1:/var/dtests/node_data/clients# ./qc2_spout.java.sh  -b guest/guest.6.251:5672 -c 1000000 --tx-batch-size 1 "test;{create:always}"
main 2015-09-16 09:40:02,336 WARN [apache.qpid.example.CoreClient] Connection=amqp://guest:guest@test/test?sync_ack='true'&sync_publish='all'&brokerlist='tcp://192.168.6.251:5672'
main 2015-09-16 09:40:02,336 WARN [apache.qpid.example.CoreClient] Connection=amqp://guest:guest@test/test?sync_ack='true'&sync_publish='all'&brokerlist='tcp://192.168.6.251:5672'
IoReceiver - /192.168.6.251:5672 2015-09-16 09:52:05,525 ERROR [apache.qpid.example.MessagingExceptionListener] Error
javax.jms.JMSException: Exception thrown against AMQConnection:
Host: 192.168.6.251
Port: 5672
Virtual Host: test
Client ID: test
Active session count: 0: org.apache.qpid.AMQException: ch=0 id=1 ExecutionException(errorCode=INTERNAL_ERROR, commandId=1735025, classCode=5, commandCode=2, fieldIndex=0, description=internal-error: Transaction prepare failed., errorInfo={}) [error code 541: internal error]
	at org.apache.qpid.client.AMQConnection.exceptionReceived(AMQConnection.java:1297)
	at org.apache.qpid.client.AMQSession_0_10.setCurrentException(AMQSession_0_10.java:1097)
	at org.apache.qpid.client.AMQSession_0_10.exception(AMQSession_0_10.java:935)
	at org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:182)
	at org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:32)
	at org.apache.qpid.transport.ExecutionException.dispatch(ExecutionException.java:103)
	at org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:55)
	at org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:50)
	at org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:32)
	at org.apache.qpid.transport.Method.delegate(Method.java:159)
	at org.apache.qpid.transport.Session.received(Session.java:596)
	at org.apache.qpid.transport.Connection.dispatch(Connection.java:451)
	at org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:64)
	at org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:40)
	at org.apache.qpid.transport.MethodDelegate.executionException(MethodDelegate.java:110)
	at org.apache.qpid.transport.ExecutionException.dispatch(ExecutionException.java:103)
	at org.apache.qpid.transport.ConnectionDelegate.command(ConnectionDelegate.java:54)
	at org.apache.qpid.transport.ConnectionDelegate.command(ConnectionDelegate.java:40)
	at org.apache.qpid.transport.Method.delegate(Method.java:159)
	at org.apache.qpid.transport.Connection.received(Connection.java:404)
	at org.apache.qpid.transport.Connection.received(Connection.java:69)
	at org.apache.qpid.transport.network.Assembler.emit(Assembler.java:97)
	at org.apache.qpid.transport.network.Assembler.assemble(Assembler.java:198)
	at org.apache.qpid.transport.network.Assembler.frame(Assembler.java:131)
	at org.apache.qpid.transport.network.Frame.delegate(Frame.java:128)
	at org.apache.qpid.transport.network.Assembler.received(Assembler.java:102)
	at org.apache.qpid.transport.network.Assembler.received(Assembler.java:44)
	at org.apache.qpid.transport.network.InputHandler.next(InputHandler.java:199)
	at org.apache.qpid.transport.network.InputHandler.received(InputHandler.java:114)
	at org.apache.qpid.transport.network.InputHandler.received(InputHandler.java:46)
	at org.apache.qpid.transport.network.io.IoReceiver.run(IoReceiver.java:161)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.qpid.AMQException: ch=0 id=1 ExecutionException(errorCode=INTERNAL_ERROR, commandId=1735025, classCode=5, commandCode=2, fieldIndex=0, description=internal-error: Transaction prepare failed., errorInfo={}) [error code 541: internal error]
	at org.apache.qpid.client.AMQSession_0_10.setCurrentException(AMQSession_0_10.java:1081)
	... 30 more
main 2015-09-16 09:52:05,541 ERROR [apache.qpid.example.CoreClient] Error while sending message!
javax.jms.JMSException: Session exception occured while trying to commit: ch=0 id=1 ExecutionException(errorCode=INTERNAL_ERROR, commandId=1735025, classCode=5, commandCode=2, fieldIndex=0, description=internal-error: Transaction prepare failed., errorInfo={})
main 2015-09-16 09:52:05,541 ERROR [apache.qpid.example.CoreClient] Error while sending message!
javax.jms.JMSException: Session exception occured while trying to commit: ch=0 id=1 ExecutionException(errorCode=INTERNAL_ERROR, commandId=1735025, classCode=5, commandCode=2, fieldIndex=0, description=internal-error: Transaction prepare failed., errorInfo={})

Comment 50 Red Hat Bugzilla 2025-02-10 03:35:48 UTC
This product has been discontinued or is no longer tracked in Red Hat Bugzilla.