854666 – cluster initial update stall when a queue has >10k messages with message groups set

Bug 854666 - cluster initial update stall when a queue has >10k messages with message groups set

Summary: cluster initial update stall when a queue has >10k messages with message grou...

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	Red Hat Enterprise MRG
Classification:	Red Hat
Component:	qpid-cpp
Sub Component:
Version:	2.1
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	messaging-bugs
QA Contact:	MRG Quality Engineering
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-09-05 13:56 UTC by Pavel Moravec
Modified:	2025-02-10 03:20 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2025-02-10 03:20:45 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Patch proposal (11.65 KB, patch) 2012-09-24 15:26 UTC, Pavel Moravec	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Apache JIRA	QPID-4343	0	None	None	None	2012-09-24 15:29:55 UTC
Red Hat Knowledge Base (Solution)	960903	0	None	None	None	Never

Description Pavel Moravec 2012-09-05 13:56:34 UTC

Description of problem:
Having qpid broker in a cluster and using message groups, an attempt to join a clustered peer causes cluster stall during initial update process, when some queue has >10k messages with message groups set.

The reason is that updater node sends information about message groups in ClusterConnectionQueueObserverStateBody message (exactly one message per one queue). If some queue has "too much" messages with msg.groups, such ClusterConnectionQueueObserverStateBody message does not fit into one AMQP frame and it is silently(!) dropped by the updater.

Updatee node then waits for the message while updater node (and consequently whole cluster) waits for updatee to mark itself as ready.


Version-Release number of selected component (if applicable):
0.14-21, almost surely in 0.18


How reproducible:
100%


Steps to Reproduce:
1. Have 2node cluster with 1 node running
2. Produce at least 10k messages with message groups to it:
qpid-send --group-key "GROUP_KEY" -m 10000 -a "groupQ; {create:always, node:{type:queue, x-declare:{ arguments:{'qpid.group_header_key':'GROUP_KEY', 'qpid.shared_msg_group':1 }}}}"
3. (re)start 2nd node twice - due to some unknown reason, the first start succeeds while the second does not.


Actual results:
New joiner stalls the cluster.


Expected results:
No broker joining a cluster can stall the cluster.


Additional info:

Comment 1 Pavel Moravec 2012-09-05 14:38:46 UTC

(fyi it is enough to send 6000 messages in above scenario to trigger the bug, while resetting group prefix by --group-prefix "" would cause 6k messages to pass in single ClusterConnectionQueueObserverStateBody message)

Comment 3 Pavel Moravec 2012-09-24 15:26:30 UTC

Created attachment 616619 [details]
Patch proposal

Patch proposal.

Instead of sending one too-huge-to-encode AMQP message from UpdateClient to update state of MessageGroupManager, more state updates are sent - one per each message group. As a message group consists of few messages only, this approach should not hit the original problem any more.

a/src/qpid/cluster/UpdateClient.cpp has to be changed to send potentially more updates by one StatefulQueueObserver. 

a/src/qpid/broker/QueueFlowLimit.h changed is a direct consequence of that

MessageGroupManager::getState and MessageGroupManager::setState in fact does the same as before but without the "for (GroupMap::const_iterator .." loop done from UpdateClient.

Comment 4 Justin Ross 2013-02-22 15:51:02 UTC

Involves clustering impl not present in 2.4/0.22.

Comment 6 Red Hat Bugzilla 2025-02-10 03:20:45 UTC

This product has been discontinued or is no longer tracked in Red Hat Bugzilla.

Note You need to log in before you can comment on or make changes to this bug.