505175 – cluster node hangs when updating second member if store contains a message larger than the max frame size

Bug 505175 - cluster node hangs when updating second member if store contains a message larger than the max frame size

Summary: cluster node hangs when updating second member if store contains a message la...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise MRG
Classification:	Red Hat
Component:	qpid-cpp
Sub Component:
Version:	1.0
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	1.1.2
Target Release:	---
Assignee:	Gordon Sim
QA Contact:	Jan Sarenik
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2009-06-10 21:29 UTC by Gordon Sim
Modified:	2009-06-12 17:39 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-06-12 17:39:22 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Fix (1.28 KB, patch) 2009-06-10 23:26 UTC, Gordon Sim	no flags	Details \| Diff
Revised fix (1.14 KB, patch) 2009-06-11 01:48 UTC, Gordon Sim	no flags	Details \| Diff
Not so automated (but very helpful) test sandbox (3.45 KB, application/gzip) 2009-06-11 13:08 UTC, Jan Sarenik	no flags	Details
Show Obsolete (1) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2009:1097	0	normal	SHIPPED_LIVE	Red Hat Enterprise MRG Messaging bug fixing update	2009-06-12 17:38:48 UTC

Description Gordon Sim 2009-06-10 21:29:56 UTC

Description of problem:

If a message larger than 64k is recovered from store by the first node in the cluster, then when another node joins the first node will hang while trying to transfer that message.

Version-Release number of selected component (if applicable):

qpidd-0.5.752581-14.el5

How reproducible:

100%

Steps to Reproduce:
1. start one node
2. create durable queue
3. send large durable message (>64k) to that queue
4. stop node
5. start node again in cluster mode using store from steps above
6. start second node for that cluster
   
Actual results:

first node hangs

Expected results:

all state transfered to second node, all cluster nodes then responding to requests as usual

Additional info:

Comment 1 Gordon Sim 2009-06-10 23:26:53 UTC

Created attachment 347311 [details]
Fix

Comment 2 Gordon Sim 2009-06-10 23:35:28 UTC

Example test case:

1. create data file with one very long line:

  for i in `seq 1 1000000`; do echo x; done | tr -d '\n' > /tmp/input
  echo '' >> /tmp/input # add new line to end of single line

2. start cluster node

   qpidd --auth no --cluster-name test-cluster

3. create durable queue:

  qpid-config add queue test-queue --durable  

4. send large message:

  sender --send-eos 1 --durable true < /tmp/input

5. stop and restart node started in step 2

6. start new cluster node

  qpidd --auth no --cluster-name test-cluster --port 5673 --data-dir data-5673

7. test message was correctly transfered to this new node

  receiver -p 5673 > /tmp/output
  diff /tmp/input tmp/output

Comment 3 Gordon Sim 2009-06-10 23:39:48 UTC

Fixed on trunk as r783571.

Comment 4 Gordon Sim 2009-06-11 01:48:48 UTC

Created attachment 347324 [details]
Revised fix

Previous patch broke transfer of messages whose content was released.

Comment 5 Jan Sarenik 2009-06-11 07:50:58 UTC

Repredoced on qpidd-0.5.752581-14.el5
using the abovewritten steps. Thanks for that!

Comment 6 Jan Sarenik 2009-06-11 12:35:31 UTC

Reproduced on RHEL5-i386 <=> RHEL5-i386

Verified on qpidc-16 build in these scenarios

  First in cluster | Second in cluster
 ^^^^^^^^^^^^^^^^^^|^^^^^^^^^^^^^^^^^^^
   RHEL5-i386      |  RHEL5-i386
   RHEL5-x86_64    |  RHEL5-i386
   RHEL5-i386      |  RHEL5-x86_64
   RHEL5-x86_64    |  RHEL5-x86_64

Thanks for early build of packages!

Comment 7 Jan Sarenik 2009-06-11 13:08:04 UTC

Created attachment 347396 [details]
Not so automated (but very helpful) test sandbox

Comment 9 errata-xmlrpc 2009-06-12 17:39:22 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1097.html

Note You need to log in before you can comment on or make changes to this bug.