Bug 505175 - cluster node hangs when updating second member if store contains a message larger than the max frame size
Summary: cluster node hangs when updating second member if store contains a message la...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 1.0
Hardware: All
OS: Linux
high
high
Target Milestone: 1.1.2
: ---
Assignee: Gordon Sim
QA Contact: Jan Sarenik
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-06-10 21:29 UTC by Gordon Sim
Modified: 2009-06-12 17:39 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-06-12 17:39:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Fix (1.28 KB, patch)
2009-06-10 23:26 UTC, Gordon Sim
no flags Details | Diff
Revised fix (1.14 KB, patch)
2009-06-11 01:48 UTC, Gordon Sim
no flags Details | Diff
Not so automated (but very helpful) test sandbox (3.45 KB, application/gzip)
2009-06-11 13:08 UTC, Jan Sarenik
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2009:1097 0 normal SHIPPED_LIVE Red Hat Enterprise MRG Messaging bug fixing update 2009-06-12 17:38:48 UTC

Description Gordon Sim 2009-06-10 21:29:56 UTC
Description of problem:

If a message larger than 64k is recovered from store by the first node in the cluster, then when another node joins the first node will hang while trying to transfer that message.

Version-Release number of selected component (if applicable):

qpidd-0.5.752581-14.el5

How reproducible:

100%

Steps to Reproduce:
1. start one node
2. create durable queue
3. send large durable message (>64k) to that queue
4. stop node
5. start node again in cluster mode using store from steps above
6. start second node for that cluster
   
Actual results:

first node hangs

Expected results:

all state transfered to second node, all cluster nodes then responding to requests as usual

Additional info:

Comment 1 Gordon Sim 2009-06-10 23:26:53 UTC
Created attachment 347311 [details]
Fix

Comment 2 Gordon Sim 2009-06-10 23:35:28 UTC
Example test case:

1. create data file with one very long line:

  for i in `seq 1 1000000`; do echo x; done | tr -d '\n' > /tmp/input
  echo '' >> /tmp/input # add new line to end of single line

2. start cluster node

   qpidd --auth no --cluster-name test-cluster

3. create durable queue:

  qpid-config add queue test-queue --durable  

4. send large message:

  sender --send-eos 1 --durable true < /tmp/input

5. stop and restart node started in step 2

6. start new cluster node

  qpidd --auth no --cluster-name test-cluster --port 5673 --data-dir data-5673

7. test message was correctly transfered to this new node

  receiver -p 5673 > /tmp/output
  diff /tmp/input tmp/output

Comment 3 Gordon Sim 2009-06-10 23:39:48 UTC
Fixed on trunk as r783571.

Comment 4 Gordon Sim 2009-06-11 01:48:48 UTC
Created attachment 347324 [details]
Revised fix

Previous patch broke transfer of messages whose content was released.

Comment 5 Jan Sarenik 2009-06-11 07:50:58 UTC
Repredoced on qpidd-0.5.752581-14.el5
using the abovewritten steps. Thanks for that!

Comment 6 Jan Sarenik 2009-06-11 12:35:31 UTC
Reproduced on RHEL5-i386 <=> RHEL5-i386

Verified on qpidc-16 build in these scenarios

  First in cluster | Second in cluster
 ^^^^^^^^^^^^^^^^^^|^^^^^^^^^^^^^^^^^^^
   RHEL5-i386      |  RHEL5-i386
   RHEL5-x86_64    |  RHEL5-i386
   RHEL5-i386      |  RHEL5-x86_64
   RHEL5-x86_64    |  RHEL5-x86_64

Thanks for early build of packages!

Comment 7 Jan Sarenik 2009-06-11 13:08:04 UTC
Created attachment 347396 [details]
Not so automated (but very helpful) test sandbox

Comment 9 errata-xmlrpc 2009-06-12 17:39:22 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1097.html


Note You need to log in before you can comment on or make changes to this bug.