Bug 505175 - cluster node hangs when updating second member if store contains a message larger than the max frame size
cluster node hangs when updating second member if store contains a message la...
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp (Show other bugs)
All Linux
high Severity high
: 1.1.2
: ---
Assigned To: Gordon Sim
Jan Sarenik
Depends On:
  Show dependency treegraph
Reported: 2009-06-10 17:29 EDT by Gordon Sim
Modified: 2009-06-12 13:39 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2009-06-12 13:39:22 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Fix (1.28 KB, patch)
2009-06-10 19:26 EDT, Gordon Sim
no flags Details | Diff
Revised fix (1.14 KB, patch)
2009-06-10 21:48 EDT, Gordon Sim
no flags Details | Diff
Not so automated (but very helpful) test sandbox (3.45 KB, application/gzip)
2009-06-11 09:08 EDT, Jan Sarenik
no flags Details

  None (edit)
Description Gordon Sim 2009-06-10 17:29:56 EDT
Description of problem:

If a message larger than 64k is recovered from store by the first node in the cluster, then when another node joins the first node will hang while trying to transfer that message.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. start one node
2. create durable queue
3. send large durable message (>64k) to that queue
4. stop node
5. start node again in cluster mode using store from steps above
6. start second node for that cluster
Actual results:

first node hangs

Expected results:

all state transfered to second node, all cluster nodes then responding to requests as usual

Additional info:
Comment 1 Gordon Sim 2009-06-10 19:26:53 EDT
Created attachment 347311 [details]
Comment 2 Gordon Sim 2009-06-10 19:35:28 EDT
Example test case:

1. create data file with one very long line:

  for i in `seq 1 1000000`; do echo x; done | tr -d '\n' > /tmp/input
  echo '' >> /tmp/input # add new line to end of single line

2. start cluster node

   qpidd --auth no --cluster-name test-cluster

3. create durable queue:

  qpid-config add queue test-queue --durable  

4. send large message:

  sender --send-eos 1 --durable true < /tmp/input

5. stop and restart node started in step 2

6. start new cluster node

  qpidd --auth no --cluster-name test-cluster --port 5673 --data-dir data-5673

7. test message was correctly transfered to this new node

  receiver -p 5673 > /tmp/output
  diff /tmp/input tmp/output
Comment 3 Gordon Sim 2009-06-10 19:39:48 EDT
Fixed on trunk as r783571.
Comment 4 Gordon Sim 2009-06-10 21:48:48 EDT
Created attachment 347324 [details]
Revised fix

Previous patch broke transfer of messages whose content was released.
Comment 5 Jan Sarenik 2009-06-11 03:50:58 EDT
Repredoced on qpidd-0.5.752581-14.el5
using the abovewritten steps. Thanks for that!
Comment 6 Jan Sarenik 2009-06-11 08:35:31 EDT
Reproduced on RHEL5-i386 <=> RHEL5-i386

Verified on qpidc-16 build in these scenarios

  First in cluster | Second in cluster
   RHEL5-i386      |  RHEL5-i386
   RHEL5-x86_64    |  RHEL5-i386
   RHEL5-i386      |  RHEL5-x86_64
   RHEL5-x86_64    |  RHEL5-x86_64

Thanks for early build of packages!
Comment 7 Jan Sarenik 2009-06-11 09:08:04 EDT
Created attachment 347396 [details]
Not so automated (but very helpful) test sandbox
Comment 9 errata-xmlrpc 2009-06-12 13:39:22 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.