Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1063700

Summary:

[linearstore] broker restart fails under stress test

Product:

Red Hat Enterprise MRG

Reporter:

Pavel Moravec <pmoravec>

Component:

qpid-cpp

Assignee:

Kim van der Riet <kim.vdriet>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Zdenek Kraus <zkraus>

Severity:

high

Docs Contact:

Priority:

high

Version:

3.0

CC:

esammons, iboverma, jross, zkraus

Target Milestone:

3.0

Keywords:

Reproducer

Target Release:

---

Hardware:

All

OS:

All

Whiteboard:

Fixed In Version:

qpid-cpp-0.22-36

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2015-01-21 12:55:59 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

709325

Attachments:

Description	Flags
reproducer script	none
/var/lib/qpidd from failing broker start	none
/var/lib/qpidd_bckp before starting the broker and /var/lib/qpidd after the broker startup failure	none

Description Pavel Moravec 2014-02-11 09:56:38 UTC

Description of problem:
Running a stress test where many messages are purged/consumed from a queue at the same time (just to try returning many EPL together) while new messages are still coming, broker restart sometime fails with JERR_JREC_BADRECTAIL.


Version-Release number of selected component (if applicable):
qpid-cpp-server-linearstore-0.22-35.el6.x86_64


How reproducible:
100% within 15 minutes, but in non-deterministic scenario


Steps to Reproduce:
Run attached script


Actual results:
Script terminates after several broker restart attempts (usually within 5, sometimes much more), with error log:

2014-02-11 09:19:45 [Broker] critical Unexpected error: Daemon startup failed: Queue testQueue: recoverMessages() failed: jexception 0x0701 RecoveryManager::readNextRemainingRecord() threw JERR_JREC_BADRECTAIL: Invalid record tail. (Bad record tail:
  Magic: expected 0x9aacb3ae; found 0xae
  Serial: expected 0xdfd9ef00f95ec0e8; found 0x3bd597e6c9
  Record Id: expected 0x200422; found 0x0
  Checksum: expected 0xfa45392f; found 0x10) (/builddir/build/BUILD/qpid-0.22/cpp/src/qpid/linearstore/MessageStoreImpl.cpp:1004)


Expected results:
Script never terminates


Additional info:
Attaching also one /var/lib/qpidd directory where the broker cant restart

Comment 1 Pavel Moravec 2014-02-11 09:58:43 UTC

Created attachment 861729 [details]
reproducer script

Comment 2 Pavel Moravec 2014-02-11 10:16:34 UTC

Created attachment 861741 [details]
/var/lib/qpidd from failing broker start

Broker logs:

2014-02-11 10:13:46 [Broker] critical Unexpected error: Queue testQueue: recoverMessages() failed: jexception 0x0701 RecoveryManager::readNextRemainingRecord() threw JERR_JREC_BADRECTAIL: Invalid record tail. (Bad record tail:
  Serial: expected 0xb0b6a3e94bf87441; found 0x3bd597e6c9
  Record Id: expected 0x1978c; found 0x0
  Checksum: expected 0x42a237f9; found 0x10) (/builddir/build/BUILD/qpid-0.22/cpp/src/qpid/linearstore/MessageStoreImpl.cpp:1004)

Comment 3 Kim van der Riet 2014-02-11 18:12:42 UTC

If you have the store files from the failure, this would be most valuable. Presumably your script preserved the store before the recovery was attempted; replacing the qls store directory with the saved store and restarting the broker should make the error 100% reproducible.

If you have the store, please send it to me or attach it to this bug.

Comment 4 Pavel Moravec 2014-02-12 08:14:24 UTC

Created attachment 862147 [details]
/var/lib/qpidd_bckp before starting the broker and /var/lib/qpidd after the broker startup failure

See /var/lib/qpidd_bckp as backed up directory before invoking "service qpiddd start".

Broker logged meantime:

2014-02-12 08:06:07 [Broker] critical Unexpected error: Daemon startup failed: Queue testQueue: recoverMessages() failed: jexception 0x0701 RecoveryManager::readNextRemainingRecord() threw JERR_JREC_BADRECTAIL: Invalid record tail. (Bad record tail:
  Magic: expected 0x9aacb3ae; found 0x6
  Serial: expected 0x3e8b1a565b177139; found 0x3bd597e6c9
  Record Id: expected 0x1978b; found 0x0
  Checksum: expected 0x7f893667; found 0x10) (/builddir/build/BUILD/qpid-0.22/cpp/src/qpid/linearstore/MessageStoreImpl.cpp:1004)

Comment 5 Kim van der Riet 2014-03-05 13:56:45 UTC

Upstream bug at https://issues.apache.org/jira/browse/QPID-5603

Comment 6 Kim van der Riet 2014-03-05 14:00:15 UTC

The attached store journal contains a record in which the record tail is exactly divided by a file boundary from the rest of the record. This meets the conditions of the upstream bug in comment #5.

Comment 7 Kim van der Riet 2014-03-05 14:53:19 UTC

Fixed in r.1574513. See upstream bug for further comments.

Comment 8 Zdenek Kraus 2014-03-18 08:21:35 UTC

this issue was tested on RHEL 6.5 i686 & x86_64 with following packages:

perl-qpid-0.22-11.el6
python-qpid-0.22-12.el6
python-qpid-qmf-0.22-28.el6
qpid-cpp-client-0.22-36.el6
qpid-cpp-client-devel-0.22-36.el6
qpid-cpp-client-devel-docs-0.22-36.el6
qpid-cpp-debuginfo-0.22-36.el6
qpid-cpp-server-0.22-36.el6
qpid-cpp-server-devel-0.22-36.el6
qpid-cpp-server-ha-0.22-36.el6
qpid-cpp-server-linearstore-0.22-36.el6
qpid-cpp-server-xml-0.22-36.el6
qpid-java-client-0.22-6.el6
qpid-java-common-0.22-6.el6
qpid-java-example-0.22-6.el6
qpid-jca-0.22-2.el6
qpid-jca-xarecovery-0.22-2.el6
qpid-proton-c-0.6-1.el6
qpid-proton-c-devel-0.6-1.el6
qpid-proton-debuginfo-0.6-1.el6
qpid-qmf-0.22-28.el6
qpid-qmf-debuginfo-0.22-28.el6
qpid-snmpd-1.0.0-16.el6
qpid-snmpd-debuginfo-1.0.0-16.el6
qpid-tools-0.22-9.el6
ruby-qpid-qmf-0.22-28.el6


-> VERIFIED