Bug 470228

Summary: Abort on starting journal with changed sizing params
Product: Red Hat Enterprise MRG Reporter: Gordon Sim <gsim>
Component: qpid-cppAssignee: Kim van der Riet <kim.vdriet>
Status: CLOSED ERRATA QA Contact: Kim van der Riet <kim.vdriet>
Severity: high Docs Contact:
Priority: high    
Version: 1.0CC: freznice
Target Milestone: 1.1   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-02-04 15:36:32 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
journal files from failure scenario none

Description Gordon Sim 2008-11-06 12:03:18 UTC
I started a broker with store loaded and ran a durable perftest. I then stopped the broker and restarted with different sizing params and the broker aborted.

[gordon@thinkpad cpp]$ ./src/qpidd --auth no --load-module ~/work/bdbstore/cpp/lib/.libs/msgstore.so --jfile-size-pgs 4 --wcache-page-size 8 --num-jfiles 4 --log-enable info+
2008-nov-06 12:01:31 info Loaded Module: /home/gordon/work/bdbstore/cpp/lib/.libs/msgstore.so
2008-nov-06 12:01:31 info Management enabled
2008-nov-06 12:01:32 notice Journal "TplStore": Created
2008-nov-06 12:01:32 notice Store module initialized; dir=/home/gordon/.qpidd
2008-nov-06 12:01:32 info > Default files per journal: 4
2008-nov-06 12:01:32 info > Auto-expand enabled
2008-nov-06 12:01:32 info > Max auto-expand journal files: 16
2008-nov-06 12:01:32 info > Default jrournal file size: 4 (wpgs)
2008-nov-06 12:01:32 info > Default write cache page size: 8 (Kib)
2008-nov-06 12:01:32 info > Default number of write cache pages: 64
2008-nov-06 12:01:32 info > TPL files per journal: 8
2008-nov-06 12:01:32 info > TPL jrournal file size: 24 (wpgs)
2008-nov-06 12:01:32 info > TPL write cache page size: 4 (Kib)
2008-nov-06 12:01:32 info > TPL number of write cache pages: 64
2008-nov-06 12:01:32 notice Journal "perftest0": Created
2008-nov-06 12:01:32 warning Journal "perftest0": Recovery found 8 files (different from --num-jfiles value of 4).
2008-nov-06 12:01:32 warning Journal "perftest0": Recovery found file size = 24 (different from --jfile-size-pgs value of 4).lt-qpidd: jrnl/arr_cnt.cpp:81: u_int32_t mrg::journal::arr_cnt::incr(u_int16_t): Assertion `_size == 0 || index < _size' failed.
Aborted (core dumped)


(gdb) bt
#0  0x45b8b410 in __kernel_vsyscall ()
#1  0x45bd1069 in raise () from /lib/libc.so.6
#2  0x45bd2671 in abort () from /lib/libc.so.6
#3  0x45bca9d9 in __assert_fail () from /lib/libc.so.6
#4  0x00437e1f in mrg::journal::arr_cnt::incr (this=Variable "this" is not available.
) at jrnl/arr_cnt.cpp:81
#5  0x0043cbf9 in mrg::journal::enq_map::insert_fid (this=0x9829868, rid=1070828, fid=4, locked=false)
    at jrnl/enq_map.cpp:83
#6  0x0043cd4b in mrg::journal::enq_map::insert_fid (this=0x9829868, rid=1070828, fid=4) at jrnl/enq_map.cpp:65
#7  0x0044ac40 in mrg::journal::jcntl::rcvr_get_next_record (this=0x9829834, fid=@0xbf83b630, ifsp=0xbf83b3b8,
    lowi=@0xbf83b632, rd=@0x9829c58) at jrnl/jcntl.cpp:684
#8  0x0044b5f3 in mrg::journal::jcntl::rcvr_janalyze (this=0x9829834, rd=@0x9829c58, prep_txn_list_ptr=0xbf83b8f4)
    at jrnl/jcntl.cpp:600
#9  0x0044c15e in mrg::journal::jcntl::recover (this=0x9829834, num_jfiles=4, auto_expand=true, ae_max_jfiles=16,
    jfsize_sblks=512, wcache_num_pages=64, wcache_pgsize_sblks=16, rd_cb=0,
    wr_cb=0x3f7100 <mrg::msgstore::JournalImpl::aio_wr_callback(mrg::journal::jcntl*, std::vector<mrg::journal::data_tok*, std::allocator<mrg::journal::data_tok*> >&)>, prep_txn_list_ptr=0xbf83b8f4, highest_rid=@0xbf83bb50) at jrnl/jcntl.cpp:159
#10 0x003f8b32 in mrg::msgstore::JournalImpl::recover (this=0x9829830, num_jfiles=4, auto_expand=true, ae_max_jfiles=16,
    jfsize_sblks=512, wcache_num_pages=64, wcache_pgsize_sblks=16, rd_cb=0,
    wr_cb=0x3f7100 <mrg::msgstore::JournalImpl::aio_wr_callback(mrg::journal::jcntl*, std::vector<mrg::journal::data_tok*, std::allocator<mrg::journal::data_tok*> >&)>, prep_tx_list_ptr=0xbf83bf44, highest_rid=@0xbf83bb50, queue_id=2)
    at JournalImpl.cpp:194
#11 0x00413657 in mrg::msgstore::MessageStoreImpl::recoverQueues (this=0x9824598, txn=@0xbf83be28, registry=@0xbf83c330,
    queue_index=@0xbf83bea4, prepared=@0xbf83bf44, messages=@0xbf83be74) at JournalImpl.h:142
#12 0x004201aa in mrg::msgstore::MessageStoreImpl::recover (this=0x9824598, registry=@0xbf83c330)
    at MessageStoreImpl.cpp:593
#13 0x00978893 in qpid::broker::MessageStoreModule::recover (this=0x98286a0, registry=@0xbf83c330)
    at qpid/broker/MessageStoreModule.cpp:87
#14 0x009113af in Broker (this=0x98237a8, conf=@0x98211c0) at qpid/broker/Broker.cpp:209
#15 0x0804e600 in QpiddBroker::execute (this=0xbf83c603, options=0x9821108) at posix/QpiddBroker.cpp:157
#16 0x0804c749 in main (argc=13, argv=0xbf83c6c4) at qpidd.cpp:76

Comment 1 Gordon Sim 2008-11-06 12:06:20 UTC
Created attachment 322703 [details]
journal files from failure scenario

Journal files attached

Comment 2 Gordon Sim 2008-11-06 12:07:29 UTC
Store: 
  Last Changed Rev: 2694
Qpid:
  Last Changed Rev: 711708

Comment 3 Kim van der Riet 2008-11-14 18:25:08 UTC
This error occurs only when the number of files specified is smaller than the original count.

Comment 4 Kim van der Riet 2008-11-14 19:32:25 UTC
Fixed in r.2804

QA: This is easy to reproduce, run perftest against a default (8-file) journal, then stop the broker and restart (recover) it using --num-jfiles 4.

Comment 6 Frantisek Reznicek 2008-11-21 09:05:08 UTC
Validated that issue has been fixed on RHEL5.2 x86_64 on packages:
qpidd-0.3.709187-1.el5, qpidd-rdma-0.3.709187-1.el5, rhm-0.2.2694-1.el5
vs.
qpidd-0.3.714072-1.el5, qpidd-rdma-0.3.714072-1.el5, rhm-0.3.2804-1.el5
->VERIFIED

Comment 8 errata-xmlrpc 2009-02-04 15:36:32 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-0035.html