Bug 451432

Summary: jexception 0x0b01 txn_map::set_aio_compl() threw JERR_MAP_NOTFOUND: Key not found in map
Product: Red Hat Enterprise MRG Reporter: Gordon Sim <gsim>
Component: qpid-cppAssignee: messaging-bugs <messaging-bugs>
Status: CLOSED CURRENTRELEASE QA Contact: Kim van der Riet <kim.vdriet>
Severity: high Docs Contact:
Priority: urgent    
Version: beta   
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-12-02 16:06:33 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Gordon Sim 2008-06-14 13:09:37 UTC
Ran 

txtest --queues 5 --total-messages 5000 --messages-per-tx 500 --tx-count 100000
--size 64 

then killed this test after some time and ran 

src/tests/txtest --queues 5 --total-messages 5000 --size 64 --check yes --init
no --transfer no

Expected all messages to be recovered, got:

Queue tx-test-3: async_dequeue() failed: jexception 0x0804 wmgr::dequeue() threw
JERR_WMGR_DEQDISCONT: Dequeued new dtok when previous dequeue returned partly
completed (state DEQ_PART). (This data_tok: id=739132 state=ENQ)
(BdbMessageStore.cpp:1282)

Comment 1 Gordon Sim 2008-06-14 13:23:03 UTC
Same test on another occasion gave:

Unexpected I/O response (RHM_IORES_BUSY) on queue tx-test-3". (JournalImpl.cpp:484)


Comment 2 Gordon Sim 2008-06-14 16:49:47 UTC
The following is some output from the broker log during the run of the initial
txtest (with args as above) before it was killed:

2008-jun-14 12:30:03 error Failed to prepare: Queue tx-test-2: store() failed:
jexception 0x0b01 txn_map::set_aio_compl() threw JERR_MAP_NOTFOUND: Key not
found in map. (xid=rhm-tid0x2aaab822f6b0 rid=0x151f019) (BdbMessageStore.cpp:1194)
2008-jun-14 12:30:03 error Journal "tx-test-2": Unexpected I/O response
(RHM_IORES_BUSY) on queue tx-test-2".
2008-jun-14 12:30:03 error Commit failed with exception: Unexpected I/O response
(RHM_IORES_BUSY) on queue tx-test-2". (JournalImpl.cpp:484)
2008-jun-14 12:30:03 error Journal "tx-test-2": Unexpected I/O response
(RHM_IORES_BUSY) on queue tx-test-2".
2008-jun-14 12:30:03 error Failed to prepare: Unexpected I/O response
(RHM_IORES_BUSY) on queue tx-test-2". (JournalImpl.cpp:484)
2008-jun-14 12:30:03 error Journal "tx-test-2": Unexpected I/O response
(RHM_IORES_BUSY) on queue tx-test-2".
2008-jun-14 12:30:03 error Commit failed with exception: Unexpected I/O response
(RHM_IORES_BUSY) on queue tx-test-2". (JournalImpl.cpp:484)


Comment 3 Gordon Sim 2008-06-15 13:04:05 UTC
It looks like the root cause of the problem is that when processing a completion
event, the record of the operation can no longer be found in the txn_map and so
cannot be marked complete (which causes the follow on errors).

Comment 4 Kim van der Riet 2008-06-16 23:50:07 UTC
The primary cause of this bug was the failure to first synchronize the store
before writing the transactional commit record. The code was adjusted to first
sync, then commit, then flush and wait for the commit to return. A secondary bug
was also uncovered by this fix - recovered transactions were not placed in the
transaction map with their enqueue-complete flags set. This prevented certain
transaction/recover tests from passing. This was also fixed. The sleep times in
the AIO wait loop were shortened from 10ms to 1ms, which improved performance.

r.2153

Comment 5 Mike Bonnet 2008-06-20 03:53:45 UTC
qpidc-0.2.667603-1.el5, qpidc-perftest-0.2.667603-1.el5, qpidd-0.2.667603-1.el5, and rhm-0.2.2153-1.el5 have been pushed to the staging repo for testing