Created attachment 332928 [details] bz481822 reproducer Description of problem: During the bz480964 validation I created test which performs needed actions and finally found this issue. The qpidd exits with message: 2009-feb-23 14:10:05 error Connection 127.0.0.1:39934 closed by error: Queue tx-test-2: loadContent() failed: jexception 0x0a00 data_tok::set_rstate( ) threw JERR_MTOK_ILLEGALSTATE: Attempted to change to illegal state. (Attempted to change read state to READ while write state is not enqueued (wsta te ENQ); wstate=<wstate unknown>.) (MessageStoreImpl.cpp:1288)(320) slock::slock(): pthread_mutex_lock: Invalid argument when messages larger than --staging-threshold threshold The test including logs and journals are in mrg3.lab.bos.redhat.com:/root/bz481822_fails090223.tar.bz2 Version-Release number of selected component (if applicable): qpidd-0.4.744917-1.el5, rhm-0.4.3116-3.el5 How reproducible: 100% Steps to Reproduce: run test stored on mrg3 or do following 1. start qpidd with --staging-threshold 500 -p ${QPIDD_PORT} 2. run txtest --messages-per-tx 10 --tx-count 1000 --total-messages 100 \ -p ${QPIDD_PORT} --dtx yes --size 2000 --queues 4 \ --durable yes --log-enable info+ Actual results: qpidd exits after point 2 with JERR_MTOK_ILLEGALSTATE msg Expected results: qpidd should be alive and running Additional info: Reproducer attached # reproducer transcript [root@dhcp-lab-200 bz481822]# ./run.sh [14:12:08] Stopping all running qpidd instances... [14:12:08] qpidd_stop: No qpidd broker found to stop! [14:12:08] .qpidd stopped ok [14:12:08] qpidd_start: qpidd launched normal bg way (port:64002,log:qpidd.transcript.log, params: --auth no --staging-threshold 500 --log-enable debug+) [14:12:09] qpidd_wait_on_settle: qpidd started-up (dur:0sec) [14:12:09] .qpidd settled [14:12:09] 1st txtest > threshold (i=0, durable=yes, dtx:yes, j=0) run 0 [durable=yes] - ecode=2 [14:12:11] .qpidd status check [14:12:11] qpidd_status: 0 instance[s] running (pids:,ports:) [14:12:11] ..ERROR:qpidd not running fine! (pid[s]:, port[s]:, inst_cnt:0) [14:12:11] 1st txtest < threshold (i=0, durable=yes, dtx:yes, j=0) run 0 [durable=yes] - ecode=2 [14:12:11] .qpidd status check [14:12:11] qpidd_status: 0 instance[s] running (pids:,ports:) [14:12:11] ..ERROR:qpidd not running fine! (pid[s]:, port[s]:, inst_cnt:0) done (err_cnt=4) [root@dhcp-lab-200 bz481822]#
This is a logic bug in the enqueue record decode path, which did not correctly consider the possibility of a record being external (triggered by the flow-to-disk). Fixed in r.3128 QA: The reproducer above should be included in some of the soak tests with randomized parameters.
The issue has been fixed, validated on RHEL 4.7 / 5.3 i386 / x86_64 on packages: qpidd-0.4.750054-1.el5, rhm-0.4.3138-2.el5. ->VERIFIED
Fixed and verified; closing.