Bug 691708

Summary: qpidd aborts in mrg::journal::wmgr::get_events when broker killed during operation
Product: Red Hat Enterprise MRG Reporter: Frantisek Reznicek <freznice>
Component: qpid-cppAssignee: Kim van der Riet <kim.vdriet>
Status: NEW --- QA Contact: MRG Quality Engineering <mrgqe-bugs>
Severity: high Docs Contact:
Priority: medium    
Version: DevelopmentCC: gsim, iboverma, jross
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Frantisek Reznicek 2011-03-29 10:13:40 UTC
Description of problem:

Qpidd broker ocassionly aborts during shutdown when txtest running.
The broker shutdown was done by kill -2 <pid> and after some time kill -15 <pid> because qpidd did not start.


This behavior was observed once on RHEL 5.6 i386.

Version-Release number of selected component (if applicable):
> python-qpid-0.10-1.el5
> python-qpid-qmf-0.10-2.el5
> python-saslwrapper-0.1.934605-2.el5
> qpid-cpp-client-0.10-1.el5
> qpid-cpp-client-devel-0.10-1.el5
> qpid-cpp-client-devel-docs-0.10-1.el5
> qpid-cpp-client-ssl-0.10-1.el5
> qpid-cpp-mrg-debuginfo-0.10-1.el5
> qpid-cpp-server-0.10-1.el5
> qpid-cpp-server-cluster-0.10-1.el5
> qpid-cpp-server-devel-0.10-1.el5
> qpid-cpp-server-ssl-0.10-1.el5
> qpid-cpp-server-store-0.10-1.el5
> qpid-cpp-server-xml-0.10-1.el5
> qpid-java-client-0.10-1.el5
> qpid-java-common-0.10-1.el5
> qpid-java-example-0.10-1.el5
> qpid-qmf-0.10-2.el5
> qpid-qmf-devel-0.10-2.el5
> qpid-tests-0.10-1.el5
> qpid-tools-0.10-1.el5
> ruby-qpid-0.7.946106-2.el5
> ruby-saslwrapper-0.1.934605-2.el5
> saslwrapper-0.1.934605-2.el5
> saslwrapper-devel-0.1.934605-2.el5
> sesame-0.10-1.el5


How reproducible:
N/A

Steps to Reproduce:
1. run 
2.
3.
  
Actual results:
qpidd aborts during shutdown.

Expected results:
qpidd should not abort during shutdown.

Additional info:
[22:23:30] run(A) 61/100--------------------------[err_cnt:0]
[22:23:40] .starting qpidd - 1]
[22:23:40] .waiting for qpidd start - 2]
.
[22:23:43] ..qpidd just started
[22:23:45] ..qpidd started on port 5672, parent lock file exists
[22:23:45] ..qpidd process detected (11485)
[22:23:45] .running txtest - 3]
[22:23:45] ..config:[msg_per_tx:185, tx_cnt:10518, msg_cnt:3279, msg_size:223, q_cnt:13, --dtx no]
..
[22:23:49] ..txtest just started (and transfering)
[22:23:49] ..qpidd & txtest up and running
[22:23:49] .wait some rand time & kill the qpidd (now waiting 8sec) - 4]
[22:23:57] INFO:mrg_kill_process_id:processe[s] 11485  did not stop yet!
[22:24:07] .pack the journal (before qpidd tries to recover) - 5]
[22:24:37] ..qpidd journal saved in qpidd_journal_a0061.tar.bz2
[22:24:37] .dummy step (journal lock removal disabled) - 6]
[22:24:42] ..qpidd & txtest and not running at the moment
[22:24:42] .re-start qpidd for journal recovery - 7]
[22:24:42] .wait for qpidd journal recovery settles - 8]
.
[22:24:45] ..qpidd just started
[22:24:45] ..qpidd started on port 5672, lock is present
[22:24:45] .re-running txtest - 9]
[22:24:51] ..txtest went OK (ecode:0;nr.lines:16 txtest.transcript.log)
[22:24:56] .stop qpidd/txtest (if needed) - 10]
[22:24:56] ..killing /usr/sbin/qpidd (PID:12146)
./qpidd_runner.sh: line 27: 12146 Killed                  ${QPIDD_DIR}/qpidd ${RUNNER_QPIDD_PARAMS} >> ${QPIDD_TRANSCRIPT} 2>&1
[22:24:56] mrg_kill_process_id:processes 12146  stopped by signal[s] 9
[22:25:07] run(A) 62/100--------------------------[err_cnt:0]

MARK-LWD-LOOP -- 2011-03-27 22:25:11 --
[22:25:17] .starting qpidd - 1]
[22:25:17] .waiting for qpidd start - 2]
.
[22:25:19] ..qpidd just started
[22:25:21] ..qpidd started on port 5672, parent lock file exists
[22:25:21] ..qpidd process detected (12568)
[22:25:21] .running txtest - 3]
[22:25:21] ..config:[msg_per_tx:237, tx_cnt:12182, msg_cnt:3215, msg_size:261, q_cnt:6, --dtx no]
.
[22:25:24] ..txtest just started (and transfering)
[22:25:24] ..qpidd & txtest up and running
[22:25:24] .wait some rand time & kill the qpidd (now waiting 6sec) - 4]
[22:25:30] INFO:mrg_kill_process_id:processe[s] 12568  did not stop yet!
./qpidd_runner.sh: line 27: 12568 Aborted                 (core dumped) ${QPIDD_DIR}/qpidd ${RUNNER_QPIDD_PARAMS} >> ${QPIDD_TRANSCRIPT} 2>&1
[22:25:40] .pack the journal (before qpidd tries to recover) - 5]
[22:26:02] ..qpidd journal saved in qpidd_journal_a0062.tar.bz2
[22:26:02] .dummy step (journal lock removal disabled) - 6]
[22:26:07] ..qpidd & txtest and not running at the moment
[22:26:07] .re-start qpidd for journal recovery - 7]
[22:26:07] .wait for qpidd journal recovery settles - 8]
.
[22:26:10] ..qpidd just started
[22:26:10] ..qpidd started on port 5672, lock is present
[22:26:10] .re-running txtest - 9]
[22:26:16] ..txtest went OK (ecode:0;nr.lines:9 txtest.transcript.log)
[22:26:21] .stop qpidd/txtest (if needed) - 10]
[22:26:21] ..killing /usr/sbin/qpidd (PID:13123)
./qpidd_runner.sh: line 27: 13123 Killed                  ${QPIDD_DIR}/qpidd ${RUNNER_QPIDD_PARAMS} >> ${QPIDD_TRANSCRIPT} 2>&1
[22:26:21] mrg_kill_process_id:processes 13123  stopped by signal[s] 9
[22:26:32] run(A) 63/100--------------------------[err_cnt:0]


  (*): Shared library is missing debugging information.
  (gdb)   8 Thread 12568  0x0038e410 in __kernel_vsyscall ()
    7 Thread 12570  0x0038e410 in __kernel_vsyscall ()
    6 Thread 12576  0x002064ca in ?? () from /usr/lib/libaio.so.1
    5 Thread 12577  0x002064ca in ?? () from /usr/lib/libaio.so.1
    4 Thread 12582  0x0038e410 in __kernel_vsyscall ()
    3 Thread 12583  0x0038e410 in __kernel_vsyscall ()
    2 Thread 12589  0x0038e410 in __kernel_vsyscall ()
  * 1 Thread 12569  0x0038e410 in __kernel_vsyscall ()
  Thread 1 (Thread 12569):
  #0  0x0038e410 in __kernel_vsyscall ()
  #1  0x009dedf0 in raise () from /lib/libc.so.6
  #2  0x009e0701 in abort () from /lib/libc.so.6
  #3  0x0051bb50 in __gnu_cxx::__verbose_terminate_handler() ()
     from /usr/lib/libstdc++.so.6
  #4  0x00519555 in ?? () from /usr/lib/libstdc++.so.6
  #5  0x00519592 in std::terminate() () from /usr/lib/libstdc++.so.6
  #6  0x005196ca in __cxa_throw () from /usr/lib/libstdc++.so.6
  #7  0x001c4e2c in mrg::journal::wmgr::get_events (this=0x9dc460c, 
      state=UNUSED, timeout=0x0, flush=false) at jrnl/wmgr.cpp:676
  #8  0x001929a9 in mrg::journal::jcntl::get_wr_events (this=0x9dc43cc, 
      timeout=0x0) at jrnl/jcntl.cpp:363
  #9  0x00140799 in mrg::msgstore::JournalImpl::getEventsFire (this=0x9dc43c8)
      at JournalImpl.cpp:505
  #10 0x00140827 in mrg::msgstore::GetEventsFireEvent::fire (this=0x9dc41b8)
      at JournalImpl.cpp:52
  #11 0x0077e245 in qpid::sys::TimerTask::fireTask (this=0x9dc41b8)
      at qpid/sys/Timer.cpp:57
  #12 0x0077e2af in qpid::sys::Timer::fire (this=0x9d39048, t=...)
      at qpid/sys/Timer.cpp:188
  #13 0x0077f693 in qpid::sys::Timer::run (this=0x9d39048)
      at qpid/sys/Timer.cpp:123
  #14 0x00684341 in qpid::sys::(anonymous namespace)::runRunnable (p=0x9d39048)
      at qpid/sys/posix/Thread.cpp:35
  #15 0x00b48832 in start_thread () from /lib/libpthread.so.0
  #16 0x00a880ae in clone () from /lib/libc.so.6
  (gdb) quit