Bug 485754

Summary: openais-0.80.5-1 sporadic deadlocks and errors
Product: Red Hat Enterprise Linux 5 Reporter: Alan Conway <aconway>
Component: openaisAssignee: Steven Dake <sdake>
Status: CLOSED DUPLICATE QA Contact: Cluster QE <mspqa-list>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 5.3CC: cluster-maint, edamato, iboverma
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-02-17 23:32:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Alan Conway 2009-02-16 17:51:38 UTC
Description of problem:

After working correctly for a while I am seeing two errors in qpid

1. cpg_dispatch returns CPG_LIBRARY_ERR code (2).

2. qpidd deadlocks with two threads stuck in semop() under calls to cpg_dispatch and cpg_mcast_joined:

Thread 4 (Thread 0x42cd8940 (LWP 22617)):
#0  0x00000036facd4477 in semop () from /lib64/libc.so.6
#1  0x00002b48f1e0ed1b in openais_dispatch_recv ()
#2  0x00002b48f1e0f9ba in cpg_dispatch () from /usr/lib64/openais/libcpg.so.2
#3  0x00002b48f1ba3833 in qpid::cluster::Cpg::dispatchAll ()
#4  0x00002b48f1bbca43 in qpid::cluster::PollerDispatch::dispatch ()
#5  0x00002b48f188745f in boost::function1<void, qpid::sys::DispatchHandle&, std::allocator<boost::function_base> >::operator() ()
#6  0x00002b48f1885448 in qpid::sys::DispatchHandle::processEvent ()
#7  0x00002b48f183d498 in qpid::sys::Poller::run ()
#8  0x00002b48f183546a in qpid::sys::(anonymous namespace)::runRunnable ()
#9  0x00000036fb806367 in start_thread () from /lib64/libpthread.so.0
#10 0x00000036facd30ad in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x436d9940 (LWP 22618)):
#0  0x00000036facd4477 in semop () from /lib64/libc.so.6
#1  0x00002b48f1e0ebc4 in openais_msg_send_reply_receive ()
#2  0x00002b48f1e0f65e in cpg_mcast_joined ()
#3  0x00002b48f1ba53a7 in qpid::cluster::Cpg::mcast ()


Version-Release number of selected component (if applicable):  openais-0.80.5-1

How reproducible:

One of the errors generally occurs within 4-5 runs on mrg10.lab.bos.redhat.com
The CPG_ERR_LIBRARLY condition is more common, I've only seen the deadlock once.

Steps to Reproduce:

on mrg10.lab.bos.redhat.com, put /home/aconway/bin in your PATH
1. startais # starts openais on mrg7,8,9,10
2. restartcluster # starts qpidd on same hosts
3. while benchmark ; do true; done

Comment 1 Steven Dake 2009-02-17 23:32:03 UTC

*** This bug has been marked as a duplicate of bug 474277 ***