Bug 748351 - corosync unloading hang on "__lll_lock_wait ()"
Summary: corosync unloading hang on "__lll_lock_wait ()"
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Fedora
Classification: Fedora
Component: corosync
Version: rawhide
Hardware: x86_64
OS: Linux
unspecified
low
Target Milestone: ---
Assignee: Steven Dake
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-10-24 08:51 UTC by Shining
Modified: 2016-04-26 17:42 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-06-11 16:31:59 UTC
Type: ---


Attachments (Terms of Use)

Description Shining 2011-10-24 08:51:44 UTC
Description of problem:
  corosync hang when unloading service.
  log in corosync.log 
   --------------------------------
Oct 24 15:45:02 corosync [SERV  ] Unloading all Corosync service engines.
Oct 24 15:45:02 corosync [SERV  ] Service engine unloaded: corosync extended virtual synchrony service
Oct 24 15:45:02 corosync [SERV  ] Service engine unloaded: corosync configuration service
Oct 24 15:45:02 corosync [SERV  ] Service engine unloaded: corosync cluster closed process group service v1.01
Oct 24 15:45:02 corosync [SERV  ] Service engine unloaded: corosync cluster config database access v1.01
Oct 24 15:45:02 corosync [SERV  ] Service engine unloaded: corosync profile loading service
Oct 24 15:45:02 corosync [SERV  ] Service engine unloaded: corosync cluster quorum service v0.1
Oct 24 15:45:02 corosync [SERV  ] Service engine unloaded: gcw cluster membership service A.01.01
Oct 24 15:45:02 corosync [CIB   ] [DEBUG]: cib_exec_exit_fn [ENTER]
Oct 24 15:45:02 corosync [CIB   ] [DEBUG]: cib_exec_exit_fn [LEAVE]
Oct 24 15:45:02 corosync [SERV  ] Service engine unloaded: gcw cib service 1.0.0
Oct 24 15:45:02 corosync [CRM   ] [DEBUG]: crm_exec_exit_fn [ENTER]
Oct 24 15:45:02 corosync [CRM   ] [DEBUG]: crm_exec_exit_fn [LEAVE]
Oct 24 15:45:02 corosync [SERV  ] Service engine unloaded: gcw crm service 1.0.0
Oct 24 15:45:02 corosync [TOTEM ] sending join/leave message
Oct 24 15:45:02 corosync [MAIN  ] Corosync Cluster Engine exiting with status 0 at main.c:1810.
   --------------------------------
   (I think the line of last but one in corosync.log is the reason to cause the problem.)

   gdb info:
   --------------------------------
0x000000371c20d4c4 in __lll_lock_wait () from /lib64/libpthread.so.0
   --------------------------------


Version-Release number of selected component (if applicable):
  corosync v1.3.4
  os: centos 5.6 x86_64

How reproducible:
  I have four nodes. each one run a simple shell script to simulate the corosync service on the node to start/stop. after several round of corosync  start/stop, corosync on some nodes will be hanged.

Steps to Reproduce:
1.
2.
3.
  
Actual results:



Expected results:


Additional info:

Comment 1 Steven Dake 2011-10-24 14:31:02 UTC
please run "thread apply all bt" on the gdb so we can get a proper trace of the entire system.  You will need the debuginfo packages installed for this to work properly.

Comment 2 Steven Dake 2012-06-11 16:31:59 UTC
This has been fixed in upstream pacemaker and corosync.


Note You need to log in before you can comment on or make changes to this bug.