Hide Forgot
Description of problem: corosync hang when unloading service. log in corosync.log -------------------------------- Oct 24 15:45:02 corosync [SERV ] Unloading all Corosync service engines. Oct 24 15:45:02 corosync [SERV ] Service engine unloaded: corosync extended virtual synchrony service Oct 24 15:45:02 corosync [SERV ] Service engine unloaded: corosync configuration service Oct 24 15:45:02 corosync [SERV ] Service engine unloaded: corosync cluster closed process group service v1.01 Oct 24 15:45:02 corosync [SERV ] Service engine unloaded: corosync cluster config database access v1.01 Oct 24 15:45:02 corosync [SERV ] Service engine unloaded: corosync profile loading service Oct 24 15:45:02 corosync [SERV ] Service engine unloaded: corosync cluster quorum service v0.1 Oct 24 15:45:02 corosync [SERV ] Service engine unloaded: gcw cluster membership service A.01.01 Oct 24 15:45:02 corosync [CIB ] [DEBUG]: cib_exec_exit_fn [ENTER] Oct 24 15:45:02 corosync [CIB ] [DEBUG]: cib_exec_exit_fn [LEAVE] Oct 24 15:45:02 corosync [SERV ] Service engine unloaded: gcw cib service 1.0.0 Oct 24 15:45:02 corosync [CRM ] [DEBUG]: crm_exec_exit_fn [ENTER] Oct 24 15:45:02 corosync [CRM ] [DEBUG]: crm_exec_exit_fn [LEAVE] Oct 24 15:45:02 corosync [SERV ] Service engine unloaded: gcw crm service 1.0.0 Oct 24 15:45:02 corosync [TOTEM ] sending join/leave message Oct 24 15:45:02 corosync [MAIN ] Corosync Cluster Engine exiting with status 0 at main.c:1810. -------------------------------- (I think the line of last but one in corosync.log is the reason to cause the problem.) gdb info: -------------------------------- 0x000000371c20d4c4 in __lll_lock_wait () from /lib64/libpthread.so.0 -------------------------------- Version-Release number of selected component (if applicable): corosync v1.3.4 os: centos 5.6 x86_64 How reproducible: I have four nodes. each one run a simple shell script to simulate the corosync service on the node to start/stop. after several round of corosync start/stop, corosync on some nodes will be hanged. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
please run "thread apply all bt" on the gdb so we can get a proper trace of the entire system. You will need the debuginfo packages installed for this to work properly.
This has been fixed in upstream pacemaker and corosync.