Bug 771187

Summary: Corosync segfault on *client* exit (probably libqb error)
Product: [Retired] Corosync Cluster Engine Reporter: Jan Friesse <jfriesse>
Component: unknownAssignee: Angus Salkeld <asalkeld>
Status: CLOSED UPSTREAM QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 1.4CC: asalkeld, jfriesse, sdake
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-01-11 07:38:12 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Jan Friesse 2012-01-02 11:31:59 UTC
Steps to Reproduce:
1. valgrind corosync -f
2. while true;do ./testcpg2;done
3. ctrl+c on 2.


Actual results:
Sometimes (very often):
an 02 12:26:35 debug   [CPG   ] got procleave message from cluster node 1797661194
Jan 02 12:26:36 info    [MAIN  ] cs_ipcs_connection_closed()
Jan 02 12:26:36 debug   [CPG   ] exit_fn for conn=0x72b6f40
==5984== Invalid read of size 4
==5984==    at 0x989DF21: cpg_lib_exit_fn (cpg.c:963)
==5984==    by 0x4087E4: cs_ipcs_connection_closed (ipc_glue.c:444)
==5984==    by 0x547BAEF: qb_ipcs_disconnect (ipcs.c:512)
==5984==    by 0x547DF15: handle_new_connection (ipc_us.c:623)
==5984==    by 0x547E302: qb_ipcs_us_connection_acceptor (ipc_us.c:798)
==5984==    by 0x54782BD: _poll_dispatch_and_take_back_ (loop_poll.c:208)
==5984==    by 0x5477B8C: qb_loop_run_level (loop.c:43)
==5984==    by 0x5477F97: qb_loop_run (loop.c:150)
==5984==    by 0x407A65: main (main.c:1295)
==5984==  Address 0x3c is not stack'd, malloc'd or (recently) free'd
==5984==
Ringbuffer:
 ->OVERWRITE
 ->write_pt [12936]
 ->read_pt [0]
 ->size [2097152 words]
 =>free [8336860 bytes]
 =>used [51744 bytes]


Expected results:
No kill of corosync on CLIENT exit.

Additional info:
Newest coolest corosync and libqb git (c8e97a1c2e260c65eed520e5fe04995f74ab562a and 39ff78c803e76559396bf6707ebd84eac85a4335).

Problem is probably happening when:
- ipc is initialized
- initialization is not done completely and corosync service lib_init function was not called, because of ctrl+c
- call of lib_exit_fn with uninitialized conn private data

Comment 1 Angus Salkeld 2012-01-03 02:41:35 UTC
Yes, confirmed to be libqb

https://github.com/asalkeld/libqb/issues/25

Comment 2 Angus Salkeld 2012-01-11 06:22:08 UTC
Fixed in libqb >= 0.8.1