Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 771187

Summary:	Corosync segfault on client exit (probably libqb error)
Product:	[Retired] Corosync Cluster Engine	Reporter:	Jan Friesse <jfriesse>
Component:	unknown	Assignee:	Angus Salkeld <asalkeld>
Status:	CLOSED UPSTREAM	QA Contact:
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	1.4	CC:	asalkeld, jfriesse, sdake
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2012-01-11 07:38:12 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jan Friesse 2012-01-02 11:31:59 UTC

Steps to Reproduce:
1. valgrind corosync -f
2. while true;do ./testcpg2;done
3. ctrl+c on 2.


Actual results:
Sometimes (very often):
an 02 12:26:35 debug   [CPG   ] got procleave message from cluster node 1797661194
Jan 02 12:26:36 info    [MAIN  ] cs_ipcs_connection_closed()
Jan 02 12:26:36 debug   [CPG   ] exit_fn for conn=0x72b6f40
==5984== Invalid read of size 4
==5984==    at 0x989DF21: cpg_lib_exit_fn (cpg.c:963)
==5984==    by 0x4087E4: cs_ipcs_connection_closed (ipc_glue.c:444)
==5984==    by 0x547BAEF: qb_ipcs_disconnect (ipcs.c:512)
==5984==    by 0x547DF15: handle_new_connection (ipc_us.c:623)
==5984==    by 0x547E302: qb_ipcs_us_connection_acceptor (ipc_us.c:798)
==5984==    by 0x54782BD: _poll_dispatch_and_take_back_ (loop_poll.c:208)
==5984==    by 0x5477B8C: qb_loop_run_level (loop.c:43)
==5984==    by 0x5477F97: qb_loop_run (loop.c:150)
==5984==    by 0x407A65: main (main.c:1295)
==5984==  Address 0x3c is not stack'd, malloc'd or (recently) free'd
==5984==
Ringbuffer:
 ->OVERWRITE
 ->write_pt [12936]
 ->read_pt [0]
 ->size [2097152 words]
 =>free [8336860 bytes]
 =>used [51744 bytes]


Expected results:
No kill of corosync on CLIENT exit.

Additional info:
Newest coolest corosync and libqb git (c8e97a1c2e260c65eed520e5fe04995f74ab562a and 39ff78c803e76559396bf6707ebd84eac85a4335).

Problem is probably happening when:
- ipc is initialized
- initialization is not done completely and corosync service lib_init function was not called, because of ctrl+c
- call of lib_exit_fn with uninitialized conn private data

Comment 1 Angus Salkeld 2012-01-03 02:41:35 UTC

Yes, confirmed to be libqb

https://github.com/asalkeld/libqb/issues/25

Comment 2 Angus Salkeld 2012-01-11 06:22:08 UTC

Fixed in libqb >= 0.8.1