Bug 771187 - Corosync segfault on *client* exit (probably libqb error)
Summary: Corosync segfault on *client* exit (probably libqb error)
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Corosync Cluster Engine
Classification: Retired
Component: unknown
Version: 1.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Angus Salkeld
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-01-02 11:31 UTC by Jan Friesse
Modified: 2012-01-11 07:38 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-01-11 07:38:12 UTC


Attachments (Terms of Use)

Description Jan Friesse 2012-01-02 11:31:59 UTC
Steps to Reproduce:
1. valgrind corosync -f
2. while true;do ./testcpg2;done
3. ctrl+c on 2.


Actual results:
Sometimes (very often):
an 02 12:26:35 debug   [CPG   ] got procleave message from cluster node 1797661194
Jan 02 12:26:36 info    [MAIN  ] cs_ipcs_connection_closed()
Jan 02 12:26:36 debug   [CPG   ] exit_fn for conn=0x72b6f40
==5984== Invalid read of size 4
==5984==    at 0x989DF21: cpg_lib_exit_fn (cpg.c:963)
==5984==    by 0x4087E4: cs_ipcs_connection_closed (ipc_glue.c:444)
==5984==    by 0x547BAEF: qb_ipcs_disconnect (ipcs.c:512)
==5984==    by 0x547DF15: handle_new_connection (ipc_us.c:623)
==5984==    by 0x547E302: qb_ipcs_us_connection_acceptor (ipc_us.c:798)
==5984==    by 0x54782BD: _poll_dispatch_and_take_back_ (loop_poll.c:208)
==5984==    by 0x5477B8C: qb_loop_run_level (loop.c:43)
==5984==    by 0x5477F97: qb_loop_run (loop.c:150)
==5984==    by 0x407A65: main (main.c:1295)
==5984==  Address 0x3c is not stack'd, malloc'd or (recently) free'd
==5984==
Ringbuffer:
 ->OVERWRITE
 ->write_pt [12936]
 ->read_pt [0]
 ->size [2097152 words]
 =>free [8336860 bytes]
 =>used [51744 bytes]


Expected results:
No kill of corosync on CLIENT exit.

Additional info:
Newest coolest corosync and libqb git (c8e97a1c2e260c65eed520e5fe04995f74ab562a and 39ff78c803e76559396bf6707ebd84eac85a4335).

Problem is probably happening when:
- ipc is initialized
- initialization is not done completely and corosync service lib_init function was not called, because of ctrl+c
- call of lib_exit_fn with uninitialized conn private data

Comment 1 Angus Salkeld 2012-01-03 02:41:35 UTC
Yes, confirmed to be libqb

https://github.com/asalkeld/libqb/issues/25

Comment 2 Angus Salkeld 2012-01-11 06:22:08 UTC
Fixed in libqb >= 0.8.1


Note You need to log in before you can comment on or make changes to this bug.