Description of problem: If you run "corosync-cfgtool -r" and hit CTRL-C to terminate it (may require quick fingers), corosync will segfault. Version-Release number of selected component (if applicable): 1.3.0 How reproducible: Reasonably easy to reproduce. Steps to Reproduce: 1. Configure two rings (rrp_mode: passive, if that matters). 2. Start corosync. 3. Drop traffic to one of the rings - in my case, ring 0, via "iptables -A OUTPUT -d 192.168.4.186 -j DROP". This will tend to make "corosync-cfgtool -r" block a bit. 4. Run "corosync-cfgtool -r" and immediately hit CTRL-C. You might have to do this a couple of times (or put it in a while loop in the shell and hit CTRL-C at random), but if you do terminate corosync-cfgtool, corosync itself will immediately segfault. Actual results: Corosync segfaults with the following backtrace: (gdb) bt #0 memcpy () at ../sysdeps/i386/i686/memcpy.S:100 #1 0xb774ded3 in coroipcs_response_send (conn=0x80f6f88, msg=0xbf7f7c90, mlen=24) at /usr/include/bits/string3.h:52 #2 0xada9dab4 in message_handler_req_exec_cfg_ringreenable (message=0xbf7f7d20, nodeid=956606656) at cfg.c:583 #3 0x0804de75 in deliver_fn (nodeid=956606656, msg=0xbf7f7d20, msg_len=32, endian_conversion_required=0) at main.c:852 #4 0xb77793c0 in app_deliver_fn (nodeid=956606656, msg=<value optimized out>, msg_len=37, endian_conversion_required=0) at totempg.c:506 #5 0xb777991e in totempg_deliver_fn (nodeid=956606656, msg=0x80f6fea, msg_len=37, endian_conversion_required=0) at totempg.c:618 #6 0xb7777a25 in totemmrp_deliver_fn (nodeid=956606656, msg=0x80f6fea, msg_len=47, endian_conversion_required=0) at totemmrp.c:98 #7 0xb776ed81 in messages_deliver_to_app (instance=0xb5bb4008, skip=0, end_point=40) at totemsrp.c:3704 #8 0xb77752a9 in message_handler_orf_token (instance=0xb5bb4008, msg=0x80c9dbc, msg_len=71, endian_conversion_needed=0) at totemsrp.c:3577 #9 0xb776d3c9 in main_deliver_fn (context=0xb5bb4008, msg=0x80c9dbc, msg_len=71) at totemsrp.c:4356 #10 0xb776b27b in passive_token_recv (rrp_instance=0x8082be8, iface_no=1, context=0xb5bb4008, msg=0x80c9dbc, msg_len=71, token_seq=539) at totemrrp.c:876 #11 0xb776a4a3 in rrp_deliver_fn (context=0x8089358, msg=0x80c9dbc, msg_len=71) at totemrrp.c:1500 #12 0xb77651d2 in net_deliver_fn (handle=5880381755227111424, fd=13, revents=1, data=0x80c9758) at totemudp.c:1244 #13 0xb7761100 in poll_run (handle=5880381755227111424) at coropoll.c:510 #14 0x0804f508 in main (argc=Cannot access memory at address 0x6) at main.c:1813 Expected results: No segfault. Additional info: The relevant code is: int coroipcs_response_send (void *conn, const void *msg, size_t mlen) { struct conn_info *conn_info = (struct conn_info *)conn; memcpy (conn_info->response_buffer, msg, mlen); ipc_sem_post (conn_info->control_buffer, SEMAPHORE_RESPONSE); api->stats_increment_value (conn_info->stats_handle, "responses"); return (0); } At this point, 'conn' points to junk, presumably because corosync-cfgtool was terminated after sending the ring re-enable message but before receiving a response, so there's no "other end" for the IPC.
Created attachment 491962 [details] proposed patch I had some discussion about this issue with asalkeld on IRC, and he suggested this patch, which seems to work for me. I wondered though if the refcounting should go further out, say, in cfg_lib_init_fn() and cfg_lib_exit_fn() in services/cfg.c (similar to the init and exit functions in services/cpg.c)? Or would you like me to take this discussion back to the mailing list? Thanks, Tim
Tim, Patch looks good as is. Please post to ml with signoff and I'll get it merged. I'd rather not use refcount in lib_init_fn and lib_exit_fn but it was necessary because of the design of cpg. Regards -steve
Patch committed in upstream git as 5b92829d6c20ab550aca6c0b273618577ea92b23