Red Hat Bugzilla – Bug 696887
segfault due to invalid 'conn' pointer after killing "corosync-cfgtool -r"
Last modified: 2011-12-06 06:50:28 EST
+++ This bug was initially created as a clone of Bug #695191 +++ Description of problem: If you run "corosync-cfgtool -r" and hit CTRL-C to terminate it (may require quick fingers), corosync will segfault. Version-Release number of selected component (if applicable): 1.3.0 How reproducible: Reasonably easy to reproduce. Steps to Reproduce: 1. Configure two rings (rrp_mode: passive, if that matters). 2. Start corosync. 3. Drop traffic to one of the rings - in my case, ring 0, via "iptables -A OUTPUT -d 192.168.4.186 -j DROP". This will tend to make "corosync-cfgtool -r" block a bit. 4. Run "corosync-cfgtool -r" and immediately hit CTRL-C. You might have to do this a couple of times (or put it in a while loop in the shell and hit CTRL-C at random), but if you do terminate corosync-cfgtool, corosync itself will immediately segfault. Actual results: Corosync segfaults with the following backtrace: (gdb) bt #0 memcpy () at ../sysdeps/i386/i686/memcpy.S:100 #1 0xb774ded3 in coroipcs_response_send (conn=0x80f6f88, msg=0xbf7f7c90, mlen=24) at /usr/include/bits/string3.h:52 #2 0xada9dab4 in message_handler_req_exec_cfg_ringreenable (message=0xbf7f7d20, nodeid=956606656) at cfg.c:583 #3 0x0804de75 in deliver_fn (nodeid=956606656, msg=0xbf7f7d20, msg_len=32, endian_conversion_required=0) at main.c:852 #4 0xb77793c0 in app_deliver_fn (nodeid=956606656, msg=<value optimized out>, msg_len=37, endian_conversion_required=0) at totempg.c:506 #5 0xb777991e in totempg_deliver_fn (nodeid=956606656, msg=0x80f6fea, msg_len=37, endian_conversion_required=0) at totempg.c:618 #6 0xb7777a25 in totemmrp_deliver_fn (nodeid=956606656, msg=0x80f6fea, msg_len=47, endian_conversion_required=0) at totemmrp.c:98 #7 0xb776ed81 in messages_deliver_to_app (instance=0xb5bb4008, skip=0, end_point=40) at totemsrp.c:3704 #8 0xb77752a9 in message_handler_orf_token (instance=0xb5bb4008, msg=0x80c9dbc, msg_len=71, endian_conversion_needed=0) at totemsrp.c:3577 #9 0xb776d3c9 in main_deliver_fn (context=0xb5bb4008, msg=0x80c9dbc, msg_len=71) at totemsrp.c:4356 #10 0xb776b27b in passive_token_recv (rrp_instance=0x8082be8, iface_no=1, context=0xb5bb4008, msg=0x80c9dbc, msg_len=71, token_seq=539) at totemrrp.c:876 #11 0xb776a4a3 in rrp_deliver_fn (context=0x8089358, msg=0x80c9dbc, msg_len=71) at totemrrp.c:1500 #12 0xb77651d2 in net_deliver_fn (handle=5880381755227111424, fd=13, revents=1, data=0x80c9758) at totemudp.c:1244 #13 0xb7761100 in poll_run (handle=5880381755227111424) at coropoll.c:510 #14 0x0804f508 in main (argc=Cannot access memory at address 0x6) at main.c:1813 Expected results: No segfault. Additional info: The relevant code is: int coroipcs_response_send (void *conn, const void *msg, size_t mlen) { struct conn_info *conn_info = (struct conn_info *)conn; memcpy (conn_info->response_buffer, msg, mlen); ipc_sem_post (conn_info->control_buffer, SEMAPHORE_RESPONSE); api->stats_increment_value (conn_info->stats_handle, "responses"); return (0); } At this point, 'conn' points to junk, presumably because corosync-cfgtool was terminated after sending the ring re-enable message but before receiving a response, so there's no "other end" for the IPC. --- Additional comment from tserong@novell.com on 2011-04-14 01:40:05 EDT --- Created attachment 491962 [details] proposed patch I had some discussion about this issue with asalkeld on IRC, and he suggested this patch, which seems to work for me. I wondered though if the refcounting should go further out, say, in cfg_lib_init_fn() and cfg_lib_exit_fn() in services/cfg.c (similar to the init and exit functions in services/cpg.c)? Or would you like me to take this discussion back to the mailing list? Thanks, Tim --- Additional comment from sdake@redhat.com on 2011-04-14 11:43:20 EDT --- Tim, Patch looks good as is. Please post to ml with signoff and I'll get it merged. I'd rather not use refcount in lib_init_fn and lib_exit_fn but it was necessary because of the design of cpg. Regards -steve --- Additional comment from jfriesse@redhat.com on 2011-04-15 03:15:32 EDT --- Patch committed in upstream git as 5b92829d6c20ab550aca6c0b273618577ea92b23
Patch committed in upstream git as 5b92829d6c20ab550aca6c0b273618577ea92b23
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: Reference counting on the cfg server in corosync was incorrect. Consequence: killing corosync-cfgtool -r before it completed would result in Corosync segfault. Fix: Add proper reference counting as per architecture. Result: Corosync no longer segfaults with this test case.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1515.html