Doesn't happen 100%, but close. Here are some symptoms. Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fb3eb7fe700 (LWP 680)] 0x00007fb4074ceefb in __cds_list_del (prev=0x21e45b000, next=0xdeadc0de00) at /usr/include/urcu/list.h:73 73 next->prev = prev; (gdb) bt #0 0x00007fb4074ceefb in __cds_list_del (prev=0x21e45b000, next=0xdeadc0de00) at /usr/include/urcu/list.h:73 #1 0x00007fb4074cef33 in cds_list_del (elem=0x7fb3d80120d0) at /usr/include/urcu/list.h:81 #2 0x00007fb4074cef4e in cds_list_del_init (elem=0x7fb3d80120d0) at /usr/include/urcu/list.h:88 #3 0x00007fb4074d1b4d in glusterd_friend_sm () at glusterd-sm.c:1348 #4 0x00007fb4074c732b in __glusterd_handle_incoming_unfriend_req ( req=0x7fb3d800f4ac) at glusterd-handler.c:2670 (gdb) p event $1 = (glusterd_friend_sm_event_t *) 0x7fb3d80120d0 (gdb) p event->list $2 = {next = 0xdeadc0de00, prev = 0x21e45b000} (gdb) p tmp $3 = (glusterd_friend_sm_event_t *) 0xdeadc0de00 (gdb) p *event $4 = {list = {next = 0xdeadc0de00, prev = 0x21e45b000}, peerid = "\000\241\000\000\000\264\177\000\000\070\000\000\000\000\000", peername = 0x7fb3d80120d000 <error: Cannot access memory at address 0x7fb3d80120d000>, ctx = 0xffffffffffffff00, event = 4294967295} It looks like list/memory corruption, probably due to improper usage of RCU functions. I have a patch that makes the problem go away, which I'll post as soon as I get this bug number.
REVIEW: http://review.gluster.org/14893 (glusterd: fix glusterd_friend_sm usage of SM functions) posted (#1) for review on master by Jeff Darcy (jdarcy)
REVIEW: http://review.gluster.org/14893 (glusterd: fix glusterd_friend_sm usage of RCU functions) posted (#2) for review on master by Jeff Darcy (jdarcy)
REVIEW: http://review.gluster.org/14893 (glusterd: fix glusterd_friend_sm usage of RCU functions) posted (#3) for review on master by Jeff Darcy (jdarcy)
This update is done in bulk based on the state of the patch and the time since last activity. If the issue is still seen, please reopen the bug.