Bug 1354621

Summary: bug-1293414-import-brickinfo-uuid.t crashes in glusterd_friend_sm/cds_list_del_init
Product: [Community] GlusterFS Reporter: Jeff Darcy <jdarcy>
Component: glusterdAssignee: bugs <bugs>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs, jthottan
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-4.1.3 (or later) Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-08-29 03:35:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jeff Darcy 2016-07-11 18:19:35 UTC
Doesn't happen 100%, but close.  Here are some symptoms.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fb3eb7fe700 (LWP 680)]
0x00007fb4074ceefb in __cds_list_del (prev=0x21e45b000, next=0xdeadc0de00)
    at /usr/include/urcu/list.h:73
73		next->prev = prev;

(gdb) bt
#0  0x00007fb4074ceefb in __cds_list_del (prev=0x21e45b000, next=0xdeadc0de00)
    at /usr/include/urcu/list.h:73
#1  0x00007fb4074cef33 in cds_list_del (elem=0x7fb3d80120d0)
    at /usr/include/urcu/list.h:81
#2  0x00007fb4074cef4e in cds_list_del_init (elem=0x7fb3d80120d0)
    at /usr/include/urcu/list.h:88
#3  0x00007fb4074d1b4d in glusterd_friend_sm () at glusterd-sm.c:1348
#4  0x00007fb4074c732b in __glusterd_handle_incoming_unfriend_req (
    req=0x7fb3d800f4ac) at glusterd-handler.c:2670

(gdb) p event
$1 = (glusterd_friend_sm_event_t *) 0x7fb3d80120d0
(gdb) p event->list
$2 = {next = 0xdeadc0de00, prev = 0x21e45b000}
(gdb) p tmp
$3 = (glusterd_friend_sm_event_t *) 0xdeadc0de00
(gdb) p *event
$4 = {list = {next = 0xdeadc0de00, prev = 0x21e45b000}, 
  peerid = "\000\241\000\000\000\264\177\000\000\070\000\000\000\000\000", 
  peername = 0x7fb3d80120d000 <error: Cannot access memory at address 0x7fb3d80120d000>, ctx = 0xffffffffffffff00, event = 4294967295}

It looks like list/memory corruption, probably due to improper usage of RCU functions.  I have a patch that makes the problem go away, which I'll post as soon as I get this bug number.

Comment 1 Vijay Bellur 2016-07-11 18:27:19 UTC
REVIEW: http://review.gluster.org/14893 (glusterd: fix glusterd_friend_sm usage of SM functions) posted (#1) for review on master by Jeff Darcy (jdarcy)

Comment 2 Vijay Bellur 2016-07-11 18:29:40 UTC
REVIEW: http://review.gluster.org/14893 (glusterd: fix glusterd_friend_sm usage of RCU functions) posted (#2) for review on master by Jeff Darcy (jdarcy)

Comment 3 Vijay Bellur 2016-07-12 12:18:02 UTC
REVIEW: http://review.gluster.org/14893 (glusterd: fix glusterd_friend_sm usage of RCU functions) posted (#3) for review on master by Jeff Darcy (jdarcy)

Comment 4 Amar Tumballi 2018-08-29 03:35:54 UTC
This update is done in bulk based on the state of the patch and the time since last activity. If the issue is still seen, please reopen the bug.