Bug 765250 (GLUSTER-3518)

Summary: filesystem crashes when built with optimization flags
Product: [Community] GlusterFS Reporter: Anand Avati <aavati>
Component: buildAssignee: Amar Tumballi <amarts>
Status: CLOSED NOTABUG QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: mainlineCC: chrisw, gluster-bugs, vraman
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Anand Avati 2011-09-06 13:31:18 UTC
It seems that for gluster 3.2.2 it's enough to add '-g2' flag to prevent crashing. That is, gluster is now built with optimization enabled and full debugging information (I think that should be enough for initial performance testing). 

I think that you should still check why does it crash. 
I'd assume that you have a bug where you are using an uninitialized variable or making some illegal assumption on packing of struct elements in your code. 

FYI, I am building it as follows (w/o CFLAGS line it crashes on 'peer probe'): 

./configure -C \ 
                             --prefix=/usr \ 
                             --libdir=/usr/lib64 \ 
                             --localstatedir=/var \ 
                             --sysconfdir=/etc \ 
                             --disable-dependency-tracking \ 
                             CFLAGS='-g2' \ 
                             && make -j 8

Comment 1 Amar Tumballi 2011-09-06 13:48:37 UTC
Are there any two machines I can use for testing this? on a single machine things seems to work fine.

Comment 2 Anand Avati 2011-09-06 16:30:41 UTC
In our build environment we are using gcc v4.3.4 (gcc-4_3-branch revision 152973) 

With this gcc I have to compile gluster with optimizations disabled (-O0). When compiled with -O2 (default) glusterd crashes when I am trying to 'peer probe', see the stack trace and several parameters I've printed in gdb below. 
I've tried both gluster v3.2.1 and 3.2.3. 
Obviously, I can't advance much with performance testing with gluster build with -O0 (switching to another compiler is a big issue for us..) 

Can you please check this? 

>>>>>>>>>>> 
Loaded symbols for /lib64/libgcc_s.so.1 
Core was generated by `/usr/sbin/glusterd'. 
Program terminated with signal 11, Segmentation fault. 
#0  0x00007f644be00bcc in rpc_transport_connect (this=0x639750, port=0) at rpc-transport.c:810 
810     rpc-transport.c: No such file or directory. 
        in rpc-transport.c 
(gdb) bt 
#0  0x00007f644be00bcc in rpc_transport_connect (this=0x639750, port=0) at rpc-transport.c:810 
#1  0x00007f644be062ab in rpc_clnt_submit (rpc=0x639590, prog=0x7f644a583140, procnum=1, cbkfn=0x7f644a355030 <glusterd3_1_probe_cbk>, proghdr=0x7fffedc8b1e0, proghdrcount=1, progpayload=0x0, 
    progpayloadcount=0, iobref=0x63a310, frame=0x62e6dc, rsphdr=0x0, rsphdr_count=0, rsp_payload=0x0, rsp_payload_count=0, rsp_iobref=0x0) at rpc-clnt.c:1362 
#2  0x00007f644a34fa9b in glusterd_submit_request (rpc=0x639590, req=0x7fffedc8b250, frame=0x62e6dc, prog=0x7f644a583140, procnum=1, iobref=0x63a310, 
    sfunc=0x7f644bbf0fc0 <gd_xdr_from_mgmt_probe_req>, this=0x6330b0, cbkfn=0x7f644a355030 <glusterd3_1_probe_cbk>) at glusterd-utils.c:351 
#3  0x00007f644a3549d5 in glusterd3_1_probe (frame=0x62e6dc, this=0x6330b0, data=0x63a3d0) at glusterd-rpc-ops.c:1340 
#4  0x00007f644a330861 in glusterd_ac_friend_probe (event=<value optimized out>, ctx=0x63a390) at glusterd-sm.c:364 
#5  0x00007f644a3309e5 in glusterd_friend_sm () at glusterd-sm.c:958 
#6  0x00007f644a3610c1 in glusterd_peer_dump_version_cbk (req=0x0, iov=0x7f644c26b304, count=<value optimized out>, myframe=0x7f644aa4ff98) at glusterd-handshake.c:378 
#7  0x00007f644be05b34 in rpc_clnt_handle_reply (clnt=0x639590, pollin=0x63a090) at rpc-clnt.c:736 
#8  0x00007f644be05d78 in rpc_clnt_notify (trans=<value optimized out>, mydata=0x6395c0, event=<value optimized out>, data=0x1) at rpc-clnt.c:849 
#9  0x00007f644be00a57 in rpc_transport_notify (this=0x639750, event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:918 
#10 0x00007f644a0f9eef in socket_event_poll_in (this=0x639750) at socket.c:1647 
#11 0x00007f644a0fa058 in socket_event_handler (fd=<value optimized out>, idx=2, data=0x639750, poll_in=1, poll_out=0, poll_err=0) at socket.c:1762 
#12 0x00007f644c049f67 in event_dispatch_epoll_handler (i=<value optimized out>, events=<value optimized out>, event_pool=<value optimized out>) at event.c:794 
#13 event_dispatch_epoll (i=<value optimized out>, events=<value optimized out>, event_pool=<value optimized out>) at event.c:856 
#14 0x000000000040622a in main (argc=1, argv=0x7fffedc8b678) at glusterfsd.c:1488 
(gdb) print this 
$1 = (rpc_transport_t *) 0x639750 
(gdb) print *this 
$2 = {ops = 0x0, listener = 0x0, private = 0x0, xl_private = 0x0, xl = 0x0, mydata = 0x0, lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = { 
        __prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, refcount = 0, ctx = 0x0, options = 0x0, name = 0x0, dnscache = 0x0, buf = 0x0, init = 0, fini = 0, 
  validate_options = 0, reconfigure = 0, notify = 0, notify_data = 0x0, peerinfo = {sockaddr = {ss_family = 0, __ss_align = 0, __ss_padding = '\000' <repeats 111 times>}, sockaddr_len = 0, 
    identifier = '\000' <repeats 72 times>, "@1XJd\177\000\000\001\000\000\000\000\000\000\000\060P5Jd\177", '\000' <repeats 13 times>}, myinfo = {sockaddr = {ss_family = 2, __ss_align = 0, 
      __ss_padding = '\000' <repeats 111 times>}, sockaddr_len = 16, identifier = "14.10.12.12:1021", '\000' <repeats 91 times>}, total_bytes_read = 288, total_bytes_write = 140, list = {next = 0x0, 
    prev = 0x0}, client_bind_insecure = 0} 
(gdb) print this->ops 
$3 = (struct rpc_transport_ops *) 0x0 
(gdb) frame 3 
#3  0x00007f644a3549d5 in glusterd3_1_probe (frame=0x62e6dc, this=0x6330b0, data=0x63a3d0) at glusterd-rpc-ops.c:1340 
1340    glusterd-rpc-ops.c: No such file or directory. 
        in glusterd-rpc-ops.c 
(gdb) print hostname 
$4 = 0x63a3b0 "module-2" 
(gdb) print port     
$5 = 0 
(gdb) print peerinfo 
$6 = (glusterd_peerinfo_t *) 0x638e60 
(gdb) print *peerinfo 
$7 = {uuid = '\000' <repeats 15 times>, uuid_str = '\000' <repeats 49 times>, state = {state = GD_FRIEND_STATE_DEFAULT, transition_time = {tv_sec = 0, tv_usec = 0}}, hostname = 0x638340 "module-2", 
  port = 0, uuid_list = {next = 0x635980, prev = 0x635980}, op_peers_list = {next = 0x0, prev = 0x0}, rpc = 0x639590, mgmt = 0x7f644a583140, connected = 1, shandle = 0x639d30, sm_log = { 
    transitions = 0x638f50, current = 0, size = 50, count = 0, state_name_get = 0x7f644a32f630 <glusterd_friend_sm_state_name_get>, 
    event_name_get = 0x7f644a32f650 <glusterd_friend_sm_event_name_get>}}

Comment 3 Anand Avati 2011-09-13 16:03:31 UTC
This is a compiler bug. Standalone test case which misbehaves on SLES SP1 -

https://github.com/avati/gcc-bug

Bug needs to be raised with Novell.

Comment 4 Anand Avati 2011-10-03 02:08:05 UTC
CHANGE: http://review.gluster.com/549 (now returns 'true(1)' is gfid is root, 'false(0)' if not.) merged in master by Vijay Bellur (vijay)

Comment 5 Amar Tumballi 2011-10-03 02:11:05 UTC
(In reply to comment #4)
> CHANGE: http://review.gluster.com/549 (now returns 'true(1)' is gfid is root,
> 'false(0)' if not.) merged in master by Vijay Bellur (vijay)

The above commit is for bug-3158 :p

Comment 6 Anand Avati 2011-11-16 08:44:35 UTC
CHANGE: http://review.gluster.com/522 (Change-Id: I0f078d1753db65d2f2e0380d1b0450c114cf40dd) merged in master by Vijay Bellur (vijay)

Comment 7 Anand Avati 2011-11-16 08:45:14 UTC
CHANGE: http://review.gluster.com/523 (Change-Id: I53b007fbdb42313d207d5d63fbfaaa6aaf033f95) merged in master by Vijay Bellur (vijay)