| Summary: | filesystem crashes when built with optimization flags | ||
|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Anand Avati <aavati> |
| Component: | build | Assignee: | Amar Tumballi <amarts> |
| Status: | CLOSED NOTABUG | QA Contact: | |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | mainline | CC: | chrisw, gluster-bugs, vraman |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | --- | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
Are there any two machines I can use for testing this? on a single machine things seems to work fine. In our build environment we are using gcc v4.3.4 (gcc-4_3-branch revision 152973)
With this gcc I have to compile gluster with optimizations disabled (-O0). When compiled with -O2 (default) glusterd crashes when I am trying to 'peer probe', see the stack trace and several parameters I've printed in gdb below.
I've tried both gluster v3.2.1 and 3.2.3.
Obviously, I can't advance much with performance testing with gluster build with -O0 (switching to another compiler is a big issue for us..)
Can you please check this?
>>>>>>>>>>>
Loaded symbols for /lib64/libgcc_s.so.1
Core was generated by `/usr/sbin/glusterd'.
Program terminated with signal 11, Segmentation fault.
#0 0x00007f644be00bcc in rpc_transport_connect (this=0x639750, port=0) at rpc-transport.c:810
810 rpc-transport.c: No such file or directory.
in rpc-transport.c
(gdb) bt
#0 0x00007f644be00bcc in rpc_transport_connect (this=0x639750, port=0) at rpc-transport.c:810
#1 0x00007f644be062ab in rpc_clnt_submit (rpc=0x639590, prog=0x7f644a583140, procnum=1, cbkfn=0x7f644a355030 <glusterd3_1_probe_cbk>, proghdr=0x7fffedc8b1e0, proghdrcount=1, progpayload=0x0,
progpayloadcount=0, iobref=0x63a310, frame=0x62e6dc, rsphdr=0x0, rsphdr_count=0, rsp_payload=0x0, rsp_payload_count=0, rsp_iobref=0x0) at rpc-clnt.c:1362
#2 0x00007f644a34fa9b in glusterd_submit_request (rpc=0x639590, req=0x7fffedc8b250, frame=0x62e6dc, prog=0x7f644a583140, procnum=1, iobref=0x63a310,
sfunc=0x7f644bbf0fc0 <gd_xdr_from_mgmt_probe_req>, this=0x6330b0, cbkfn=0x7f644a355030 <glusterd3_1_probe_cbk>) at glusterd-utils.c:351
#3 0x00007f644a3549d5 in glusterd3_1_probe (frame=0x62e6dc, this=0x6330b0, data=0x63a3d0) at glusterd-rpc-ops.c:1340
#4 0x00007f644a330861 in glusterd_ac_friend_probe (event=<value optimized out>, ctx=0x63a390) at glusterd-sm.c:364
#5 0x00007f644a3309e5 in glusterd_friend_sm () at glusterd-sm.c:958
#6 0x00007f644a3610c1 in glusterd_peer_dump_version_cbk (req=0x0, iov=0x7f644c26b304, count=<value optimized out>, myframe=0x7f644aa4ff98) at glusterd-handshake.c:378
#7 0x00007f644be05b34 in rpc_clnt_handle_reply (clnt=0x639590, pollin=0x63a090) at rpc-clnt.c:736
#8 0x00007f644be05d78 in rpc_clnt_notify (trans=<value optimized out>, mydata=0x6395c0, event=<value optimized out>, data=0x1) at rpc-clnt.c:849
#9 0x00007f644be00a57 in rpc_transport_notify (this=0x639750, event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:918
#10 0x00007f644a0f9eef in socket_event_poll_in (this=0x639750) at socket.c:1647
#11 0x00007f644a0fa058 in socket_event_handler (fd=<value optimized out>, idx=2, data=0x639750, poll_in=1, poll_out=0, poll_err=0) at socket.c:1762
#12 0x00007f644c049f67 in event_dispatch_epoll_handler (i=<value optimized out>, events=<value optimized out>, event_pool=<value optimized out>) at event.c:794
#13 event_dispatch_epoll (i=<value optimized out>, events=<value optimized out>, event_pool=<value optimized out>) at event.c:856
#14 0x000000000040622a in main (argc=1, argv=0x7fffedc8b678) at glusterfsd.c:1488
(gdb) print this
$1 = (rpc_transport_t *) 0x639750
(gdb) print *this
$2 = {ops = 0x0, listener = 0x0, private = 0x0, xl_private = 0x0, xl = 0x0, mydata = 0x0, lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {
__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, refcount = 0, ctx = 0x0, options = 0x0, name = 0x0, dnscache = 0x0, buf = 0x0, init = 0, fini = 0,
validate_options = 0, reconfigure = 0, notify = 0, notify_data = 0x0, peerinfo = {sockaddr = {ss_family = 0, __ss_align = 0, __ss_padding = '\000' <repeats 111 times>}, sockaddr_len = 0,
identifier = '\000' <repeats 72 times>, "@1XJd\177\000\000\001\000\000\000\000\000\000\000\060P5Jd\177", '\000' <repeats 13 times>}, myinfo = {sockaddr = {ss_family = 2, __ss_align = 0,
__ss_padding = '\000' <repeats 111 times>}, sockaddr_len = 16, identifier = "14.10.12.12:1021", '\000' <repeats 91 times>}, total_bytes_read = 288, total_bytes_write = 140, list = {next = 0x0,
prev = 0x0}, client_bind_insecure = 0}
(gdb) print this->ops
$3 = (struct rpc_transport_ops *) 0x0
(gdb) frame 3
#3 0x00007f644a3549d5 in glusterd3_1_probe (frame=0x62e6dc, this=0x6330b0, data=0x63a3d0) at glusterd-rpc-ops.c:1340
1340 glusterd-rpc-ops.c: No such file or directory.
in glusterd-rpc-ops.c
(gdb) print hostname
$4 = 0x63a3b0 "module-2"
(gdb) print port
$5 = 0
(gdb) print peerinfo
$6 = (glusterd_peerinfo_t *) 0x638e60
(gdb) print *peerinfo
$7 = {uuid = '\000' <repeats 15 times>, uuid_str = '\000' <repeats 49 times>, state = {state = GD_FRIEND_STATE_DEFAULT, transition_time = {tv_sec = 0, tv_usec = 0}}, hostname = 0x638340 "module-2",
port = 0, uuid_list = {next = 0x635980, prev = 0x635980}, op_peers_list = {next = 0x0, prev = 0x0}, rpc = 0x639590, mgmt = 0x7f644a583140, connected = 1, shandle = 0x639d30, sm_log = {
transitions = 0x638f50, current = 0, size = 50, count = 0, state_name_get = 0x7f644a32f630 <glusterd_friend_sm_state_name_get>,
event_name_get = 0x7f644a32f650 <glusterd_friend_sm_event_name_get>}}
This is a compiler bug. Standalone test case which misbehaves on SLES SP1 - https://github.com/avati/gcc-bug Bug needs to be raised with Novell. CHANGE: http://review.gluster.com/549 (now returns 'true(1)' is gfid is root, 'false(0)' if not.) merged in master by Vijay Bellur (vijay) (In reply to comment #4) > CHANGE: http://review.gluster.com/549 (now returns 'true(1)' is gfid is root, > 'false(0)' if not.) merged in master by Vijay Bellur (vijay) The above commit is for bug-3158 :p CHANGE: http://review.gluster.com/522 (Change-Id: I0f078d1753db65d2f2e0380d1b0450c114cf40dd) merged in master by Vijay Bellur (vijay) CHANGE: http://review.gluster.com/523 (Change-Id: I53b007fbdb42313d207d5d63fbfaaa6aaf033f95) merged in master by Vijay Bellur (vijay) |
It seems that for gluster 3.2.2 it's enough to add '-g2' flag to prevent crashing. That is, gluster is now built with optimization enabled and full debugging information (I think that should be enough for initial performance testing). I think that you should still check why does it crash. I'd assume that you have a bug where you are using an uninitialized variable or making some illegal assumption on packing of struct elements in your code. FYI, I am building it as follows (w/o CFLAGS line it crashes on 'peer probe'): ./configure -C \ --prefix=/usr \ --libdir=/usr/lib64 \ --localstatedir=/var \ --sysconfdir=/etc \ --disable-dependency-tracking \ CFLAGS='-g2' \ && make -j 8