Bug 1607888
| Summary: | backtrace seen in glusterd log when triggering glusterd restart on issuing of index heal (TC#RHG3-13523) | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Nag Pavan Chilakam <nchilaka> |
| Component: | glusterd | Assignee: | Atin Mukherjee <amukherj> |
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Bala Konda Reddy M <bmekala> |
| Severity: | urgent | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | rhgs-3.4 | CC: | nchilaka, rhs-bugs, sankarshan, sheggodu, storage-qa-internal, vbellur |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-08-12 05:30:44 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Nag Pavan Chilakam
2018-07-24 13:41:11 UTC
logs available @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/bug.1607888/ glusterd log for the below trace @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/bug.1607888/rhs-client18.lab.eng.blr.redhat.com_glusterd_backtrace_node/glusterfs/glusterd.log Below is the complete backtrace dumped in glusterd log [2018-07-24 13:30:03.141052] I [MSGID: 106493] [glusterd-handler.c:3890:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to rhs-client30.lab.eng.blr.redhat.com (0), ret: 0, op_ret: -1 [2018-07-24 13:30:03.163881] W [glusterfsd.c:1367:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f28b8dd5dd5] -->/usr/sbin/glusterd(glusterfs_sigwaiter+0xe5) [0x55d19b27b575] -->/usr/sbin/glusterd(cleanup_and_exit+0x6b) [0x55d1 9b27b3eb] ) 0-: received signum (15), shutting down pending frames: frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) patchset: git://git.gluster.org/glusterfs.git signal received: 11 time of crash: 2018-07-24 13:30:03 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.12.2 /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xa0)[0x7f28b9f76cc0] /lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f28b9f80c04] /lib64/libc.so.6(+0x36280)[0x7f28b85d6280] /lib64/liburcu-bp.so.1(rcu_read_lock_bp+0x2d)[0x7f28ae46f0ad] /usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x30b81)[0x7f28aea0db81] /usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x29844)[0x7f28aea06844] /usr/lib64/glusterfs/3.12.2/xlator/mgmt/glusterd.so(+0x23a5e)[0x7f28aea00a5e] /lib64/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x325)[0x7f28b9d38955] /lib64/libgfrpc.so.0(rpcsvc_notify+0x10b)[0x7f28b9d38b3b] /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f28b9d3aa73] /usr/lib64/glusterfs/3.12.2/rpc-transport/socket.so(+0x7566)[0x7f28abc4a566] /usr/lib64/glusterfs/3.12.2/rpc-transport/socket.so(+0x9b0c)[0x7f28abc4cb0c] [2018-07-24 13:30:05.187577] I [MSGID: 100030] [glusterfsd.c:2504:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.12.2 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO) [2018-07-24 13:30:05.192508] I [MSGID: 106478] [glusterd.c:1451:init] 0-management: Maximum allowed open file descriptors set to 65536 Also, i only see the backtrace, no core found, if i Need to change the title (remove the word crash), kindly feel free to correct me. How can we have a crash with out not having core file? That is only possible if we have messed up with core pattern, isn't it? Also from the backtrace in the logs it does look like glusterd crashed during cleanup_and_exit () and I believe this is similar to urcu cleanup related crash which we observed earlier. OTOH, restarting glusterd instance in a matter of 3 seconds in a loop is also not something what is recommended. We could run with issues like 24007 port not being free in such a short interval and also glusterd can go down before finishing its handshaking with other peers which can lead to urcu clean up races. (In reply to Atin Mukherjee from comment #6) > How can we have a crash with out not having core file? That is only possible > if we have messed up with core pattern, isn't it? As mentioned, in c#5 , there was no crash, only backtrace. changing the title too "there was no crash, only backtrace." - what does this mean? Can this ever happen? We dump the backtrace logs in the log file when there's a SEGV. Core was not generated and I haven;t changed any settings wrt core generation, These machines have been in use for quite some time and never did I see a core not generating problem. For now, I am ok with the current resoultion of closing it as insuff data, as it is not possible for dev to debug in this case. |