Bug 1473637 - glusterd crash
Summary: glusterd crash
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: mainline
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-07-21 10:53 UTC by florian
Modified: 2018-10-05 04:05 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2018-10-05 04:05:37 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
cmd_history_all_nodes (3.92 MB, application/x-gzip)
2017-07-25 06:51 UTC, florian
no flags Details

Description florian 2017-07-21 10:53:42 UTC
Description of problem:
The gluster daemon crashed.


Version-Release number of selected component (if applicable):
3.7.6

How reproducible:
Can not reproduce

Additional info:
OS: Ubuntu 16.04.2 LTS

Setup:

distributed replicated.
6 nodes, each has 6 bricks

log from etc-glusterfs-glusterd.vol.log:

[2017-07-21 06:50:19.234951] W [glusterd-locks.c:577:glusterd_mgmt_v3_lock] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_op_begin_synctask+0x2c) [0x7f3754aff2cc] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/mgmt/glusterd.so(gd_sync_task_begin+0x927) [0x7f3754aff1f7] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_lock+0x54d) [0x7f3754b0275d] ) 0-management: Lock for Volume1 held by aa66f2f9-8fb9-4669-852c-df0fba010395
[2017-07-21 06:50:19.237358] E [MSGID: 106119] [glusterd-syncop.c:1823:gd_sync_task_begin] 0-management: Unable to acquire lock for Volume1
The message "I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req" repeated 3 times between [2017-07-21 06:50:05.082063] and [2017-07-21 06:51:05.177239]
The message "I [MSGID: 106487] [glusterd-handler.c:1411:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req" repeated 3 times between [2017-07-21 06:49:54.642058] and [2017-07-21 06:51:05.274672]
The message "I [MSGID: 106499] [glusterd-handler.c:4267:__glusterd_handle_status_volume] 0-management: Received status volume req for volume Volume1" repeated 3 times between [2017-07-21 06:50:05.492489] and [2017-07-21 06:51:16.106958]
pending frames:
patchset: git://git.gluster.com/glusterfs.git
signal received: 6
time of crash: 
2017-07-21 06:51:38
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.6
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x7e)[0x7f3759a68e6e]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f3759a8354d]
/lib/x86_64-linux-gnu/libc.so.6(+0x354b0)[0x7f3758e634b0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38)[0x7f3758e63428]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7f3758e6502a]
/lib/x86_64-linux-gnu/libc.so.6(+0x777ea)[0x7f3758ea57ea]
/lib/x86_64-linux-gnu/libc.so.6(+0x7e6f8)[0x7f3758eac6f8]
/lib/x86_64-linux-gnu/libc.so.6(+0x813be)[0x7f3758eaf3be]
/lib/x86_64-linux-gnu/libc.so.6(__libc_calloc+0xba)[0x7f3758eb221a]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(__gf_calloc+0x6a)[0x7f3759a9903a]
/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/rpc-transport/socket.so(+0x8a1b)[0x7f3750a9aa1b]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x7bbea)[0x7f3759ac4bea]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f37591fe6ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f3758f3482d]
---------

Comment 1 Atin Mukherjee 2017-07-21 11:15:42 UTC
Please provide the output of 't a a bt' of gdb session along with attaching the core file.

Comment 2 florian 2017-07-21 11:35:52 UTC
I'm (In reply to Atin Mukherjee from comment #1)
> Please provide the output of 't a a bt' of gdb session along with attaching
> the core file.

I'm sorry I'm not used to gdb. As far I understand this is a debugger for c / c++ programs. At the moment of the crash the glusterfsd had no debugger attached. 

Did I misunderstand something?

Comment 3 Atin Mukherjee 2017-07-24 04:57:20 UTC
if the program crashes you should be able to see a core file generated and placed in the location depending on your core_pattern setting. We'd need to have the core file attached in the bug (atleast) to figure out the reason of the crash. Additionally the backtrace of the core through gdb (following commands) would help us faster to narrow down the issue. Please do attach the glusterd & cmd_history log from all the nodes.

# gdb <core file> glusterd
t a a bt

Comment 4 florian 2017-07-25 06:40:00 UTC
The output of gdb:

(gdb) core /root/core_glusterfs_crash 
warning: core file may not match specified executable file.
[New LWP 13967]
[New LWP 13965]
[New LWP 28763]
[New LWP 13886]
[New LWP 13885]
[New LWP 13966]
[New LWP 13887]
[New LWP 13888]
[New LWP 13884]
[New LWP 13964]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterd -p /var/run/glusterd.pid'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f3758e63428 in sigandset (dest=0x363c, left=0x368f, right=0x6) at sigandset.c:33
33        return __sigandset (dest, left, right);
[Current thread is 1 (Thread 0x7f374a7fc700 (LWP 13967))]

Comment 5 Atin Mukherjee 2017-07-25 06:42:37 UTC
Can you please provide the out put of "t a a bt" ?

Comment 6 florian 2017-07-25 06:51:11 UTC
Created attachment 1304037 [details]
cmd_history_all_nodes

Comment 7 florian 2017-07-25 06:52:46 UTC
output of "t a a bt"

(gdb) t a a bt

Thread 10 (Thread 0x7f374bfff700 (LWP 13964)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f3754b004e3 in ?? () from /usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/mgmt/glusterd.so
#2  0x00007f37591fe6ba in start_thread (arg=0x7f374bfff700) at pthread_create.c:333
#3  0x00007f3758f3482d in capget () at ../sysdeps/unix/syscall-template.S:84
#4  0x0000000000000000 in ?? ()

Thread 9 (Thread 0x7f3759f23780 (LWP 13884)):
#0  0x00007f37591ff98d in pthread_join (threadid=139875458209536, thread_return=0x0) at pthread_join.c:90
#1  0x00007f3759ac4eeb in ?? () from /usr/lib/x86_64-linux-gnu/libglusterfs.so.0
#2  0x0000000000405501 in main ()

Thread 8 (Thread 0x7f37555b5700 (LWP 13888)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
#1  0x00007f3759aa8d98 in syncenv_task () from /usr/lib/x86_64-linux-gnu/libglusterfs.so.0
#2  0x00007f3759aa9970 in syncenv_processor () from /usr/lib/x86_64-linux-gnu/libglusterfs.so.0
#3  0x00007f37591fe6ba in start_thread (arg=0x7f37555b5700) at pthread_create.c:333
#4  0x00007f3758f3482d in capget () at ../sysdeps/unix/syscall-template.S:84
#5  0x0000000000000000 in ?? ()

Thread 7 (Thread 0x7f3755db6700 (LWP 13887)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
#1  0x00007f3759aa8d98 in syncenv_task () from /usr/lib/x86_64-linux-gnu/libglusterfs.so.0
#2  0x00007f3759aa9970 in syncenv_processor () from /usr/lib/x86_64-linux-gnu/libglusterfs.so.0
#3  0x00007f37591fe6ba in start_thread (arg=0x7f3755db6700) at pthread_create.c:333
#4  0x00007f3758f3482d in capget () at ../sysdeps/unix/syscall-template.S:84
#5  0x0000000000000000 in ?? ()

Thread 6 (Thread 0x7f374affd700 (LWP 13966)):
#0  0x00007f3758f34e23 in vmsplice () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007f3759ac4a58 in ?? () from /usr/lib/x86_64-linux-gnu/libglusterfs.so.0
#2  0x00007f37591fe6ba in start_thread (arg=0x7f374affd700) at pthread_create.c:333
#3  0x00007f3758f3482d in capget () at ../sysdeps/unix/syscall-template.S:84
#4  0x0000000000000000 in ?? ()

Thread 5 (Thread 0x7f3756db8700 (LWP 13885)):
#0  0x00007f3759207c1d in nanosleep () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007f3759a86744 in gf_timer_proc () from /usr/lib/x86_64-linux-gnu/libglusterfs.so.0
---Type <return> to continue, or q <return> to quit---

Comment 8 florian 2017-07-25 07:05:10 UTC
Missing piece of "t a a bt" 


Thread 5 (Thread 0x7f3756db8700 (LWP 13885)):
#0  0x00007f3759207c1d in nanosleep () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007f3759a86744 in gf_timer_proc () from /usr/lib/x86_64-linux-gnu/libglusterfs.so.0
---Type <return> to continue, or q <return> to quit---
#2  0x00007f37591fe6ba in start_thread (arg=0x7f3756db8700) at pthread_create.c:333
#3  0x00007f3758f3482d in capget () at ../sysdeps/unix/syscall-template.S:84
#4  0x0000000000000000 in ?? ()

Thread 4 (Thread 0x7f37565b7700 (LWP 13886)):
#0  do_sigwait (sig=0x7f37565b6e3c, set=<optimized out>) at ../sysdeps/unix/sysv/linux/sigwait.c:64
#1  __sigwait (set=<optimized out>, sig=0x7f37565b6e3c) at ../sysdeps/unix/sysv/linux/sigwait.c:96
#2  0x00000000004080bf in glusterfs_sigwaiter ()
#3  0x00007f37591fe6ba in start_thread (arg=0x7f37565b7700) at pthread_create.c:333
#4  0x00007f3758f3482d in capget () at ../sysdeps/unix/syscall-template.S:84
#5  0x0000000000000000 in ?? ()

Thread 3 (Thread 0x7f3749ffb700 (LWP 28763)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
#1  0x00007f3759aa8d98 in syncenv_task () from /usr/lib/x86_64-linux-gnu/libglusterfs.so.0
#2  0x00007f3759aa9970 in syncenv_processor () from /usr/lib/x86_64-linux-gnu/libglusterfs.so.0
#3  0x00007f37591fe6ba in start_thread (arg=0x7f3749ffb700) at pthread_create.c:333
#4  0x00007f3758f3482d in capget () at ../sysdeps/unix/syscall-template.S:84
#5  0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7f374b7fe700 (LWP 13965)):
#0  0x00007f3758f34e23 in vmsplice () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007f3759ac4a58 in ?? () from /usr/lib/x86_64-linux-gnu/libglusterfs.so.0
#2  0x00007f37591fe6ba in start_thread (arg=0x7f374b7fe700) at pthread_create.c:333
#3  0x00007f3758f3482d in capget () at ../sysdeps/unix/syscall-template.S:84
#4  0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7f374a7fc700 (LWP 13967)):
#0  0x00007f3758e63428 in sigandset (dest=0x363c, left=0x368f, right=0x6) at sigandset.c:33
#1  0x0000000000000020 in ?? ()
#2  0x0000000000000000 in ?? ()

Comment 9 Atin Mukherjee 2018-10-05 04:05:37 UTC
symbol tables are missing which gives us no clue on the complete backtrace. If you're able to reproduce the crash with the latest release, please let us know and we'll be happy to debug it further. For now considering this version is quite old and there's no reproducer mentioned in the bug, I'm closing it.


Note You need to log in before you can comment on or make changes to this bug.