Description of problem: The gluster daemon crashed. Version-Release number of selected component (if applicable): 3.7.6 How reproducible: Can not reproduce Additional info: OS: Ubuntu 16.04.2 LTS Setup: distributed replicated. 6 nodes, each has 6 bricks log from etc-glusterfs-glusterd.vol.log: [2017-07-21 06:50:19.234951] W [glusterd-locks.c:577:glusterd_mgmt_v3_lock] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_op_begin_synctask+0x2c) [0x7f3754aff2cc] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/mgmt/glusterd.so(gd_sync_task_begin+0x927) [0x7f3754aff1f7] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_lock+0x54d) [0x7f3754b0275d] ) 0-management: Lock for Volume1 held by aa66f2f9-8fb9-4669-852c-df0fba010395 [2017-07-21 06:50:19.237358] E [MSGID: 106119] [glusterd-syncop.c:1823:gd_sync_task_begin] 0-management: Unable to acquire lock for Volume1 The message "I [MSGID: 106488] [glusterd-handler.c:1472:__glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req" repeated 3 times between [2017-07-21 06:50:05.082063] and [2017-07-21 06:51:05.177239] The message "I [MSGID: 106487] [glusterd-handler.c:1411:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req" repeated 3 times between [2017-07-21 06:49:54.642058] and [2017-07-21 06:51:05.274672] The message "I [MSGID: 106499] [glusterd-handler.c:4267:__glusterd_handle_status_volume] 0-management: Received status volume req for volume Volume1" repeated 3 times between [2017-07-21 06:50:05.492489] and [2017-07-21 06:51:16.106958] pending frames: patchset: git://git.gluster.com/glusterfs.git signal received: 6 time of crash: 2017-07-21 06:51:38 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.7.6 /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x7e)[0x7f3759a68e6e] /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f3759a8354d] /lib/x86_64-linux-gnu/libc.so.6(+0x354b0)[0x7f3758e634b0] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38)[0x7f3758e63428] /lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7f3758e6502a] /lib/x86_64-linux-gnu/libc.so.6(+0x777ea)[0x7f3758ea57ea] /lib/x86_64-linux-gnu/libc.so.6(+0x7e6f8)[0x7f3758eac6f8] /lib/x86_64-linux-gnu/libc.so.6(+0x813be)[0x7f3758eaf3be] /lib/x86_64-linux-gnu/libc.so.6(__libc_calloc+0xba)[0x7f3758eb221a] /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(__gf_calloc+0x6a)[0x7f3759a9903a] /usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/rpc-transport/socket.so(+0x8a1b)[0x7f3750a9aa1b] /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x7bbea)[0x7f3759ac4bea] /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f37591fe6ba] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f3758f3482d] ---------
Please provide the output of 't a a bt' of gdb session along with attaching the core file.
I'm (In reply to Atin Mukherjee from comment #1) > Please provide the output of 't a a bt' of gdb session along with attaching > the core file. I'm sorry I'm not used to gdb. As far I understand this is a debugger for c / c++ programs. At the moment of the crash the glusterfsd had no debugger attached. Did I misunderstand something?
if the program crashes you should be able to see a core file generated and placed in the location depending on your core_pattern setting. We'd need to have the core file attached in the bug (atleast) to figure out the reason of the crash. Additionally the backtrace of the core through gdb (following commands) would help us faster to narrow down the issue. Please do attach the glusterd & cmd_history log from all the nodes. # gdb <core file> glusterd t a a bt
The output of gdb: (gdb) core /root/core_glusterfs_crash warning: core file may not match specified executable file. [New LWP 13967] [New LWP 13965] [New LWP 28763] [New LWP 13886] [New LWP 13885] [New LWP 13966] [New LWP 13887] [New LWP 13888] [New LWP 13884] [New LWP 13964] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `/usr/sbin/glusterd -p /var/run/glusterd.pid'. Program terminated with signal SIGABRT, Aborted. #0 0x00007f3758e63428 in sigandset (dest=0x363c, left=0x368f, right=0x6) at sigandset.c:33 33 return __sigandset (dest, left, right); [Current thread is 1 (Thread 0x7f374a7fc700 (LWP 13967))]
Can you please provide the out put of "t a a bt" ?
Created attachment 1304037 [details] cmd_history_all_nodes
output of "t a a bt" (gdb) t a a bt Thread 10 (Thread 0x7f374bfff700 (LWP 13964)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x00007f3754b004e3 in ?? () from /usr/lib/x86_64-linux-gnu/glusterfs/3.7.6/xlator/mgmt/glusterd.so #2 0x00007f37591fe6ba in start_thread (arg=0x7f374bfff700) at pthread_create.c:333 #3 0x00007f3758f3482d in capget () at ../sysdeps/unix/syscall-template.S:84 #4 0x0000000000000000 in ?? () Thread 9 (Thread 0x7f3759f23780 (LWP 13884)): #0 0x00007f37591ff98d in pthread_join (threadid=139875458209536, thread_return=0x0) at pthread_join.c:90 #1 0x00007f3759ac4eeb in ?? () from /usr/lib/x86_64-linux-gnu/libglusterfs.so.0 #2 0x0000000000405501 in main () Thread 8 (Thread 0x7f37555b5700 (LWP 13888)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225 #1 0x00007f3759aa8d98 in syncenv_task () from /usr/lib/x86_64-linux-gnu/libglusterfs.so.0 #2 0x00007f3759aa9970 in syncenv_processor () from /usr/lib/x86_64-linux-gnu/libglusterfs.so.0 #3 0x00007f37591fe6ba in start_thread (arg=0x7f37555b5700) at pthread_create.c:333 #4 0x00007f3758f3482d in capget () at ../sysdeps/unix/syscall-template.S:84 #5 0x0000000000000000 in ?? () Thread 7 (Thread 0x7f3755db6700 (LWP 13887)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225 #1 0x00007f3759aa8d98 in syncenv_task () from /usr/lib/x86_64-linux-gnu/libglusterfs.so.0 #2 0x00007f3759aa9970 in syncenv_processor () from /usr/lib/x86_64-linux-gnu/libglusterfs.so.0 #3 0x00007f37591fe6ba in start_thread (arg=0x7f3755db6700) at pthread_create.c:333 #4 0x00007f3758f3482d in capget () at ../sysdeps/unix/syscall-template.S:84 #5 0x0000000000000000 in ?? () Thread 6 (Thread 0x7f374affd700 (LWP 13966)): #0 0x00007f3758f34e23 in vmsplice () at ../sysdeps/unix/syscall-template.S:84 #1 0x00007f3759ac4a58 in ?? () from /usr/lib/x86_64-linux-gnu/libglusterfs.so.0 #2 0x00007f37591fe6ba in start_thread (arg=0x7f374affd700) at pthread_create.c:333 #3 0x00007f3758f3482d in capget () at ../sysdeps/unix/syscall-template.S:84 #4 0x0000000000000000 in ?? () Thread 5 (Thread 0x7f3756db8700 (LWP 13885)): #0 0x00007f3759207c1d in nanosleep () at ../sysdeps/unix/syscall-template.S:84 #1 0x00007f3759a86744 in gf_timer_proc () from /usr/lib/x86_64-linux-gnu/libglusterfs.so.0 ---Type <return> to continue, or q <return> to quit---
Missing piece of "t a a bt" Thread 5 (Thread 0x7f3756db8700 (LWP 13885)): #0 0x00007f3759207c1d in nanosleep () at ../sysdeps/unix/syscall-template.S:84 #1 0x00007f3759a86744 in gf_timer_proc () from /usr/lib/x86_64-linux-gnu/libglusterfs.so.0 ---Type <return> to continue, or q <return> to quit--- #2 0x00007f37591fe6ba in start_thread (arg=0x7f3756db8700) at pthread_create.c:333 #3 0x00007f3758f3482d in capget () at ../sysdeps/unix/syscall-template.S:84 #4 0x0000000000000000 in ?? () Thread 4 (Thread 0x7f37565b7700 (LWP 13886)): #0 do_sigwait (sig=0x7f37565b6e3c, set=<optimized out>) at ../sysdeps/unix/sysv/linux/sigwait.c:64 #1 __sigwait (set=<optimized out>, sig=0x7f37565b6e3c) at ../sysdeps/unix/sysv/linux/sigwait.c:96 #2 0x00000000004080bf in glusterfs_sigwaiter () #3 0x00007f37591fe6ba in start_thread (arg=0x7f37565b7700) at pthread_create.c:333 #4 0x00007f3758f3482d in capget () at ../sysdeps/unix/syscall-template.S:84 #5 0x0000000000000000 in ?? () Thread 3 (Thread 0x7f3749ffb700 (LWP 28763)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225 #1 0x00007f3759aa8d98 in syncenv_task () from /usr/lib/x86_64-linux-gnu/libglusterfs.so.0 #2 0x00007f3759aa9970 in syncenv_processor () from /usr/lib/x86_64-linux-gnu/libglusterfs.so.0 #3 0x00007f37591fe6ba in start_thread (arg=0x7f3749ffb700) at pthread_create.c:333 #4 0x00007f3758f3482d in capget () at ../sysdeps/unix/syscall-template.S:84 #5 0x0000000000000000 in ?? () Thread 2 (Thread 0x7f374b7fe700 (LWP 13965)): #0 0x00007f3758f34e23 in vmsplice () at ../sysdeps/unix/syscall-template.S:84 #1 0x00007f3759ac4a58 in ?? () from /usr/lib/x86_64-linux-gnu/libglusterfs.so.0 #2 0x00007f37591fe6ba in start_thread (arg=0x7f374b7fe700) at pthread_create.c:333 #3 0x00007f3758f3482d in capget () at ../sysdeps/unix/syscall-template.S:84 #4 0x0000000000000000 in ?? () Thread 1 (Thread 0x7f374a7fc700 (LWP 13967)): #0 0x00007f3758e63428 in sigandset (dest=0x363c, left=0x368f, right=0x6) at sigandset.c:33 #1 0x0000000000000020 in ?? () #2 0x0000000000000000 in ?? ()
symbol tables are missing which gives us no clue on the complete backtrace. If you're able to reproduce the crash with the latest release, please let us know and we'll be happy to debug it further. For now considering this version is quite old and there's no reproducer mentioned in the bug, I'm closing it.