Description of problem: While trying to verify glusto-test patches we had observed that glusterd isn't starting during the installation. This was observed for the last 2 nightly builds. https://ci.centos.org/job/gluster_glusto-patch-check/1537/consoleFull # systemctl status glusterd ● glusterd.service - GlusterFS, a clustered file-system server Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Wed 2019-08-21 12:19:20 IST; 24h ago Docs: man:glusterd(8) Process: 13852 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=1/FAILURE) Aug 21 12:19:20 dhcp35-114.lab.eng.blr.redhat.com glusterd[13853]: spinlock 1 Aug 21 12:19:20 dhcp35-114.lab.eng.blr.redhat.com glusterd[13853]: epoll.h 1 Aug 21 12:19:20 dhcp35-114.lab.eng.blr.redhat.com glusterd[13853]: xattr.h 1 Aug 21 12:19:20 dhcp35-114.lab.eng.blr.redhat.com glusterd[13853]: st_atim.tv_nsec 1 Aug 21 12:19:20 dhcp35-114.lab.eng.blr.redhat.com glusterd[13853]: package-string: glusterfs 20190820.95f71df Aug 21 12:19:20 dhcp35-114.lab.eng.blr.redhat.com glusterd[13853]: --------- Aug 21 12:19:20 dhcp35-114.lab.eng.blr.redhat.com systemd[1]: glusterd.service: control process exited, code=exited status=1 Aug 21 12:19:20 dhcp35-114.lab.eng.blr.redhat.com systemd[1]: Failed to start GlusterFS, a clustered file-system server. Aug 21 12:19:20 dhcp35-114.lab.eng.blr.redhat.com systemd[1]: Unit glusterd.service entered failed state. Aug 21 12:19:20 dhcp35-114.lab.eng.blr.redhat.com systemd[1]: glusterd.service failed. Program terminated with signal 6, Aborted. #0 0x00007fcb85868207 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55 55 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); (gdb) bt #0 0x00007fcb85868207 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55 #1 0x00007fcb858698f8 in __GI_abort () at abort.c:90 #2 0x00007fcb858aad27 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7fcb859bb312 "*** %s ***: %s terminated\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:196 #3 0x00007fcb859499e7 in __GI___fortify_fail (msg=msg@entry=0x7fcb859bb2b8 "buffer overflow detected") at fortify_fail.c:30 #4 0x00007fcb85947b62 in __GI___chk_fail () at chk_fail.c:28 #5 0x00007fcb8594727b in ___vsnprintf_chk (s=<optimized out>, maxlen=<optimized out>, flags=<optimized out>, slen=<optimized out>, format=<optimized out>, args=args@entry=0x7fffd7e97708) at vsnprintf_chk.c:37 #6 0x00007fcb85947198 in ___snprintf_chk (s=s@entry=0x7fffd7e97a40 "", maxlen=maxlen@entry=4096, flags=flags@entry=1, slen=slen@entry=3776, format=format@entry=0x7fcb7b48dd4b "%s") at snprintf_chk.c:35 #7 0x00007fcb7b34efb9 in snprintf (__fmt=0x7fcb7b48dd4b "%s", __n=4096, __s=0x7fffd7e97a40 "") at /usr/include/bits/stdio2.h:64 #8 init (this=0x561402cb9520) at glusterd.c:1450 #9 0x00007fcb87222ea1 in __xlator_init (xl=0x561402cb9520) at xlator.c:597 #10 xlator_init (xl=xl@entry=0x561402cb9520) at xlator.c:623 #11 0x00007fcb8725fb29 in glusterfs_graph_init (graph=graph@entry=0x561402cb50f0) at graph.c:422 #12 0x00007fcb87260195 in glusterfs_graph_activate (graph=graph@entry=0x561402cb50f0, ctx=ctx@entry=0x561402c70010) at graph.c:776 #13 0x00005614017c3182 in glusterfs_process_volfp (ctx=ctx@entry=0x561402c70010, fp=fp@entry=0x561402cb4e70) at glusterfsd.c:2728 #14 0x00005614017c333d in glusterfs_volumes_init (ctx=ctx@entry=0x561402c70010) at glusterfsd.c:2800 #15 0x00005614017bea3a in main (argc=4, argv=<optimized out>) at glusterfsd.c:2962 (gdb) t a a bt Thread 7 (Thread 0x7fcb7c002700 (LWP 13727)): #0 0x00007fcb85926f73 in select () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007fcb872a4224 in runner (arg=0x561402cb2bf0) at ../../contrib/timer-wheel/timer-wheel.c:186 #2 0x00007fcb86067dd5 in start_thread (arg=0x7fcb7c002700) at pthread_create.c:307 #3 0x00007fcb8592fead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 6 (Thread 0x7fcb7e807700 (LWP 13722)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 #1 0x00007fcb872337ab in gf_timer_proc (data=0x561402cae280) at timer.c:140 #2 0x00007fcb86067dd5 in start_thread (arg=0x7fcb7e807700) at pthread_create.c:307 #3 0x00007fcb8592fead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 5 (Thread 0x7fcb7e006700 (LWP 13723)): #0 0x00007fcb8606f361 in do_sigwait (sig=0x7fcb7e0050dc, set=<optimized out>) at ../sysdeps/unix/sysv/linux/sigwait.c:60 #1 __sigwait (set=set@entry=0x7fcb7e0050e0, sig=sig@entry=0x7fcb7e0050dc) at ../sysdeps/unix/sysv/linux/sigwait.c:95 #2 0x00005614017c277b in glusterfs_sigwaiter (arg=<optimized out>) at glusterfsd.c:2463 #3 0x00007fcb86067dd5 in start_thread (arg=0x7fcb7e006700) at pthread_create.c:307 #4 0x00007fcb8592fead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 4 (Thread 0x7fcb7d805700 (LWP 13724)): #0 0x00007fcb858f6e2d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007fcb858f6cc4 in __sleep (seconds=0, seconds@entry=30) at ../sysdeps/unix/sysv/linux/sleep.c:137 #2 0x00007fcb87250868 in pool_sweeper (arg=<optimized out>) at mem-pool.c:446 #3 0x00007fcb86067dd5 in start_thread (arg=0x7fcb7d805700) at pthread_create.c:307 #4 0x00007fcb8592fead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 3 (Thread 0x7fcb7d004700 (LWP 13725)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 #1 0x00007fcb872659b0 in syncenv_task (proc=proc@entry=0x561402caea70) at syncop.c:517 #2 0x00007fcb87266860 in syncenv_processor (thdata=0x561402caea70) at syncop.c:584 #3 0x00007fcb86067dd5 in start_thread (arg=0x7fcb7d004700) at pthread_create.c:307 #4 0x00007fcb8592fead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 2 (Thread 0x7fcb7c803700 (LWP 13726)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 #1 0x00007fcb872659b0 in syncenv_task (proc=proc@entry=0x561402caee30) at syncop.c:517 #2 0x00007fcb87266860 in syncenv_processor (thdata=0x561402caee30) at syncop.c:584 ---Type <return> to continue, or q <return> to quit--- #3 0x00007fcb86067dd5 in start_thread (arg=0x7fcb7c803700) at pthread_create.c:307 #4 0x00007fcb8592fead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 1 (Thread 0x7fcb8772a4c0 (LWP 13721)): #0 0x00007fcb85868207 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55 #1 0x00007fcb858698f8 in __GI_abort () at abort.c:90 #2 0x00007fcb858aad27 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7fcb859bb312 "*** %s ***: %s terminated\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:196 #3 0x00007fcb859499e7 in __GI___fortify_fail (msg=msg@entry=0x7fcb859bb2b8 "buffer overflow detected") at fortify_fail.c:30 #4 0x00007fcb85947b62 in __GI___chk_fail () at chk_fail.c:28 #5 0x00007fcb8594727b in ___vsnprintf_chk (s=<optimized out>, maxlen=<optimized out>, flags=<optimized out>, slen=<optimized out>, format=<optimized out>, args=args@entry=0x7fffd7e97708) at vsnprintf_chk.c:37 #6 0x00007fcb85947198 in ___snprintf_chk (s=s@entry=0x7fffd7e97a40 "", maxlen=maxlen@entry=4096, flags=flags@entry=1, slen=slen@entry=3776, format=format@entry=0x7fcb7b48dd4b "%s") at snprintf_chk.c:35 #7 0x00007fcb7b34efb9 in snprintf (__fmt=0x7fcb7b48dd4b "%s", __n=4096, __s=0x7fffd7e97a40 "") at /usr/include/bits/stdio2.h:64 #8 init (this=0x561402cb9520) at glusterd.c:1450 #9 0x00007fcb87222ea1 in __xlator_init (xl=0x561402cb9520) at xlator.c:597 #10 xlator_init (xl=xl@entry=0x561402cb9520) at xlator.c:623 #11 0x00007fcb8725fb29 in glusterfs_graph_init (graph=graph@entry=0x561402cb50f0) at graph.c:422 #12 0x00007fcb87260195 in glusterfs_graph_activate (graph=graph@entry=0x561402cb50f0, ctx=ctx@entry=0x561402c70010) at graph.c:776 #13 0x00005614017c3182 in glusterfs_process_volfp (ctx=ctx@entry=0x561402c70010, fp=fp@entry=0x561402cb4e70) at glusterfsd.c:2728 #14 0x00005614017c333d in glusterfs_volumes_init (ctx=ctx@entry=0x561402c70010) at glusterfsd.c:2800 #15 0x00005614017bea3a in main (argc=4, argv=<optimized out>) at glusterfsd.c:2962 Version-Release number of selected component (if applicable): Whatever is the version in upstream. How reproducible: Always Steps to Reproduce: 1.service glusterd start. Actual results: glusterd crashing with core dump. Expected results: glusterd shouldn't crash and core files shouldn't be created. Additional info:
Kshithij, Can you please mention all steps of reproducer? i.e, what are the steps performed on the cluster before glusterd crashed? I'm not sure but this might be having a relationship with the bugs that are filed under shd-multiplexing feature (saying this just because this is also a sigabrt). Thanks, Sanju
(In reply to Sanju from comment #1) > Kshithij, > > Can you please mention all steps of reproducer? i.e, what are the steps > performed on the cluster before glusterd crashed? Just installed glusterfs using the nightly builds and tried to start glusterd. That's it! I didn't perform any other steps. > I'm not sure but this might be having a relationship with the bugs that are > filed under shd-multiplexing feature (saying this just because this is also > a sigabrt). > > Thanks, > Sanju
I too have seen this, last week, but Kshithij, can you try restarting glusterd? It worked fine after restart.
(In reply to Amar Tumballi from comment #3) > I too have seen this, last week, but Kshithij, can you try restarting > glusterd? It worked fine after restart. I tried restarting as well, it didn't help Amar.
So does this mean it happens every time we try to start glusterd? If so can you please pass the setup (ping offline) ?
(In reply to Atin Mukherjee from comment #5) > So does this mean it happens every time we try to start glusterd? Yes! This happens every time. > If so can > you please pass the setup (ping offline) ? Have shared the details with you offline.
Fix posted https://review.gluster.org/#/c/glusterfs/+/23309