Description of problem: glusted is crashing as described in the thread at https://www.gluster.org/pipermail/gluster-users/2015-October/023783.html Community members looked at the core dump and said it looks like a glibc corruption. Vijay Bellur requested a bug report be opened. Version-Release number of selected component (if applicable): # rpm -qa | grep gluster glusterfs-geo-replication-3.7.4-2.el6.x86_64 glusterfs-client-xlators-3.7.4-2.el6.x86_64 glusterfs-3.7.4-2.el6.x86_64 glusterfs-libs-3.7.4-2.el6.x86_64 glusterfs-api-3.7.4-2.el6.x86_64 glusterfs-fuse-3.7.4-2.el6.x86_64 glusterfs-server-3.7.4-2.el6.x86_64 glusterfs-cli-3.7.4-2.el6.x86_64 How reproducible: It has core dumped on multiple nodes multiple times. Steps to Reproduce: Not sure of how to reproduce Actual results: Gluster to keep running Expected results: Gluster crashing Additional info: # gluster volume info Volume Name: gv0 Type: Replicate Volume ID: fc50d049-cebe-4a3f-82a6-748847226099 Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: eapps-gluster01.uwg.westga.edu:/export/sdb1/gv0 Brick2: eapps-gluster02.uwg.westga.edu:/export/sdb1/gv0 Brick3: eapps-gluster03.uwg.westga.edu:/export/sdb1/gv0 Options Reconfigured: diagnostics.count-fop-hits: on diagnostics.latency-measurement: on nfs.drc: off # gluster volume status Status of volume: gv0 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick eapps-gluster01.uwg.westga.edu:/expor t/sdb1/gv0 49152 0 Y 36149 Brick eapps-gluster02.uwg.westga.edu:/expor t/sdb1/gv0 49152 0 Y 24797 Brick eapps-gluster03.uwg.westga.edu:/expor t/sdb1/gv0 N/A N/A N N/A NFS Server on localhost 2049 0 Y 26812 Self-heal Daemon on localhost N/A N/A Y 26820 NFS Server on eapps-gluster03.uwg.westga.ed u 2049 0 Y 47314 Self-heal Daemon on eapps-gluster03.uwg.wes tga.edu N/A N/A Y 47322 NFS Server on eapps-gluster02.uwg.westga.ed u 2049 0 Y 52522 Self-heal Daemon on eapps-gluster02.uwg.wes tga.edu N/A N/A Y 52535 Task Status of Volume gv0 ------------------------------------------------------------------------------ There are no active volume tasks # cat /etc/redhat-release Red Hat Enterprise Linux Server release 6.7 (Santiago) Core dump info requested in the thread: Both of the requested trace commands are below: Core was generated by `/usr/sbin/glusterd --pid-file=/var/run/glusterd.pid'. Program terminated with signal 6, Aborted. #0 0x0000003b91432625 in raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 64 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); (gdb) bt #0 0x0000003b91432625 in raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x0000003b91433e05 in abort () at abort.c:92 #2 0x0000003b91470537 in __libc_message (do_abort=2, fmt=0x3b915588c0 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:198 #3 0x0000003b91475f4e in malloc_printerr (action=3, str=0x3b9155687d "corrupted double-linked list", ptr=<value optimized out>, ar_ptr=<value optimized out>) at malloc.c:6350 #4 0x0000003b914763d3 in malloc_consolidate (av=0x7fee90000020) at malloc.c:5216 #5 0x0000003b91479c28 in _int_malloc (av=0x7fee90000020, bytes=<value optimized out>) at malloc.c:4415 #6 0x0000003b9147a7ed in __libc_calloc (n=<value optimized out>, elem_size=<value optimized out>) at malloc.c:4093 #7 0x0000003b9345c81f in __gf_calloc (nmemb=<value optimized out>, size=<value optimized out>, type=59, typestr=0x7fee9ed2d708 "gf_common_mt_rpc_trans_t") at mem-pool.c:117 #8 0x00007fee9ed2830b in socket_server_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0xf3eca0, poll_in=1, poll_out=<value optimized out>, poll_err=<value optimized out>) at socket.c:2622 #9 0x0000003b9348b0a0 in event_dispatch_epoll_handler (data=0xf408b0) at event-epoll.c:575 #10 event_dispatch_epoll_worker (data=0xf408b0) at event-epoll.c:678 #11 0x0000003b91807a51 in start_thread (arg=0x7fee9db3b700) at pthread_create.c:301 #12 0x0000003b914e893d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 (gdb) t a a bt Thread 9 (Thread 0x7fee9e53c700 (LWP 37122)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:183 #1 0x00007fee9fffcf93 in hooks_worker (args=<value optimized out>) at glusterd-hooks.c:534 #2 0x0000003b91807a51 in start_thread (arg=0x7fee9e53c700) at pthread_create.c:301 #3 0x0000003b914e893d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Thread 8 (Thread 0x7feea0c99700 (LWP 36996)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:239 #1 0x0000003b9346cbdb in syncenv_task (proc=0xefa8c0) at syncop.c:607 #2 0x0000003b93472cb0 in syncenv_processor (thdata=0xefa8c0) at syncop.c:699 #3 0x0000003b91807a51 in start_thread (arg=0x7feea0c99700) at pthread_create.c:301 #4 0x0000003b914e893d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Thread 7 (Thread 0x7feea209b700 (LWP 36994)): #0 do_sigwait (set=<value optimized out>, sig=0x7feea209ae5c) at ../sysdeps/unix/sysv/linux/sigwait.c:65 #1 __sigwait (set=<value optimized out>, sig=0x7feea209ae5c) at ../sysdeps/unix/sysv/linux/sigwait.c:100 #2 0x0000000000405dfb in glusterfs_sigwaiter (arg=<value optimized out>) at glusterfsd.c:1989 #3 0x0000003b91807a51 in start_thread (arg=0x7feea209b700) at pthread_create.c:301 #4 0x0000003b914e893d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Thread 6 (Thread 0x7feea2a9c700 (LWP 36993)): #0 0x0000003b9180efbd in nanosleep () at ../sysdeps/unix/syscall-template.S:82 #1 0x0000003b934473ea in gf_timer_proc (ctx=0xecc010) at timer.c:205 #2 0x0000003b91807a51 in start_thread (arg=0x7feea2a9c700) at pthread_create.c:301 #3 0x0000003b914e893d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Thread 5 (Thread 0x7feea9e04740 (LWP 36992)): #0 0x0000003b918082ad in pthread_join (threadid=140662814254848, thread_return=0x0) at pthread_join.c:89 #1 0x0000003b9348ab4d in event_dispatch_epoll (event_pool=0xeeb5b0) at event-epoll.c:762 #2 0x0000000000407b24 in main (argc=2, argv=0x7fff5294adc8) at glusterfsd.c:2333 Thread 4 (Thread 0x7feea169a700 (LWP 36995)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:239 #1 0x0000003b9346cbdb in syncenv_task (proc=0xefa500) at syncop.c:607 #2 0x0000003b93472cb0 in syncenv_processor (thdata=0xefa500) at syncop.c:699 #3 0x0000003b91807a51 in start_thread (arg=0x7feea169a700) at pthread_create.c:301 #4 0x0000003b914e893d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Thread 3 (Thread 0x7fee9d13a700 (LWP 37124)): #0 0x0000003b914e8f33 in epoll_wait () at ../sysdeps/unix/syscall-template.S:82 #1 0x0000003b9348aed1 in event_dispatch_epoll_worker (data=0xf405b0) at event-epoll.c:668 #2 0x0000003b91807a51 in start_thread (arg=0x7fee9d13a700) at pthread_create.c:301 #3 0x0000003b914e893d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Thread 2 (Thread 0x7fee97fff700 (LWP 37125)): #0 0x0000003b914e8f33 in epoll_wait () at ../sysdeps/unix/syscall-template.S:82 #1 0x0000003b9348aed1 in event_dispatch_epoll_worker (data=0xf6b4d0) at event-epoll.c:668 #2 0x0000003b91807a51 in start_thread (arg=0x7fee97fff700) at pthread_create.c:301 #3 0x0000003b914e893d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Thread 1 (Thread 0x7fee9db3b700 (LWP 37123)): #0 0x0000003b91432625 in raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x0000003b91433e05 in abort () at abort.c:92 #2 0x0000003b91470537 in __libc_message (do_abort=2, fmt=0x3b915588c0 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:198 ---Type <return> to continue, or q <return> to quit--- #3 0x0000003b91475f4e in malloc_printerr (action=3, str=0x3b9155687d "corrupted double-linked list", ptr=<value optimized out>, ar_ptr=<value optimized out>) at malloc.c:6350 #4 0x0000003b914763d3 in malloc_consolidate (av=0x7fee90000020) at malloc.c:5216 #5 0x0000003b91479c28 in _int_malloc (av=0x7fee90000020, bytes=<value optimized out>) at malloc.c:4415 #6 0x0000003b9147a7ed in __libc_calloc (n=<value optimized out>, elem_size=<value optimized out>) at malloc.c:4093 #7 0x0000003b9345c81f in __gf_calloc (nmemb=<value optimized out>, size=<value optimized out>, type=59, typestr=0x7fee9ed2d708 "gf_common_mt_rpc_trans_t") at mem-pool.c:117 #8 0x00007fee9ed2830b in socket_server_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0xf3eca0, poll_in=1, poll_out=<value optimized out>, poll_err=<value optimized out>) at socket.c:2622 #9 0x0000003b9348b0a0 in event_dispatch_epoll_handler (data=0xf408b0) at event-epoll.c:575 #10 event_dispatch_epoll_worker (data=0xf408b0) at event-epoll.c:678 #11 0x0000003b91807a51 in start_thread (arg=0x7fee9db3b700) at pthread_create.c:301 #12 0x0000003b914e893d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
Hello guys, I've been experiencing the same issue lately. I've got a 255 GB core dump ... that I could'nt exploit. The symptoms are quite the same that explained in the original report (https://www.gluster.org/pipermail/gluster-users/2015-October/023784.html) glustershd.log.2.gz:[2015-11-26 06:47:59.053991] W [glusterfsd.c:1236:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182) [0x7fbe53b8f182] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xd5) [0x7fbe548cc7c5] -->/usr/sbin/glusterfs(clea nup_and_exit+0x69) [0x7fbe548cc659] ) 0-: received signum (15), shutting down after that message, glusterd is crashed. I'm running glusterfs on ubuntu 14.04: ii glusterfs-client 3.7.6-ubuntu1~trusty1 amd64 clustered file-system (client package) ii glusterfs-common 3.7.6-ubuntu1~trusty1 amd64 GlusterFS common libraries and translator modules ii glusterfs-server 3.7.6-ubuntu1~trusty1 amd64 clustered file-system (server package) I will follow this thread, if you need more input, feel free to let me know.
Hello guys, Any updates on this bug? What do you suggest ? Should I wait for a immient patch or downgrade my servers to a prior version (<= 3.7.4) ?
Could you mention the configuration values for the following options in glusterd.vol file? ping-timeout event-threads We observed few crashes when multi threaded e-poll support was enabled in glusterd and I suspect this could be one of them. We had decided to revert the settings. You shouldn't be seeing this crash with 3.7.6 onwards.
Hello, thanks for your quick answers. Here's a sample or glusterd.vol: volume management type mgmt/glusterd option working-directory /var/lib/glusterd option transport-type socket,rdma option transport.socket.keepalive-time 10 option transport.socket.keepalive-interval 2 option transport.socket.read-fail-log off option ping-timeout 30 # option base-port 49152 end-volume
Are you ok to upgrade to 3.7.6 and try it out?
Hello, The main problem is that I'm already using that version (see comment#1). Should I downgrade ?
(In reply to florian.leduc from comment #6) > Hello, > > The main problem is that I'm already using that version (see comment#1). > Should I downgrade ? Hi florian, Could you do some configuration in glusterd.vol file which present in /usr/local/etc/glusterfs/glusterd.vol file. in that file add/modify below entry: option event-threads 1 option ping-timeout 0 and restart the glusterd, and let me know if you face glusterd crash problem again.
I've one more point to share. I thought we had already disabled multi threaded e-poll support in GlusterD and it seems like we missed to do so and will surely do it in next 3.7.x release. #c7 actually is a work around to disable it.
Hello Guys, I'll do that today or tomorrow. I'll keep you up to date.
Hi florian, patch http://review.gluster.org/#/c/12874/ will be available soon in gluster codebase. meanwhile you can do configuration of glusterd.vol file manually and let us know if the issue is still reproducing after doing configuration.
Perfect, I've just modified the settings. We will monitor our systems intensively and let you know if the crashes still occur. Thanks for your quick replies.
Hello Guys, No glusterd crashing during the whole weekend :). Should I maintain those options in my CMDB or should I wait for the next patch to get it? Regards,
(In reply to florian.leduc from comment #12) > Hello Guys, > > No glusterd crashing during the whole weekend :). Should I maintain those > options in my CMDB or should I wait for the next patch to get it? > > Regards, Folrian, We'd encourage you to maintain the same configuration till we release 3.7.7. Thanks, Atin
Hello guys, For some times, no crashed occured, but after enabling quota feature we started to see crashes of glusterfsd (but no more glusterd) and we experienced wierd behavior: 1. glusterfsd crashes from times to times (see backtrace below) 2. after enabling quotas, a lot of CPU was consumed (around 60% of 32 vcpu). 3. a lot of split-brain and unsynched entries has appeared in gluster vomheal info [2015-12-15 17:35:54.236684] I [glusterfsd-mgmt.c:57:mgmt_cbk_spec] 0-mgmt: Volume file changed [2015-12-15 17:35:54.241767] I [graph.c:269:gf_add_cmdline_options] 0-data-01-server: adding option 'listen-port' for volume 'data-01-server' with value '49154' [2015-12-15 17:35:54.241810] I [graph.c:269:gf_add_cmdline_options] 0-data-01-posix: adding option 'glusterd-uuid' for volume 'data-01-posix' with value 'e2a44035-0e7d-4796-819a-062f916b0d49' [2015-12-15 17:35:54.248617] I [MSGID: 121037] [changetimerecorder.c:1686:reconfigure] 0-data-01-changetimerecorder: set! [2015-12-15 17:35:54.249140] W [socket.c:3636:reconfigure] 0-data-01-quota: NBIO on -1 failed (Bad file descriptor) [2015-12-15 17:35:54.249388] I [MSGID: 115034] [server.c:403:_check_for_auth_option] 0-/var/opt/hosting/data/volume_data-01: skip format check for non-addr auth option auth.login./var/opt/hosting/data/volume_data-01.allow [2015-12-15 17:35:54.249442] I [MSGID: 115034] [server.c:403:_check_for_auth_option] 0-/var/opt/hosting/data/volume_data-01: skip format check for non-addr auth option auth.login.8d63107f-2fe9-40ce-99e6-6a7a6ac0d49e.password [2015-12-15 17:35:54.249648] I [login.c:81:gf_auth] 0-auth/login: allowed user names: 8d63107f-2fe9-40ce-99e6-6a7a6ac0d49e [2015-12-15 17:35:54.249686] I [login.c:81:gf_auth] 0-auth/login: allowed user names: 8d63107f-2fe9-40ce-99e6-6a7a6ac0d49e [2015-12-15 17:35:54.249713] I [login.c:81:gf_auth] 0-auth/login: allowed user names: 8d63107f-2fe9-40ce-99e6-6a7a6ac0d49e [2015-12-15 17:35:54.249741] I [login.c:81:gf_auth] 0-auth/login: allowed user names: 8d63107f-2fe9-40ce-99e6-6a7a6ac0d49e [2015-12-15 17:35:54.249771] I [login.c:81:gf_auth] 0-auth/login: allowed user names: 8d63107f-2fe9-40ce-99e6-6a7a6ac0d49e [2015-12-15 17:35:54.249795] I [login.c:81:gf_auth] 0-auth/login: allowed user names: 8d63107f-2fe9-40ce-99e6-6a7a6ac0d49e pending frames: frame : type(0) op(14) frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2015-12-15 17:35:54 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.7.6 /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x92)[0x7f9aced33562] /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_print_trace+0x31d)[0x7f9aced4f51d] /lib/x86_64-linux-gnu/libc.so.6(+0x36d40)[0x7f9ace131d40] /lib/x86_64-linux-gnu/libpthread.so.0(pthread_spin_lock+0x0)[0x7f9ace4cd0f0] ---------
Hi Vijaikumar, Can you please take a look at it?
Hi Folrian, Could you please provide the stack-trace from the glusterfsd core-dump Thanks, Vijay
Hello Vijaikumar, thanks for your reply. After a quick look at the system. I could'nt find any core dumps, can you give me a hint of where it should be located ? (I tried to google it, but no luck so far). I once got a core dump in brick which is: /var/opt/hosting/data/volume_data-01.
BTW, here's our configuration: Volume Name: data-01 Type: Replicate Volume ID: 4b2b4dbe-a8dd-4988-b76e-0e1fc7c0dda9 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 10.234.208.154:/var/opt/hosting/data/volume_data-01 Brick2: 10.234.208.155:/var/opt/hosting/data/volume_data-01 Options Reconfigured: features.quota-deem-statfs: on features.inode-quota: on features.quota: on performance.readdir-ahead: on nfs.disable: on cluster.self-heal-window-size: 128 cluster.data-self-heal-algorithm: diff cluster.min-free-disk: 5 network.frame-timeout: 600 network.ping-timeout: 60 performance.write-behind-window-size: 128MB performance.cache-max-file-size: 100MB performance.cache-min-file-size: 1KB performance.cache-size: 10GB performance.cache-refresh-timeout: 5 cluster.self-heal-daemon: on
Hi Folrian, Usually core-file will be generated under the root dir '/' (which is a cwd of a brick process). If the core pattern is set in the kernel parameter to gerenerate corefile in a different directory other than cwd, it will be in the specified dir. In RHEL, core pattern may be set to '/var/crash' or '/var/log/crash' Command to check the core pattern 'sysctl kernel.core_pattern' Also check for 'ulimit -c', if it is zero then corefile would have not generated We will also try to re-create this problem in-house Thanks, Vijay
Hi, I havent found any trails of core files on that system (that should be named "core" according to sysctl). I'll do more searching on the next crash. here's a pastebin alerts sent by thru syslog: http://pastebin.com/1JZZuz86
Hi everyone, We're still experiencing a lot of severe crashes (no trail of core dump on the volume) and then a lot of unsynched entries after healing passed even after reinstalling the whole volume from scratch. ==== Logs: red-ack Dec 22 21:10:30: Program: ssh%3A%2F%2Froot%4010.234.208.15 [2015-12-22 20, Facility: daemon, Level: crit 10:30.601517] C [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 0-data-01-client-0: server 10.234.240.57:49153 has not responded in the last 60 seconds, disconnecting. 10:30.601517] C [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 0-data-01-client-0: server 10.234.240.57:49153 has not responded in the last 60 seconds, disconnecting. red-ack Dec 22 21:10:21: Program: glustershd[40694], Facility: daemon, Level: crit [2015-12-22 20:10:21.209994] C [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 0-data-01-client-0: server 10.234.240.57:49153 has not responded in the last 60 seconds, disconnecting. [2015-12-22 20:10:21.209994] C [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 0-data-01-client-0: server 10.234.240.57:49153 has not responded in the last 60 seconds, disconnecting. red-ack Dec 22 21:10:15: Program: ssh%3A%2F%2Froot%4010.234.144.57 [2015-12-22 20, Facility: daemon, Level: crit 10:15.976956] C [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 0-data-01-client-0: server 10.234.240.57:49153 has not responded in the last 60 seconds, disconnecting. 10:15.976956] C [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 0-data-01-client-0: server 10.234.240.57:49153 has not responded in the last 60 seconds, disconnecting. red-ack Dec 22 21:09:30: Program: var-opt-hosting-shared-volumes-d [2015-12-22 20, Facility: daemon, Level: crit 09:30.414887] C [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 0-data-01-client-0: server 10.234.240.57:49153 has not responded in the last 60 seconds, disconnecting. 09:30.414887] C [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 0-data-01-client-0: server 10.234.240.57:49153 has not responded in the last 60 seconds, disconnecting. ==== Volume Heal info output: .... <gfid:e2d18ab9-a607-499d-babf-8fdaa90dd0bb> <gfid:199ba193-0788-4e3b-8951-26f0841c7e45> <gfid:77e2401a-2b98-4713-99b3-444bff26a222> <gfid:aa47948d-cd91-4d70-941d-21342d4acf06> <gfid:ef1f3a4f-6c7b-4741-a846-e8e78174369a> <gfid:38856f67-d776-4000-ab42-e548a0ab5f09> <gfid:7aa8f688-a53b-4962-81da-ffe5c45ac025> <gfid:b9d4bef4-bdee-45dc-bac5-85fdb45f6f41> <gfid:ba930fd2-3f46-4c32-99f4-6b6f344b649b> <gfid:4d6b8109-cf72-4837-bc48-45158785227a> <gfid:62025fc2-e011-4ce0-a3bb-2815bceaaac4> Number of entries: 853 Could you please advise. Thanks.
Is the crash from glusterd or brick process?
Hello Atin, I'd say the brick process but I have the feeling that ping-timeout set to 0 may be related to those crashes/timeouts. What do you suggest ? keep feeding this thread or opening a new one ?
(In reply to florian.leduc from comment #23) > Hello Atin, > > I'd say the brick process but I have the feeling that ping-timeout set to 0 > may be related to those crashes/timeouts. I don't think ping timeout will contribute to it. > > What do you suggest ? keep feeding this thread or opening a new one ? I highly recommend of opening a new bug for this as otherwise it will be misleading since this bug talks about a crash in glusterd process.
Since I've not received any further details around this bug, I am closing it right now, feel free to reopen if the issue persists.