Description of problem: ======================== Had created 201 EC volumes and was starting and stopping them for 10 loops , while that was going on , core was dumped on the node Version-Release number of selected component (if applicable): ============================================================= glusterfs-6.0-7.el7rhgs.x86_64 How reproducible: ================= 1/1 Steps to Reproduce: ================== 1.Created 201 EC volumes on a brickmux setup 2.Was starting and stopping all the volumes in loop for i in {1..10};do for z in $(gluster v list) ;do gluster v stop $z --mode=script;sleep 2;done;sleep 60;echo;for y in $(gluster v list|grep vol_);do gluster v start $y;done;sleep 60;done 3.Core file generated on the node where step.2 command was running Actual results: ================ Core file generated Expected results: ================= Core file should not be generated Additional info: ================= [root@dhcp43-44 /]# gdb ./core.31291 GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... [New LWP 788] [New LWP 4000] [New LWP 1737] [New LWP 1956] [New LWP 2394] [New LWP 2753] [New LWP 2754] [New LWP 2755] [New LWP 2937] [New LWP 3999] [New LWP 5055] [New LWP 5940] [New LWP 6184] [New LWP 6185] [New LWP 6186] [New LWP 6188] [New LWP 6190] [New LWP 6191] [New LWP 31291] [New LWP 31292] [New LWP 31295] [New LWP 31296] [New LWP 31298] [New LWP 31299] [New LWP 31308] [New LWP 31337] [New LWP 31338] [New LWP 31351] [New LWP 31360] [New LWP 31502] [New LWP 31503] [New LWP 32728] [New LWP 470] [New LWP 31293] [New LWP 31294] [New LWP 6187] [New LWP 31297] [New LWP 6189] [New LWP 6183] Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd.debug...done. done. Missing separate debuginfo for Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/fd/8aae983dfbac2604017d27f4f3ead73b598514 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/sbin/glusterfsd -s 10.70.43.44 --volfile-id vol_1-1.10.70.43.44.gluster-br'. Program terminated with signal 11, Segmentation fault. #0 0x00007f2ed9e394c1 in posix_janitor_task (data=0x7f2ea1686570) at posix-helpers.c:1460 1460 if ((now - priv->last_landfill_check) > priv->janitor_sleep_duration) { Missing separate debuginfos, use: debuginfo-install glibc-2.17-292.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_6.x86_64 libacl-2.2.51-14.el7.x86_64 libaio-0.3.109-13.el7.x86_64 libattr-2.4.46-13.el7.x86_64 libcom_err-1.42.9-16.el7.x86_64 libgcc-4.8.5-39.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 libuuid-2.23.2-61.el7.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 sqlite-3.7.17-8.el7.x86_64 sssd-client-1.16.4-21.el7.x86_64 zlib-1.2.7-18.el7.x86_64 (gdb) Crash logs --- 2019-07-15 11:08:59.466031] I [barrier.c:648:fini] 0-vol_2-98-barrier: Disabling barriering and dequeuing all the queued fops [2019-07-15 11:08:59.466114] I [io-stats.c:4027:fini] 0-vol_2-98-io-stats: io-stats translator unloaded pending frames: frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) patchset: git://git.gluster.org/glusterfs.git signal received: 11 time of crash: 2019-07-15 11:08:59 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 6.0 /lib64/libglusterfs.so.0(+0x27210)[0x7f2ee8782210] /lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f2ee878cc34] /lib64/libc.so.6(+0x363f0)[0x7f2ee6dbe3f0] /usr/lib64/glusterfs/6.0/xlator/storage/posix.so(+0x64c1)[0x7f2ed9e394c1] /lib64/libglusterfs.so.0(+0x65c60)[0x7f2ee87c0c60] /lib64/libc.so.6(+0x48180)[0x7f2ee6dd0180] --------- [2019-07-15 11:10:08.817635] I [MSGID: 100030] [glusterfsd.c:2819:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 6.0 (args: /usr/sbin/glusterfsd -s 10.70.43.44 --volfile-id vol_1-1.10.70.43.44.gluster-brick1-vol1-1 -p /var/run/gluster/vols/vol_1-1/10.70.43.44-gluster-brick1-vol1-1.pid -S /var/run/gluster/66c054dd1baae0bd.socket --brick-name /gluster/brick1/vol1-1 -l /var/log/glusterfs/bricks/gluster-brick1-vol1-1.log --xlator-option *-posix.glusterd-uuid=cf15b682-0080-43bb-b9b9-f7d71b5b0e76 --process-name brick --brick-port 49152 --xlator-option vol_1-1-server.listen-port=49152 --brick-mux) [2019-07-15 11:10:08.818215] I [glusterfsd.c:2546:daemonize
RCA: Brick is crashed at the time of accessing posix_priv members in janitor_task code path.janitor tasks are managed by synctask and currently posix_fini deletes the timer.To avoid the crash delete timer at the time of getting PARENT_DOWN event.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:3249