Bug 1729971 - core file generated - when EC volume stop and start is executed for 10 loops on a EC+Brickmux setup
Summary: core file generated - when EC volume stop and start is executed for 10 loops...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: core
Version: rhgs-3.5
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: RHGS 3.5.0
Assignee: Mohit Agrawal
QA Contact: Upasana
URL:
Whiteboard:
Depends On:
Blocks: 1696809 1730409
TreeView+ depends on / blocked
 
Reported: 2019-07-15 12:46 UTC by Upasana
Modified: 2019-11-06 09:15 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1730409 (view as bug list)
Environment:
Last Closed: 2019-10-30 12:22:15 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2019:3249 0 None None None 2019-10-30 12:22:51 UTC

Description Upasana 2019-07-15 12:46:11 UTC
Description of problem:
========================
Had created 201 EC volumes and was starting and stopping them for 10 loops , while that was going on , core was dumped on the node 

Version-Release number of selected component (if applicable):
=============================================================
glusterfs-6.0-7.el7rhgs.x86_64


How reproducible:
=================
1/1

Steps to Reproduce:
==================
1.Created 201 EC volumes on a brickmux setup 
2.Was starting and stopping all the volumes in loop
for i in {1..10};do for z in $(gluster v list) ;do gluster v stop $z --mode=script;sleep 2;done;sleep 60;echo;for y in $(gluster v list|grep vol_);do gluster v start $y;done;sleep 60;done

3.Core file generated on the node where step.2 command was running

Actual results:
================
Core file generated

Expected results:
=================
Core file should not be generated

Additional info:
=================


[root@dhcp43-44 /]# gdb ./core.31291 
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
[New LWP 788]
[New LWP 4000]
[New LWP 1737]
[New LWP 1956]
[New LWP 2394]
[New LWP 2753]
[New LWP 2754]
[New LWP 2755]
[New LWP 2937]
[New LWP 3999]
[New LWP 5055]
[New LWP 5940]
[New LWP 6184]
[New LWP 6185]
[New LWP 6186]
[New LWP 6188]
[New LWP 6190]
[New LWP 6191]
[New LWP 31291]
[New LWP 31292]
[New LWP 31295]
[New LWP 31296]
[New LWP 31298]
[New LWP 31299]
[New LWP 31308]
[New LWP 31337]
[New LWP 31338]
[New LWP 31351]
[New LWP 31360]
[New LWP 31502]
[New LWP 31503]
[New LWP 32728]
[New LWP 470]
[New LWP 31293]
[New LWP 31294]
[New LWP 6187]
[New LWP 31297]
[New LWP 6189]
[New LWP 6183]
Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd.debug...done.
done.
Missing separate debuginfo for 
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/fd/8aae983dfbac2604017d27f4f3ead73b598514
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfsd -s 10.70.43.44 --volfile-id vol_1-1.10.70.43.44.gluster-br'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f2ed9e394c1 in posix_janitor_task (data=0x7f2ea1686570) at posix-helpers.c:1460
1460	    if ((now - priv->last_landfill_check) > priv->janitor_sleep_duration) {
Missing separate debuginfos, use: debuginfo-install glibc-2.17-292.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_6.x86_64 libacl-2.2.51-14.el7.x86_64 libaio-0.3.109-13.el7.x86_64 libattr-2.4.46-13.el7.x86_64 libcom_err-1.42.9-16.el7.x86_64 libgcc-4.8.5-39.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 libuuid-2.23.2-61.el7.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 sqlite-3.7.17-8.el7.x86_64 sssd-client-1.16.4-21.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) 











Crash logs ---
2019-07-15 11:08:59.466031] I [barrier.c:648:fini] 0-vol_2-98-barrier: Disabling barriering and dequeuing all the queued fops
[2019-07-15 11:08:59.466114] I [io-stats.c:4027:fini] 0-vol_2-98-io-stats: io-stats translator unloaded
pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash:
2019-07-15 11:08:59
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 6.0
/lib64/libglusterfs.so.0(+0x27210)[0x7f2ee8782210]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f2ee878cc34]
/lib64/libc.so.6(+0x363f0)[0x7f2ee6dbe3f0]
/usr/lib64/glusterfs/6.0/xlator/storage/posix.so(+0x64c1)[0x7f2ed9e394c1]
/lib64/libglusterfs.so.0(+0x65c60)[0x7f2ee87c0c60]
/lib64/libc.so.6(+0x48180)[0x7f2ee6dd0180]
---------
[2019-07-15 11:10:08.817635] I [MSGID: 100030] [glusterfsd.c:2819:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 6.0 (args: /usr/sbin/glusterfsd -s 10.70.43.44 --volfile-id vol_1-1.10.70.43.44.gluster-brick1-vol1-1 -p /var/run/gluster/vols/vol_1-1/10.70.43.44-gluster-brick1-vol1-1.pid -S /var/run/gluster/66c054dd1baae0bd.socket --brick-name /gluster/brick1/vol1-1 -l /var/log/glusterfs/bricks/gluster-brick1-vol1-1.log --xlator-option *-posix.glusterd-uuid=cf15b682-0080-43bb-b9b9-f7d71b5b0e76 --process-name brick --brick-port 49152 --xlator-option vol_1-1-server.listen-port=49152 --brick-mux)
[2019-07-15 11:10:08.818215] I [glusterfsd.c:2546:daemonize

Comment 8 Mohit Agrawal 2019-07-16 15:44:53 UTC
RCA: Brick is crashed at the time of accessing posix_priv members in 
     janitor_task code path.janitor tasks are managed by synctask and 
     currently posix_fini deletes the timer.To avoid the crash delete timer 
     at the time of getting PARENT_DOWN event.

Comment 17 errata-xmlrpc 2019-10-30 12:22:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:3249


Note You need to log in before you can comment on or make changes to this bug.