1678232 – [geo-rep]: Cores seen on slave during add-brick

Bug 1678232 - [geo-rep]: Cores seen on slave during add-brick

Summary: [geo-rep]: Cores seen on slave during add-brick

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	distribute
Sub Component:
Version:	rhgs-3.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.4.z Batch Update 4
Assignee:	Susant Kumar Palai
QA Contact:	Rochelle
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1712871 (view as bug list)
Depends On:
Blocks:	1884244
TreeView+	depends on / blocked

Reported:	2019-02-18 10:49 UTC by Rochelle
Modified:	2020-10-01 12:35 UTC (History)
CC List:	16 users (show)
Fixed In Version:	glusterfs-3.12.2-45
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1884244 (view as bug list)
Environment:
Last Closed:	2019-03-27 03:43:40 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2019:0658	0	None	None	None	2019-03-27 03:44:49 UTC

Description Rochelle 2019-02-18 10:49:44 UTC

Description of problem:
=======================

(gdb) bt
#0  0x00007fc668299469 in __inode_get_xl_index (xlator=0x7fc64c03ed20, inode=0x7fc638015a28) at inode.c:549
#1  __inode_unref (inode=0x7fc638015a28, clear=_gf_false) at inode.c:589
#2  0x00007fc668299ef3 in inode_unref (inode=0x7fc638015a28) at inode.c:670
#3  0x00007fc668287a2c in loc_wipe (loc=loc@entry=0x7fc6544e7bc0) at xlator.c:777
#4  0x00007fc6599f1fb9 in dht_heal_path (this=this@entry=0x7fc64c03ed20, path=0x7fc64c1ef170 "/thread5/level00", itable=itable@entry=0x7fc654099260)
    at dht-helper.c:2019
#5  0x00007fc6599f2318 in dht_heal_full_path (data=<optimized out>) at dht-helper.c:2067
#6  0x00007fc6682c3800 in synctask_wrap () at syncop.c:375
#7  0x00007fc6668d8010 in ?? () from /lib64/libc.so.6
#8  0x0000000000000000 in ?? ()


(gdb) bt
#0  __check_cycle (data=<optimized out>, a_dentry=<optimized out>) at inode.c:292
#1  __foreach_ancestor_dentry (dentry=dentry@entry=0x2, data=data@entry=0x7f1cc4032518, per_dentry_fn=0x7f1cf4035680 <__check_cycle>) at inode.c:259
#2  0x00007f1cf4035fa3 in __foreach_ancestor_dentry (dentry=dentry@entry=0x7f1cd8097138, data=data@entry=0x7f1cc4032518, 
    per_dentry_fn=0x7f1cf4035680 <__check_cycle>) at inode.c:276
#3  0x00007f1cf4035fa3 in __foreach_ancestor_dentry (dentry=dentry@entry=0x7f1cc803cd38, data=data@entry=0x7f1cc4032518, 
    per_dentry_fn=0x7f1cf4035680 <__check_cycle>) at inode.c:276
#4  0x00007f1cf4035fa3 in __foreach_ancestor_dentry (dentry=dentry@entry=0x7f1cc403f538, data=data@entry=0x7f1cc4032518, 
    per_dentry_fn=0x7f1cf4035680 <__check_cycle>) at inode.c:276
#5  0x00007f1cf403750e in __is_dentry_cyclic (dentry=0x7f1cc403f538) at inode.c:306
#6  __inode_link (inode=inode@entry=0x7f1cc803d748, parent=parent@entry=0x7f1cc803f248, name=name@entry=0x7f1cc803f019 "level24", 
    iatt=iatt@entry=0x7f1cd8545f70) at inode.c:1174
#7  0x00007f1cf4037a09 in inode_link (inode=0x7f1cc803d748, parent=0x7f1cc803f248, name=name@entry=0x7f1cc803f019 "level24", 
    iatt=iatt@entry=0x7f1cd8545f70) at inode.c:1207
#8  0x00007f1ce578f134 in dht_heal_path (this=this@entry=0x7f1ce00dc100, 
    path=0x7f1cd814bec0 "/thread2/level04/level14/level24/level34/level44/level54", itable=itable@entry=0x7f1cd811b660) at dht-helper.c:2005
#9  0x00007f1ce578f318 in dht_heal_full_path (data=<optimized out>) at dht-helper.c:2067
#10 0x00007f1cf4060800 in synctask_wrap () at syncop.c:375
#11 0x00007f1cf2675010 in ?? () from /lib64/libc.so.6
#12 0x0000000000000000 in ?? ()


Version-Release number of selected component (if applicable):
=============================================================
[root@dhcp42-46 master]# rpm -qa | grep gluster
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-rdma-3.12.2-43.el7rhgs.x86_64
python2-gluster-3.12.2-43.el7rhgs.x86_64
glusterfs-server-3.12.2-43.el7rhgs.x86_64
glusterfs-fuse-3.12.2-43.el7rhgs.x86_64
glusterfs-geo-replication-3.12.2-43.el7rhgs.x86_64
glusterfs-api-3.12.2-43.el7rhgs.x86_64
glusterfs-events-3.12.2-43.el7rhgs.x86_64
vdsm-gluster-4.19.43-2.3.el7rhgs.noarch
tendrl-gluster-integration-1.6.3-10.el7rhgs.noarch
glusterfs-client-xlators-3.12.2-43.el7rhgs.x86_64
glusterfs-cli-3.12.2-43.el7rhgs.x86_64
gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64
glusterfs-libs-3.12.2-43.el7rhgs.x86_64
glusterfs-3.12.2-43.el7rhgs.x86_64
libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.3.x86_64


How reproducible:
=================
1/1


Steps to Reproduce:
===================
1. Setup geo-rep session between master and slave volume
2. for i in {create,chmod,symlink,chown,chmod,rename,create,chmod,chgrp,create,truncate,hardlink,create,chmod,chown,symlink,chgrp,create,create}; do crefi --multi -n 10 -b 10 -d 10 --max=10K --min=500 --random -T 10 -t text --fop=$i /mnt/master/ ; sleep 10 ; done

3. Add bricks to the master volume and the slave volume , and start rebalance while geo-rep is syncing file to the slave.
4. Wait for all files to sync to the slave.
5. Check the arequal-checksum which currently matches.


Actual results:
===============
Cores seen on slave

Expected results:
=================
There should be no cores 



Additional info:
================
No functionality impact

Comment 13 Susant Kumar Palai 2019-02-20 06:07:14 UTC

Update:

 Could reproduce today on fuse. The crash happened at md-cache. Here is the bt.
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.26-28.fc27.x86_64 openssl-libs-1.1.0h-3.fc27.x86_64 sssd-client-1.16.2-4.fc27.x86_64
(gdb) bt
#0  0x00007fb76d2ba660 in raise () from /lib64/libc.so.6
#1  0x00007fb76d2bbc41 in abort () from /lib64/libc.so.6
#2  0x00007fb76d2b2f7a in __assert_fail_base () from /lib64/libc.so.6
#3  0x00007fb76d2b2ff2 in __assert_fail () from /lib64/libc.so.6
#4  0x00007fb76ecfce71 in __inode_unref (inode=0x7fb75400c4f8, clear=_gf_false) at inode.c:585
#5  0x00007fb76ecfd108 in inode_unref (inode=0x7fb75400c4f8) at inode.c:670
#6  0x00007fb76ece8855 in loc_wipe (loc=0x7fb750009720) at xlator.c:777
#7  0x00007fb76073481c in mdc_local_wipe (this=0x7fb75c017fc0, local=0x7fb750009720) at md-cache.c:310
#8  0x00007fb7607375e6 in mdc_lookup_cbk (frame=0x7fb7500062b8, cookie=0x7fb750001538, this=0x7fb75c017fc0, op_ret=0, op_errno=0, inode=0x7fb748001ed8, stbuf=0x7fb75c0533d0, dict=0x7fb754016278, 
    postparent=0x7fb75c053670) at md-cache.c:1248
#9  0x00007fb760b5e36a in qr_lookup_cbk (frame=0x7fb750001538, cookie=0x7fb750001838, this=0x7fb75c015230, op_ret=0, op_errno=0, inode_ret=0x7fb748001ed8, buf=0x7fb75c0533d0, xdata=0x7fb754016278, 
    postparent=0x7fb75c053670) at quick-read.c:449
#10 0x00007fb760d69da1 in ioc_lookup_cbk (frame=0x7fb750001838, cookie=0x7fb75000f548, this=0x7fb75c013c00, op_ret=0, op_errno=0, inode=0x7fb748001ed8, stbuf=0x7fb75c0533d0, xdata=0x7fb754016278, 
    postparent=0x7fb75c053670) at io-cache.c:267
#11 0x00007fb76119c0ab in wb_lookup_cbk (frame=0x7fb75000f548, cookie=0x7fb750001f18, this=0x7fb75c010f40, op_ret=0, op_errno=0, inode=0x7fb748001ed8, buf=0x7fb75c0533d0, xdata=0x7fb754016278, 
    postparent=0x7fb75c053670) at write-behind.c:2390
#12 0x00007fb7613b7a10 in dht_heal_full_path_done (op_ret=0, heal_frame=0x7fb75c052248, data=0x7fb75c052248) at dht-helper.c:2113
#13 0x00007fb76ed34c41 in synctask_wrap () at syncop.c:377
#14 0x00007fb76d2cfd00 in ?? () from /lib64/libc.so.6
#15 0x0000000000000000 in ?? ()

The same issue of inode_unref. The extra unref that happened in dht_heal_path lead to this crash. After applying the patch https://review.gluster.org/#/c/glusterfs/+/21998/, it is resolved.

Comment 26 errata-xmlrpc 2019-03-27 03:43:40 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0658

Comment 27 Nithya Balachandran 2019-05-27 05:18:14 UTC

*** Bug 1712871 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.