Bug 1488152

Summary:	gluster-blockd process crashed and core generated
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	krishnaram Karthick <kramdoss>
Component:	sharding	Assignee:	Krutika Dhananjay <kdhananj>
Status:	CLOSED ERRATA	QA Contact:	krishnaram Karthick <kramdoss>
Severity:	urgent	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.3	CC:	amukherj, asrivast, kramdoss, mchangir, pkarampu, rcyriac, rhs-bugs, sanandpa, storage-qa-internal
Target Milestone:	---
Target Release:	RHGS 3.3.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-3.8.4-44	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1488354 1488381 1488860 (view as bug list)		Environment:
Last Closed:	2017-09-21 05:06:51 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1417151, 1488354, 1488381, 1488387, 1488391, 1488860

Description krishnaram Karthick 2017-09-04 13:08:46 UTC

gluster-blockd crash was seen on CNS setup. Right now there is no clear steps on what caused this crash, but only gluster-block creation and delete were performed.

sh-4.2# gdb /usr/sbin/gluster-blockd /core.595
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-100.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/sbin/gluster-blockd...Reading symbols from /usr/sbin/gluster-blockd...(no debugging symbols found)...done.
(no debugging symbols found)...done.
[New LWP 2738]
[New LWP 2724]
[New LWP 2730]
[New LWP 2727]
[New LWP 2723]
[New LWP 2726]
[New LWP 2729]
[New LWP 2728]
[New LWP 2725]
[New LWP 595]
[New LWP 676]
[New LWP 675]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/gluster-blockd --glfs-lru-count 15 --log-level INFO'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f549eaf3c30 in pthread_mutex_lock () from /lib64/libpthread.so.0
Missing separate debuginfos, use: debuginfo-install gluster-block-0.2.1-10.el7rhgs.x86_64
(gdb) bt
#0  0x00007f549eaf3c30 in pthread_mutex_lock () from /lib64/libpthread.so.0
#1  0x00007f549e207f15 in fd_anonymous () from /lib64/libglusterfs.so.0
#2  0x00007f54869d1927 in shard_common_inode_write_do ()
   from /usr/lib64/glusterfs/3.8.4/xlator/features/shard.so
#3  0x00007f54869d1c7d in shard_common_inode_write_post_mknod_handler ()
   from /usr/lib64/glusterfs/3.8.4/xlator/features/shard.so
#4  0x00007f54869ca77f in shard_common_mknod_cbk ()
   from /usr/lib64/glusterfs/3.8.4/xlator/features/shard.so
#5  0x00007f5486c1164b in dht_newfile_cbk ()
   from /usr/lib64/glusterfs/3.8.4/xlator/cluster/distribute.so
#6  0x00007f5486e71ab1 in afr_mknod_unwind ()
   from /usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so
#7  0x00007f5486e73eeb in __afr_dir_write_cbk ()
   from /usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so
#8  0x00007f5486e7482d in afr_mknod_wind_cbk ()
   from /usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so
#9  0x00007f54870f6168 in client3_3_mknod_cbk ()
   from /usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so
#10 0x00007f549dfac840 in rpc_clnt_handle_reply () from /lib64/libgfrpc.so.0
#11 0x00007f549dfacb27 in rpc_clnt_notify () from /lib64/libgfrpc.so.0
#12 0x00007f549dfa89e3 in rpc_transport_notify () from /lib64/libgfrpc.so.0
#13 0x00007f5490be63d6 in socket_event_poll_in ()
   from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so
---Type <return> to continue, or q <return> to quit--- 
#14 0x00007f5490be897c in socket_event_handler ()
   from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so
#15 0x00007f549e23e1e6 in event_dispatch_epoll_worker ()
   from /lib64/libglusterfs.so.0
#16 0x00007f549eaf1e25 in start_thread () from /lib64/libpthread.so.0
#17 0x00007f549d8b034d in clone () from /lib64/libc.so.6


Version-Release number of selected component (if applicable):
gluster-block-0.2.1-10.el7rhgs.x86_64

How reproducible:
n/a

core file shall be attached shortly.

Comment 4 Pranith Kumar K 2017-09-05 07:51:03 UTC

Based on the core-file, the only way it can happen is because it doesn't create all the shards.
(gdb) fr 2
#2  0x00007f54869d1927 in shard_common_inode_write_do (frame=0x7f548c0dbbe0, 
    this=0x7f54800120d0) at shard.c:3883
3883	                        anon_fd = fd_anonymous (local->inode_list[i]);
(gdb) p i
$1 = 255
(gdb) p local->inode_list[i]
$2 = (inode_t *) 0x0
(gdb) p lical->inode_list[i-1]
No symbol "lical" in current context.
(gdb) p local->inode_list[i-1]
$3 = (inode_t *) 0x7f5474765440
(gdb) p local->offset
$4 = 0
(gdb) p local->num_blocks
$5 = 256

Based on this data, I went through the code and I see two races:
1) In shard_common_mknod_cbk()
local->eexist_count is incremented without frame->lock
2) In shard_common_lookup_shards_cbk()
local->create_count is incremented without frame->lock

This can lead to the counts being less than what they need to be, so mknod is done just on 255 shards instead of 256 shards.

Comment 5 Pranith Kumar K 2017-09-05 08:11:58 UTC

https://review.gluster.org/18203

Comment 6 Pranith Kumar K 2017-09-05 08:13:00 UTC

(In reply to Pranith Kumar K from comment #4)
> Based on the core-file, the only way it can happen is because it doesn't
> create all the shards.
> (gdb) fr 2
> #2  0x00007f54869d1927 in shard_common_inode_write_do (frame=0x7f548c0dbbe0, 
>     this=0x7f54800120d0) at shard.c:3883
> 3883	                        anon_fd = fd_anonymous (local->inode_list[i]);
> (gdb) p i
> $1 = 255
> (gdb) p local->inode_list[i]
> $2 = (inode_t *) 0x0
> (gdb) p lical->inode_list[i-1]
> No symbol "lical" in current context.
> (gdb) p local->inode_list[i-1]
> $3 = (inode_t *) 0x7f5474765440
> (gdb) p local->offset
> $4 = 0
> (gdb) p local->num_blocks
> $5 = 256
> 
> Based on this data, I went through the code and I see two races:
> 1) In shard_common_mknod_cbk()
> local->eexist_count is incremented without frame->lock
> 2) In shard_common_lookup_shards_cbk()
> local->create_count is incremented without frame->lock
> 
> This can lead to the counts being less than what they need to be, so mknod
> is done just on 255 shards instead of 256 shards.

The race this particular crash hit is 1) but 2) also is there based on code reading.

Comment 8 Pranith Kumar K 2017-09-05 09:28:50 UTC

(In reply to Pranith Kumar K from comment #6)
> (In reply to Pranith Kumar K from comment #4)
> > Based on the core-file, the only way it can happen is because it doesn't
> > create all the shards.
> > (gdb) fr 2
> > #2  0x00007f54869d1927 in shard_common_inode_write_do (frame=0x7f548c0dbbe0, 
> >     this=0x7f54800120d0) at shard.c:3883
> > 3883	                        anon_fd = fd_anonymous (local->inode_list[i]);
> > (gdb) p i
> > $1 = 255
> > (gdb) p local->inode_list[i]
> > $2 = (inode_t *) 0x0
> > (gdb) p lical->inode_list[i-1]
> > No symbol "lical" in current context.
> > (gdb) p local->inode_list[i-1]
> > $3 = (inode_t *) 0x7f5474765440
> > (gdb) p local->offset
> > $4 = 0
> > (gdb) p local->num_blocks
> > $5 = 256
> > 
> > Based on this data, I went through the code and I see two races:
> > 1) In shard_common_mknod_cbk()
> > local->eexist_count is incremented without frame->lock
> > 2) In shard_common_lookup_shards_cbk()
> > local->create_count is incremented without frame->lock
> > 
> > This can lead to the counts being less than what they need to be, so mknod
> > is done just on 255 shards instead of 256 shards.
> 
> The race this particular crash hit is 1) but 2) also is there based on code
> reading.

Sorry messed up the numbers. This crash is hit because of 2) not 1)

Comment 9 Pranith Kumar K 2017-09-05 09:54:30 UTC

I am able to recreate the issue on my laptop, with the following steps:
Create more number of distribute legs to have parallelization:
1)gluster --mode=script --wignore volume create r2 replica 2 localhost.localdomain:/home/gfs/r2_0 localhost.localdomain:/home/gfs/r2_1 localhost.localdomain:/home/gfs/r2_2 localhost.localdomain:/home/gfs/r2_3 localhost.localdomain:/home/gfs/r2_4 localhost.localdomain:/home/gfs/r2_5 localhost.localdomain:/home/gfs/r2_6 localhost.localdomain:/home/gfs/r2_7 localhost.localdomain:/home/gfs/r2_8 localhost.localdomain:/home/gfs/r2_9

2) Increase event threads to 16:
gluster v set r2 client.event-threads 16

3)

15:12:05 :) ⚡ gluster v info r2
 
Volume Name: r2
Type: Distributed-Replicate
Volume ID: 64e7d1be-602f-4704-b010-cec8004d52d0
Status: Started
Snapshot Count: 0
Number of Bricks: 5 x 2 = 10
Transport-type: tcp
Bricks:
Brick1: localhost.localdomain:/home/gfs/r2_0
Brick2: localhost.localdomain:/home/gfs/r2_1
Brick3: localhost.localdomain:/home/gfs/r2_2
Brick4: localhost.localdomain:/home/gfs/r2_3
Brick5: localhost.localdomain:/home/gfs/r2_4
Brick6: localhost.localdomain:/home/gfs/r2_5
Brick7: localhost.localdomain:/home/gfs/r2_6
Brick8: localhost.localdomain:/home/gfs/r2_7
Brick9: localhost.localdomain:/home/gfs/r2_8
Brick10: localhost.localdomain:/home/gfs/r2_9
Options Reconfigured:
client.event-threads: 16
features.shard: on
transport.address-family: inet
nfs.disable: on


4) Run the following command:
root@dhcp35-190 - /mnt/r2 
15:15:06 :) ⚡ for i in {1..100}; do fallocate -l 1073741824 b && rm -f b; done
fallocate: fallocate failed: Software caused connection abort
fallocate: cannot open b: Transport endpoint is not connected
fallocate: cannot open b: Transport endpoint is not connected

Here is similar looking trace:
package-string: glusterfs 4.0dev
/usr/local/lib/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb3)[0x7f019abef451]
/usr/local/lib/libglusterfs.so.0(gf_print_trace+0x259)[0x7f019abf5709]
/usr/local/sbin/glusterfs(glusterfsd_print_trace+0x1f)[0x409f23]
/lib64/libc.so.6(+0x366f0)[0x7f01991f96f0]
/lib64/libpthread.so.0(pthread_mutex_lock+0x0)[0x7f0199a28d90]
/usr/local/lib/libglusterfs.so.0(fd_anonymous+0x43)[0x7f019ac1ef57]
/usr/local/lib/glusterfs/4.0dev/xlator/features/shard.so(+0x1f954)[0x7f018d65e954]
/usr/local/lib/glusterfs/4.0dev/xlator/features/shard.so(+0x1fb7b)[0x7f018d65eb7b]
/usr/local/lib/glusterfs/4.0dev/xlator/features/shard.so(+0x1b655)[0x7f018d65a655]
/usr/local/lib/glusterfs/4.0dev/xlator/cluster/distribute.so(+0x617ad)[0x7f018d8cf7ad]
/usr/local/lib/glusterfs/4.0dev/xlator/cluster/replicate.so(+0xd04a)[0x7f018db3c04a]
/usr/local/lib/glusterfs/4.0dev/xlator/cluster/replicate.so(+0xb38e)[0x7f018db3a38e]
/usr/local/lib/glusterfs/4.0dev/xlator/cluster/replicate.so(+0xd0e2)[0x7f018db3c0e2]
/usr/local/lib/glusterfs/4.0dev/xlator/protocol/client.so(+0x1f60d)[0x7f018ddf660d]

Ran the same with the fix and didn't see any crash

Comment 10 Pranith Kumar K 2017-09-05 10:06:59 UTC

The following logs confirm the theory:
[2017-09-05 10:05:30.485831] I [shard.c:1731:shard_common_lookup_shards_cbk] 0-r2-shard: create-count: 255
[2017-09-05 10:05:32.484851] I [shard.c:1731:shard_common_lookup_shards_cbk] 0-r2-shard: create-count: 255
[2017-09-05 10:05:34.333780] I [shard.c:1731:shard_common_lookup_shards_cbk] 0-r2-shard: create-count: 255
[2017-09-05 10:05:36.081419] I [shard.c:1731:shard_common_lookup_shards_cbk] 0-r2-shard: create-count: 254 <<----- this should be 255
pending frames:
frame : type(1) op(FALLOCATE)
frame : type(1) op(FALLOCATE)
...
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash: 
2017-09-05 10:05:36
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 4.0dev
/usr/local/lib/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb3)[0x7f1496271451]
/usr/local/lib/libglusterfs.so.0(gf_print_trace+0x259)[0x7f1496277709]
/usr/local/sbin/glusterfs(glusterfsd_print_trace+0x1f)[0x409f23]
/lib64/libc.so.6(+0x366f0)[0x7f149487b6f0]
/lib64/libpthread.so.0(pthread_mutex_lock+0x0)[0x7f14950aad90]
/usr/local/lib/libglusterfs.so.0(fd_anonymous+0x43)[0x7f14962a0f57]
/usr/local/lib/glusterfs/4.0dev/xlator/features/shard.so(+0x1f99c)[0x7f1488ce099c]
/usr/local/lib/glusterfs/4.0dev/xlator/features/shard.so(+0x1fbc3)[0x7f1488ce0bc3]
/usr/local/lib/glusterfs/4.0dev/xlator/features/shard.so(+0x1b69d)[0x7f1488cdc69d]
/usr/local/lib/glusterfs/4.0dev/xlator/cluster/distribute.so(+0x617ad)[0x7f1488f517ad]
/usr/local/lib/glusterfs/4.0dev/xlator/cluster/replicate.so(+0xd04a)[0x7f14891be04a]
/usr/local/lib/glusterfs/4.0dev/xlator/cluster/replicate.so(+0xb38e)[0x7f14891bc38e]
/usr/local/lib/glusterfs/4.0dev/xlator/cluster/replicate.so(+0xd0e2)[0x7f14891be0e2]
/usr/local/lib/glusterfs/4.0dev/xlator/protocol/client.so(+0x1f60d)[0x7f148947860d]
/usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0x1b7)[0x7f1496033cce]
/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x2bc)[0x7f149603426c]
/usr/local/lib/libgfrpc.so.0(rpc_transport_notify+0x10f)[0x7f14960302a0]
/usr/local/lib/glusterfs/4.0dev/rpc-transport/socket.so(+0xb12f)[0x7f148a55112f]
/usr/local/lib/glusterfs/4.0dev/rpc-transport/socket.so(+0xb754)[0x7f148a551754]
/usr/local/lib/libglusterfs.so.0(+0x9b7cb)[0x7f14962de7cb]
/usr/local/lib/libglusterfs.so.0(+0x9ba9f)[0x7f14962dea9f]
/lib64/libpthread.so.0(+0x736d)[0x7f14950a836d]
/lib64/libc.so.6(clone+0x3f)[0x7f1494955b8f]

Comment 11 Atin Mukherjee 2017-09-06 09:50:37 UTC

upstream master patch : https://review.gluster.org/18203
upstream 3.12 patch : https://review.gluster.org/#/c/18204/
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/117340/

Comment 14 RamaKasturi 2017-09-14 13:57:36 UTC

Followed steps given in comment 9 and did not see any crash with glusterfs build glusterfs-3.8.4-44.el7rhgs.x86_64.

Comment 17 krishnaram Karthick 2017-09-19 09:01:11 UTC

Moving the bug to verified based on comment 14 and 16.

Comment 19 errata-xmlrpc 2017-09-21 05:06:51 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774