Bug 1371228 - Client side crash seen when writing to an 8x2 vol hosted on Sandisk IF150.
Summary: Client side crash seen when writing to an 8x2 vol hosted on Sandisk IF150.
Keywords:
Status: CLOSED DUPLICATE of bug 1305406
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: core
Version: rhgs-3.1
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Vijay Bellur
QA Contact: Sachin
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-29 16:23 UTC by Ben Turner
Modified: 2016-11-18 04:08 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-09-19 14:08:59 UTC
Embargoed:


Attachments (Terms of Use)

Description Ben Turner 2016-08-29 16:23:38 UTC
Description of problem:

I have two servers and 16 Sandisk IF disks, 8 disks zoned to each server.  The volume is setup as a 8x2:

[root@rhs-srv-09 ~]# gluster v info
 
Volume Name: testvol
Type: Distributed-Replicate
Volume ID: 6a703fe5-f294-407d-8926-2a3999bfa369
Status: Started
Number of Bricks: 8 x 2 = 16
Transport-type: tcp
Bricks:
Brick1: rhs-srv-09-priv.ceph-dev.lab.eng.rdu2.redhat.com:/brick1/gfsbrick
Brick2: rhs-srv-10-priv.ceph-dev.lab.eng.rdu2.redhat.com:/brick1/gfsbrick
Brick3: rhs-srv-09-priv.ceph-dev.lab.eng.rdu2.redhat.com:/brick2/gfsbrick
Brick4: rhs-srv-10-priv.ceph-dev.lab.eng.rdu2.redhat.com:/brick2/gfsbrick
Brick5: rhs-srv-09-priv.ceph-dev.lab.eng.rdu2.redhat.com:/brick3/gfsbrick
Brick6: rhs-srv-10-priv.ceph-dev.lab.eng.rdu2.redhat.com:/brick3/gfsbrick
Brick7: rhs-srv-09-priv.ceph-dev.lab.eng.rdu2.redhat.com:/brick4/gfsbrick
Brick8: rhs-srv-10-priv.ceph-dev.lab.eng.rdu2.redhat.com:/brick4/gfsbrick
Brick9: rhs-srv-09-priv.ceph-dev.lab.eng.rdu2.redhat.com:/brick5/gfsbrick
Brick10: rhs-srv-10-priv.ceph-dev.lab.eng.rdu2.redhat.com:/brick5/gfsbrick
Brick11: rhs-srv-09-priv.ceph-dev.lab.eng.rdu2.redhat.com:/brick6/gfsbrick
Brick12: rhs-srv-10-priv.ceph-dev.lab.eng.rdu2.redhat.com:/brick6/gfsbrick
Brick13: rhs-srv-09-priv.ceph-dev.lab.eng.rdu2.redhat.com:/brick7/gfsbrick
Brick14: rhs-srv-10-priv.ceph-dev.lab.eng.rdu2.redhat.com:/brick7/gfsbrick
Brick15: rhs-srv-09-priv.ceph-dev.lab.eng.rdu2.redhat.com:/brick8/gfsbrick
Brick16: rhs-srv-10-priv.ceph-dev.lab.eng.rdu2.redhat.com:/brick8/gfsbrick

I also have 4 clients accessing the volume running performance tests.  During the write perf tests, when I write with a record size of 64k and under, I see a crash on one or more clients.

Version-Release number of selected component (if applicable):

glusterfs-3.7.9-10.el7rhgs.x86_64

How reproducible:

The smaller the record size the more reproducible it is.

Steps to Reproduce:
1.  Do sequential / random writes with 4k record size.
2.
3.

Actual results:

Client side crash.

Expected results:

Normal operation.

Additional info:

Adding crash info below.

Comment 2 Ben Turner 2016-08-29 16:24:36 UTC
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: pending frames:
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(1) op(WRITE)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(1) op(WRITE)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(1) op(WRITE)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(0) op(0)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(0) op(0)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(0) op(0)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(0) op(0)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(0) op(0)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(0) op(0)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(0) op(0)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(0) op(0)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(0) op(0)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(0) op(0)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(0) op(0)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(0) op(0)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(0) op(0)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(0) op(0)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(0) op(0)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(0) op(0)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(0) op(0)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(0) op(0)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(0) op(0)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(0) op(0)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(0) op(0)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(0) op(0)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(0) op(0)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(0) op(0)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(0) op(0)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(0) op(0)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(0) op(0)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(0) op(0)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(0) op(0)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(0) op(0)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(0) op(0)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(0) op(0)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(0) op(0)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(0) op(0)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(0) op(0)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(0) op(0)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: frame : type(0) op(0)
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: patchset: git://git.gluster.com/glusterfs.git
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: signal received: 6
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: time of crash:
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: 2016-08-27 20:46:16
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: configuration details:
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: argp 1
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: backtrace 1
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: dlfcn 1
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: libpthread 1
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: llistxattr 1
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: setfsid 1
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: spinlock 1
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: epoll.h 1
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: xattr.h 1
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: st_atim.tv_nsec 1
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: package-string: glusterfs 3.7.9
Aug 27 16:46:16 rhs-cli-10 gluster-mount[3458]: ---------

Comment 3 Ben Turner 2016-08-29 16:52:12 UTC
(gdb) bt
#0  0x00007f06fbfd15f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f06fbfd2ce8 in __GI_abort () at abort.c:90
#2  0x00007f06fc011317 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7f06fc11a988 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:196
#3  0x00007f06fc018fe1 in malloc_printerr (ar_ptr=0x7f06d0000020, ptr=<optimized out>, str=0x7f06fc1180a1 "invalid fastbin entry (free)", action=3) at malloc.c:5013
#4  _int_free (av=0x7f06d0000020, p=<optimized out>, have_lock=0) at malloc.c:3835
#5  0x00007f06eb8ce812 in dht_local_wipe (this=0x7f06ec0204a0, local=0x7f06ea51798c) at dht-helper.c:627
#6  0x00007f06eb916566 in dht_writev_cbk (frame=0x7f06fb4009f0, cookie=<optimized out>, this=<optimized out>, op_ret=16384, op_errno=0, prebuf=0x7f06e9e90b90, postbuf=0x7f06e9e90c00, xdata=0x7f06fdbbf460)
    at dht-inode-write.c:111
#7  0x00007f06ebb66836 in afr_writev_unwind (frame=0x7f06fb3fbeb0, this=<optimized out>) at afr-inode-write.c:252
#8  0x00007f06ebb66b09 in afr_writev_wind_cbk (frame=0x7f06fb3f95b4, cookie=0x1, this=0x7f06ec01f6a0, op_ret=<optimized out>, op_errno=0, prebuf=0x7f06f087a940, postbuf=0x7f06f087a9b0, xdata=0x7f06fdbbdbc4)
    at afr-inode-write.c:377
#9  0x00007f06ebddc9b9 in client3_3_writev_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f06fb4047c0) at client-rpc-fops.c:912
#10 0x00007f06fd6b1990 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f06ec165600, pollin=pollin@entry=0x7f06ec3d1290) at rpc-clnt.c:764
#11 0x00007f06fd6b1c4f in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f06ec165630, event=<optimized out>, data=0x7f06ec3d1290) at rpc-clnt.c:905
#12 0x00007f06fd6ad793 in rpc_transport_notify (this=this@entry=0x7f06ec175300, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f06ec3d1290) at rpc-transport.c:546
#13 0x00007f06f23489a4 in socket_event_poll_in (this=this@entry=0x7f06ec175300) at socket.c:2353
#14 0x00007f06f234b5e4 in socket_event_handler (fd=fd@entry=10, idx=idx@entry=1, data=0x7f06ec175300, poll_in=1, poll_out=0, poll_err=0) at socket.c:2466
#15 0x00007f06fd951c4a in event_dispatch_epoll_handler (event=0x7f06f087ae80, event_pool=0x7f06fedaa5d0) at event-epoll.c:575
#16 event_dispatch_epoll_worker (data=0x7f06fee00840) at event-epoll.c:678
#17 0x00007f06fc74bdc5 in start_thread (arg=0x7f06f087b700) at pthread_create.c:308
#18 0x00007f06fc0921cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Comment 7 Ben Turner 2016-09-19 14:08:59 UTC
Closed as DUP.

*** This bug has been marked as a duplicate of bug 1305406 ***

Comment 8 Florian Weimer 2016-09-19 14:17:09 UTC
This was fixed in bug glibc-2.17-106.el7_2.6 as bug 1313308.


Note You need to log in before you can comment on or make changes to this bug.