Bug 1262249 - Fuse mount crashes with quick read enabled
Summary: Fuse mount crashes with quick read enabled
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: quick-read
Version: 3.7.4
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-09-11 09:19 UTC by Francesco Tribioli
Modified: 2017-03-08 10:52 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2017-03-08 10:52:04 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
volume profile (38.82 KB, text/plain)
2015-09-14 14:59 UTC, Francesco Tribioli
no flags Details

Description Francesco Tribioli 2015-09-11 09:19:40 UTC
Description of problem:
glusterfs crashes and the volume must be remounted


Version-Release number of selected component (if applicable):
glusterfs-fuse-3.7.4-2.el7.x86_64


How reproducible:
It happens randomly but quite frequently under medium load. 


Steps to Reproduce:
1. Create a tow server replicated volume with 3 bricks for each server
2. Mount the volume with FUSE
3. Set performance.quick-read on

Actual results:
Fuse mount process crashes on the server under load. Still works in the other server

Expected results:
glusterfs should not crash

Additional info:

(gdb) bt
#0  0x00007f44586825f6 in __memcpy_ssse3_back () from /lib64/libc.so.6
#1  0x00007f4447563bc4 in memcpy (__len=<optimized out>,
    __src=<optimized out>, __dest=<optimized out>)
    at /usr/include/bits/string3.h:51
#2  qr_content_extract (xdata=xdata@entry=0x7f445a163774) at quick-read.c:278
#3  0x00007f4447563f94 in qr_lookup_cbk (frame=0x7f44579942c4,
    cookie=<optimized out>, this=0x7f4448016320, op_ret=0, op_errno=117,
    inode_ret=0x7f4444afd434, buf=0x7f444c0628f0, xdata=0x7f445a163774,
    postparent=0x7f444c062b20) at quick-read.c:422
#4  0x00007f444777095c in ioc_lookup_cbk (frame=0x7f44579a1dcc,
    cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>,
    op_errno=<optimized out>, inode=0x7f4444afd434, stbuf=0x7f444c0628f0,
    xdata=0x7f445a163774, postparent=0x7f444c062b20) at io-cache.c:260
#5  0x00007f4447dc4f7f in dht_discover_complete (
    this=this@entry=0x7f4448011220,
    discover_frame=discover_frame@entry=0x7f44579906f8) at dht-common.c:304
#6  0x00007f4447dc563a in dht_discover_cbk (frame=0x7f44579906f8,
    cookie=0x7f4457990fb4, this=0x7f4448011220, op_ret=<optimized out>,
    op_errno=117, inode=0x7f4444afd434, stbuf=0x7f4439b0c198,
    xattr=0x7f445a163774, postparent=0x7f4439b0c208) at dht-common.c:439
#7  0x00007f444c1a2bb7 in afr_discover_done (this=<optimized out>,
    frame=0x7f4457990fb4) at afr-common.c:2114
#8  afr_discover_cbk (frame=0x7f4457990fb4, cookie=<optimized out>,
    this=<optimized out>, op_ret=<optimized out>, op_errno=<optimized out>,
    inode=<optimized out>, buf=0x7f444ce08930, xdata=0x7f445a162e28,
    postparent=0x7f444ce089a0) at afr-common.c:2149
#9  0x00007f444c3f1437 in client3_3_lookup_cbk (req=<optimized out>,
    iov=<optimized out>, count=<optimized out>, myframe=0x7f4457993e10)
    at client-rpc-fops.c:2978
#10 0x00007f4459c4eb10 in rpc_clnt_handle_reply (
    clnt=clnt@entry=0x7f44480fd310, pollin=pollin@entry=0x7f4448a51fd0)
    at rpc-clnt.c:766
#11 0x00007f4459c4edcf in rpc_clnt_notify (trans=<optimized out>,
    mydata=0x7f44480fd340, event=<optimized out>, data=0x7f4448a51fd0)
    at rpc-clnt.c:907
#12 0x00007f4459c4a903 in rpc_transport_notify (
    this=this@entry=0x7f444810d010,
    event=event@entry=RPC_TRANSPORT_MSG_RECEIVED,
    data=data@entry=0x7f4448a51fd0) at rpc-transport.c:544
#13 0x00007f444e8eb506 in socket_event_poll_in (this=this@entry=0x7f444810d010)
    at socket.c:2236
#14 0x00007f444e8ee3f4 in socket_event_handler (fd=fd@entry=17,
    idx=idx@entry=6, data=0x7f444810d010, poll_in=1, poll_out=0, poll_err=0)
    at socket.c:2349
#15 0x00007f4459ee17ba in event_dispatch_epoll_handler (event=0x7f444ce08e80,
    event_pool=0x7f445abf2330) at event-epoll.c:575
#16 event_dispatch_epoll_worker (data=0x7f445ac3aeb0) at event-epoll.c:678
#17 0x00007f4458ce8df5 in start_thread () from /lib64/libpthread.so.0
#18 0x00007f445862f1ad in clone () from /lib64/libc.so.6

----

Volume Name: home_gfs
Type: Distributed-Replicate
Volume ID: fa5aa52a-8105-47f1-b1d6-f10db8a11330
Status: Started
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: castore:/glusterfs/home_gfs/brick1
Brick2: polluce:/glusterfs/home_gfs/brick1
Brick3: castore:/glusterfs/home_gfs/brick2
Brick4: polluce:/glusterfs/home_gfs/brick2
Brick5: castore:/glusterfs/home_gfs/brick3
Brick6: polluce:/glusterfs/home_gfs/brick3
Options Reconfigured:
performance.quick-read: on
nfs.ports-insecure: on
diagnostics.client-log-level: ERROR
diagnostics.brick-log-level: ERROR
cluster.self-heal-daemon: enable
nfs.disable: on
server.allow-insecure: on
client.bind-insecure: on
network.ping-timeout: 5

It is mounted this way:

castore:/home_gfs on /export/home/public type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

Comment 1 Sakshi 2015-09-14 11:50:55 UTC
Could you provide more information on the type of load when the crash happens, basically what fops were running, how much files/directories were created. Also could you upload the core file.

Comment 2 Francesco Tribioli 2015-09-14 14:17:13 UTC
The server is running our postfix+dovecot server so under load but not extremely heavy, just many read and write of small files. I will attach a few minutes of profile and the core dump

Comment 3 Francesco Tribioli 2015-09-14 14:59:53 UTC
Created attachment 1073289 [details]
volume profile

Comment 4 Francesco Tribioli 2015-09-14 15:05:17 UTC
Unfortunately the core dump is 300MB and the limit seems to be 20MB. There is another way I can use to attach it?

Comment 5 Sakshi 2015-09-15 07:43:00 UTC
You could compress and attach it. Is it a VM, can I access it to get the core file?
Also what is the error signal with which the process crashed?

Comment 6 Sakshi 2015-09-15 12:07:35 UTC
Please upload the logs from the server where the crash happened

Comment 7 Francesco Tribioli 2015-09-15 12:49:14 UTC
This refers to one of the crashes

pending frames:
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(1) op(READDIRP)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash:
2015-09-10 14:17:24
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.7.4
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb2)[0x7fb8f13dbf82]
/lib64/libglusterfs.so.0(gf_print_trace+0x31d)[0x7fb8f13f840d]
/lib64/libc.so.6(+0x35650)[0x7fb8efaca650]
/lib64/libc.so.6(+0x1495f6)[0x7fb8efbde5f6]
/usr/lib64/glusterfs/3.7.4/xlator/performance/quick-read.so(qr_content_extract+0x44)[0x7fb8e2c1fbc4]
/usr/lib64/glusterfs/3.7.4/xlator/performance/quick-read.so(qr_lookup_cbk+0x154)[0x7fb8e2c1ff94]
/usr/lib64/glusterfs/3.7.4/xlator/performance/io-cache.so(ioc_lookup_cbk+0x36c)[0x7fb8e2e2c95c]
/usr/lib64/glusterfs/3.7.4/xlator/cluster/distribute.so(dht_discover_complete+0x17f)[0x7fb8e3480f7f]
/usr/lib64/glusterfs/3.7.4/xlator/cluster/distribute.so(dht_discover_cbk+0x29a)[0x7fb8e348163a]
/usr/lib64/glusterfs/3.7.4/xlator/cluster/replicate.so(afr_discover_cbk+0x3a7)[0x7fb8e36febb7]
/usr/lib64/glusterfs/3.7.4/xlator/protocol/client.so(client3_3_lookup_cbk+0x707)[0x7fb8e394d437]
/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7fb8f11aab10]
/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1bf)[0x7fb8f11aadcf]
/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fb8f11a6903]
/usr/lib64/glusterfs/3.7.4/rpc-transport/socket.so(+0x9506)[0x7fb8e5e47506]
/usr/lib64/glusterfs/3.7.4/rpc-transport/socket.so(+0xc3f4)[0x7fb8e5e4a3f4]
/lib64/libglusterfs.so.0(+0x877ba)[0x7fb8f143d7ba]
/lib64/libpthread.so.0(+0x7df5)[0x7fb8f0244df5]
/lib64/libc.so.6(clone+0x6d)[0x7fb8efb8b1ad]
---------

Comment 9 Kaushal 2017-03-08 10:52:04 UTC
This bug is getting closed because GlusteFS-3.7 has reached its end-of-life.

Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS.
If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.


Note You need to log in before you can comment on or make changes to this bug.