Bug 762787 (GLUSTER-1055)

Summary: Everyday crashes on some glusterfs clients
Product: [Community] GlusterFS Reporter: Thomas Séglard <tsg>
Component: replicateAssignee: Pavan Vilas Sondur <pavan>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: low    
Version: 3.0.4CC: gluster-bugs, vijay
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Vijay Bellur 2010-07-07 07:42:38 UTC
This looks to be a duplicate of bug 762600. Bug 868 has been fixed in the 3.0.5 release candidates. Can you please check if your crash is fixed by 3.0.5rc9 or by applying this patch on to GlusterFS 3.0.4?

http://patches.gluster.com/patch/3213

Comment 1 Thomas Séglard 2010-07-07 10:35:21 UTC
Replicated setup. 2 servers and 6 clients. On 4 clients everything is ok on 2 other clients I see multiple crashs during the day. On these 2 "bad" clients we are using a script which does many reads in one particular directory (mounted with glusterfs). This script is in PHP and uses threads.
I post below the (same) configuration file used on all clients and full gdb backtrace.

** Servers
[...]
#################
# CONF BK2010
#################
volume posix4
  type storage/posix
  option directory /data/exports/export_conf2010
end-volume

volume locks4
    type features/locks
    subvolumes posix4
end-volume

volume brick4
    type performance/io-threads
    option thread-count 8
    subvolumes locks4
end-volume
[...]


** Clients

## file auto generated by /bin/glusterfs-volgen (mount.vol)
# Cmd line:
# $ /bin/glusterfs-volgen --name bkconf2010 --raid 1 10.110.0.41:/data/exports/export_conf2010 10.110.0.4
2:/data/exports/export_conf2010

# RAID 1
# TRANSPORT-TYPE tcp
volume 10.110.0.42-4
    type protocol/client
    option transport-type tcp
    option remote-host 10.110.0.42
    option transport.socket.nodelay on
    option transport.remote-port 6996
    option remote-subvolume brick4
end-volume

volume 10.110.0.41-4
    type protocol/client
    option transport-type tcp
    option remote-host 10.110.0.41
    option transport.socket.nodelay on
    option transport.remote-port 6996
    option remote-subvolume brick4
end-volume

volume mirror-4
    type cluster/replicate
    subvolumes 10.110.0.41-4 10.110.0.42-4
end-volume

volume readahead4
    type performance/read-ahead
    option page-count 4
    subvolumes mirror-4
end-volume

volume iocache4
    type performance/io-cache
    option cache-size `echo $(( $(grep 'MemTotal' /proc/meminfo | sed 's/[^0-9]//g') / 5120 ))`MB
    option cache-timeout 1
    subvolumes readahead4
end-volume

volume quickread4
    type performance/quick-read
    option cache-timeout 1
    option max-file-size 64kB
    subvolumes iocache4
end-volume

volume writebehind4
    type performance/write-behind
    option enable-trickling-writes false
    option flush-behind on
    option cache-size 4MB
    subvolumes quickread4
end-volume

volume statprefetch4
    type performance/stat-prefetch
    subvolumes writebehind4
end-volume


** Backtrace

shell# gdb /sbin/glusterfs /core.17091
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu"...

warning: Can't read pathname for load map: Input/output error.
Reading symbols from /lib/libglusterfs.so.0...done.
Loaded symbols for /lib/libglusterfs.so.0
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libpthread.so.0...done.
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib/glusterfs/3.0.4/xlator/protocol/client.so...done.
Loaded symbols for /lib/glusterfs/3.0.4/xlator/protocol/client.so
Reading symbols from /lib/glusterfs/3.0.4/xlator/cluster/replicate.so...done.
Loaded symbols for /lib/glusterfs/3.0.4/xlator/cluster/replicate.so
Reading symbols from /lib/glusterfs/3.0.4/xlator/performance/read-ahead.so...done.
Loaded symbols for /lib/glusterfs/3.0.4/xlator/performance/read-ahead.so
Reading symbols from /lib/glusterfs/3.0.4/xlator/performance/io-cache.so...done.
Loaded symbols for /lib/glusterfs/3.0.4/xlator/performance/io-cache.so
Reading symbols from /lib/glusterfs/3.0.4/xlator/performance/quick-read.so...done.
Loaded symbols for /lib/glusterfs/3.0.4/xlator/performance/quick-read.so
Reading symbols from /lib/glusterfs/3.0.4/xlator/performance/write-behind.so...done.
Loaded symbols for /lib/glusterfs/3.0.4/xlator/performance/write-behind.so
Reading symbols from /lib/glusterfs/3.0.4/xlator/performance/stat-prefetch.so...done.
Loaded symbols for /lib/glusterfs/3.0.4/xlator/performance/stat-prefetch.so
Reading symbols from /lib/glusterfs/3.0.4/xlator/mount/fuse.so...done.
Loaded symbols for /lib/glusterfs/3.0.4/xlator/mount/fuse.so
Reading symbols from /lib/glusterfs/3.0.4/transport/socket.so...done.
Loaded symbols for /lib/glusterfs/3.0.4/transport/socket.so
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /lib/libgcc_s.so.1...done.
Loaded symbols for /lib/libgcc_s.so.1
Core was generated by `/sbin/glusterfs --log-level=NORMAL --volfile=/etc/glusterfs/bkconf2010.vol /dat'.
Program terminated with signal 11, Segmentation fault.
[New process 17097]
[New process 17091]
[New process 17092]
#0  0x00007f40d3fea114 in ra_fstat (frame=0x7f40c404f5b0, this=0x2534bd0, fd=0x25eaca0)
    at read-ahead.c:750
750     read-ahead.c: No such file or directory.
        in read-ahead.c

(gdb) thread apply all bt full

Thread 3 (process 17092):
#0  0x00007f40d4f12071 in nanosleep () from /lib/libc.so.6
No symbol table info available.
#1  0x00007f40d4f3d534 in usleep () from /lib/libc.so.6
No symbol table info available.
#2  0x00007f40d5610652 in gf_timer_proc (ctx=0x252d010) at timer.c:177
        now = 1278460080777811
        now_tv = {tv_sec = 1278460080, tv_usec = 777811}
        event = (gf_timer_t *) 0x253c550
        reg = (gf_timer_registry_t *) 0x25380a0
        __FUNCTION__ = "gf_timer_proc"
#3  0x00007f40d51cdfc7 in start_thread () from /lib/libpthread.so.0
No symbol table info available.
#4  0x00007f40d4f4364d in clone () from /lib/libc.so.6
No symbol table info available.
#5  0x0000000000000000 in ?? ()
No symbol table info available.

Thread 2 (process 17091):
#0  0x00007f40d51d2645 in ?? () from /lib/libpthread.so.0
No symbol table info available.
#1  0x00007f40d3dda73c in ioc_open_cbk (frame=0x7f40c404cf40, cookie=0x7f40c404bd80, this=0x25354b0, 
    op_ret=0, op_errno=117, fd=0x25903c0) at io-cache.c:474
        tmp_ioc_inode = 0
        local = (ioc_local_t *) 0x7f40c404bcb0
        table = (ioc_table_t *) 0x2538730
        ioc_inode = (ioc_inode_t *) 0x0
        inode = (inode_t *) 0x7f40cc000bb0
        weight = 4294967295
        path = 0x7f40c401a9c0 "/extMapping.php"
        __FUNCTION__ = "ioc_open_cbk"
#2  0x00007f40d3fe7454 in ra_open_cbk (frame=0x7f40c404bd80, cookie=0x7f40c404bde0, this=0x2534bd0, 
    op_ret=0, op_errno=117, fd=0x25903c0) at read-ahead.c:116
        fn = (fop_open_cbk_t) 0x7f40d3dda698 <ioc_open_cbk>
        _parent = (call_frame_t *) 0x7f40c404cf40
        old_THIS = (xlator_t *) 0x2534bd0
        conf = (ra_conf_t *) 0x2538810
        file = (ra_file_t *) 0x7f40c404c400
        ret = 0
        wbflags = 0
        __FUNCTION__ = "ra_open_cbk"
#3  0x00007f40d421a79a in afr_open_cbk (frame=0x7f40c404bde0, cookie=0x0, this=0x2534350, op_ret=0, 
    op_errno=0, fd=0x25903c0) at afr-open.c:140
        fn = (fop_open_cbk_t) 0x7f40d3fe7150 <ra_open_cbk>
        _parent = (call_frame_t *) 0x7f40c404bd80
        old_THIS = (xlator_t *) 0x2534350
        __local = (afr_local_t *) 0x7f40c404d040
        __this = (xlator_t *) 0x2534350
        local = (afr_local_t *) 0x7f40c404d040
        priv = (afr_private_t *) 0x25389c0
        child_index = 0
        ctx = 139916143515392
        fd_ctx = (afr_fd_ctx_t *) 0x7f40c4085700
        ret = 0
        call_count = 0
        __FUNCTION__ = "afr_open_cbk"
#4  0x00007f40d4462f49 in client_open_cbk (frame=0x7f40c401aab0, hdr=0x7f40c404c760, hdrlen=116, 
    iobuf=0x0) at client-protocol.c:4090
        fn = (ret_fn_t) 0x7f40d421a2f5 <afr_open_cbk>
        _parent = (call_frame_t *) 0x7f40c404bde0
        old_THIS = (xlator_t *) 0x2533dd0
        op_ret = 0
        op_errno = 0
        fd = (fd_t *) 0x25903c0
        remote_fd = 7
        rsp = (gf_fop_open_rsp_t *) 0x7f40c404c7cc
        local = (client_local_t *) 0x7f40c404b850
        conf = (client_conf_t *) 0x2538b80
        fdctx = (client_fd_ctx_t *) 0x7f40c404c7e0
        ino = 36135004
        gen = 5480397927761314920
#5  0x00007f40d446ac19 in protocol_client_interpret (this=0x2533dd0, trans=0x2539700, 
    hdr_p=0x7f40c404c760 "", hdrlen=116, iobuf=0x0) at client-protocol.c:6529
        ret = -1
        frame = (call_frame_t *) 0x7f40c401aab0
        hdr = (gf_hdr_common_t *) 0x7f40c404c760
        callid = 292901
        type = 4
        op = 10
        conn = (client_connection_t *) 0x2539b60
        __FUNCTION__ = "protocol_client_interpret"
#6  0x00007f40d446b8df in protocol_client_pollin (this=0x2533dd0, trans=0x2539700)
    at client-protocol.c:6827
        conf = (client_conf_t *) 0x2538b80
        ret = 0
        iobuf = (struct iobuf *) 0x0
        hdr = 0x7f40c404c760 ""
        hdrlen = 116
#7  0x00007f40d446bf53 in notify (this=0x2533dd0, event=2, data=0x2539700) at client-protocol.c:6946
        i = 0
        ret = -1
        child_down = 1
        was_not_down = 0
        trans = (transport_t *) 0x2539700
        conn = (client_connection_t *) 0x0
        conf = (client_conf_t *) 0x2538b80
        parent = (xlator_list_t *) 0x0
        __FUNCTION__ = "notify"
#8  0x00007f40d56015dc in xlator_notify (xl=0x2533dd0, event=2, data=0x2539700) at xlator.c:924
        old_THIS = (xlator_t *) 0x7f40d58417c0
        ret = 0
#9  0x00007f40d337f1ef in socket_event_poll_in (this=0x2539700) at socket.c:731
        ret = 0
#10 0x00007f40d337f4e9 in socket_event_handler (fd=9, idx=1, data=0x2539700, poll_in=1, poll_out=0, 
    poll_err=0) at socket.c:831
        this = (transport_t *) 0x2539700
        priv = (socket_private_t *) 0x2539a90
        ret = 0
#11 0x00007f40d5626203 in event_dispatch_epoll_handler (event_pool=0x252e900, events=0x253b790, i=0)
    at event.c:804
        event_data = (struct event_data *) 0x253b794
        handler = (event_handler_t) 0x7f40d337f432 <socket_event_handler>
        data = (void *) 0x2539700
        idx = 1
        ret = -1
        __FUNCTION__ = "event_dispatch_epoll_handler"
#12 0x00007f40d56263d2 in event_dispatch_epoll (event_pool=0x252e900) at event.c:867
        events = (struct epoll_event *) 0x253b790
        size = 1
        i = 0
        ret = 1
        __FUNCTION__ = "event_dispatch_epoll"
#13 0x00007f40d56266e3 in event_dispatch (event_pool=0x252e900) at event.c:975
        ret = -1
        __FUNCTION__ = "event_dispatch"
#14 0x00000000004063d3 in main (argc=4, argv=0x7fffdda5e748) at glusterfsd.c:1425
        ctx = (glusterfs_ctx_t *) 0x252d010
        cmd_args = (cmd_args_t *) 0x252d010
        pool = (call_pool_t *) 0x252e9a0
        stbuf = {st_dev = 139916439052352, st_ino = 15773923, st_nlink = 4294967487, 
  st_mode = 3584355008, st_uid = 32576, st_gid = 3584410456, pad0 = 32576, st_rdev = 140736912025280, 
  st_size = 139916439044096, st_blksize = 4200312, st_blocks = 0, st_atim = {tv_sec = 4429057797, 
    tv_nsec = 705412764499181569}, st_mtim = {tv_sec = 139916438987952, tv_nsec = 140736912025416}, 
  st_ctim = {tv_sec = 1970169159, tv_nsec = 11}, __unused = {32768, 4, 140736912025416}}
        tmp_logfile = '\0' <repeats 1023 times>
        tmp_logfile_dyn = 0x0
        tmp_logfilebase = 0x0
        timestr = '\0' <repeats 255 times>
        utime = 1278420147
        tm = (struct tm *) 0x0
        ret = 63
        lim = {rlim_cur = 18446744073709551615, rlim_max = 18446744073709551615}
        specfp = (FILE *) 0x252ec60
        graph = (xlator_t *) 0x252eec0
        trav = (xlator_t *) 0x0
        fuse_volume_found = 0
        xl_count = 8
        process_mode = 1 '\001'
        pipe_fd = {6, 7}
        gf_success = 0
        gf_failure = -1
        __FUNCTION__ = "main"

Thread 1 (process 17097):
#0  0x00007f40d3fea114 in ra_fstat (frame=0x7f40c404f5b0, this=0x2534bd0, fd=0x25eaca0)
    at read-ahead.c:750
        file = (ra_file_t *) 0x7f40c404c400
        iter_fd = (fd_t *) 0x25903c0
        inode = (inode_t *) 0x7f40cc000bb0
        ret = 0
        tmp_file = 139916143281152
#1  0x00007f40d5606c1c in default_fstat (frame=0x7f40c404f550, this=0x25354b0, fd=0x25eaca0)
    at defaults.c:673
        _new = (call_frame_t *) 0x7f40c404f5b0
        old_THIS = (xlator_t *) 0x25354b0
        tmp_cbk = (fop_fstat_cbk_t) 0x7f40d56069c6 <default_fstat_cbk>
#2  0x00007f40d3bceca5 in qr_fstat (frame=0x7f40c404f4f0, this=0x2535e50, fd=0x25eaca0)
    at quick-read.c:1184
        _new = (call_frame_t *) 0x7f40c404f550
        old_THIS = (xlator_t *) 0x2535e50
        tmp_cbk = (fop_fstat_cbk_t) 0x7f40d3bce64f <qr_fstat_cbk>
        qr_fd_ctx = (qr_fd_ctx_t *) 0x253f600
        need_open = 0 '\0'
        can_wind = 1 '\001'
        need_unwind = 0 '\0'
        value = 39056896
        ret = 0
        op_ret = -1
        op_errno = -1
        stub = (call_stub_t *) 0x0
        loc = {path = 0x0, name = 0x0, ino = 0, inode = 0x0, parent = 0x0}
        path = 0x254f0e0 "/extMapping.php"
        flags = 32768
#3  0x00007f40d39be44d in wb_fstat_helper (frame=0x7f40c404c3a0, this=0x25367f0, fd=0x25eaca0)
    at write-behind.c:698
        _new = (call_frame_t *) 0x7f40c404f4f0
        old_THIS = (xlator_t *) 0x25367f0
        tmp_cbk = (fop_fstat_cbk_t) 0x7f40d39be170 <wb_fstat_cbk>
#4  0x00007f40d561bacd in call_resume_wind (stub=0x7f40c404f170) at call-stub.c:2607
        __FUNCTION__ = "call_resume_wind"
#5  0x00007f40d562180a in call_resume (stub=0x7f40c404f170) at call-stub.c:4304
        old_THIS = (xlator_t *) 0x25367f0
        __FUNCTION__ = "call_resume"
#6  0x00007f40d39c15ce in wb_resume_other_requests (frame=0x7f40c404c3a0, file=0x254f1c0, 
    other_requests=0x42656e20) at write-behind.c:1621
        ret = 0
        request = (wb_request_t *) 0x7f40c404f480
        dummy = (wb_request_t *) 0x42656df0
        fops_removed = 0
        wind = 1 '\001'
        stub = (call_stub_t *) 0x7f40c404f170
#7  0x00007f40d39c1678 in wb_do_ops (frame=0x7f40c404c3a0, file=0x254f1c0, winds=0x42656e40, 
    unwinds=0x42656e30, other_requests=0x42656e20) at write-behind.c:1649
        ret = 0
#8  0x00007f40d39c1cef in wb_process_queue (frame=0x7f40c404c3a0, file=0x254f1c0, flush_all=1 '\001')
    at write-behind.c:1817
        winds = {next = 0x42656e40, prev = 0x42656e40}
        unwinds = {next = 0x42656e30, prev = 0x42656e30}
        other_requests = {next = 0x7f40c404f4b0, prev = 0x7f40c404f4b0}
        size = 131072
        conf = (wb_conf_t *) 0x25384d0
        count = 1
        ret = -1
#9  0x00007f40d39be7b7 in wb_fstat (frame=0x7f40c404c3a0, this=0x25367f0, fd=0x25eaca0)
    at write-behind.c:755
        file = (wb_file_t *) 0x254f1c0
        local = (wb_local_t *) 0x7f40c404c570
        tmp_file = 39121344
        stub = (call_stub_t *) 0x7f40c404f170
        request = (wb_request_t *) 0x7f40c404f480
        ret = -1
        op_errno = 22
        __FUNCTION__ = "wb_fstat"
#10 0x00007f40d5606c1c in default_fstat (frame=0x7f40c404c9f0, this=0x2537270, fd=0x25eaca0)
    at defaults.c:673
        _new = (call_frame_t *) 0x7f40c404c3a0
        old_THIS = (xlator_t *) 0x2537270
        tmp_cbk = (fop_fstat_cbk_t) 0x7f40d56069c6 <default_fstat_cbk>
#11 0x00007f40d358cf34 in fuse_getattr (this=0x252eec0, finh=0x7f40c404cd40, msg=0x7f40c404cd68)
    at fuse-bridge.c:812
        _new = (call_frame_t *) 0x7f40c404c9f0
        old_THIS = (xlator_t *) 0x252eec0
        tmp_cbk = (fop_fstat_cbk_t) 0x7f40d358c057 <fuse_attr_cbk>
        frame = (call_frame_t *) 0x7f40c404c988
        xl = (xlator_t *) 0x2537270
        state = (fuse_state_t *) 0x7f40c404c830
        fd = (fd_t *) 0x25eaca0
        ret = 0
        __FUNCTION__ = "fuse_getattr"
#12 0x00007f40d3598cb9 in fuse_thread_proc (data=0x252eec0) at fuse-bridge.c:3190
        mount_point = 0x0
        this = (xlator_t *) 0x252eec0
        priv = (fuse_private_t *) 0x2538280
        res = 56
        iobuf = (struct iobuf *) 0x252e8c8
        finh = (fuse_in_header_t *) 0x7f40c404cd40
        iov_in = {{iov_base = 0x7f40c404cd40, iov_len = 80}, {iov_base = 0x7f40d4e54000, 
    iov_len = 131072}}
        msg = (void *) 0x7f40c404cd68
        msg0_size = 168
        ret = 0
        now = {tv_sec = 1278420147, tv_usec = 623243}
        timeout = {tv_sec = 1278420148, tv_nsec = 623243000}
        __FUNCTION__ = "fuse_thread_proc"
#13 0x00007f40d51cdfc7 in start_thread () from /lib/libpthread.so.0
No symbol table info available.
#14 0x00007f40d4f4364d in clone () from /lib/libc.so.6
No symbol table info available.
#15 0x0000000000000000 in ?? ()
No symbol table info available.

Comment 2 Thomas Séglard 2010-07-09 12:15:34 UTC
(En réponse au commentaire 1)
> This looks to be a duplicate of bug 762600. Bug 868 has been fixed in the 3.0.5
> release candidates. Can you please check if your crash is fixed by 3.0.5rc9 or
> by applying this patch on to GlusterFS 3.0.4?
> 
> http://patches.gluster.com/patch/3213

I applied your patch on glusterfs 3.0.4 and now everything is perfect. I set this ticket has "solved". Thanks !