Bug 764219 (GLUSTER-2487)

Summary: Crash at rebuild
Product: [Community] GlusterFS Reporter: Lukasz Jagiello <l.jagiello>
Component: replicateAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED NOTABUG QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: mainlineCC: gluster-bugs
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Lukasz Jagiello 2011-03-03 14:56:40 UTC
Gluster version: 3.1.3qa2 (also not working at 3.1.2).

Configuration:
volume ftp-client-0
    type protocol/client
    option remote-host 10.0.2.10
    option remote-subvolume /d0/ftp
    option transport-type tcp
end-volume

volume ftp-client-1
    type protocol/client
    option remote-host 10.0.2.11
    option remote-subvolume /d0/ftp
    option transport-type tcp
end-volume

volume ftp-replicate-0
    type cluster/replicate
    subvolumes ftp-client-0 ftp-client-1
end-volume

volume ftp-write-behind
    type performance/write-behind
    subvolumes ftp-replicate-0
end-volume

volume ftp-read-ahead
    type performance/read-ahead
    subvolumes ftp-write-behind
end-volume

volume ftp-io-cache
    type performance/io-cache
    option cache-size 256MB
    subvolumes ftp-read-ahead
end-volume

volume ftp-quick-read
    type performance/quick-read
    option cache-size 256MB
    subvolumes ftp-io-cache
end-volume

volume ftp-stat-prefetch
    type performance/stat-prefetch
    subvolumes ftp-quick-read
end-volume

volume ftp
    type debug/io-stats
    subvolumes ftp-stat-prefetch
end-volume

client-1 got empty storage, when I start gluster CPU got 100% use and after random time daemon crash.

# gdb /opt/glusterfs/3.1.3/sbin/glusterfsd /core.9001 
GNU gdb Fedora (6.8-27.el5)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
(no debugging symbols found)

warning: exec file is newer than core file.
Reading symbols from /opt/glusterfs/3.1.3/lib64/libglusterfs.so.0...done.
Loaded symbols for /opt/glusterfs/3.1.3/lib64/libglusterfs.so.0
Reading symbols from /opt/glusterfs/3.1.3/lib64/libgfrpc.so.0...done.
Loaded symbols for /opt/glusterfs/3.1.3/lib64/libgfrpc.so.0
Reading symbols from /opt/glusterfs/3.1.3/lib64/libgfxdr.so.0...done.
Loaded symbols for /opt/glusterfs/3.1.3/lib64/libgfxdr.so.0
Reading symbols from /lib64/libdl.so.2...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libpthread.so.0...done.
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libc.so.6...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/rpc-transport/socket.so...done.
Loaded symbols for /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/rpc-transport/socket.so
Reading symbols from /lib64/libnss_files.so.2...done.
Loaded symbols for /lib64/libnss_files.so.2
Reading symbols from /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/xlator/storage/posix.so...done.
Loaded symbols for /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/xlator/storage/posix.so
Reading symbols from /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/xlator/features/access-control.so...done.
Loaded symbols for /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/xlator/features/access-control.so
Reading symbols from /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/xlator/features/locks.so...done.
Loaded symbols for /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/xlator/features/locks.so
Reading symbols from /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/xlator/protocol/client.so...done.
Loaded symbols for /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/xlator/protocol/client.so
Reading symbols from /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/xlator/cluster/pump.so...done.
Loaded symbols for /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/xlator/cluster/pump.so
Reading symbols from /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/xlator/performance/io-threads.so...done.
Loaded symbols for /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/xlator/performance/io-threads.so
Reading symbols from /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/xlator/debug/io-stats.so...done.
Loaded symbols for /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/xlator/debug/io-stats.so
Reading symbols from /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/xlator/protocol/server.so...done.
Loaded symbols for /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/xlator/protocol/server.so
Reading symbols from /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/auth/addr.so...done.
Loaded symbols for /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/auth/addr.so
Core was generated by `/opt/glusterfs/3.1.3/sbin/glusterfsd --xlator-option ftp-server.listen-port=240'.
Program terminated with signal 11, Segmentation fault.
[New process 9015]
[New process 9033]
[New process 9028]
[New process 9018]
[New process 9017]
[New process 9016]
[New process 9014]
[New process 9012]
[New process 9010]
[New process 9009]
[New process 9008]
[New process 9004]
[New process 9002]
[New process 9001]
#0  0x00002aaaabe6dd10 in io_stats_setattr_cbk (frame=0x2b66cf2ff8a8, cookie=0x2b66cf281ec0, this=0x19679f70, op_ret=0, op_errno=0, preop=0x2aaab413d858, postop=0x2aaab413d8c8) at io-stats.c:572
572	io-stats.c: No such file or directory.
	in io-stats.c

backtrace:
#v+
#9022 0x00002aaaaba282bd in afr_unlock_inodelk (frame=0x2b66cf40cbf0, this=0x19677dd0) at afr-lk-common.c:625
#9023 0x00002aaaaba28d6c in afr_unlock (frame=0x2b66cf40cbf0, this=0x19677dd0) at afr-lk-common.c:1680
#9024 0x00002aaaaba11a8b in afr_changelog_post_op_cbk (frame=0x2b66cf40cbf0, cookie=<value optimized out>, this=0x19677dd0, op_ret=<value optimized out>, op_errno=<value optimized out>, xattr=<value optimized out>) at afr-transaction.c:399
#9025 0x00002b66ce304ca9 in default_xattrop_cbk (frame=0x2b66cf316df8, cookie=<value optimized out>, this=<value optimized out>, op_ret=0, op_errno=38, dict=0x2aaab0034240) at defaults.c:309
#9026 0x00002b66ce304ca9 in default_xattrop_cbk (frame=0x2b66cf30ff1c, cookie=<value optimized out>, this=<value optimized out>, op_ret=0, op_errno=38, dict=0x2aaab0034240) at defaults.c:309
#9027 0x00002aaaab196de4 in do_xattrop (frame=0x2b66cf3513c4, this=0x19673940, loc=0x2aaab00e5c38, fd=<value optimized out>, optype=GF_XATTROP_ADD_ARRAY, xattr=0x2aaab0034240) at posix.c:3694
#9028 0x00002aaaab197651 in posix_xattrop (frame=0x2b66cf2ff8a8, this=0x2b66cf281ec0, loc=0x19679f70, optype=GF_XATTROP_ADD_ARRAY, xattr=0x2aaab413d858) at posix.c:3703
#9029 0x00002b66ce2fe8e9 in default_xattrop (frame=<value optimized out>, this=0x19674b20, loc=0x2aaab00e5c38, flags=GF_XATTROP_ADD_ARRAY, dict=0x2aaab0034240) at defaults.c:1022
#9030 0x00002b66ce2fe8e9 in default_xattrop (frame=<value optimized out>, this=0x19675e60, loc=0x2aaab00e5c38, flags=GF_XATTROP_ADD_ARRAY, dict=0x2aaab0034240) at defaults.c:1022
#9031 0x00002aaaaba13807 in afr_changelog_post_op (frame=0x2b66cf40cbf0, this=0x19677dd0) at afr-transaction.c:648
#9032 0x00002aaaaba13be6 in afr_transaction_resume (frame=0x2b66cf40cbf0, this=0x19677dd0) at afr-transaction.c:1188
#9033 0x00002aaaaba0ae9c in afr_setattr_wind_cbk (frame=0x2b66cf40cbf0, cookie=0x0, this=0x19677dd0, op_ret=0, op_errno=0, preop=0x439ee390, postop=0x439ee320) at afr-inode-write.c:875
#9034 0x00002b66ce304248 in default_setattr_cbk (frame=0x2b66cf2fbf6c, cookie=<value optimized out>, this=<value optimized out>, op_ret=0, op_errno=0, statpre=0x439ee390, statpost=0x439ee320) at defaults.c:405
#9035 0x00002b66ce304248 in default_setattr_cbk (frame=0x2b66cf2f8000, cookie=<value optimized out>, this=<value optimized out>, op_ret=0, op_errno=0, statpre=0x439ee390, statpost=0x439ee320) at defaults.c:405
#9036 0x00002aaaab19100b in posix_setattr (frame=0x2b66cf35c83c, this=0x19673940, loc=<value optimized out>, stbuf=0x2aaab00e64a0, valid=48) at posix.c:709
#9037 0x00002aaaab3a4559 in ac_setattr_resume (frame=<value optimized out>, this=0x19674b20, loc=0x2aaab00e5c38, buf=0x2aaab00e64a0, valid=48) at access-control.c:1670
#9038 0x00002aaaab3a75c9 in ac_setattr (frame=0x2b66cf2f8000, this=0x19674b20, loc=0x2aaab00e5c38, buf=0x2aaab00e64a0, valid=48) at access-control.c:1751
#9039 0x00002b66ce2fd979 in default_setattr (frame=<value optimized out>, this=0x19675e60, loc=0x2aaab00e5c38, stbuf=0x2aaab00e64a0, valid=48) at defaults.c:1131
#9040 0x00002aaaaba0a17b in afr_setattr_wind (frame=0x2b66cf40cbf0, this=<value optimized out>) at afr-inode-write.c:905
#9041 0x00002aaaaba11bae in afr_changelog_pre_op_cbk (frame=0x2b66cf40cbf0, cookie=<value optimized out>, this=0x19677dd0, op_ret=<value optimized out>, op_errno=22, xattr=<value optimized out>) at afr-transaction.c:725
#9042 0x00002b66ce304ca9 in default_xattrop_cbk (frame=0x2b66cf30a144, cookie=<value optimized out>, this=<value optimized out>, op_ret=0, op_errno=22, dict=0x2aaab00399e0) at defaults.c:309
#9043 0x00002b66ce304ca9 in default_xattrop_cbk (frame=0x2b66cf30a7f8, cookie=<value optimized out>, this=<value optimized out>, op_ret=0, op_errno=22, dict=0x2aaab00399e0) at defaults.c:309
#9044 0x00002aaaab196de4 in do_xattrop (frame=0x2b66cf317d70, this=0x19673940, loc=0x2aaab00e5c38, fd=<value optimized out>, optype=GF_XATTROP_ADD_ARRAY, xattr=0x2aaab00399e0) at posix.c:3694
#9045 0x00002aaaab197651 in posix_xattrop (frame=0x2b66cf2ff8a8, this=0x2b66cf281ec0, loc=0x19679f70, optype=GF_XATTROP_ADD_ARRAY, xattr=0x2aaab413d858) at posix.c:3703
#9046 0x00002b66ce2fe8e9 in default_xattrop (frame=<value optimized out>, this=0x19674b20, loc=0x2aaab00e5c38, flags=GF_XATTROP_ADD_ARRAY, dict=0x2aaab00399e0) at defaults.c:1022
#9047 0x00002b66ce2fe8e9 in default_xattrop (frame=<value optimized out>, this=0x19675e60, loc=0x2aaab00e5c38, flags=GF_XATTROP_ADD_ARRAY, dict=0x2aaab00399e0) at defaults.c:1022
#9048 0x00002aaaaba123c4 in afr_changelog_pre_op (frame=0x2b66cf40cbf0, this=0x19677dd0) at afr-transaction.c:914
#9049 0x00002aaaaba12798 in afr_internal_lock_finish (frame=0x2b66cf40cbf0, this=0x19677dd0) at afr-transaction.c:1162
#9050 0x00002aaaaba12e0c in afr_post_blocking_inodelk_cbk (frame=0x2b66cf40cbf0, this=0x19677dd0) at afr-transaction.c:954
#9051 0x00002aaaaba2a20f in afr_lock_blocking (frame=0x2b66cf40cbf0, this=0x19677dd0, child_index=2) at afr-lk-common.c:992
---Type <return> to continue, or q <return> to quit---
#9052 0x00002aaaaba2a973 in afr_lock_cbk (frame=0x2b66cf40cbf0, cookie=<value optimized out>, this=0x19677dd0, op_ret=0, op_errno=0) at afr-lk-common.c:756
#9053 0x00002aaaaba2ab64 in afr_blocking_inodelk_cbk (frame=0x2b66cf40cbf0, cookie=0x0, this=0x19677dd0, op_ret=0, op_errno=0) at afr-lk-common.c:770
#9054 0x00002aaaab5b9c72 in pl_common_inodelk (frame=0x2b66cf2aebfc, this=0x19675e60, volume=0x19674a30 "ftp-pump", inode=0x2aaaac56bb4c, cmd=7, flock=0x439eeb50, loc=0x2aaab00e5c38, fd=0x0) at inodelk.c:649
#9055 0x00002aaaab5ba41d in pl_inodelk (frame=0x2b66cf2ff8a8, this=0x2b66cf281ec0, volume=0x19679f70 "�\230g\031", loc=<value optimized out>, cmd=0, flock=0x2aaab413d858) at inodelk.c:659
#9056 0x00002aaaaba2a657 in afr_lock_blocking (frame=0x2b66cf40cbf0, this=0x19677dd0, child_index=0) at afr-lk-common.c:1017
#9057 0x00002aaaaba2a8d5 in afr_blocking_lock (frame=0x2b66cf40cbf0, this=0x19677dd0) at afr-lk-common.c:1119
#9058 0x00002aaaaba12db2 in afr_post_nonblocking_inodelk_cbk (frame=0x2b66cf40cbf0, this=0x19677dd0) at afr-transaction.c:975
#9059 0x00002aaaaba2830e in afr_unlock_inodelk (frame=0x2b66cf40cbf0, this=0x19677dd0) at afr-lk-common.c:601
#9060 0x00002aaaaba28d6c in afr_unlock (frame=0x2b66cf40cbf0, this=0x19677dd0) at afr-lk-common.c:1680
#9061 0x00002aaaaba2956e in afr_nonblocking_inodelk_cbk (frame=0x2b66cf40cbf0, cookie=<value optimized out>, this=0x19677dd0, op_ret=<value optimized out>, op_errno=11) at afr-lk-common.c:1351
#9062 0x00002aaaab5b9c72 in pl_common_inodelk (frame=0x2b66cf30e344, this=0x19675e60, volume=0x19674a30 "ftp-pump", inode=0x2aaaac56bb4c, cmd=6, flock=0x439eeed0, loc=0x2aaab00e5c38, fd=0x0) at inodelk.c:649
#9063 0x00002aaaab5ba41d in pl_inodelk (frame=0x2b66cf2ff8a8, this=0x2b66cf281ec0, volume=0x19679f70 "�\230g\031", loc=<value optimized out>, cmd=0, flock=0x2aaab413d858) at inodelk.c:659
#9064 0x00002aaaaba292ef in afr_nonblocking_inodelk (frame=0x2b66cf40cbf0, this=0x19677dd0) at afr-lk-common.c:1440
#9065 0x00002aaaaba117b3 in afr_lock_rec (frame=0x2b66cf40cbf0, this=0x19677dd0) at afr-transaction.c:1110
#9066 0x00002aaaaba12942 in afr_transaction (frame=0x2b66cf40cbf0, this=0x19677dd0, type=AFR_METADATA_TRANSACTION) at afr-transaction.c:1237
#9067 0x00002aaaaba0dec0 in afr_setattr (frame=0x2b66cf3513c4, this=0x19677dd0, loc=0x2b66cf672030, buf=0x2b66cf672058, valid=48) at afr-inode-write.c:990
#9068 0x00002aaaaba35ec8 in pump_setattr (frame=0x2b66cf3513c4, this=0x2b66cf281ec0, loc=0x2b66cf672030, stbuf=0x2b66cf672058, valid=48) at pump.c:2216
#9069 0x00002aaaabc5d6a9 in iot_setattr_wrapper (frame=<value optimized out>, this=0x19678eb0, loc=0x2b66cf672030, stbuf=0x2b66cf672058, valid=48) at io-threads.c:260
#9070 0x00002b66ce30d6c1 in call_resume (stub=0x2b66cf671ff8) at call-stub.c:2480
#9071 0x00002aaaabc61119 in iot_worker (data=0x1967e270) at io-threads.c:129
#9072 0x0000003e7f406367 in start_thread () from /lib64/libpthread.so.0
#9073 0x0000003e7ecd309d in clone () from /lib64/libc.so.6
#v-

Comment 1 Pranith Kumar K 2011-03-07 03:43:54 UTC
hi Lukasz,
    Could you please post the output of "volume info ftp". You mentioned that client-1 does not have any data. Could you let us know how much data does the client-0 has so that we can try to reproduce this in-house. Is the mount type fuse or nfs?. Please let us know if you have issued any commands before you observed the crash.

Pranith

Comment 2 Pranith Kumar K 2011-03-07 04:41:37 UTC
hi,
   I see that client translator and server translator both are loaded in the same glusterfs that crashed. I am not sure if you have attached the correct volfile to the bug. According to the backtrace replicate, posix locks, io-threads are all loaded on the same glusterfs. In some cases this results in deep recursion using up all the stack space. Please make sure that the glusterfs is using correct volfiles for client and bricks.

pranith

(In reply to comment #0)
> Gluster version: 3.1.3qa2 (also not working at 3.1.2).
> 
> Configuration:
> volume ftp-client-0
>     type protocol/client
>     option remote-host 10.0.2.10
>     option remote-subvolume /d0/ftp
>     option transport-type tcp
> end-volume
> 
> volume ftp-client-1
>     type protocol/client
>     option remote-host 10.0.2.11
>     option remote-subvolume /d0/ftp
>     option transport-type tcp
> end-volume
> 
> volume ftp-replicate-0
>     type cluster/replicate
>     subvolumes ftp-client-0 ftp-client-1
> end-volume
> 
> volume ftp-write-behind
>     type performance/write-behind
>     subvolumes ftp-replicate-0
> end-volume
> 
> volume ftp-read-ahead
>     type performance/read-ahead
>     subvolumes ftp-write-behind
> end-volume
> 
> volume ftp-io-cache
>     type performance/io-cache
>     option cache-size 256MB
>     subvolumes ftp-read-ahead
> end-volume
> 
> volume ftp-quick-read
>     type performance/quick-read
>     option cache-size 256MB
>     subvolumes ftp-io-cache
> end-volume
> 
> volume ftp-stat-prefetch
>     type performance/stat-prefetch
>     subvolumes ftp-quick-read
> end-volume
> 
> volume ftp
>     type debug/io-stats
>     subvolumes ftp-stat-prefetch
> end-volume
> 
> client-1 got empty storage, when I start gluster CPU got 100% use and after
> random time daemon crash.
> 
> # gdb /opt/glusterfs/3.1.3/sbin/glusterfsd /core.9001 
> GNU gdb Fedora (6.8-27.el5)
> Copyright (C) 2008 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu"...
> (no debugging symbols found)
> 
> warning: exec file is newer than core file.
> Reading symbols from /opt/glusterfs/3.1.3/lib64/libglusterfs.so.0...done.
> Loaded symbols for /opt/glusterfs/3.1.3/lib64/libglusterfs.so.0
> Reading symbols from /opt/glusterfs/3.1.3/lib64/libgfrpc.so.0...done.
> Loaded symbols for /opt/glusterfs/3.1.3/lib64/libgfrpc.so.0
> Reading symbols from /opt/glusterfs/3.1.3/lib64/libgfxdr.so.0...done.
> Loaded symbols for /opt/glusterfs/3.1.3/lib64/libgfxdr.so.0
> Reading symbols from /lib64/libdl.so.2...done.
> Loaded symbols for /lib64/libdl.so.2
> Reading symbols from /lib64/libpthread.so.0...done.
> Loaded symbols for /lib64/libpthread.so.0
> Reading symbols from /lib64/libc.so.6...done.
> Loaded symbols for /lib64/libc.so.6
> Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> Reading symbols from
> /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/rpc-transport/socket.so...done.
> Loaded symbols for
> /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/rpc-transport/socket.so
> Reading symbols from /lib64/libnss_files.so.2...done.
> Loaded symbols for /lib64/libnss_files.so.2
> Reading symbols from
> /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/xlator/storage/posix.so...done.
> Loaded symbols for
> /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/xlator/storage/posix.so
> Reading symbols from
> /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/xlator/features/access-control.so...done.
> Loaded symbols for
> /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/xlator/features/access-control.so
> Reading symbols from
> /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/xlator/features/locks.so...done.
> Loaded symbols for
> /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/xlator/features/locks.so
> Reading symbols from
> /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/xlator/protocol/client.so...done.
> Loaded symbols for
> /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/xlator/protocol/client.so
> Reading symbols from
> /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/xlator/cluster/pump.so...done.
> Loaded symbols for
> /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/xlator/cluster/pump.so
> Reading symbols from
> /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/xlator/performance/io-threads.so...done.
> Loaded symbols for
> /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/xlator/performance/io-threads.so
> Reading symbols from
> /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/xlator/debug/io-stats.so...done.
> Loaded symbols for
> /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/xlator/debug/io-stats.so
> Reading symbols from
> /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/xlator/protocol/server.so...done.
> Loaded symbols for
> /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/xlator/protocol/server.so
> Reading symbols from
> /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/auth/addr.so...done.
> Loaded symbols for /opt/glusterfs/3.1.3/lib64/glusterfs/3.1.3qa2/auth/addr.so
> Core was generated by `/opt/glusterfs/3.1.3/sbin/glusterfsd --xlator-option
> ftp-server.listen-port=240'.
> Program terminated with signal 11, Segmentation fault.
> [New process 9015]
> [New process 9033]
> [New process 9028]
> [New process 9018]
> [New process 9017]
> [New process 9016]
> [New process 9014]
> [New process 9012]
> [New process 9010]
> [New process 9009]
> [New process 9008]
> [New process 9004]
> [New process 9002]
> [New process 9001]
> #0  0x00002aaaabe6dd10 in io_stats_setattr_cbk (frame=0x2b66cf2ff8a8,
> cookie=0x2b66cf281ec0, this=0x19679f70, op_ret=0, op_errno=0,
> preop=0x2aaab413d858, postop=0x2aaab413d8c8) at io-stats.c:572
> 572    io-stats.c: No such file or directory.
>     in io-stats.c
> 
> backtrace:
> #v+
> #9022 0x00002aaaaba282bd in afr_unlock_inodelk (frame=0x2b66cf40cbf0,
> this=0x19677dd0) at afr-lk-common.c:625
> #9023 0x00002aaaaba28d6c in afr_unlock (frame=0x2b66cf40cbf0, this=0x19677dd0)
> at afr-lk-common.c:1680
> #9024 0x00002aaaaba11a8b in afr_changelog_post_op_cbk (frame=0x2b66cf40cbf0,
> cookie=<value optimized out>, this=0x19677dd0, op_ret=<value optimized out>,
> op_errno=<value optimized out>, xattr=<value optimized out>) at
> afr-transaction.c:399
> #9025 0x00002b66ce304ca9 in default_xattrop_cbk (frame=0x2b66cf316df8,
> cookie=<value optimized out>, this=<value optimized out>, op_ret=0,
> op_errno=38, dict=0x2aaab0034240) at defaults.c:309
> #9026 0x00002b66ce304ca9 in default_xattrop_cbk (frame=0x2b66cf30ff1c,
> cookie=<value optimized out>, this=<value optimized out>, op_ret=0,
> op_errno=38, dict=0x2aaab0034240) at defaults.c:309
> #9027 0x00002aaaab196de4 in do_xattrop (frame=0x2b66cf3513c4, this=0x19673940,
> loc=0x2aaab00e5c38, fd=<value optimized out>, optype=GF_XATTROP_ADD_ARRAY,
> xattr=0x2aaab0034240) at posix.c:3694
> #9028 0x00002aaaab197651 in posix_xattrop (frame=0x2b66cf2ff8a8,
> this=0x2b66cf281ec0, loc=0x19679f70, optype=GF_XATTROP_ADD_ARRAY,
> xattr=0x2aaab413d858) at posix.c:3703
> #9029 0x00002b66ce2fe8e9 in default_xattrop (frame=<value optimized out>,
> this=0x19674b20, loc=0x2aaab00e5c38, flags=GF_XATTROP_ADD_ARRAY,
> dict=0x2aaab0034240) at defaults.c:1022
> #9030 0x00002b66ce2fe8e9 in default_xattrop (frame=<value optimized out>,
> this=0x19675e60, loc=0x2aaab00e5c38, flags=GF_XATTROP_ADD_ARRAY,
> dict=0x2aaab0034240) at defaults.c:1022
> #9031 0x00002aaaaba13807 in afr_changelog_post_op (frame=0x2b66cf40cbf0,
> this=0x19677dd0) at afr-transaction.c:648
> #9032 0x00002aaaaba13be6 in afr_transaction_resume (frame=0x2b66cf40cbf0,
> this=0x19677dd0) at afr-transaction.c:1188
> #9033 0x00002aaaaba0ae9c in afr_setattr_wind_cbk (frame=0x2b66cf40cbf0,
> cookie=0x0, this=0x19677dd0, op_ret=0, op_errno=0, preop=0x439ee390,
> postop=0x439ee320) at afr-inode-write.c:875
> #9034 0x00002b66ce304248 in default_setattr_cbk (frame=0x2b66cf2fbf6c,
> cookie=<value optimized out>, this=<value optimized out>, op_ret=0, op_errno=0,
> statpre=0x439ee390, statpost=0x439ee320) at defaults.c:405
> #9035 0x00002b66ce304248 in default_setattr_cbk (frame=0x2b66cf2f8000,
> cookie=<value optimized out>, this=<value optimized out>, op_ret=0, op_errno=0,
> statpre=0x439ee390, statpost=0x439ee320) at defaults.c:405
> #9036 0x00002aaaab19100b in posix_setattr (frame=0x2b66cf35c83c,
> this=0x19673940, loc=<value optimized out>, stbuf=0x2aaab00e64a0, valid=48) at
> posix.c:709
> #9037 0x00002aaaab3a4559 in ac_setattr_resume (frame=<value optimized out>,
> this=0x19674b20, loc=0x2aaab00e5c38, buf=0x2aaab00e64a0, valid=48) at
> access-control.c:1670
> #9038 0x00002aaaab3a75c9 in ac_setattr (frame=0x2b66cf2f8000, this=0x19674b20,
> loc=0x2aaab00e5c38, buf=0x2aaab00e64a0, valid=48) at access-control.c:1751
> #9039 0x00002b66ce2fd979 in default_setattr (frame=<value optimized out>,
> this=0x19675e60, loc=0x2aaab00e5c38, stbuf=0x2aaab00e64a0, valid=48) at
> defaults.c:1131
> #9040 0x00002aaaaba0a17b in afr_setattr_wind (frame=0x2b66cf40cbf0, this=<value
> optimized out>) at afr-inode-write.c:905
> #9041 0x00002aaaaba11bae in afr_changelog_pre_op_cbk (frame=0x2b66cf40cbf0,
> cookie=<value optimized out>, this=0x19677dd0, op_ret=<value optimized out>,
> op_errno=22, xattr=<value optimized out>) at afr-transaction.c:725
> #9042 0x00002b66ce304ca9 in default_xattrop_cbk (frame=0x2b66cf30a144,
> cookie=<value optimized out>, this=<value optimized out>, op_ret=0,
> op_errno=22, dict=0x2aaab00399e0) at defaults.c:309
> #9043 0x00002b66ce304ca9 in default_xattrop_cbk (frame=0x2b66cf30a7f8,
> cookie=<value optimized out>, this=<value optimized out>, op_ret=0,
> op_errno=22, dict=0x2aaab00399e0) at defaults.c:309
> #9044 0x00002aaaab196de4 in do_xattrop (frame=0x2b66cf317d70, this=0x19673940,
> loc=0x2aaab00e5c38, fd=<value optimized out>, optype=GF_XATTROP_ADD_ARRAY,
> xattr=0x2aaab00399e0) at posix.c:3694
> #9045 0x00002aaaab197651 in posix_xattrop (frame=0x2b66cf2ff8a8,
> this=0x2b66cf281ec0, loc=0x19679f70, optype=GF_XATTROP_ADD_ARRAY,
> xattr=0x2aaab413d858) at posix.c:3703
> #9046 0x00002b66ce2fe8e9 in default_xattrop (frame=<value optimized out>,
> this=0x19674b20, loc=0x2aaab00e5c38, flags=GF_XATTROP_ADD_ARRAY,
> dict=0x2aaab00399e0) at defaults.c:1022
> #9047 0x00002b66ce2fe8e9 in default_xattrop (frame=<value optimized out>,
> this=0x19675e60, loc=0x2aaab00e5c38, flags=GF_XATTROP_ADD_ARRAY,
> dict=0x2aaab00399e0) at defaults.c:1022
> #9048 0x00002aaaaba123c4 in afr_changelog_pre_op (frame=0x2b66cf40cbf0,
> this=0x19677dd0) at afr-transaction.c:914
> #9049 0x00002aaaaba12798 in afr_internal_lock_finish (frame=0x2b66cf40cbf0,
> this=0x19677dd0) at afr-transaction.c:1162
> #9050 0x00002aaaaba12e0c in afr_post_blocking_inodelk_cbk
> (frame=0x2b66cf40cbf0, this=0x19677dd0) at afr-transaction.c:954
> #9051 0x00002aaaaba2a20f in afr_lock_blocking (frame=0x2b66cf40cbf0,
> this=0x19677dd0, child_index=2) at afr-lk-common.c:992
> ---Type <return> to continue, or q <return> to quit---
> #9052 0x00002aaaaba2a973 in afr_lock_cbk (frame=0x2b66cf40cbf0, cookie=<value
> optimized out>, this=0x19677dd0, op_ret=0, op_errno=0) at afr-lk-common.c:756
> #9053 0x00002aaaaba2ab64 in afr_blocking_inodelk_cbk (frame=0x2b66cf40cbf0,
> cookie=0x0, this=0x19677dd0, op_ret=0, op_errno=0) at afr-lk-common.c:770
> #9054 0x00002aaaab5b9c72 in pl_common_inodelk (frame=0x2b66cf2aebfc,
> this=0x19675e60, volume=0x19674a30 "ftp-pump", inode=0x2aaaac56bb4c, cmd=7,
> flock=0x439eeb50, loc=0x2aaab00e5c38, fd=0x0) at inodelk.c:649
> #9055 0x00002aaaab5ba41d in pl_inodelk (frame=0x2b66cf2ff8a8,
> this=0x2b66cf281ec0, volume=0x19679f70 "�\230g\031", loc=<value optimized out>,
> cmd=0, flock=0x2aaab413d858) at inodelk.c:659
> #9056 0x00002aaaaba2a657 in afr_lock_blocking (frame=0x2b66cf40cbf0,
> this=0x19677dd0, child_index=0) at afr-lk-common.c:1017
> #9057 0x00002aaaaba2a8d5 in afr_blocking_lock (frame=0x2b66cf40cbf0,
> this=0x19677dd0) at afr-lk-common.c:1119
> #9058 0x00002aaaaba12db2 in afr_post_nonblocking_inodelk_cbk
> (frame=0x2b66cf40cbf0, this=0x19677dd0) at afr-transaction.c:975
> #9059 0x00002aaaaba2830e in afr_unlock_inodelk (frame=0x2b66cf40cbf0,
> this=0x19677dd0) at afr-lk-common.c:601
> #9060 0x00002aaaaba28d6c in afr_unlock (frame=0x2b66cf40cbf0, this=0x19677dd0)
> at afr-lk-common.c:1680
> #9061 0x00002aaaaba2956e in afr_nonblocking_inodelk_cbk (frame=0x2b66cf40cbf0,
> cookie=<value optimized out>, this=0x19677dd0, op_ret=<value optimized out>,
> op_errno=11) at afr-lk-common.c:1351
> #9062 0x00002aaaab5b9c72 in pl_common_inodelk (frame=0x2b66cf30e344,
> this=0x19675e60, volume=0x19674a30 "ftp-pump", inode=0x2aaaac56bb4c, cmd=6,
> flock=0x439eeed0, loc=0x2aaab00e5c38, fd=0x0) at inodelk.c:649
> #9063 0x00002aaaab5ba41d in pl_inodelk (frame=0x2b66cf2ff8a8,
> this=0x2b66cf281ec0, volume=0x19679f70 "�\230g\031", loc=<value optimized out>,
> cmd=0, flock=0x2aaab413d858) at inodelk.c:659
> #9064 0x00002aaaaba292ef in afr_nonblocking_inodelk (frame=0x2b66cf40cbf0,
> this=0x19677dd0) at afr-lk-common.c:1440
> #9065 0x00002aaaaba117b3 in afr_lock_rec (frame=0x2b66cf40cbf0,
> this=0x19677dd0) at afr-transaction.c:1110
> #9066 0x00002aaaaba12942 in afr_transaction (frame=0x2b66cf40cbf0,
> this=0x19677dd0, type=AFR_METADATA_TRANSACTION) at afr-transaction.c:1237
> #9067 0x00002aaaaba0dec0 in afr_setattr (frame=0x2b66cf3513c4, this=0x19677dd0,
> loc=0x2b66cf672030, buf=0x2b66cf672058, valid=48) at afr-inode-write.c:990
> #9068 0x00002aaaaba35ec8 in pump_setattr (frame=0x2b66cf3513c4,
> this=0x2b66cf281ec0, loc=0x2b66cf672030, stbuf=0x2b66cf672058, valid=48) at
> pump.c:2216
> #9069 0x00002aaaabc5d6a9 in iot_setattr_wrapper (frame=<value optimized out>,
> this=0x19678eb0, loc=0x2b66cf672030, stbuf=0x2b66cf672058, valid=48) at
> io-threads.c:260
> #9070 0x00002b66ce30d6c1 in call_resume (stub=0x2b66cf671ff8) at
> call-stub.c:2480
> #9071 0x00002aaaabc61119 in iot_worker (data=0x1967e270) at io-threads.c:129
> #9072 0x0000003e7f406367 in start_thread () from /lib64/libpthread.so.0
> #9073 0x0000003e7ecd309d in clone () from /lib64/libc.so.6
> #v-

Comment 3 Lukasz Jagiello 2011-03-07 07:54:32 UTC
(In reply to comment #1)
> hi Lukasz,
>     Could you please post the output of "volume info ftp". 

#v+
# gluster
gluster> volume info ftp

Volume Name: ftp
Type: Replicate
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: 10.0.2.10:/d0/ftp
Brick2: 10.0.2.11:/d0/ftp
Options Reconfigured:
performance.cache-size: 256MB
nfs.enable-ino32: off
gluster> 
#v-

> You mentioned that
> client-1 does not have any data. Could you let us know how much data does the
> client-0 has so that we can try to reproduce this in-house. Is the mount type
> fuse or nfs?.

Data ~ 1.7TB (a lot different size files)
mount over fuse.

> Please let us know if you have issued any commands before you
> observed the crash.

Nope, just /etc/init.d/glusterd start

Wait few min (sometimes more sometimes less) AFR working and crash.

Comment 4 Lukasz Jagiello 2011-03-07 08:38:05 UTC
(In reply to comment #2)
> hi,
>    I see that client translator and server translator both are loaded in the
> same glusterfs that crashed. I am not sure if you have attached the correct
> volfile to the bug. According to the backtrace replicate, posix locks,
> io-threads are all loaded on the same glusterfs. In some cases this results in
> deep recursion using up all the stack space. Please make sure that the
> glusterfs is using correct volfiles for client and bricks.
> 
> pranith

It seems to me that you're right. I noticed some old entries in the volfile after 'replace-brick', which I tried to start replacing the machine. Atm 34min working and no crash, hope will be good.

Comment 5 Pranith Kumar K 2011-03-07 09:33:59 UTC
(In reply to comment #4)
> (In reply to comment #2)
> > hi,
> >    I see that client translator and server translator both are loaded in the
> > same glusterfs that crashed. I am not sure if you have attached the correct
> > volfile to the bug. According to the backtrace replicate, posix locks,
> > io-threads are all loaded on the same glusterfs. In some cases this results in
> > deep recursion using up all the stack space. Please make sure that the
> > glusterfs is using correct volfiles for client and bricks.
> > 
> > pranith
> 
> It seems to me that you're right. I noticed some old entries in the volfile
> after 'replace-brick', which I tried to start replacing the machine. Atm 34min
> working and no crash, hope will be good.

It happened because the set up is not in a correct state. Would it be ok if I mark this bug as RESOLVED - INVALID since there isn't much to fix in code.

Comment 6 Lukasz Jagiello 2011-03-07 09:39:14 UTC
> > >    I see that client translator and server translator both are loaded in the
> > > same glusterfs that crashed. I am not sure if you have attached the correct
> > > volfile to the bug. According to the backtrace replicate, posix locks,
> > > io-threads are all loaded on the same glusterfs. In some cases this results in
> > > deep recursion using up all the stack space. Please make sure that the
> > > glusterfs is using correct volfiles for client and bricks.
> > > 
> > > pranith
> > 
> > It seems to me that you're right. I noticed some old entries in the volfile
> > after 'replace-brick', which I tried to start replacing the machine. Atm 34min
> > working and no crash, hope will be good.
> 
> It happened because the set up is not in a correct state. Would it be ok if I
> mark this bug as RESOLVED - INVALID since there isn't much to fix in code.

Sure. Still syncing but looks good, if something happend i will reopen case.