Bug 810450

Summary: glusterfs process crashed while running parallel dbench on multiple clients
Product: [Community] GlusterFS Reporter: Vijaykumar Koppad <vkoppad>
Component: fuseAssignee: shishir gowda <sgowda>
Status: CLOSED CURRENTRELEASE QA Contact: Vijaykumar Koppad <vkoppad>
Severity: high Docs Contact:
Priority: unspecified    
Version: mainlineCC: bbandari, gluster-bugs, nsathyan, shmohan, vbellur, vbhat
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.4.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-24 17:37:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 817967    
Attachments:
Description Flags
Client log file none

Description Vijaykumar Koppad 2012-04-06 08:12:12 UTC
Created attachment 575653 [details]
Client log file

Description of problem: 
While running dbench on multiple clients simultaneously, in a geo-replication setup with distribute-stripe as master. 

Version-Release number of selected component (if applicable):
[cc2e9ad0751da55dfdcd86fea2d5b312a1cbd1b5]

Steps to Reproduce:
1.setup ssh-geo-replication with distributed-stripe as master. 
2.Run dbench on some 10 clients simultaneously.

This is back-trace -- 
############################################################################


#0  0x00007f6583b1299b in stripe_writev (frame=0x7f658729ade4, this=0x16199d0, fd=0x16e4db4, vector=0x7f657401c240, count=1, offset=131072, flags=32770, iobref=0x7f65740330c0, xdata=0x0) at stripe.c:3508
#1  0x00007f65838d7469 in dht_writev (frame=0x7f6587291a14, this=0x161a590, fd=0x16e4db4, vector=0x7f657401c240, count=1, off=131072, flags=32770, iobref=0x7f65740330c0, xdata=0x0) at dht-inode-write.c:158
#2  0x00007f6583687251 in wb_sync (frame=0x7f6587093bf4, file=0x17b1420, winds=0x7f657bffe630) at write-behind.c:547
#3  0x00007f658368d8bd in wb_do_ops (frame=0x7f6587093bf4, file=0x17b1420, winds=0x7f657bffe630, unwinds=0x7f657bffe620, other_requests=0x7f657bffe610) at write-behind.c:1884
#4  0x00007f658368e131 in wb_process_queue (frame=0x7f6587093bf4, file=0x17b1420) at write-behind.c:2074
#5  0x00007f658368ebca in wb_writev (frame=0x7f658729a5d4, this=0x161b870, fd=0x16e4db4, vector=0x7f657400f6f0, count=1, offset=131072, flags=32770, iobref=0x7f657400fd90, xdata=0x0) at write-behind.c:2197
#6  0x00007f658347bea1 in ra_writev (frame=0x7f658729512c, this=0x161caa0, fd=0x16e4db4, vector=0x7f657400f6f0, count=1, offset=131072, flags=32770, iobref=0x7f657400fd90, xdata=0x0) at read-ahead.c:691
#7  0x00007f658326a258 in ioc_writev (frame=0x7f6587296758, this=0x161dc50, fd=0x16e4db4, vector=0x7f657400f6f0, count=1, offset=131072, flags=32770, iobref=0x7f657400fd90, xdata=0x0) at io-cache.c:1250
#8  0x00007f658304e884 in qr_writev (frame=0x7f6587298238, this=0x161ee00, fd=0x16e4db4, vector=0x7f657400f6f0, count=1, off=131072, wr_flags=32770, iobref=0x7f657400fd90, xdata=0x0) at quick-read.c:1544
#9  0x00007f6582e3e3d2 in mdc_writev (frame=0x7f658729ade4, this=0x1620010, fd=0x16e4db4, vector=0x7f657400f6f0, count=1, offset=131072, flags=32770, iobref=0x7f657400fd90, xdata=0x0) at md-cache.c:1342
#10 0x00007f6582c2c666 in io_stats_writev (frame=0x7f6587291a14, this=0x1621290, fd=0x16e4db4, vector=0x7f657400f6f0, count=1, offset=131072, flags=32770, iobref=0x7f657400fd90, xdata=0x0) at io-stats.c:2082
#11 0x00007f6586bb0b96 in fuse_write_resume (state=0x7f657400eff0) at fuse-bridge.c:2042
#12 0x00007f6586ba4164 in fuse_resolve_done (state=0x7f657400eff0) at fuse-resolve.c:453
#13 0x00007f6586ba423a in fuse_resolve_all (state=0x7f657400eff0) at fuse-resolve.c:482
#14 0x00007f6586ba412d in fuse_resolve (state=0x7f657400eff0) at fuse-resolve.c:439
#15 0x00007f6586ba4211 in fuse_resolve_all (state=0x7f657400eff0) at fuse-resolve.c:478
#16 0x00007f6586ba42b4 in fuse_resolve_continue (state=0x7f657400eff0) at fuse-resolve.c:498
#17 0x00007f6586ba3ef0 in fuse_resolve_fd (state=0x7f657400eff0) at fuse-resolve.c:351
#18 0x00007f6586ba40db in fuse_resolve (state=0x7f657400eff0) at fuse-resolve.c:428
#19 0x00007f6586ba41bc in fuse_resolve_all (state=0x7f657400eff0) at fuse-resolve.c:471
#20 0x00007f6586ba42f2 in fuse_resolve_and_resume (state=0x7f657400eff0, fn=0x7f6586bb05e5 <fuse_write_resume>) at fuse-resolve.c:511
#21 0x00007f6586bb0db3 in fuse_write (this=0x1601ad0, finh=0x7f6574002ce0, msg=0x7f658787c000) at fuse-bridge.c:2089
#22 0x00007f6586bba6e8 in fuse_thread_proc (data=0x1601ad0) at fuse-bridge.c:3962
#23 0x0000003259c077f1 in start_thread () from /lib64/libpthread.so.0
#24 0x00000032594e5ccd in clone () from /lib64/libc.so.6
(gdb) f 0 
#0  0x00007f6583b1299b in stripe_writev (frame=0x7f658729ade4, this=0x16199d0, fd=0x16e4db4, vector=0x7f657401c240, count=1, offset=131072, flags=32770, iobref=0x7f65740330c0, xdata=0x0) at stripe.c:3508
3508	                STACK_WIND (frame, stripe_writev_cbk, fctx->xl_array[idx],
(gdb) f 1
#1  0x00007f65838d7469 in dht_writev (frame=0x7f6587291a14, this=0x161a590, fd=0x16e4db4, vector=0x7f657401c240, count=1, off=131072, flags=32770, iobref=0x7f65740330c0, xdata=0x0) at dht-inode-write.c:158
158	        STACK_WIND (frame, dht_writev_cbk,
(gdb) f 3 
#3  0x00007f658368d8bd in wb_do_ops (frame=0x7f6587093bf4, file=0x17b1420, winds=0x7f657bffe630, unwinds=0x7f657bffe620, other_requests=0x7f657bffe610) at write-behind.c:1884
1884	        ret = wb_sync (frame, file, winds);
(gdb) f 2 
#2  0x00007f6583687251 in wb_sync (frame=0x7f6587093bf4, file=0x17b1420, winds=0x7f657bffe630) at write-behind.c:547
547	                        STACK_WIND (sync_frame, wb_sync_cbk,
#######################################################
Back-trace from the log .
#######################################################

[2012-04-05 04:10:25.870674] I [rpc-clnt.c:1669:rpc_clnt_reconfig] 0-doa-client-0: changing port to 24009 (from 0)
[2012-04-05 04:10:25.870873] I [rpc-clnt.c:1669:rpc_clnt_reconfig] 0-doa-client-1: changing port to 24010 (from 0)
[2012-04-05 04:10:25.871021] I [rpc-clnt.c:1669:rpc_clnt_reconfig] 0-doa-client-2: changing port to 24011 (from 0)
[2012-04-05 04:10:25.871178] I [rpc-clnt.c:1669:rpc_clnt_reconfig] 0-doa-client-3: changing port to 24012 (from 0)
[2012-04-05 04:10:25.871332] I [client.c:136:client_register_grace_timer] 0-doa-client-0: Registering a grace timer
[2012-04-05 04:10:25.871389] I [client.c:136:client_register_grace_timer] 0-doa-client-1: Registering a grace timer
[2012-04-05 04:10:25.871428] I [client.c:136:client_register_grace_timer] 0-doa-client-2: Registering a grace timer
[2012-04-05 04:10:25.871465] I [client.c:136:client_register_grace_timer] 0-doa-client-3: Registering a grace timer
[2012-04-05 04:10:29.833480] W [client.c:2078:client_rpc_notify] 0-doa-client-0: Cancelling the grace timer
[2012-04-05 04:10:29.833684] I [client-handshake.c:1632:select_server_supported_programs] 0-doa-client-0: Using Program GlusterFS 3git, Num (1298437), Version (330)
[2012-04-05 04:10:29.834137] I [client-handshake.c:1429:client_setvolume_cbk] 0-doa-client-0: Connected to 172.17.251.54:24009, attached to remote volume '/exportdir/d1'.
[2012-04-05 04:10:29.834170] I [client-handshake.c:1441:client_setvolume_cbk] 0-doa-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2012-04-05 04:10:29.892033] I [client-handshake.c:456:client_set_lk_version_cbk] 0-doa-client-0: Server lk version = 1
[2012-04-05 04:10:29.892582] W [client.c:2078:client_rpc_notify] 0-doa-client-1: Cancelling the grace timer
[2012-04-05 04:10:29.892777] I [client-handshake.c:1632:select_server_supported_programs] 0-doa-client-1: Using Program GlusterFS 3git, Num (1298437), Version (330)
[2012-04-05 04:10:29.893162] I [client-handshake.c:1429:client_setvolume_cbk] 0-doa-client-1: Connected to 172.17.251.54:24010, attached to remote volume '/exportdir/d2'.
[2012-04-05 04:10:29.893195] I [client-handshake.c:1441:client_setvolume_cbk] 0-doa-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2012-04-05 04:10:29.893858] I [client-handshake.c:456:client_set_lk_version_cbk] 0-doa-client-1: Server lk version = 1
[2012-04-05 04:10:29.899012] W [client.c:2078:client_rpc_notify] 0-doa-client-2: Cancelling the grace timer
[2012-04-05 04:10:29.899178] I [client-handshake.c:1632:select_server_supported_programs] 0-doa-client-2: Using Program GlusterFS 3git, Num (1298437), Version (330)
[2012-04-05 04:10:29.899548] I [client-handshake.c:1429:client_setvolume_cbk] 0-doa-client-2: Connected to 172.17.251.54:24011, attached to remote volume '/exportdir/d3'.
[2012-04-05 04:10:29.899591] I [client-handshake.c:1441:client_setvolume_cbk] 0-doa-client-2: Server and Client lk-version numbers are not same, reopening the fds
[2012-04-05 04:10:29.899868] I [client-handshake.c:456:client_set_lk_version_cbk] 0-doa-client-2: Server lk version = 1
[2012-04-05 04:10:29.904640] W [client.c:2078:client_rpc_notify] 0-doa-client-3: Cancelling the grace timer
[2012-04-05 04:10:29.904841] I [client-handshake.c:1632:select_server_supported_programs] 0-doa-client-3: Using Program GlusterFS 3git, Num (1298437), Version (330)
[2012-04-05 04:10:29.905198] I [client-handshake.c:1429:client_setvolume_cbk] 0-doa-client-3: Connected to 172.17.251.54:24012, attached to remote volume '/exportdir/d4'.
[2012-04-05 04:10:29.905229] I [client-handshake.c:1441:client_setvolume_cbk] 0-doa-client-3: Server and Client lk-version numbers are not same, reopening the fds
[2012-04-05 04:10:29.932341] I [fuse-bridge.c:4081:fuse_graph_setup] 0-fuse: switched to graph 0
[2012-04-05 04:10:29.932548] I [client-handshake.c:456:client_set_lk_version_cbk] 0-doa-client-3: Server lk version = 1
[2012-04-05 04:10:29.932877] I [fuse-bridge.c:3358:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.13
[2012-04-05 04:19:01.353877] W [client3_1-fops.c:263:client3_1_mknod_cbk] 0-doa-client-2: remote operation failed: File exists. Path: /clients/client13/~dmtmp/PM/MOVED.DOC
[2012-04-05 04:19:02.085299] W [client3_1-fops.c:263:client3_1_mknod_cbk] 0-doa-client-2: remote operation failed: File exists. Path: /clients/client10/~dmtmp/PM/T1.XLS
pending frames:
frame : type(1) op(WRITE)
frame : type(1) op(WRITE)
frame : type(1) op(WRITE)
frame : type(1) op(WRITE)
frame : type(1) op(WRITE)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2012-04-05 04:19:34
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3git
/lib64/libc.so.6[0x3259432900]
/usr/local/lib/glusterfs/3git/xlator/cluster/stripe.so(stripe_writev+0x798)[0x7f8fa8c7d99b]
/usr/local/lib/glusterfs/3git/xlator/cluster/distribute.so(dht_writev+0x50d)[0x7f8fa8a42469]
/usr/local/lib/glusterfs/3git/xlator/performance/write-behind.so(wb_sync+0x852)[0x7f8fa87f2251]
/usr/local/lib/glusterfs/3git/xlator/performance/write-behind.so(wb_do_ops+0x144)[0x7f8fa87f88bd]
/usr/local/lib/glusterfs/3git/xlator/performance/write-behind.so(wb_process_queue+0x2d1)[0x7f8fa87f9131]
/usr/local/lib/glusterfs/3git/xlator/performance/write-behind.so(wb_writev+0x8e0)[0x7f8fa87f9bca]
/usr/local/lib/glusterfs/3git/xlator/performance/read-ahead.so(ra_writev+0x407)[0x7f8fa85e6ea1]
/usr/local/lib/glusterfs/3git/xlator/performance/io-cache.so(ioc_writev+0x49a)[0x7f8fa83d5258]
/usr/local/lib/glusterfs/3git/xlator/performance/quick-read.so(qr_writev+0x790)[0x7f8fa81b9884]
/usr/local/lib/glusterfs/3git/xlator/performance/md-cache.so(mdc_writev+0x28f)[0x7f8fa3dfa3d2]
/usr/local/lib/glusterfs/3git/xlator/debug/io-stats.so(io_stats_writev+0x433)[0x7f8fa3be8666]
/usr/local/lib/glusterfs/3git/xlator/mount/fuse.so(fuse_write_resume+0x5b1)[0x7f8fabd1bb96]
/usr/local/lib/glusterfs/3git/xlator/mount/fuse.so(+0x8164)[0x7f8fabd0f164]
/usr/local/lib/glusterfs/3git/xlator/mount/fuse.so(+0x823a)[0x7f8fabd0f23a]
/usr/local/lib/glusterfs/3git/xlator/mount/fuse.so(+0x812d)[0x7f8fabd0f12d]
/usr/local/lib/glusterfs/3git/xlator/mount/fuse.so(+0x8211)[0x7f8fabd0f211]
/usr/local/lib/glusterfs/3git/xlator/mount/fuse.so(fuse_resolve_continue+0x24)[0x7f8fabd0f2b4]
/usr/local/lib/glusterfs/3git/xlator/mount/fuse.so(+0x7ef0)[0x7f8fabd0eef0]
/usr/local/lib/glusterfs/3git/xlator/mount/fuse.so(+0x80db)[0x7f8fabd0f0db]
/usr/local/lib/glusterfs/3git/xlator/mount/fuse.so(+0x81bc)[0x7f8fabd0f1bc]
/usr/local/lib/glusterfs/3git/xlator/mount/fuse.so(fuse_resolve_and_resume+0x37)[0x7f8fabd0f2f2]
/usr/local/lib/glusterfs/3git/xlator/mount/fuse.so(+0x14db3)[0x7f8fabd1bdb3]
/usr/local/lib/glusterfs/3git/xlator/mount/fuse.so(+0x1e6e8)[0x7f8fabd256e8]
/lib64/libpthread.so.0[0x3259c077f1]
/lib64/libc.so.6(clone+0x6d)[0x32594e5ccd]

Comment 1 Anand Avati 2012-04-20 12:56:31 UTC
CHANGE: http://review.gluster.com/3190 (stripe: make sure we have complete set of subvolumes before making fop) merged in master by Vijay Bellur (vijay)

Comment 2 shishir gowda 2012-04-20 13:16:21 UTC
*** Bug 804274 has been marked as a duplicate of this bug. ***

Comment 3 shishir gowda 2012-04-27 05:37:24 UTC
*** Bug 786094 has been marked as a duplicate of this bug. ***