Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 762274 (GLUSTER-542)

Summary:	write-behind crashes
Product:	[Community] GlusterFS	Reporter:	Jan Boysen <jan>
Component:	write-behind	Assignee:	Raghavendra G <raghavendra>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	medium	Docs Contact:
Priority:	low
Version:	2.0.9	CC:	aavati, gluster-bugs, gordan, hauser, pavan, shehjart
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:		Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jan Boysen 2010-01-13 22:23:35 UTC

The Write-behind translator (on client) segfaults with the following backtrace reproducable on a ftp server after some time of operation:


pending frames:                                                                                                                              
frame : type(1) op(LOOKUP)                                                                                                                   
frame : type(1) op(LOOKUP)                                                                                                                   
frame : type(1) op(LOOKUP)                                                                                                                   
frame : type(1) op(LOOKUP)                                                                                                                   

patchset: v2.0.9
signal received: 11
time of crash: 2010-01-13 21:20:24
configuration details:            
argp 1                            
backtrace 1                       
bdb->cursor->get 1                
db.h 1                            
dlfcn 1                           
fdatasync 1                       
libpthread 1                      
llistxattr 1                      
setfsid 1                         
spinlock 1                        
epoll.h 1                         
xattr.h 1                         
st_atim.tv_nsec 1                 
package-string: glusterfs 2.0.9   
/lib/libc.so.6[0x7fe87b1a0f40]    
/lib/libc.so.6(memcpy+0x15b)[0x7fe87b1ec03b]
/usr/lib/glusterfs/2.0.9/xlator/performance/write-behind.so(__wb_collapse_write_bufs+0x109)[0x7fe87a1366f9]
/usr/lib/glusterfs/2.0.9/xlator/performance/write-behind.so(wb_process_queue+0xb5)[0x7fe87a137035]         
/usr/lib/glusterfs/2.0.9/xlator/performance/write-behind.so(wb_writev+0x313)[0x7fe87a137af3]               
/usr/lib/glusterfs/2.0.9/xlator/performance/read-ahead.so(ra_writev+0x124)[0x7fe879f2f744]                 
/usr/lib/glusterfs/2.0.9/xlator/cluster/replicate.so(afr_sh_data_read_cbk+0x1ee)[0x7fe879d19afe]           
/usr/lib/glusterfs/2.0.9/xlator/performance/read-ahead.so(ra_frame_unwind+0x15e)[0x7fe879f30d9e]           
/usr/lib/glusterfs/2.0.9/xlator/performance/read-ahead.so(ra_waitq_return+0x1d)[0x7fe879f30e9d]            
/usr/lib/glusterfs/2.0.9/xlator/performance/read-ahead.so(ra_fault_cbk+0x175)[0x7fe879f31545]              
/usr/lib/glusterfs/2.0.9/xlator/performance/write-behind.so(wb_readv_cbk+0x8e)[0x7fe87a1377be]             
/usr/lib/glusterfs/2.0.9/xlator/performance/quick-read.so(qr_readv_cbk+0x3c)[0x7fe87a33e81c]               
/usr/lib/glusterfs/2.0.9/xlator/performance/io-cache.so(ioc_frame_return+0x263)[0x7fe87a54b6f3]            
/usr/lib/glusterfs/2.0.9/xlator/performance/io-cache.so(ioc_waitq_return+0x1d)[0x7fe87a54b86d]             
/usr/lib/glusterfs/2.0.9/xlator/performance/io-cache.so(ioc_fault_cbk+0x1f7)[0x7fe87a54c877]               
/usr/lib/glusterfs/2.0.9/xlator/protocol/client.so(client_readv_cbk+0x1c7)[0x7fe87a7673b7]                 
/usr/lib/glusterfs/2.0.9/xlator/protocol/client.so(protocol_client_pollin+0xca)[0x7fe87a7570ba]            
/usr/lib/glusterfs/2.0.9/xlator/protocol/client.so(notify+0xe0)[0x7fe87a75dd20]                            
/usr/lib/glusterfs/2.0.9/transport/socket.so(socket_event_handler+0xe0)[0x7fe87929f200]                    
/usr/lib/libglusterfs.so.0[0x7fe87b90d21f]                                                                 
/usr/sbin/glusterfs(main+0x80b)[0x403fab]                                                                  
/lib/libc.so.6(__libc_start_main+0xe6)[0x7fe87b18d1a6]                                                     
/usr/sbin/glusterfs[0x402789]


removing the translator "solves" the issue....

Comment 1 Anand Avati 2010-02-08 00:32:27 UTC

I've noticed a very high incidence of the problem I reported a while back, that manifests itself in open files getting corrupted on commit, possibly during conditions that involve server disconnections due to timeouts (very high disk load). Specifically, I've noticed that my .viminfo file got corrupted for the 3rd time today. Since this is root's .viminfo, and I'm running glfs as root, I don't have the logs to verify the disconnections, though. From what I can tell, a chunk of a dll somehow ends up in .viminfo, but I'm not sure which one.

On a different volume, I'm seeing other weirdness under the same high disk load conditions (software RAID check/resync on all server nodes). This seems to be specifically related to using writebehind+iocache on the client-side on one of he servers, exported via unfsd (the one from the gluster ftp site). What happens is that the /home volume simply seems to disappear underneath unfsd! The attached log indicates a glusterfsd crash.

This doesn't happen if I remove the writebehind and io-cache translators.

Other notable things about the setup that might help figure out the cause of this:

- The other two servers are idle - they are not serving any requests. They are, however, also under the same high disk load.

- writebehind and io-cache is only applied on one server the one behing used to export via unfsd. The other servers do not have those translators applied. The volume config is attached. It is called home-cache.vol, but this is the same file the log file refers to even though it is listed there as home.vol.

The problem specifically occurs when servers are undergoing high load of the described nature that causes disk latencies to go up massively. I have not observed any instances of a similar crash happening without the writebehind and io-cache translators.

Gordan

Comment 2 Anand Avati 2010-02-08 00:33:20 UTC

================================================================================
Version      : glusterfs 2.0.9 built on Dec 24 2009 23:13:15
git: v2.0.9
Starting Time: 2010-01-05 11:18:37
Command line : /usr/sbin/glusterfs --log-level=NORMAL --disable-direct-io-mode --volfile=/etc/glusterfs/home.vol /home
PID          : 14917
System name  : Linux
Nodename     : raiden.shatteredsilicon.net
Kernel Release : 2.6.18-164.9.1.el5
Hardware Identifier: x86_64

Given volfile:
+------------------------------------------------------------------------------+
 1: volume home1
 2:    type protocol/client
 3:    option transport-type socket
 4:    option transport.address-family inet
 5:    option remote-host 10.2.0.11
 6:    option remote-port 6997
 7:    option remote-subvolume home1
 8: end-volume
 9:
 10: volume home1-writebehind
 11:    type performance/write-behind
 12:    option cache-size 2MB           # default is equal to aggregate-size
 13:    option flush-behind on          # default is 'off'
 14:    option enable-O_SYNC on
 15:    subvolumes home1
 16: end-volume
 17:
 18: volume home1-iocache
 19:    type performance/io-cache
 20:    option cache-size 64MB
 21:    option cache-timeout 2          # default is 1 second
 22:    subvolumes home1-writebehind
 23: end-volume
 24:
 25: ##############################################################################
 26:
 27: volume home3
 28:    type protocol/client
 29:    option transport-type socket
 30:    option transport.address-family inet
 31:    option remote-host 10.2.0.13
 32:    option remote-port 6997
 33:    option remote-subvolume home3
 34: end-volume
 35:
 36: volume home3-writebehind
 37:    type performance/write-behind
 38:    option cache-size 2MB           # default is equal to aggregate-size
 39:    option flush-behind on          # default is 'off'
 40:    option enable-O_SYNC on
 41:    subvolumes home3
 42: end-volume
 43:
 44: volume home3-iocache
 45:    type performance/io-cache
 46:    option cache-size 64MB
 47:    option cache-timeout 2          # default is 1 second
 48:    subvolumes home3-writebehind
 49: end-volume
 50:
 51: ##############################################################################
 52:
 53: volume home-store
 54:    type storage/posix
 55:    option directory /gluster/home
 56: end-volume
 57:
 58: volume home2
 59:    type features/posix-locks
 60:    subvolumes home-store
 61: end-volume
 62:
 63: volume home2-writebehind
 64:    type performance/write-behind
 65:    option cache-size 2MB           # default is equal to aggregate-size
 66:    option flush-behind on          # default is 'off'
 67:    option enable-O_SYNC on
 68:    subvolumes home2
 69: end-volume
 70:
 71: volume home2-iocache
 72:    type performance/io-cache
 73:    option cache-size 64MB
 74:    option cache-timeout 2          # default is 1 second
 75:    subvolumes home2-writebehind
 76: end-volume
 77:
 78: ##############################################################################
 79:
 80: volume server
 81:    type protocol/server
 82:    option transport-type socket
 83:    option transport.address-family inet
 84:    option transport.socket.listen-port 6997
 85:    subvolumes home2
 86:    option auth.addr.home2.allow 127.0.0.1,10.*
 87: end-volume
 88:
 89: volume home
 90:    type cluster/replicate
 91:    subvolumes home2-iocache home1-iocache home3-iocache
 92:    option read-subvolume home2-iocache
 93:    option favorite-child home2-iocache
 94: end-volume

+------------------------------------------------------------------------------+
[2010-01-05 11:18:37] W [afr.c:2436:init] home: You have specified subvolume 'home2-iocache' as the 'favorite child'. This means that if a discrepancy in the content or attributes (ownership, permission, etc.) of a file is detected among the subvolumes, the file on 'home2-iocache' will be considered the definitive version and its contents will OVERWRITE the contents of the file on other subvolumes. All versions of the file except that on 'home2-iocache' WILL BE LOST.
[2010-01-05 11:18:37] N [afr.c:2194:notify] home: Subvolume 'home2-iocache' came back up; going online.
[2010-01-05 11:18:37] N [afr.c:2194:notify] home: Subvolume 'home2-iocache' came back up; going online.
[2010-01-05 11:18:37] N [afr.c:2194:notify] home: Subvolume 'home2-iocache' came back up; going online.
[2010-01-05 11:18:37] N [glusterfsd.c:1306:main] glusterfs: Successfully started
[2010-01-05 11:18:37] N [client-protocol.c:5733:client_setvolume_cbk] home1: Connected to 10.2.0.11:6997, attached to remote volume 'home1'.
[2010-01-05 11:18:37] N [client-protocol.c:5733:client_setvolume_cbk] home1: Connected to 10.2.0.11:6997, attached to remote volume 'home1'.
[2010-01-05 11:18:37] N [client-protocol.c:5733:client_setvolume_cbk] home3: Connected to 10.2.0.13:6997, attached to remote volume 'home3'.
[2010-01-05 11:18:37] N [client-protocol.c:5733:client_setvolume_cbk] home3: Connected to 10.2.0.13:6997, attached to remote volume 'home3'.
[2010-01-05 11:18:37] N [server-protocol.c:7065:mop_setvolume] server: accepted client from 10.2.0.13:1015
[2010-01-05 11:18:37] N [server-protocol.c:7065:mop_setvolume] server: accepted client from 10.2.0.13:1014
[2010-01-05 11:18:47] N [server-protocol.c:7065:mop_setvolume] server: accepted client from 10.2.0.11:1009
[2010-01-05 11:18:47] N [server-protocol.c:7065:mop_setvolume] server: accepted client from 10.2.0.11:1008
[2010-01-05 11:21:05] E [posix.c:3156:do_xattrop] home-store: getxattr failed on /gordan/.gconf/apps/bluetooth-manager while doing xattrop: No such file or directory
pending frames:
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)

patchset: v2.0.9
signal received: 11
time of crash: 2010-01-05 11:21:06
configuration details:
argp 1
backtrace 1
db.h 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 2.0.9
/lib64/libc.so.6[0x3f55e302d0]
/lib64/libc.so.6(memcpy+0x15b)[0x3f55e7bf0b]
/usr/lib64/glusterfs/2.0.9/xlator/performance/write-behind.so(__wb_collapse_write_bufs+0x105)[0x2ac6b23bc045]
/usr/lib64/glusterfs/2.0.9/xlator/performance/write-behind.so(wb_process_queue+0xa8)[0x2ac6b23bcc68]
/usr/lib64/glusterfs/2.0.9/xlator/performance/write-behind.so(wb_writev+0x373)[0x2ac6b23bf0e3]
/usr/lib64/glusterfs/2.0.9/xlator/performance/io-cache.so(ioc_writev+0x123)[0x2ac6b25c5a13]
/usr/lib64/glusterfs/2.0.9/xlator/cluster/replicate.so(afr_sh_data_read_cbk+0x1d6)[0x2ac6b2e2dea6]
/usr/lib64/glusterfs/2.0.9/xlator/performance/io-cache.so(ioc_frame_return+0x240)[0x2ac6b25c7390]
/usr/lib64/glusterfs/2.0.9/xlator/performance/io-cache.so(ioc_dispatch_requests+0x25b)[0x2ac6b25c4f4b]
/usr/lib64/glusterfs/2.0.9/xlator/performance/io-cache.so(ioc_readv+0x1fa)[0x2ac6b25c679a]
/usr/lib64/glusterfs/2.0.9/xlator/cluster/replicate.so(afr_sh_data_read_write+0xd7)[0x2ac6b2e2c1c7]
/usr/lib64/glusterfs/2.0.9/xlator/cluster/replicate.so(afr_sh_data_read_write_iter+0x5d)[0x2ac6b2e2dc8d]
/usr/lib64/glusterfs/2.0.9/xlator/cluster/replicate.so(afr_sh_data_write_cbk+0x91)[0x2ac6b2e2dfb1]
/usr/lib64/glusterfs/2.0.9/xlator/performance/io-cache.so(ioc_writev_cbk+0x7a)[0x2ac6b25c5aca]
/usr/lib64/glusterfs/2.0.9/xlator/performance/write-behind.so(wb_stack_unwind+0x6a)[0x2ac6b23bbe8a]
/usr/lib64/glusterfs/2.0.9/xlator/performance/write-behind.so(wb_do_ops+0x2c)[0x2ac6b23bcb7c]
/usr/lib64/glusterfs/2.0.9/xlator/performance/write-behind.so(wb_process_queue+0xf4)[0x2ac6b23bccb4]
/usr/lib64/glusterfs/2.0.9/xlator/performance/write-behind.so(wb_writev+0x373)[0x2ac6b23bf0e3]
/usr/lib64/glusterfs/2.0.9/xlator/performance/io-cache.so(ioc_writev+0x123)[0x2ac6b25c5a13]
/usr/lib64/glusterfs/2.0.9/xlator/cluster/replicate.so(afr_sh_data_read_cbk+0x1d6)[0x2ac6b2e2dea6]
/usr/lib64/glusterfs/2.0.9/xlator/performance/io-cache.so(ioc_frame_return+0x240)[0x2ac6b25c7390]
/usr/lib64/glusterfs/2.0.9/xlator/performance/io-cache.so(ioc_dispatch_requests+0x25b)[0x2ac6b25c4f4b]
/usr/lib64/glusterfs/2.0.9/xlator/performance/io-cache.so(ioc_readv+0x1fa)[0x2ac6b25c679a]
/usr/lib64/glusterfs/2.0.9/xlator/cluster/replicate.so(afr_sh_data_read_write+0xd7)[0x2ac6b2e2c1c7]
/usr/lib64/glusterfs/2.0.9/xlator/cluster/replicate.so(afr_sh_data_read_write_iter+0x5d)[0x2ac6b2e2dc8d]
/usr/lib64/glusterfs/2.0.9/xlator/cluster/replicate.so(afr_sh_data_write_cbk+0x91)[0x2ac6b2e2dfb1]
/usr/lib64/glusterfs/2.0.9/xlator/performance/io-cache.so(ioc_writev_cbk+0x7a)[0x2ac6b25c5aca]
/usr/lib64/glusterfs/2.0.9/xlator/performance/write-behind.so(wb_stack_unwind+0x6a)[0x2ac6b23bbe8a]
/usr/lib64/glusterfs/2.0.9/xlator/performance/write-behind.so(wb_do_ops+0x2c)[0x2ac6b23bcb7c]
/usr/lib64/glusterfs/2.0.9/xlator/performance/write-behind.so(wb_process_queue+0xf4)[0x2ac6b23bccb4]
/usr/lib64/glusterfs/2.0.9/xlator/performance/write-behind.so(wb_sync_cbk+0xc5)[0x2ac6b23be2b5]
/usr/lib64/glusterfs/2.0.9/xlator/protocol/client.so(client_write_cbk+0x14e)[0x2ac6b21b13de]
/usr/lib64/glusterfs/2.0.9/xlator/protocol/client.so(protocol_client_pollin+0xca)[0x2ac6b21a18aa]
/usr/lib64/glusterfs/2.0.9/xlator/protocol/client.so(notify+0x212)[0x2ac6b21a84e2]
/usr/lib64/glusterfs/2.0.9/transport/socket.so(socket_event_handler+0xd3)[0x2aaaaaaafe33]
/usr/lib64/libglusterfs.so.0[0x3f56a27115]
/usr/sbin/glusterfs(main+0xa06)[0x403e96]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x3f55e1d994]
/usr/sbin/glusterfs[0x402509]
---------


volume home1
       type protocol/client
       option transport-type socket
       option transport.address-family inet
       option remote-host 10.2.0.11
       option remote-port 6997
       option remote-subvolume home1
end-volume

volume home1-writebehind
       type performance/write-behind
       option cache-size 2MB           # default is equal to aggregate-size
       option flush-behind on          # default is 'off'
       option enable-O_SYNC on
       subvolumes home1
end-volume

volume home1-iocache
       type performance/io-cache
       option cache-size 64MB
       option cache-timeout 2          # default is 1 second
       subvolumes home1-writebehind
end-volume

##############################################################################

volume home3
       type protocol/client
       option transport-type socket
       option transport.address-family inet
       option remote-host 10.2.0.13
       option remote-port 6997
       option remote-subvolume home3
end-volume

volume home3-writebehind
       type performance/write-behind
       option cache-size 2MB           # default is equal to aggregate-size
       option flush-behind on          # default is 'off'
       option enable-O_SYNC on
       subvolumes home3
end-volume

volume home3-iocache
       type performance/io-cache
       option cache-size 64MB
       option cache-timeout 2          # default is 1 second
       subvolumes home3-writebehind
end-volume

##############################################################################

volume home-store
       type storage/posix
       option directory /gluster/home
end-volume

volume home2
       type features/posix-locks
       subvolumes home-store
end-volume

volume home2-writebehind
       type performance/write-behind
       option cache-size 2MB           # default is equal to aggregate-size
       option flush-behind on          # default is 'off'
       option enable-O_SYNC on
       subvolumes home2
end-volume

volume home2-iocache
       type performance/io-cache
       option cache-size 64MB
       option cache-timeout 2          # default is 1 second
       subvolumes home2-writebehind
end-volume

##############################################################################

volume server
       type protocol/server
       option transport-type socket
       option transport.address-family inet
       option transport.socket.listen-port 6997
       subvolumes home2
       option auth.addr.home2.allow 127.0.0.1,10.*
end-volume

volume home
       type cluster/replicate
       subvolumes home2-iocache home1-iocache home3-iocache
       option read-subvolume home2-iocache
       option favorite-child home2-iocache
end-volume

Comment 3 Anand Avati 2010-02-08 00:37:53 UTC

<from email by Gordan Bobic>

Core was generated by `/usr/sbin/glusterfs --log-level=NORMAL --disable-direct-io-mode --volfile=/etc/'.
Program terminated with signal 11, Segmentation fault.
[New process 14917]
[New process 14921]
[New process 14918]
#0  0x0000003f55e7bf0b in memcpy () from /lib64/libc.so.6

Thread 3 (process 14918):
#0  0x0000003f55e9a0b1 in nanosleep () from /lib64/libc.so.6
No symbol table info available.
#1  0x0000003f55ecd684 in usleep () from /lib64/libc.so.6
No symbol table info available.
#2  0x0000003f56a1a8b4 in gf_timer_proc (ctx=<value optimized out>) at timer.c:177
	now = <value optimized out>
	now_tv = Could not find the frame base for "gf_timer_proc".
	event = <value optimized out>
	reg = <value optimized out>
	__FUNCTION__ = "gf_timer_proc"
#3  0x0000003f56606617 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#4  0x0000003f55ed3c2d in clone () from /lib64/libc.so.6
No symbol table info available.

Thread 2 (process 14921):
#0  0x0000003f5660d73b in read () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x0000003f5820ec9a in ?? () from /usr/lib64/libfuse.so.2
No symbol table info available.
#2  0x0000003f58212650 in fuse_chan_receive () from /usr/lib64/libfuse.so.2
No symbol table info available.
#3  0x00002ac6b304a230 in fuse_thread_proc (data=<value optimized out>) at fuse-bridge.c:2541
	mount_point = <value optimized out>
	this = (xlator_t *) 0xa11cdc0
	priv = (fuse_private_t *) 0xa12b050
	res = 62
	iobuf = (struct iobuf *) 0x2aaaac0836b0
	chan_size = 135168
	ret = <value optimized out>
	now = {tv_sec = 1262690317, tv_usec = 407596}
	timeout = {tv_sec = 1262690318, tv_nsec = 407596000}
	__FUNCTION__ = "fuse_thread_proc"
#4  0x0000003f56606617 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#5  0x0000003f55ed3c2d in clone () from /lib64/libc.so.6
No symbol table info available.

Thread 1 (process 14917):
#0  0x0000003f55e7bf0b in memcpy () from /lib64/libc.so.6
No symbol table info available.
#1  0x00002ac6b23bc045 in __wb_collapse_write_bufs (requests=0x2aaab40055b8, page_size=131072) at /usr/include/bits/string3.h:51
	offset_expected = <value optimized out>
	space_left = 65536
	iov_len = (size_t *) 0x2aaab4012098
	write_size = (size_t *) 0x2aaab4010328
	ptr = 0x2aaab1ffd000 "\n\n<\001D\004\223"
	first_request = <value optimized out>
	request = (wb_request_t *) 0x2aaab402acb0
	tmp = (wb_request_t *) 0x2aaab40055b8
#2  0x00002ac6b23bcc68 in wb_process_queue (frame=0x2aaab4020ba0, file=0x2aaab4005580, flush_all=0 '\0') at write-behind.c:1436
	winds = {next = 0x7ffffccbff40, prev = 0x7ffffccbff40}
	unwinds = {next = 0x2aaab402acd0, prev = 0x2aaab402acd0}
	other_requests = {next = 0x7ffffccbff20, prev = 0x7ffffccbff20}
	size = 131072
	conf = (wb_conf_t *) 0xa129b60
	count = <value optimized out>
#3  0x00002ac6b23bf0e3 in wb_writev (frame=0x2aaab402f630, this=<value optimized out>, fd=0x2aaab4004170, vector=0x2aaab4025260, 
    count=1, offset=6553600, iobref=0x2aaab4025940) at write-behind.c:1550
	file = (wb_file_t *) 0x2aaab1fbf000
	process_frame = (call_frame_t *) 0x2aaab4020ba0
	size = 169906720
	tmp_file = 46912652727680
	stub = <value optimized out>
	op_ret = <value optimized out>
	op_errno = 0
	__FUNCTION__ = "wb_writev"
#4  0x00002ac6b25c5a13 in ioc_writev (frame=0x2aaab40278a0, this=0xa1227a0, fd=0x2aaab4004170, vector=0x2aaab4025260, count=1, 
    offset=6553600, iobref=0x2aaab4025940) at io-cache.c:944
	_new = (call_frame_t *) 0x2aaab2000000
	ioc_inode = 46912652727808
#5  0x00002ac6b2e2dea6 in afr_sh_data_read_cbk (frame=0xa208c90, cookie=<value optimized out>, this=<value optimized out>, 
    op_ret=<value optimized out>, op_errno=<value optimized out>, vector=0x2aaab4025260, count=1, buf=0x7ffffccc0120, 
    iobref=0x2aaab4025940) at afr-self-heal-data.c:589
	_new = (call_frame_t *) 0x2aaab2000000
	local = (afr_local_t *) 0xa205730
	sh = (afr_self_heal_t *) 0xa206cb8
	call_count = 2
	offset = 6553600
	__FUNCTION__ = "afr_sh_data_read_cbk"
#6  0x00002ac6b25c7390 in ioc_frame_return (frame=0x2aaab40275d0) at page.c:671
	local = <value optimized out>
	wait_count = <value optimized out>
	__PRETTY_FUNCTION__ = "ioc_frame_return"
	__FUNCTION__ = "ioc_frame_return"
#7  0x00002ac6b25c4f4b in ioc_dispatch_requests (frame=0x2aaab40275d0, ioc_inode=0x2aaab4005050, fd=0x2aaab4004170, 
    offset=6553600, size=<value optimized out>) at io-cache.c:809
	local = <value optimized out>
	table = (ioc_table_t *) 0xa128520
	trav = (ioc_page_t *) 0x2aaab40258a0
	waitq = (ioc_waitq_t *) 0x0
	rounded_offset = 46912652859552
	rounded_end = 6684672
	trav_offset = 6684672
	fault = 1
	local_offset = 6553600
	need_validate = 0 '\0'
	might_need_validate = 0 '\0'
	__FUNCTION__ = "ioc_dispatch_requests"
#8  0x00002ac6b25c679a in ioc_readv (frame=0x2aaab40275d0, this=<value optimized out>, fd=0x2aaab4004170, size=65536, 
    offset=6553600) at io-cache.c:882
	tmp_ioc_inode = 46912652726352
	ioc_inode = (ioc_inode_t *) 0x2aaab1fbf000
	weight = 1
	__FUNCTION__ = "ioc_readv"
#9  0x00002ac6b2e2c1c7 in afr_sh_data_read_write (frame=0xa208c90, this=<value optimized out>) at afr-self-heal-data.c:616
	_new = (call_frame_t *) 0x2aaab2000000
	sh = (afr_self_heal_t *) 0xa206cb8
#10 0x00002ac6b2e2dc8d in afr_sh_data_read_write_iter (frame=0xa208c90, this=0xa126310) at afr-self-heal-data.c:652
	local = (afr_local_t *) 0x10000
	sh = (afr_self_heal_t *) 0x2aaab1ffd000
	__FUNCTION__ = "afr_sh_data_read_write_iter"
#11 0x00002ac6b2e2dfb1 in afr_sh_data_write_cbk (frame=0xa208c90, cookie=<value optimized out>, this=0xa126310, 
    op_ret=<value optimized out>, op_errno=0, buf=<value optimized out>) at afr-self-heal-data.c:529
	local = (afr_local_t *) 0xa205730
	sh = (afr_self_heal_t *) 0xa206cb8
	child_index = 2
	call_count = -1308635136
	__FUNCTION__ = "afr_sh_data_write_cbk"
#12 0x00002ac6b25c5aca in ioc_writev_cbk (frame=0x2aaab4029c90, cookie=<value optimized out>, this=<value optimized out>, 
    op_ret=65536, op_errno=0, stbuf=0x7ffffccc03e0) at io-cache.c:910
	fn = (int32_t (*)(call_frame_t *, call_frame_t *, xlator_t *, int32_t, int32_t, ...)) 0x2aaab4005580
	_parent = (call_frame_t *) 0x2aaab2000000
	ioc_inode = 46912652757216
#13 0x00002ac6b23bbe8a in wb_stack_unwind (unwinds=0x7ffffccc0500) at write-behind.c:1280
	fn = (int32_t (*)(call_frame_t *, call_frame_t *, xlator_t *, int32_t, int32_t, ...)) 0x2aaab4005580
	_parent = (call_frame_t *) 0x2aaab2000000
	buf = {st_dev = 0, st_ino = 0, st_nlink = 0, st_mode = 0, st_uid = 0, st_gid = 0, pad0 = 0, st_rdev = 0, st_size = 0, 
  st_blksize = 0, st_blocks = 0, st_atim = {tv_sec = 0, tv_nsec = 0}, st_mtim = {tv_sec = 0, tv_nsec = 0}, st_ctim = {tv_sec = 0, 
    tv_nsec = 0}, __unused = {0, 0, 0}}
	request = (wb_request_t *) 0x2aaab400ab20
	dummy = <value optimized out>
	frame = <value optimized out>
	local = <value optimized out>
#14 0x00002ac6b23bcb7c in wb_do_ops (frame=0x2aaab4011010, file=0x2aaab1fbf000, winds=0x10000, unwinds=0xa00, 
    other_requests=0x2aaab402acb0) at write-behind.c:1335
No locals.
#15 0x00002ac6b23bccb4 in wb_process_queue (frame=0x2aaab4011010, file=0x2aaab400c6b0, flush_all=0 '\0') at write-behind.c:1451
	winds = {next = 0x7ffffccc0510, prev = 0x7ffffccc0510}
	unwinds = {next = 0x2aaab400ab40, prev = 0x2aaab400ab40}
	other_requests = {next = 0x7ffffccc04f0, prev = 0x7ffffccc04f0}
	size = 131072
	conf = (wb_conf_t *) 0xa12ab30
	count = <value optimized out>
#16 0x00002ac6b23bf0e3 in wb_writev (frame=0x2aaab4027750, this=<value optimized out>, fd=0x2aaab4004170, vector=0x2aaab4019420, 
    count=1, offset=6488064, iobref=0x2aaab4006ed0) at write-behind.c:1550
	file = (wb_file_t *) 0x2aaab1fbf000
	process_frame = (call_frame_t *) 0x2aaab4011010
	size = 169906720
	tmp_file = 46912652756656
	stub = <value optimized out>
	op_ret = <value optimized out>
	op_errno = 0
	__FUNCTION__ = "wb_writev"
#17 0x00002ac6b25c5a13 in ioc_writev (frame=0x2aaab4029c90, this=0xa123b00, fd=0x2aaab4004170, vector=0x2aaab4019420, count=1, 
    offset=6488064, iobref=0x2aaab4006ed0) at io-cache.c:944
	_new = (call_frame_t *) 0x2aaab2000000
	ioc_inode = 46912652757216
#18 0x00002ac6b2e2dea6 in afr_sh_data_read_cbk (frame=0xa208c90, cookie=<value optimized out>, this=<value optimized out>, 
    op_ret=<value optimized out>, op_errno=<value optimized out>, vector=0x2aaab4019420, count=1, buf=0x7ffffccc06f0, 
    iobref=0x2aaab4006ed0) at afr-self-heal-data.c:589
	_new = (call_frame_t *) 0x2aaab2000000
	local = (afr_local_t *) 0xa205730
	sh = (afr_self_heal_t *) 0xa206cb8
	call_count = 1
	offset = 6488064
	__FUNCTION__ = "afr_sh_data_read_cbk"
#19 0x00002ac6b25c7390 in ioc_frame_return (frame=0x2aaab400aec0) at page.c:671
	local = <value optimized out>
	wait_count = <value optimized out>
	__PRETTY_FUNCTION__ = "ioc_frame_return"
	__FUNCTION__ = "ioc_frame_return"
#20 0x00002ac6b25c4f4b in ioc_dispatch_requests (frame=0x2aaab400aec0, ioc_inode=0x2aaab4005050, fd=0x2aaab4004170, 
    offset=6488064, size=<value optimized out>) at io-cache.c:809
	local = <value optimized out>
	table = (ioc_table_t *) 0xa128520
	trav = (ioc_page_t *) 0x2aaab4024eb0
	waitq = (ioc_waitq_t *) 0x2aaab401f380
	rounded_offset = 46912652857008
	rounded_end = 6553600
	trav_offset = 6553600
	fault = 0
	local_offset = 6488064
	need_validate = 0 '\0'
	might_need_validate = 0 '\0'
	__FUNCTION__ = "ioc_dispatch_requests"
#21 0x00002ac6b25c679a in ioc_readv (frame=0x2aaab400aec0, this=<value optimized out>, fd=0x2aaab4004170, size=65536, 
    offset=6488064) at io-cache.c:882
	tmp_ioc_inode = 46912652726352
	ioc_inode = (ioc_inode_t *) 0x2aaab1fbf000
	weight = 1
	__FUNCTION__ = "ioc_readv"
#22 0x00002ac6b2e2c1c7 in afr_sh_data_read_write (frame=0xa208c90, this=<value optimized out>) at afr-self-heal-data.c:616
	_new = (call_frame_t *) 0x2aaab2000000
	sh = (afr_self_heal_t *) 0xa206cb8
#23 0x00002ac6b2e2dc8d in afr_sh_data_read_write_iter (frame=0xa208c90, this=0xa126310) at afr-self-heal-data.c:652
	local = (afr_local_t *) 0x10000
	sh = (afr_self_heal_t *) 0x2aaab1ffd000
	__FUNCTION__ = "afr_sh_data_read_write_iter"
#24 0x00002ac6b2e2dfb1 in afr_sh_data_write_cbk (frame=0xa208c90, cookie=<value optimized out>, this=0xa126310, 
    op_ret=<value optimized out>, op_errno=0, buf=<value optimized out>) at afr-self-heal-data.c:529
	local = (afr_local_t *) 0xa205730
	sh = (afr_self_heal_t *) 0xa206cb8
	child_index = 2
	call_count = -1308635136
	__FUNCTION__ = "afr_sh_data_write_cbk"
#25 0x00002ac6b25c5aca in ioc_writev_cbk (frame=0x2aaab40290e0, cookie=<value optimized out>, this=<value optimized out>, 
    op_ret=65536, op_errno=0, stbuf=0x7ffffccc09b0) at io-cache.c:910
	fn = (int32_t (*)(call_frame_t *, call_frame_t *, xlator_t *, int32_t, int32_t, ...)) 0x2aaab4005580
	_parent = (call_frame_t *) 0x2aaab2000000
	ioc_inode = 46912652757216
#26 0x00002ac6b23bbe8a in wb_stack_unwind (unwinds=0x7ffffccc0ad0) at write-behind.c:1280
	fn = (int32_t (*)(call_frame_t *, call_frame_t *, xlator_t *, int32_t, int32_t, ...)) 0x2aaab4005580
	_parent = (call_frame_t *) 0x2aaab2000000
	buf = {st_dev = 0, st_ino = 0, st_nlink = 0, st_mode = 0, st_uid = 0, st_gid = 0, pad0 = 0, st_rdev = 0, st_size = 0, 
  st_blksize = 0, st_blocks = 0, st_atim = {tv_sec = 0, tv_nsec = 0}, st_mtim = {tv_sec = 0, tv_nsec = 0}, st_ctim = {tv_sec = 0, 
    tv_nsec = 0}, __unused = {0, 0, 0}}
	request = (wb_request_t *) 0x2aaab402f5c0
	dummy = <value optimized out>
	frame = <value optimized out>
	local = <value optimized out>
#27 0x00002ac6b23bcb7c in wb_do_ops (frame=0x2aaab4026150, file=0x2aaab1fbf000, winds=0x10000, unwinds=0xa00, 
    other_requests=0x2aaab402acb0) at write-behind.c:1335
No locals.
#28 0x00002ac6b23bccb4 in wb_process_queue (frame=0x2aaab4026150, file=0x2aaab400c6b0, flush_all=0 '\0') at write-behind.c:1451
	winds = {next = 0x7ffffccc0ae0, prev = 0x7ffffccc0ae0}
	unwinds = {next = 0x2aaab402f5e0, prev = 0x2aaab402f5e0}
	other_requests = {next = 0x7ffffccc0ac0, prev = 0x7ffffccc0ac0}
	size = 131072
	conf = (wb_conf_t *) 0xa12ab30
	count = <value optimized out>
#29 0x00002ac6b23be2b5 in wb_sync_cbk (frame=0x2aaab4026150, cookie=<value optimized out>, this=<value optimized out>, 
    op_ret=131072, op_errno=0, stbuf=<value optimized out>) at write-behind.c:345
	winds = (list_head_t *) 0x2aaab40260d0
	file = (wb_file_t *) 0x2aaab400c6b0
	request = <value optimized out>
	dummy = <value optimized out>
	per_request_local = <value optimized out>
	fd = (fd_t *) 0x2aaab4004170
#30 0x00002ac6b21b13de in client_write_cbk (frame=0x2aaab40261b0, hdr=0x2aaab40054b0, hdrlen=<value optimized out>, 
    iobuf=<value optimized out>) at client-protocol.c:4530
	fn = (int32_t (*)(call_frame_t *, call_frame_t *, xlator_t *, int32_t, int32_t, ...)) 0x2aaab4005580
	_parent = (call_frame_t *) 0x2aaab2000000
	op_ret = 131072
	op_errno = 0
	stbuf = {st_dev = 2306, st_ino = 298025213, st_nlink = 1, st_mode = 33188, st_uid = 1001, st_gid = 1001, pad0 = 0, 
  st_rdev = 0, st_size = 18092032, st_blksize = 4096, st_blocks = 35392, st_atim = {tv_sec = 1262641112, tv_nsec = 0}, st_mtim = {
    tv_sec = 1262690466, tv_nsec = 0}, st_ctim = {tv_sec = 1262690466, tv_nsec = 0}, __unused = {0, 0, 0}}
#31 0x00002ac6b21a18aa in protocol_client_pollin (this=0xa123150, trans=0xa12a110) at client-protocol.c:6347
	conf = (client_conf_t *) 0xa129c60
	ret = 0
	iobuf = (struct iobuf *) 0x0
	hdr = 0x2aaab40054b0 ""
	hdrlen = 108
#32 0x00002ac6b21a84e2 in notify (this=0x2aaab2000000, event=2, data=0xa12a110) at client-protocol.c:6390
	ret = <value optimized out>
	child_down = <value optimized out>
	was_not_down = <value optimized out>
	trans = (transport_t *) 0x2aaab1fbf000
	conn = <value optimized out>
	conf = (client_conf_t *) 0xa129c60
	parent = <value optimized out>
	__FUNCTION__ = "notify"
#33 0x00002aaaaaaafe33 in socket_event_handler (fd=<value optimized out>, idx=2, data=0xa12a110, poll_in=1, poll_out=0, poll_err=0)
    at socket.c:814
	this = (transport_t *) 0x2aaab2000000
	priv = (socket_private_t *) 0xa12a410
	ret = -1308635136
#34 0x0000003f56a27115 in event_dispatch_epoll (event_pool=<value optimized out>) at event.c:804
	events = <value optimized out>
	i = <value optimized out>
	ret = <value optimized out>
	__FUNCTION__ = "event_dispatch_epoll"
#35 0x0000000000403e96 in main (argc=5, argv=0x7ffffccc1ac8) at glusterfsd.c:1317
	ctx = <value optimized out>
	cmd_args = <value optimized out>
	stbuf = {st_dev = 0, st_ino = 140737434619872, st_nlink = 0, st_mode = 1436594605, st_uid = 63, st_gid = 0, pad0 = 0, 
  st_rdev = 1261711580, st_size = 0, st_blksize = 272019541654, st_blocks = 140737434621392, st_atim = {tv_sec = 272019534192, 
    tv_nsec = 140737434621455}, st_mtim = {tv_sec = 140737434621440, tv_nsec = 140737434621432}, st_ctim = {tv_sec = 272021703448, 
    tv_nsec = 1}, __unused = {0, 6323616, 272023768447}}
	tmp_logfile = '\0' <repeats 1023 times>
	timestr = '\0' <repeats 255 times>
	utime = 1262690317
	tm = <value optimized out>
	ret = 0
	lim = {rlim_cur = 18446744073709551615, rlim_max = 18446744073709551615}
	specfp = <value optimized out>
	graph = (xlator_t *) 0x0
	trav = <value optimized out>
	fuse_volume_found = 0
	xl_count = <value optimized out>
	pipe_fd = {6, 7}
	gf_success = 0
	gf_failure = -1
	__FUNCTION__ = "main"

Comment 4 Anand Avati 2010-02-08 00:55:35 UTC

There are two variants of the core issue.

First variant is this crash.

The second, scarier, is when file data of other files are silently overwritten. The cause is due to write-behind getting write IO buffer served from io-cache - this causes internal offset calculations to go wrong.

In the normal scenario, IO buffers are always generated from fuse or protocol and the write data will coincide with IO buffer (assumption which write-behind makes). In fact, just removing io-cache below replicate should fix this corruption issue (since that will cause IO buffers to be generated from protocol/client of the source server, and not use a cached IO buffer).

Comment 5 Gordan Bobic 2010-02-08 07:47:07 UTC

Does that mean that this can (and will be) fixed? Or is the fix going to be detecting iocache below replicate and aborting glusterfsd start?

Comment 6 Anand Avati 2010-02-08 12:48:04 UTC

(In reply to comment #5)
> Does that mean that this can (and will be) fixed? Or is the fix going to be
> detecting iocache below replicate and aborting glusterfsd start?

We're working on a fix and the patch is getting reviewed internally (http://patches.gluster.com/patch/2740/)

Avati

Comment 7 Anand Avati 2010-02-23 04:26:50 UTC

PATCH: http://patches.gluster.com/patch/2807 in master (performance/write-behind: refactor __wb_collapse_write_bufs)

Comment 8 Anand Avati 2010-02-23 04:26:56 UTC

PATCH: http://patches.gluster.com/patch/2808 in master (performance/write-behind: fix data corruption while aggregating the adjacent contiguous iobufs into a single iobuf)

Comment 9 Anand Avati 2010-02-23 04:27:05 UTC

PATCH: http://patches.gluster.com/patch/2809 in release-2.0 (performance/write-behind: refactor __wb_collapse_write_bufs)

Comment 10 Anand Avati 2010-02-23 04:27:09 UTC

PATCH: http://patches.gluster.com/patch/2810 in release-2.0 (performance/write-behind: fix data corruption while aggregating the adjacent contiguous iobufs into a single iobuf)

Comment 11 Anand Avati 2010-02-23 04:39:45 UTC

PATCH: http://patches.gluster.com/patch/2807 in release-3.0 (performance/write-behind: refactor __wb_collapse_write_bufs)

Comment 12 Anand Avati 2010-02-23 04:39:50 UTC

PATCH: http://patches.gluster.com/patch/2808 in release-3.0 (performance/write-behind: fix data corruption while aggregating the adjacent contiguous iobufs into a single iobuf)