Hide Forgot
The Write-behind translator (on client) segfaults with the following backtrace reproducable on a ftp server after some time of operation: pending frames: frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) patchset: v2.0.9 signal received: 11 time of crash: 2010-01-13 21:20:24 configuration details: argp 1 backtrace 1 bdb->cursor->get 1 db.h 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 2.0.9 /lib/libc.so.6[0x7fe87b1a0f40] /lib/libc.so.6(memcpy+0x15b)[0x7fe87b1ec03b] /usr/lib/glusterfs/2.0.9/xlator/performance/write-behind.so(__wb_collapse_write_bufs+0x109)[0x7fe87a1366f9] /usr/lib/glusterfs/2.0.9/xlator/performance/write-behind.so(wb_process_queue+0xb5)[0x7fe87a137035] /usr/lib/glusterfs/2.0.9/xlator/performance/write-behind.so(wb_writev+0x313)[0x7fe87a137af3] /usr/lib/glusterfs/2.0.9/xlator/performance/read-ahead.so(ra_writev+0x124)[0x7fe879f2f744] /usr/lib/glusterfs/2.0.9/xlator/cluster/replicate.so(afr_sh_data_read_cbk+0x1ee)[0x7fe879d19afe] /usr/lib/glusterfs/2.0.9/xlator/performance/read-ahead.so(ra_frame_unwind+0x15e)[0x7fe879f30d9e] /usr/lib/glusterfs/2.0.9/xlator/performance/read-ahead.so(ra_waitq_return+0x1d)[0x7fe879f30e9d] /usr/lib/glusterfs/2.0.9/xlator/performance/read-ahead.so(ra_fault_cbk+0x175)[0x7fe879f31545] /usr/lib/glusterfs/2.0.9/xlator/performance/write-behind.so(wb_readv_cbk+0x8e)[0x7fe87a1377be] /usr/lib/glusterfs/2.0.9/xlator/performance/quick-read.so(qr_readv_cbk+0x3c)[0x7fe87a33e81c] /usr/lib/glusterfs/2.0.9/xlator/performance/io-cache.so(ioc_frame_return+0x263)[0x7fe87a54b6f3] /usr/lib/glusterfs/2.0.9/xlator/performance/io-cache.so(ioc_waitq_return+0x1d)[0x7fe87a54b86d] /usr/lib/glusterfs/2.0.9/xlator/performance/io-cache.so(ioc_fault_cbk+0x1f7)[0x7fe87a54c877] /usr/lib/glusterfs/2.0.9/xlator/protocol/client.so(client_readv_cbk+0x1c7)[0x7fe87a7673b7] /usr/lib/glusterfs/2.0.9/xlator/protocol/client.so(protocol_client_pollin+0xca)[0x7fe87a7570ba] /usr/lib/glusterfs/2.0.9/xlator/protocol/client.so(notify+0xe0)[0x7fe87a75dd20] /usr/lib/glusterfs/2.0.9/transport/socket.so(socket_event_handler+0xe0)[0x7fe87929f200] /usr/lib/libglusterfs.so.0[0x7fe87b90d21f] /usr/sbin/glusterfs(main+0x80b)[0x403fab] /lib/libc.so.6(__libc_start_main+0xe6)[0x7fe87b18d1a6] /usr/sbin/glusterfs[0x402789] removing the translator "solves" the issue....
<merging Gordan Bobic's report - same bug> I've noticed a very high incidence of the problem I reported a while back, that manifests itself in open files getting corrupted on commit, possibly during conditions that involve server disconnections due to timeouts (very high disk load). Specifically, I've noticed that my .viminfo file got corrupted for the 3rd time today. Since this is root's .viminfo, and I'm running glfs as root, I don't have the logs to verify the disconnections, though. From what I can tell, a chunk of a dll somehow ends up in .viminfo, but I'm not sure which one. On a different volume, I'm seeing other weirdness under the same high disk load conditions (software RAID check/resync on all server nodes). This seems to be specifically related to using writebehind+iocache on the client-side on one of he servers, exported via unfsd (the one from the gluster ftp site). What happens is that the /home volume simply seems to disappear underneath unfsd! The attached log indicates a glusterfsd crash. This doesn't happen if I remove the writebehind and io-cache translators. Other notable things about the setup that might help figure out the cause of this: - The other two servers are idle - they are not serving any requests. They are, however, also under the same high disk load. - writebehind and io-cache is only applied on one server the one behing used to export via unfsd. The other servers do not have those translators applied. The volume config is attached. It is called home-cache.vol, but this is the same file the log file refers to even though it is listed there as home.vol. The problem specifically occurs when servers are undergoing high load of the described nature that causes disk latencies to go up massively. I have not observed any instances of a similar crash happening without the writebehind and io-cache translators. Gordan
================================================================================ Version : glusterfs 2.0.9 built on Dec 24 2009 23:13:15 git: v2.0.9 Starting Time: 2010-01-05 11:18:37 Command line : /usr/sbin/glusterfs --log-level=NORMAL --disable-direct-io-mode --volfile=/etc/glusterfs/home.vol /home PID : 14917 System name : Linux Nodename : raiden.shatteredsilicon.net Kernel Release : 2.6.18-164.9.1.el5 Hardware Identifier: x86_64 Given volfile: +------------------------------------------------------------------------------+ 1: volume home1 2: type protocol/client 3: option transport-type socket 4: option transport.address-family inet 5: option remote-host 10.2.0.11 6: option remote-port 6997 7: option remote-subvolume home1 8: end-volume 9: 10: volume home1-writebehind 11: type performance/write-behind 12: option cache-size 2MB # default is equal to aggregate-size 13: option flush-behind on # default is 'off' 14: option enable-O_SYNC on 15: subvolumes home1 16: end-volume 17: 18: volume home1-iocache 19: type performance/io-cache 20: option cache-size 64MB 21: option cache-timeout 2 # default is 1 second 22: subvolumes home1-writebehind 23: end-volume 24: 25: ############################################################################## 26: 27: volume home3 28: type protocol/client 29: option transport-type socket 30: option transport.address-family inet 31: option remote-host 10.2.0.13 32: option remote-port 6997 33: option remote-subvolume home3 34: end-volume 35: 36: volume home3-writebehind 37: type performance/write-behind 38: option cache-size 2MB # default is equal to aggregate-size 39: option flush-behind on # default is 'off' 40: option enable-O_SYNC on 41: subvolumes home3 42: end-volume 43: 44: volume home3-iocache 45: type performance/io-cache 46: option cache-size 64MB 47: option cache-timeout 2 # default is 1 second 48: subvolumes home3-writebehind 49: end-volume 50: 51: ############################################################################## 52: 53: volume home-store 54: type storage/posix 55: option directory /gluster/home 56: end-volume 57: 58: volume home2 59: type features/posix-locks 60: subvolumes home-store 61: end-volume 62: 63: volume home2-writebehind 64: type performance/write-behind 65: option cache-size 2MB # default is equal to aggregate-size 66: option flush-behind on # default is 'off' 67: option enable-O_SYNC on 68: subvolumes home2 69: end-volume 70: 71: volume home2-iocache 72: type performance/io-cache 73: option cache-size 64MB 74: option cache-timeout 2 # default is 1 second 75: subvolumes home2-writebehind 76: end-volume 77: 78: ############################################################################## 79: 80: volume server 81: type protocol/server 82: option transport-type socket 83: option transport.address-family inet 84: option transport.socket.listen-port 6997 85: subvolumes home2 86: option auth.addr.home2.allow 127.0.0.1,10.* 87: end-volume 88: 89: volume home 90: type cluster/replicate 91: subvolumes home2-iocache home1-iocache home3-iocache 92: option read-subvolume home2-iocache 93: option favorite-child home2-iocache 94: end-volume +------------------------------------------------------------------------------+ [2010-01-05 11:18:37] W [afr.c:2436:init] home: You have specified subvolume 'home2-iocache' as the 'favorite child'. This means that if a discrepancy in the content or attributes (ownership, permission, etc.) of a file is detected among the subvolumes, the file on 'home2-iocache' will be considered the definitive version and its contents will OVERWRITE the contents of the file on other subvolumes. All versions of the file except that on 'home2-iocache' WILL BE LOST. [2010-01-05 11:18:37] N [afr.c:2194:notify] home: Subvolume 'home2-iocache' came back up; going online. [2010-01-05 11:18:37] N [afr.c:2194:notify] home: Subvolume 'home2-iocache' came back up; going online. [2010-01-05 11:18:37] N [afr.c:2194:notify] home: Subvolume 'home2-iocache' came back up; going online. [2010-01-05 11:18:37] N [glusterfsd.c:1306:main] glusterfs: Successfully started [2010-01-05 11:18:37] N [client-protocol.c:5733:client_setvolume_cbk] home1: Connected to 10.2.0.11:6997, attached to remote volume 'home1'. [2010-01-05 11:18:37] N [client-protocol.c:5733:client_setvolume_cbk] home1: Connected to 10.2.0.11:6997, attached to remote volume 'home1'. [2010-01-05 11:18:37] N [client-protocol.c:5733:client_setvolume_cbk] home3: Connected to 10.2.0.13:6997, attached to remote volume 'home3'. [2010-01-05 11:18:37] N [client-protocol.c:5733:client_setvolume_cbk] home3: Connected to 10.2.0.13:6997, attached to remote volume 'home3'. [2010-01-05 11:18:37] N [server-protocol.c:7065:mop_setvolume] server: accepted client from 10.2.0.13:1015 [2010-01-05 11:18:37] N [server-protocol.c:7065:mop_setvolume] server: accepted client from 10.2.0.13:1014 [2010-01-05 11:18:47] N [server-protocol.c:7065:mop_setvolume] server: accepted client from 10.2.0.11:1009 [2010-01-05 11:18:47] N [server-protocol.c:7065:mop_setvolume] server: accepted client from 10.2.0.11:1008 [2010-01-05 11:21:05] E [posix.c:3156:do_xattrop] home-store: getxattr failed on /gordan/.gconf/apps/bluetooth-manager while doing xattrop: No such file or directory pending frames: frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) patchset: v2.0.9 signal received: 11 time of crash: 2010-01-05 11:21:06 configuration details: argp 1 backtrace 1 db.h 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 2.0.9 /lib64/libc.so.6[0x3f55e302d0] /lib64/libc.so.6(memcpy+0x15b)[0x3f55e7bf0b] /usr/lib64/glusterfs/2.0.9/xlator/performance/write-behind.so(__wb_collapse_write_bufs+0x105)[0x2ac6b23bc045] /usr/lib64/glusterfs/2.0.9/xlator/performance/write-behind.so(wb_process_queue+0xa8)[0x2ac6b23bcc68] /usr/lib64/glusterfs/2.0.9/xlator/performance/write-behind.so(wb_writev+0x373)[0x2ac6b23bf0e3] /usr/lib64/glusterfs/2.0.9/xlator/performance/io-cache.so(ioc_writev+0x123)[0x2ac6b25c5a13] /usr/lib64/glusterfs/2.0.9/xlator/cluster/replicate.so(afr_sh_data_read_cbk+0x1d6)[0x2ac6b2e2dea6] /usr/lib64/glusterfs/2.0.9/xlator/performance/io-cache.so(ioc_frame_return+0x240)[0x2ac6b25c7390] /usr/lib64/glusterfs/2.0.9/xlator/performance/io-cache.so(ioc_dispatch_requests+0x25b)[0x2ac6b25c4f4b] /usr/lib64/glusterfs/2.0.9/xlator/performance/io-cache.so(ioc_readv+0x1fa)[0x2ac6b25c679a] /usr/lib64/glusterfs/2.0.9/xlator/cluster/replicate.so(afr_sh_data_read_write+0xd7)[0x2ac6b2e2c1c7] /usr/lib64/glusterfs/2.0.9/xlator/cluster/replicate.so(afr_sh_data_read_write_iter+0x5d)[0x2ac6b2e2dc8d] /usr/lib64/glusterfs/2.0.9/xlator/cluster/replicate.so(afr_sh_data_write_cbk+0x91)[0x2ac6b2e2dfb1] /usr/lib64/glusterfs/2.0.9/xlator/performance/io-cache.so(ioc_writev_cbk+0x7a)[0x2ac6b25c5aca] /usr/lib64/glusterfs/2.0.9/xlator/performance/write-behind.so(wb_stack_unwind+0x6a)[0x2ac6b23bbe8a] /usr/lib64/glusterfs/2.0.9/xlator/performance/write-behind.so(wb_do_ops+0x2c)[0x2ac6b23bcb7c] /usr/lib64/glusterfs/2.0.9/xlator/performance/write-behind.so(wb_process_queue+0xf4)[0x2ac6b23bccb4] /usr/lib64/glusterfs/2.0.9/xlator/performance/write-behind.so(wb_writev+0x373)[0x2ac6b23bf0e3] /usr/lib64/glusterfs/2.0.9/xlator/performance/io-cache.so(ioc_writev+0x123)[0x2ac6b25c5a13] /usr/lib64/glusterfs/2.0.9/xlator/cluster/replicate.so(afr_sh_data_read_cbk+0x1d6)[0x2ac6b2e2dea6] /usr/lib64/glusterfs/2.0.9/xlator/performance/io-cache.so(ioc_frame_return+0x240)[0x2ac6b25c7390] /usr/lib64/glusterfs/2.0.9/xlator/performance/io-cache.so(ioc_dispatch_requests+0x25b)[0x2ac6b25c4f4b] /usr/lib64/glusterfs/2.0.9/xlator/performance/io-cache.so(ioc_readv+0x1fa)[0x2ac6b25c679a] /usr/lib64/glusterfs/2.0.9/xlator/cluster/replicate.so(afr_sh_data_read_write+0xd7)[0x2ac6b2e2c1c7] /usr/lib64/glusterfs/2.0.9/xlator/cluster/replicate.so(afr_sh_data_read_write_iter+0x5d)[0x2ac6b2e2dc8d] /usr/lib64/glusterfs/2.0.9/xlator/cluster/replicate.so(afr_sh_data_write_cbk+0x91)[0x2ac6b2e2dfb1] /usr/lib64/glusterfs/2.0.9/xlator/performance/io-cache.so(ioc_writev_cbk+0x7a)[0x2ac6b25c5aca] /usr/lib64/glusterfs/2.0.9/xlator/performance/write-behind.so(wb_stack_unwind+0x6a)[0x2ac6b23bbe8a] /usr/lib64/glusterfs/2.0.9/xlator/performance/write-behind.so(wb_do_ops+0x2c)[0x2ac6b23bcb7c] /usr/lib64/glusterfs/2.0.9/xlator/performance/write-behind.so(wb_process_queue+0xf4)[0x2ac6b23bccb4] /usr/lib64/glusterfs/2.0.9/xlator/performance/write-behind.so(wb_sync_cbk+0xc5)[0x2ac6b23be2b5] /usr/lib64/glusterfs/2.0.9/xlator/protocol/client.so(client_write_cbk+0x14e)[0x2ac6b21b13de] /usr/lib64/glusterfs/2.0.9/xlator/protocol/client.so(protocol_client_pollin+0xca)[0x2ac6b21a18aa] /usr/lib64/glusterfs/2.0.9/xlator/protocol/client.so(notify+0x212)[0x2ac6b21a84e2] /usr/lib64/glusterfs/2.0.9/transport/socket.so(socket_event_handler+0xd3)[0x2aaaaaaafe33] /usr/lib64/libglusterfs.so.0[0x3f56a27115] /usr/sbin/glusterfs(main+0xa06)[0x403e96] /lib64/libc.so.6(__libc_start_main+0xf4)[0x3f55e1d994] /usr/sbin/glusterfs[0x402509] --------- volume home1 type protocol/client option transport-type socket option transport.address-family inet option remote-host 10.2.0.11 option remote-port 6997 option remote-subvolume home1 end-volume volume home1-writebehind type performance/write-behind option cache-size 2MB # default is equal to aggregate-size option flush-behind on # default is 'off' option enable-O_SYNC on subvolumes home1 end-volume volume home1-iocache type performance/io-cache option cache-size 64MB option cache-timeout 2 # default is 1 second subvolumes home1-writebehind end-volume ############################################################################## volume home3 type protocol/client option transport-type socket option transport.address-family inet option remote-host 10.2.0.13 option remote-port 6997 option remote-subvolume home3 end-volume volume home3-writebehind type performance/write-behind option cache-size 2MB # default is equal to aggregate-size option flush-behind on # default is 'off' option enable-O_SYNC on subvolumes home3 end-volume volume home3-iocache type performance/io-cache option cache-size 64MB option cache-timeout 2 # default is 1 second subvolumes home3-writebehind end-volume ############################################################################## volume home-store type storage/posix option directory /gluster/home end-volume volume home2 type features/posix-locks subvolumes home-store end-volume volume home2-writebehind type performance/write-behind option cache-size 2MB # default is equal to aggregate-size option flush-behind on # default is 'off' option enable-O_SYNC on subvolumes home2 end-volume volume home2-iocache type performance/io-cache option cache-size 64MB option cache-timeout 2 # default is 1 second subvolumes home2-writebehind end-volume ############################################################################## volume server type protocol/server option transport-type socket option transport.address-family inet option transport.socket.listen-port 6997 subvolumes home2 option auth.addr.home2.allow 127.0.0.1,10.* end-volume volume home type cluster/replicate subvolumes home2-iocache home1-iocache home3-iocache option read-subvolume home2-iocache option favorite-child home2-iocache end-volume
<from email by Gordan Bobic> Core was generated by `/usr/sbin/glusterfs --log-level=NORMAL --disable-direct-io-mode --volfile=/etc/'. Program terminated with signal 11, Segmentation fault. [New process 14917] [New process 14921] [New process 14918] #0 0x0000003f55e7bf0b in memcpy () from /lib64/libc.so.6 Thread 3 (process 14918): #0 0x0000003f55e9a0b1 in nanosleep () from /lib64/libc.so.6 No symbol table info available. #1 0x0000003f55ecd684 in usleep () from /lib64/libc.so.6 No symbol table info available. #2 0x0000003f56a1a8b4 in gf_timer_proc (ctx=<value optimized out>) at timer.c:177 now = <value optimized out> now_tv = Could not find the frame base for "gf_timer_proc". event = <value optimized out> reg = <value optimized out> __FUNCTION__ = "gf_timer_proc" #3 0x0000003f56606617 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #4 0x0000003f55ed3c2d in clone () from /lib64/libc.so.6 No symbol table info available. Thread 2 (process 14921): #0 0x0000003f5660d73b in read () from /lib64/libpthread.so.0 No symbol table info available. #1 0x0000003f5820ec9a in ?? () from /usr/lib64/libfuse.so.2 No symbol table info available. #2 0x0000003f58212650 in fuse_chan_receive () from /usr/lib64/libfuse.so.2 No symbol table info available. #3 0x00002ac6b304a230 in fuse_thread_proc (data=<value optimized out>) at fuse-bridge.c:2541 mount_point = <value optimized out> this = (xlator_t *) 0xa11cdc0 priv = (fuse_private_t *) 0xa12b050 res = 62 iobuf = (struct iobuf *) 0x2aaaac0836b0 chan_size = 135168 ret = <value optimized out> now = {tv_sec = 1262690317, tv_usec = 407596} timeout = {tv_sec = 1262690318, tv_nsec = 407596000} __FUNCTION__ = "fuse_thread_proc" #4 0x0000003f56606617 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #5 0x0000003f55ed3c2d in clone () from /lib64/libc.so.6 No symbol table info available. Thread 1 (process 14917): #0 0x0000003f55e7bf0b in memcpy () from /lib64/libc.so.6 No symbol table info available. #1 0x00002ac6b23bc045 in __wb_collapse_write_bufs (requests=0x2aaab40055b8, page_size=131072) at /usr/include/bits/string3.h:51 offset_expected = <value optimized out> space_left = 65536 iov_len = (size_t *) 0x2aaab4012098 write_size = (size_t *) 0x2aaab4010328 ptr = 0x2aaab1ffd000 "\n\n<\001D\004\223" first_request = <value optimized out> request = (wb_request_t *) 0x2aaab402acb0 tmp = (wb_request_t *) 0x2aaab40055b8 #2 0x00002ac6b23bcc68 in wb_process_queue (frame=0x2aaab4020ba0, file=0x2aaab4005580, flush_all=0 '\0') at write-behind.c:1436 winds = {next = 0x7ffffccbff40, prev = 0x7ffffccbff40} unwinds = {next = 0x2aaab402acd0, prev = 0x2aaab402acd0} other_requests = {next = 0x7ffffccbff20, prev = 0x7ffffccbff20} size = 131072 conf = (wb_conf_t *) 0xa129b60 count = <value optimized out> #3 0x00002ac6b23bf0e3 in wb_writev (frame=0x2aaab402f630, this=<value optimized out>, fd=0x2aaab4004170, vector=0x2aaab4025260, count=1, offset=6553600, iobref=0x2aaab4025940) at write-behind.c:1550 file = (wb_file_t *) 0x2aaab1fbf000 process_frame = (call_frame_t *) 0x2aaab4020ba0 size = 169906720 tmp_file = 46912652727680 stub = <value optimized out> op_ret = <value optimized out> op_errno = 0 __FUNCTION__ = "wb_writev" #4 0x00002ac6b25c5a13 in ioc_writev (frame=0x2aaab40278a0, this=0xa1227a0, fd=0x2aaab4004170, vector=0x2aaab4025260, count=1, offset=6553600, iobref=0x2aaab4025940) at io-cache.c:944 _new = (call_frame_t *) 0x2aaab2000000 ioc_inode = 46912652727808 #5 0x00002ac6b2e2dea6 in afr_sh_data_read_cbk (frame=0xa208c90, cookie=<value optimized out>, this=<value optimized out>, op_ret=<value optimized out>, op_errno=<value optimized out>, vector=0x2aaab4025260, count=1, buf=0x7ffffccc0120, iobref=0x2aaab4025940) at afr-self-heal-data.c:589 _new = (call_frame_t *) 0x2aaab2000000 local = (afr_local_t *) 0xa205730 sh = (afr_self_heal_t *) 0xa206cb8 call_count = 2 offset = 6553600 __FUNCTION__ = "afr_sh_data_read_cbk" #6 0x00002ac6b25c7390 in ioc_frame_return (frame=0x2aaab40275d0) at page.c:671 local = <value optimized out> wait_count = <value optimized out> __PRETTY_FUNCTION__ = "ioc_frame_return" __FUNCTION__ = "ioc_frame_return" #7 0x00002ac6b25c4f4b in ioc_dispatch_requests (frame=0x2aaab40275d0, ioc_inode=0x2aaab4005050, fd=0x2aaab4004170, offset=6553600, size=<value optimized out>) at io-cache.c:809 local = <value optimized out> table = (ioc_table_t *) 0xa128520 trav = (ioc_page_t *) 0x2aaab40258a0 waitq = (ioc_waitq_t *) 0x0 rounded_offset = 46912652859552 rounded_end = 6684672 trav_offset = 6684672 fault = 1 local_offset = 6553600 need_validate = 0 '\0' might_need_validate = 0 '\0' __FUNCTION__ = "ioc_dispatch_requests" #8 0x00002ac6b25c679a in ioc_readv (frame=0x2aaab40275d0, this=<value optimized out>, fd=0x2aaab4004170, size=65536, offset=6553600) at io-cache.c:882 tmp_ioc_inode = 46912652726352 ioc_inode = (ioc_inode_t *) 0x2aaab1fbf000 weight = 1 __FUNCTION__ = "ioc_readv" #9 0x00002ac6b2e2c1c7 in afr_sh_data_read_write (frame=0xa208c90, this=<value optimized out>) at afr-self-heal-data.c:616 _new = (call_frame_t *) 0x2aaab2000000 sh = (afr_self_heal_t *) 0xa206cb8 #10 0x00002ac6b2e2dc8d in afr_sh_data_read_write_iter (frame=0xa208c90, this=0xa126310) at afr-self-heal-data.c:652 local = (afr_local_t *) 0x10000 sh = (afr_self_heal_t *) 0x2aaab1ffd000 __FUNCTION__ = "afr_sh_data_read_write_iter" #11 0x00002ac6b2e2dfb1 in afr_sh_data_write_cbk (frame=0xa208c90, cookie=<value optimized out>, this=0xa126310, op_ret=<value optimized out>, op_errno=0, buf=<value optimized out>) at afr-self-heal-data.c:529 local = (afr_local_t *) 0xa205730 sh = (afr_self_heal_t *) 0xa206cb8 child_index = 2 call_count = -1308635136 __FUNCTION__ = "afr_sh_data_write_cbk" #12 0x00002ac6b25c5aca in ioc_writev_cbk (frame=0x2aaab4029c90, cookie=<value optimized out>, this=<value optimized out>, op_ret=65536, op_errno=0, stbuf=0x7ffffccc03e0) at io-cache.c:910 fn = (int32_t (*)(call_frame_t *, call_frame_t *, xlator_t *, int32_t, int32_t, ...)) 0x2aaab4005580 _parent = (call_frame_t *) 0x2aaab2000000 ioc_inode = 46912652757216 #13 0x00002ac6b23bbe8a in wb_stack_unwind (unwinds=0x7ffffccc0500) at write-behind.c:1280 fn = (int32_t (*)(call_frame_t *, call_frame_t *, xlator_t *, int32_t, int32_t, ...)) 0x2aaab4005580 _parent = (call_frame_t *) 0x2aaab2000000 buf = {st_dev = 0, st_ino = 0, st_nlink = 0, st_mode = 0, st_uid = 0, st_gid = 0, pad0 = 0, st_rdev = 0, st_size = 0, st_blksize = 0, st_blocks = 0, st_atim = {tv_sec = 0, tv_nsec = 0}, st_mtim = {tv_sec = 0, tv_nsec = 0}, st_ctim = {tv_sec = 0, tv_nsec = 0}, __unused = {0, 0, 0}} request = (wb_request_t *) 0x2aaab400ab20 dummy = <value optimized out> frame = <value optimized out> local = <value optimized out> #14 0x00002ac6b23bcb7c in wb_do_ops (frame=0x2aaab4011010, file=0x2aaab1fbf000, winds=0x10000, unwinds=0xa00, other_requests=0x2aaab402acb0) at write-behind.c:1335 No locals. #15 0x00002ac6b23bccb4 in wb_process_queue (frame=0x2aaab4011010, file=0x2aaab400c6b0, flush_all=0 '\0') at write-behind.c:1451 winds = {next = 0x7ffffccc0510, prev = 0x7ffffccc0510} unwinds = {next = 0x2aaab400ab40, prev = 0x2aaab400ab40} other_requests = {next = 0x7ffffccc04f0, prev = 0x7ffffccc04f0} size = 131072 conf = (wb_conf_t *) 0xa12ab30 count = <value optimized out> #16 0x00002ac6b23bf0e3 in wb_writev (frame=0x2aaab4027750, this=<value optimized out>, fd=0x2aaab4004170, vector=0x2aaab4019420, count=1, offset=6488064, iobref=0x2aaab4006ed0) at write-behind.c:1550 file = (wb_file_t *) 0x2aaab1fbf000 process_frame = (call_frame_t *) 0x2aaab4011010 size = 169906720 tmp_file = 46912652756656 stub = <value optimized out> op_ret = <value optimized out> op_errno = 0 __FUNCTION__ = "wb_writev" #17 0x00002ac6b25c5a13 in ioc_writev (frame=0x2aaab4029c90, this=0xa123b00, fd=0x2aaab4004170, vector=0x2aaab4019420, count=1, offset=6488064, iobref=0x2aaab4006ed0) at io-cache.c:944 _new = (call_frame_t *) 0x2aaab2000000 ioc_inode = 46912652757216 #18 0x00002ac6b2e2dea6 in afr_sh_data_read_cbk (frame=0xa208c90, cookie=<value optimized out>, this=<value optimized out>, op_ret=<value optimized out>, op_errno=<value optimized out>, vector=0x2aaab4019420, count=1, buf=0x7ffffccc06f0, iobref=0x2aaab4006ed0) at afr-self-heal-data.c:589 _new = (call_frame_t *) 0x2aaab2000000 local = (afr_local_t *) 0xa205730 sh = (afr_self_heal_t *) 0xa206cb8 call_count = 1 offset = 6488064 __FUNCTION__ = "afr_sh_data_read_cbk" #19 0x00002ac6b25c7390 in ioc_frame_return (frame=0x2aaab400aec0) at page.c:671 local = <value optimized out> wait_count = <value optimized out> __PRETTY_FUNCTION__ = "ioc_frame_return" __FUNCTION__ = "ioc_frame_return" #20 0x00002ac6b25c4f4b in ioc_dispatch_requests (frame=0x2aaab400aec0, ioc_inode=0x2aaab4005050, fd=0x2aaab4004170, offset=6488064, size=<value optimized out>) at io-cache.c:809 local = <value optimized out> table = (ioc_table_t *) 0xa128520 trav = (ioc_page_t *) 0x2aaab4024eb0 waitq = (ioc_waitq_t *) 0x2aaab401f380 rounded_offset = 46912652857008 rounded_end = 6553600 trav_offset = 6553600 fault = 0 local_offset = 6488064 need_validate = 0 '\0' might_need_validate = 0 '\0' __FUNCTION__ = "ioc_dispatch_requests" #21 0x00002ac6b25c679a in ioc_readv (frame=0x2aaab400aec0, this=<value optimized out>, fd=0x2aaab4004170, size=65536, offset=6488064) at io-cache.c:882 tmp_ioc_inode = 46912652726352 ioc_inode = (ioc_inode_t *) 0x2aaab1fbf000 weight = 1 __FUNCTION__ = "ioc_readv" #22 0x00002ac6b2e2c1c7 in afr_sh_data_read_write (frame=0xa208c90, this=<value optimized out>) at afr-self-heal-data.c:616 _new = (call_frame_t *) 0x2aaab2000000 sh = (afr_self_heal_t *) 0xa206cb8 #23 0x00002ac6b2e2dc8d in afr_sh_data_read_write_iter (frame=0xa208c90, this=0xa126310) at afr-self-heal-data.c:652 local = (afr_local_t *) 0x10000 sh = (afr_self_heal_t *) 0x2aaab1ffd000 __FUNCTION__ = "afr_sh_data_read_write_iter" #24 0x00002ac6b2e2dfb1 in afr_sh_data_write_cbk (frame=0xa208c90, cookie=<value optimized out>, this=0xa126310, op_ret=<value optimized out>, op_errno=0, buf=<value optimized out>) at afr-self-heal-data.c:529 local = (afr_local_t *) 0xa205730 sh = (afr_self_heal_t *) 0xa206cb8 child_index = 2 call_count = -1308635136 __FUNCTION__ = "afr_sh_data_write_cbk" #25 0x00002ac6b25c5aca in ioc_writev_cbk (frame=0x2aaab40290e0, cookie=<value optimized out>, this=<value optimized out>, op_ret=65536, op_errno=0, stbuf=0x7ffffccc09b0) at io-cache.c:910 fn = (int32_t (*)(call_frame_t *, call_frame_t *, xlator_t *, int32_t, int32_t, ...)) 0x2aaab4005580 _parent = (call_frame_t *) 0x2aaab2000000 ioc_inode = 46912652757216 #26 0x00002ac6b23bbe8a in wb_stack_unwind (unwinds=0x7ffffccc0ad0) at write-behind.c:1280 fn = (int32_t (*)(call_frame_t *, call_frame_t *, xlator_t *, int32_t, int32_t, ...)) 0x2aaab4005580 _parent = (call_frame_t *) 0x2aaab2000000 buf = {st_dev = 0, st_ino = 0, st_nlink = 0, st_mode = 0, st_uid = 0, st_gid = 0, pad0 = 0, st_rdev = 0, st_size = 0, st_blksize = 0, st_blocks = 0, st_atim = {tv_sec = 0, tv_nsec = 0}, st_mtim = {tv_sec = 0, tv_nsec = 0}, st_ctim = {tv_sec = 0, tv_nsec = 0}, __unused = {0, 0, 0}} request = (wb_request_t *) 0x2aaab402f5c0 dummy = <value optimized out> frame = <value optimized out> local = <value optimized out> #27 0x00002ac6b23bcb7c in wb_do_ops (frame=0x2aaab4026150, file=0x2aaab1fbf000, winds=0x10000, unwinds=0xa00, other_requests=0x2aaab402acb0) at write-behind.c:1335 No locals. #28 0x00002ac6b23bccb4 in wb_process_queue (frame=0x2aaab4026150, file=0x2aaab400c6b0, flush_all=0 '\0') at write-behind.c:1451 winds = {next = 0x7ffffccc0ae0, prev = 0x7ffffccc0ae0} unwinds = {next = 0x2aaab402f5e0, prev = 0x2aaab402f5e0} other_requests = {next = 0x7ffffccc0ac0, prev = 0x7ffffccc0ac0} size = 131072 conf = (wb_conf_t *) 0xa12ab30 count = <value optimized out> #29 0x00002ac6b23be2b5 in wb_sync_cbk (frame=0x2aaab4026150, cookie=<value optimized out>, this=<value optimized out>, op_ret=131072, op_errno=0, stbuf=<value optimized out>) at write-behind.c:345 winds = (list_head_t *) 0x2aaab40260d0 file = (wb_file_t *) 0x2aaab400c6b0 request = <value optimized out> dummy = <value optimized out> per_request_local = <value optimized out> fd = (fd_t *) 0x2aaab4004170 #30 0x00002ac6b21b13de in client_write_cbk (frame=0x2aaab40261b0, hdr=0x2aaab40054b0, hdrlen=<value optimized out>, iobuf=<value optimized out>) at client-protocol.c:4530 fn = (int32_t (*)(call_frame_t *, call_frame_t *, xlator_t *, int32_t, int32_t, ...)) 0x2aaab4005580 _parent = (call_frame_t *) 0x2aaab2000000 op_ret = 131072 op_errno = 0 stbuf = {st_dev = 2306, st_ino = 298025213, st_nlink = 1, st_mode = 33188, st_uid = 1001, st_gid = 1001, pad0 = 0, st_rdev = 0, st_size = 18092032, st_blksize = 4096, st_blocks = 35392, st_atim = {tv_sec = 1262641112, tv_nsec = 0}, st_mtim = { tv_sec = 1262690466, tv_nsec = 0}, st_ctim = {tv_sec = 1262690466, tv_nsec = 0}, __unused = {0, 0, 0}} #31 0x00002ac6b21a18aa in protocol_client_pollin (this=0xa123150, trans=0xa12a110) at client-protocol.c:6347 conf = (client_conf_t *) 0xa129c60 ret = 0 iobuf = (struct iobuf *) 0x0 hdr = 0x2aaab40054b0 "" hdrlen = 108 #32 0x00002ac6b21a84e2 in notify (this=0x2aaab2000000, event=2, data=0xa12a110) at client-protocol.c:6390 ret = <value optimized out> child_down = <value optimized out> was_not_down = <value optimized out> trans = (transport_t *) 0x2aaab1fbf000 conn = <value optimized out> conf = (client_conf_t *) 0xa129c60 parent = <value optimized out> __FUNCTION__ = "notify" #33 0x00002aaaaaaafe33 in socket_event_handler (fd=<value optimized out>, idx=2, data=0xa12a110, poll_in=1, poll_out=0, poll_err=0) at socket.c:814 this = (transport_t *) 0x2aaab2000000 priv = (socket_private_t *) 0xa12a410 ret = -1308635136 #34 0x0000003f56a27115 in event_dispatch_epoll (event_pool=<value optimized out>) at event.c:804 events = <value optimized out> i = <value optimized out> ret = <value optimized out> __FUNCTION__ = "event_dispatch_epoll" #35 0x0000000000403e96 in main (argc=5, argv=0x7ffffccc1ac8) at glusterfsd.c:1317 ctx = <value optimized out> cmd_args = <value optimized out> stbuf = {st_dev = 0, st_ino = 140737434619872, st_nlink = 0, st_mode = 1436594605, st_uid = 63, st_gid = 0, pad0 = 0, st_rdev = 1261711580, st_size = 0, st_blksize = 272019541654, st_blocks = 140737434621392, st_atim = {tv_sec = 272019534192, tv_nsec = 140737434621455}, st_mtim = {tv_sec = 140737434621440, tv_nsec = 140737434621432}, st_ctim = {tv_sec = 272021703448, tv_nsec = 1}, __unused = {0, 6323616, 272023768447}} tmp_logfile = '\0' <repeats 1023 times> timestr = '\0' <repeats 255 times> utime = 1262690317 tm = <value optimized out> ret = 0 lim = {rlim_cur = 18446744073709551615, rlim_max = 18446744073709551615} specfp = <value optimized out> graph = (xlator_t *) 0x0 trav = <value optimized out> fuse_volume_found = 0 xl_count = <value optimized out> pipe_fd = {6, 7} gf_success = 0 gf_failure = -1 __FUNCTION__ = "main"
There are two variants of the core issue. First variant is this crash. The second, scarier, is when file data of other files are silently overwritten. The cause is due to write-behind getting write IO buffer served from io-cache - this causes internal offset calculations to go wrong. In the normal scenario, IO buffers are always generated from fuse or protocol and the write data will coincide with IO buffer (assumption which write-behind makes). In fact, just removing io-cache below replicate should fix this corruption issue (since that will cause IO buffers to be generated from protocol/client of the source server, and not use a cached IO buffer).
Does that mean that this can (and will be) fixed? Or is the fix going to be detecting iocache below replicate and aborting glusterfsd start?
(In reply to comment #5) > Does that mean that this can (and will be) fixed? Or is the fix going to be > detecting iocache below replicate and aborting glusterfsd start? We're working on a fix and the patch is getting reviewed internally (http://patches.gluster.com/patch/2740/) Avati
PATCH: http://patches.gluster.com/patch/2807 in master (performance/write-behind: refactor __wb_collapse_write_bufs)
PATCH: http://patches.gluster.com/patch/2808 in master (performance/write-behind: fix data corruption while aggregating the adjacent contiguous iobufs into a single iobuf)
PATCH: http://patches.gluster.com/patch/2809 in release-2.0 (performance/write-behind: refactor __wb_collapse_write_bufs)
PATCH: http://patches.gluster.com/patch/2810 in release-2.0 (performance/write-behind: fix data corruption while aggregating the adjacent contiguous iobufs into a single iobuf)
PATCH: http://patches.gluster.com/patch/2807 in release-3.0 (performance/write-behind: refactor __wb_collapse_write_bufs)
PATCH: http://patches.gluster.com/patch/2808 in release-3.0 (performance/write-behind: fix data corruption while aggregating the adjacent contiguous iobufs into a single iobuf)