Created attachment 653473 [details] The gdb log Description of problem: When the error-gen translator is configured for dht, errors is not properly handled by dht, and leads to crash. (gdb) f 0 #0 0x00007fb9b5924a8a in dht_inode_ctx_time_update (inode=0x7fb9acdb402c, this=<value optimized out>, stat=0x0, post=1) at dht-helper.c:944 944 DHT_UPDATE_TIME(time->mtime, time->mtime_nsec, (gdb) l 939 return -1; 940 } 941 942 time = &ctx->time; 943 944 DHT_UPDATE_TIME(time->mtime, time->mtime_nsec, 945 stat->ia_mtime, stat->ia_mtime_nsec, inode, post); 946 DHT_UPDATE_TIME(time->ctime, time->ctime_nsec, 947 stat->ia_ctime, stat->ia_ctime_nsec, inode, post); 948 DHT_UPDATE_TIME(time->atime, time->atime_nsec, (gdb) down (gdb) up #1 0x00007fb9b593ebc1 in dht_revalidate_cbk (frame=0x7fb9b9417c44, cookie=0x7fb9b9417aec, this=0xe09140, op_ret=<value optimized out>, op_errno=<value optimized out>, inode=<value optimized out>, stbuf=0x0, xattr=0x0, postparent=0x0) at dht-common.c:663 663 dht_inode_ctx_time_update (local->loc.parent, this, (gdb) l 658 local->op_ret = -1; 659 local->op_errno = ESTALE; 660 } 661 662 if (local->loc.parent) { 663 dht_inode_ctx_time_update (local->loc.parent, this, 664 postparent, 1); 665 } 666 667 DHT_STRIP_PHASE1_FLAGS (&local->stbuf); (gdb) p postparent $1 = (struct iatt *) 0x0 (gdb) p op_ret $2 = <value optimized out> (gdb) up #2 0x00007fb9ae04f51d in error_gen_lookup (frame=0x7fb9b9417aec, this=<value optimized out>, loc=<value optimized out>, xdata=<value optimized out>) at error-gen.c:408 408 STACK_UNWIND_STRICT (lookup, frame, -1, op_errno, NULL, NULL, NULL, (gdb) #3 0x00007fb9b593d61f in dht_lookup (frame=0x7fb9b9417c44, this=<value optimized out>, loc=<value optimized out>, xattr_req=<value optimized out>) at dht-common.c:1451 1451 STACK_WIND (frame, dht_revalidate_cbk, (gdb) down #2 0x00007fb9ae04f51d in error_gen_lookup (frame=0x7fb9b9417aec, this=<value optimized out>, loc=<value optimized out>, xdata=<value optimized out>) at error-gen.c:408 408 STACK_UNWIND_STRICT (lookup, frame, -1, op_errno, NULL, NULL, NULL, (gdb) p op_errno $3 = 11 (gdb) quit Version-Release number of selected component (if applicable): How reproducible: frequently Steps to Reproduce: 1.Create a distributed-replicate volume. 2.Modify the vol file as follows: volume test-vol-error-gen-0 type debug/error-gen subvolumes test-vol-replicate-0 end-volume volume test-vol-error-gen-1 type debug/error-gen subvolumes test-vol-replicate-1 end-volume volume test-vol-dht type cluster/distribute subvolumes test-vol-error-gen-0 test-vol-error-gen-1 end-volume 3. On the mount-point, keep on performing the following operation continuously: "cat /etc/hosts > file1" Actual results: Expected results: Additional info:
reducing the priority, as issues are caused by having 'error-gen' in picture. But in general, good to fix it.
Patch is on review http://review.gluster.com/#change,4338.
CHANGE: http://review.gluster.org/4338 (cluster/dht: update ctx-time only if we receive the new iatt) merged in master by Anand Avati (avati)
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.5.0, please reopen this bug report. glusterfs-3.5.0 has been announced on the Gluster Developers mailinglist [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/6137 [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user