Bug 881013 - dht does not handle the errors properly and crashes.
dht does not handle the errors properly and crashes.
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: distribute (Show other bugs)
mainline
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: vpshastry
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-11-28 07:18 EST by Avra Sengupta
Modified: 2014-04-17 07:39 EDT (History)
3 users (show)

See Also:
Fixed In Version: glusterfs-3.5.0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2014-04-17 07:39:35 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
The gdb log (10.76 KB, application/octet-stream)
2012-11-28 07:18 EST, Avra Sengupta
no flags Details

  None (edit)
Description Avra Sengupta 2012-11-28 07:18:48 EST
Created attachment 653473 [details]
The gdb log

Description of problem:
When the error-gen translator is configured for dht, errors is not properly handled by dht, and leads to crash.
(gdb) f 0
#0  0x00007fb9b5924a8a in dht_inode_ctx_time_update (inode=0x7fb9acdb402c, 
    this=<value optimized out>, stat=0x0, post=1) at dht-helper.c:944
944	        DHT_UPDATE_TIME(time->mtime, time->mtime_nsec,
(gdb) l
939	                        return -1;
940	        }
941	
942	        time = &ctx->time;
943	
944	        DHT_UPDATE_TIME(time->mtime, time->mtime_nsec,
945	                        stat->ia_mtime, stat->ia_mtime_nsec, inode, post);
946	        DHT_UPDATE_TIME(time->ctime, time->ctime_nsec,
947	                        stat->ia_ctime, stat->ia_ctime_nsec, inode, post);
948	        DHT_UPDATE_TIME(time->atime, time->atime_nsec,
(gdb) down
(gdb) up
#1  0x00007fb9b593ebc1 in dht_revalidate_cbk (frame=0x7fb9b9417c44, 
    cookie=0x7fb9b9417aec, this=0xe09140, op_ret=<value optimized out>, 
    op_errno=<value optimized out>, inode=<value optimized out>, stbuf=0x0, 
    xattr=0x0, postparent=0x0) at dht-common.c:663
663	                        dht_inode_ctx_time_update (local->loc.parent, this,
(gdb) l
658	                        local->op_ret = -1;
659	                        local->op_errno = ESTALE;
660	                }
661	
662	                if (local->loc.parent) {
663	                        dht_inode_ctx_time_update (local->loc.parent, this,
664	                                                   postparent, 1);
665	                }
666	
667	                DHT_STRIP_PHASE1_FLAGS (&local->stbuf);
(gdb) p postparent
$1 = (struct iatt *) 0x0
(gdb) p op_ret
$2 = <value optimized out>
(gdb) up
#2  0x00007fb9ae04f51d in error_gen_lookup (frame=0x7fb9b9417aec, 
    this=<value optimized out>, loc=<value optimized out>, 
    xdata=<value optimized out>) at error-gen.c:408
408			STACK_UNWIND_STRICT (lookup, frame, -1, op_errno, NULL, NULL, NULL,
(gdb) 
#3  0x00007fb9b593d61f in dht_lookup (frame=0x7fb9b9417c44, 
    this=<value optimized out>, loc=<value optimized out>, 
    xattr_req=<value optimized out>) at dht-common.c:1451
1451				STACK_WIND (frame, dht_revalidate_cbk,
(gdb) down
#2  0x00007fb9ae04f51d in error_gen_lookup (frame=0x7fb9b9417aec, 
    this=<value optimized out>, loc=<value optimized out>, 
    xdata=<value optimized out>) at error-gen.c:408
408			STACK_UNWIND_STRICT (lookup, frame, -1, op_errno, NULL, NULL, NULL,
(gdb) p op_errno
$3 = 11
(gdb) quit


Version-Release number of selected component (if applicable):


How reproducible: frequently


Steps to Reproduce:
1.Create a distributed-replicate volume.
2.Modify the vol file as follows:
volume test-vol-error-gen-0
    type debug/error-gen
    subvolumes test-vol-replicate-0
end-volume

volume test-vol-error-gen-1
    type debug/error-gen
    subvolumes test-vol-replicate-1
end-volume

volume test-vol-dht
    type cluster/distribute
    subvolumes test-vol-error-gen-0 test-vol-error-gen-1
end-volume

3. On the mount-point, keep on performing the following operation continuously:
"cat /etc/hosts > file1"
  
Actual results:


Expected results:


Additional info:
Comment 1 Amar Tumballi 2012-12-06 00:32:38 EST
reducing the priority, as issues are caused by having 'error-gen' in picture. But in general, good to fix it.
Comment 2 vpshastry 2012-12-21 01:52:54 EST
Patch is on review http://review.gluster.com/#change,4338.
Comment 3 Vijay Bellur 2013-01-17 03:28:22 EST
CHANGE: http://review.gluster.org/4338 (cluster/dht: update ctx-time only if we receive the new iatt) merged in master by Anand Avati (avati@redhat.com)
Comment 4 Niels de Vos 2014-04-17 07:39:35 EDT
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.5.0, please reopen this bug report.

glusterfs-3.5.0 has been announced on the Gluster Developers mailinglist [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/6137
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.