Bug 1188242
Summary: | Disperse volume: client crashed while running iozone | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Bhaskarakiran <byarlaga> | ||||||
Component: | disperse | Assignee: | Ashish Pandey <aspandey> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||||
Severity: | unspecified | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | mainline | CC: | bugs, byarlaga, iesool, mzywusko, pkarampu | ||||||
Target Milestone: | --- | Keywords: | Reopened, Triaged | ||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | glusterfs-3.8rc2 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 1219358 1224115 1233632 (view as bug list) | Environment: | |||||||
Last Closed: | 2016-06-16 12:41:05 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 1192378, 1224118 | ||||||||
Bug Blocks: | 1186580, 1219358, 1224115, 1233632 | ||||||||
Attachments: |
|
Description
Bhaskarakiran
2015-02-02 11:56:56 UTC
Created attachment 994660 [details]
client corefile
Log snippet: ============ pending frames: frame : type(1) op(LOOKUP) frame : type(1) op(LOOKUP) frame : type(1) op(FTRUNCATE) frame : type(0) op(0) frame : type(1) op(UNLINK) frame : type(0) op(0) frame : type(0) op(0) frame : type(1) op(FLUSH) frame : type(1) op(STAT) frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2015-02-24 11:41:47 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.7dev /usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x306ae20aa6] /usr/lib64/libglusterfs.so.0(gf_print_trace+0x33f)[0x306ae3bdcf] /lib64/libc.so.6[0x342d4326a0] /usr/lib64/glusterfs/3.7dev/xlator/cluster/distribute.so(dht_writev_cbk+0x268)[0x7f300993cbf8] /usr/lib64/libglusterfs.so.0(default_writev_cbk+0xcc)[0x306ae2e5ec] /usr/lib64/glusterfs/3.7dev/xlator/cluster/disperse.so(ec_manager_writev+0x10d)[0x7f3009b8647d] /usr/lib64/glusterfs/3.7dev/xlator/cluster/disperse.so(__ec_manager+0x34)[0x7f3009b6a654] /usr/lib64/glusterfs/3.7dev/xlator/cluster/disperse.so(ec_resume+0x91)[0x7f3009b6a461] /usr/lib64/glusterfs/3.7dev/xlator/cluster/disperse.so(ec_combine+0x196)[0x7f3009b88fa6] /usr/lib64/glusterfs/3.7dev/xlator/cluster/disperse.so(ec_writev_cbk+0x27b)[0x7f3009b844bb] /usr/lib64/glusterfs/3.7dev/xlator/protocol/client.so(client3_3_writev_cbk+0x6cc)[0x7f3009de301c] /usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)[0x306aa0ea65] /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x142)[0x306aa0ff02] /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x28)[0x306aa0b5f8] /usr/lib64/glusterfs/3.7dev/rpc-transport/socket.so(+0x9759)[0x7f30103fc759] /usr/lib64/glusterfs/3.7dev/rpc-transport/socket.so(+0xb1bd)[0x7f30103fe1bd] /usr/lib64/libglusterfs.so.0[0x306ae78ffc] /lib64/libpthread.so.0[0x342d8079d1] /lib64/libc.so.6(clone+0x6d)[0x342d4e89dd] --------- dht_fsync_cbk() function is being called with op_ret = -1, op_errno = 2 (ENOENT) and postbuf and prebuff is NULL. Inside the function dht_fsync_cbk, skipping the error handling of op_errno = ENOENT ( if (op_ret == -1 && !dht_inode_missing(op_errno)) ) which causes control to go to - if (IS_DHT_MIGRATION_PHASE1 (postbuf)) Macro IS_DHT_MIGRATION_PHASE1 trying to access the attributes of file using postbuf pointer which is NULL. This leads to crash. Bug id 960843 made some changes to not to include op_errno = ENOENT in error handling. Need to investigate the reason to skip op_errno = ENOENT case and also modify marco definitions to handle NULL pointers properly. Ashish, I just realized, on an active fd, fsync should never give ESTALE/ENOENT as the fd is already opened on the file. Why is EC returning this error? This could be ec bug after all? Pranith REVIEW: http://review.gluster.org/10176 (cluster/ec: Use fd instead of loc for get_size_version) posted (#1) for review on master by Ashish Pandey (aspandey) REVIEW: http://review.gluster.org/10176 (cluster/ec: Use fd instead of loc for get_size_version) posted (#2) for review on master by Ashish Pandey (aspandey) REVIEW: http://review.gluster.org/10218 (Comments implemeted) posted (#1) for review on master by Ashish Pandey (aspandey) REVIEW: http://review.gluster.org/10176 (cluster/ec: Use fd instead of loc for get_size_version) posted (#3) for review on master by Ashish Pandey (aspandey) REVIEW: http://review.gluster.org/10176 (cluster/ec: Use fd instead of loc for get_size_version) posted (#4) for review on master by Ashish Pandey (aspandey) REVIEW: http://review.gluster.org/10176 (cluster/ec: Use fd instead of loc for get_size_version) posted (#5) for review on master by Ashish Pandey (aspandey) REVIEW: http://review.gluster.org/10176 (cluster/ec: Use fd instead of loc for get_size_version) posted (#6) for review on master by Ashish Pandey (aspandey) REVIEW: http://review.gluster.org/10176 (cluster/ec: Use fd instead of loc for get_size_version) posted (#7) for review on master by Ashish Pandey (aspandey) COMMIT: http://review.gluster.org/10176 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit 582b252e3a418ee332cf3d4b1a415520e242b599 Author: Ashish Pandey <aspandey> Date: Thu Apr 9 17:27:46 2015 +0530 cluster/ec: Use fd instead of loc for get_size_version Change-Id: Ia7d43cb3b222db34ecb0e35424f1766715ed8e6a BUG: 1188242 Signed-off-by: Ashish Pandey <aspandey> Reviewed-on: http://review.gluster.org/10176 Reviewed-by: Xavier Hernandez <xhernandez> Tested-by: Gluster Build System <jenkins.com> REVIEW: http://review.gluster.org/10625 (cluster/ec: Use fd instead of loc for get_size_version) posted (#1) for review on release-3.7 by Ashish Pandey (aspandey) REVIEW: http://review.gluster.org/11097 (dht: error value check before performing rebalance complete) posted (#1) for review on master by Sakshi Bansal (sabansal) REVIEW: http://review.gluster.org/11097 (dht : Error value check before performing rebalance complete) posted (#2) for review on master by Sakshi Bansal (sabansal) REVIEW: http://review.gluster.org/11307 (quota: allow writes when with ENOENT/ESTALE on active fd) posted (#1) for review on master by Vijaikumar Mallikarjuna (vmallika) REVIEW: http://review.gluster.org/11307 (quota: allow writes when with ENOENT/ESTALE on active fd) posted (#4) for review on master by Vijaikumar Mallikarjuna (vmallika) COMMIT: http://review.gluster.org/11097 committed in master by Raghavendra G (rgowdapp) ------ commit c791765bc84b1ba62203b2b9c1e815944a39921c Author: Sakshi <sabansal> Date: Fri Jun 5 13:48:09 2015 +0530 dht : Error value check before performing rebalance complete Change-Id: I7a0cd288d16f27b887c7820162efdbe99a039d95 BUG: 1188242 Signed-off-by: Sakshi <sabansal> Reviewed-on: http://review.gluster.org/11097 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: N Balachandran <nbalacha> Reviewed-by: Susant Palai <spalai> Reviewed-by: Raghavendra G <rgowdapp> Tested-by: NetBSD Build System <jenkins.org> Tested-by: Raghavendra G <rgowdapp> COMMIT: http://review.gluster.org/11307 committed in master by Raghavendra G (rgowdapp) ------ commit 142cbe0cfe1f0ff64d081f792e33337977ef5562 Author: vmallika <vmallika> Date: Thu Jun 18 12:02:50 2015 +0530 quota: allow writes when with ENOENT/ESTALE on active fd We may get ENOENT/ESTALE in case of below scenario fd = open file.txt unlink file.txt write on fd Here build_ancestry can fail as the file is removed. For now ignore ENOENT/ESTALE on active fd with writev and fallocate. We need to re-visit this code once we understand how other file-system behave in this scenario Below patch fixes the issue in DHT: http://review.gluster.org/#/c/11097 Change-Id: I7be683583b808c280e3ea2ddd036c1558a6d53e5 BUG: 1188242 Signed-off-by: vmallika <vmallika> Reviewed-on: http://review.gluster.org/11307 Tested-by: NetBSD Build System <jenkins.org> Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Raghavendra G <rgowdapp> Fix for this BZ is already present in a GlusterFS release. You can find clone of this BZ, fixed in a GlusterFS release and closed. Hence closing this mainline BZ as well. This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report. glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/ [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user |