Bug 1343906
Summary: | [Stress/Scale] : I/O errors out from gNFS mount points during high load on an erasure coded volume,Logs flooded with Error messages. | |||
---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Pranith Kumar K <pkarampu> | |
Component: | posix | Assignee: | Pranith Kumar K <pkarampu> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | ||
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | mainline | CC: | asoman, bugs, jahernan, nbalacha, pkarampu, rcyriac, rhs-bugs | |
Target Milestone: | --- | Keywords: | Triaged | |
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.9.0 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | 1343178 | |||
: | 1360138 (view as bug list) | Environment: | ||
Last Closed: | 2017-03-27 18:24:03 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1343178 | |||
Bug Blocks: | 1343695, 1360138, 1360140 |
Comment 1
Xavi Hernandez
2016-06-08 10:30:19 UTC
REVIEW: http://review.gluster.org/14669 (storage/posix: Give correct errno for anon-fd operations) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu) Description of problem: ------------------------ Had a 1*(4+2) volume.Added bricks,scaled it till 3*(4+2).Ran rebalance each time. In the meantime,I/O errored out on 2 of my clients : dd: error writing ‘stress3’: Input/output error 8399+0 records in 8398+0 records out Untaring the tarball failed as well. Details about sos,the exact workload as well as error messages from logs in comments How reproducible: ----------------- Reporting the first occurrence. Steps to Reproduce: ------------------- 1. Create an EC volume.Mount it on multiple clients via gNFS.Add bricks,rebalance. 2. Run all kinds of I/O from various mount points 3. Check for errors in logs/application side. Actual results: --------------- I/O errors out. Logs flooded with error messages. Expected results: ----------------- I/Os on the application side should not be affected. Additional info: ---------------- [root@gqas013 glusterfs]# gluster v info Volume Name: drogon Type: Distributed-Disperse Volume ID: 6d49ee45-1048-4325-96fb-c14ac5e278e8 Status: Started Number of Bricks: 3 x (4 + 2) = 18 Transport-type: tcp Bricks: Brick1: gqas013.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brickA Brick2: gqas011.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brickB Brick3: gqas005.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brickC Brick4: gqas006.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brickD Brick5: gqas013.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brickE Brick6: gqas011.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brickF Brick7: gqas013.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brickG Brick8: gqas011.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brickH Brick9: gqas005.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brickI Brick10: gqas006.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brickJ Brick11: gqas013.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brickK Brick12: gqas011.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brickL Brick13: gqas013.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brickM Brick14: gqas011.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brickN Brick15: gqas005.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brickO Brick16: gqas006.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brickP Brick17: gqas013.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brickQ Brick18: gqas011.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brickR Options Reconfigured: performance.readdir-ahead: on [root@gqas013 glusterfs]# A ) From the rebalance log : I see a lot of assertion failed messages here : [2016-06-06 15:10:53.999653] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.7.9/xlator/cluster/disperse.so(ec_resume+0x91) [0x7fd263b5e621] -->/usr/lib64/glusterfs/3.7.9/xlator/cluster/disperse.so(__ec_manager+0x57) [0x7fd263b5e807] -->/usr/lib64/glusterfs/3.7.9/xlator/cluster/disperse.so(ec_manager_setattr+0x2c6) [0x7fd263b7be76] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size) [2016-06-06 15:10:54.003509] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.7.9/xlator/cluster/disperse.so(ec_resume+0x91) [0x7fd263b5e621] -->/usr/lib64/glusterfs/3.7.9/xlator/cluster/disperse.so(__ec_manager+0x57) [0x7fd263b5e807] -->/usr/lib64/glusterfs/3.7.9/xlator/cluster/disperse.so(ec_manager_setattr+0x2c6) [0x7fd263b7be76] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size) [2016-06-06 15:10:54.012540] E [ec-inode-write.c:395:ec_manager_setattr] (-->/usr/lib64/glusterfs/3.7.9/xlator/cluster/disperse.so(ec_resume+0x91) [0x7fd263b5e621] -->/usr/lib64/glusterfs/3.7.9/xlator/cluster/disperse.so(__ec_manager+0x57) [0x7fd263b5e807] -->/usr/lib64/glusterfs/3.7.9/xlator/cluster/disperse.so(ec_manager_setattr+0x2c6) [0x7fd263b7be76] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size) [2016-06-06 15:13:15.333800] E [MSGID: 109023] [dht-rebalance.c:2330:gf_defrag_get_entry] 0-drogon-dht: Migrate file failed:/d4/linux-4.6.1/Documentation/arm/Porting lookup failed B ) The brick logs are literally flooded with bad file descriptor errors : bricks/bricks-testvol_brickA.log:[2016-06-06 15:09:58.275139] E [MSGID: 115073] [server-rpc-fops.c:1907:server_fxattrop_cbk] 0-drogon-server: 16513: FXATTROP -2 (c812a0d6-a85d-41f6-a7e6-51630285ec60) ==> (File descriptor in bad state) [File descriptor in bad state] bricks/bricks-testvol_brickA.log:[2016-06-06 15:09:58.275220] E [MSGID: 113040] [posix-helpers.c:1686:__posix_fd_ctx_get] 0-drogon-posix: Failed to get anonymous real_path: /bricks/testvol_brickA/.glusterfs/14/c7/14c711cb-11c7-4c8e-8893-de71849e9eb9 _fd = -1 [No such file or directory] bricks/bricks-testvol_brickA.log:[2016-06-06 15:09:58.275290] E [MSGID: 115073] [server-rpc-fops.c:1907:server_fxattrop_cbk] 0-drogon-server: 16514: FXATTROP -2 (14c711cb-11c7-4c8e-8893-de71849e9eb9) ==> (File descriptor in bad state) [File descriptor in bad state] bricks/bricks-testvol_brickA.log:[2016-06-06 15:09:58.275393] E [MSGID: 113040] [posix-helpers.c:1686:__posix_fd_ctx_get] 0-drogon-posix: Failed to get anonymous real_path: /bricks/testvol_brickA/.glusterfs/80/3b/803b05cb-4e6f-457c-88b1-2a608530757c _fd = -1 [No such file or directory] bricks/bricks-testvol_brickA.log:[2016-06-06 15:09:58.275468] E [MSGID: 115073] [server-rpc-fops.c:1907:server_fxattrop_cbk] 0-drogon-server: 16515: FXATTROP -2 (803b05cb-4e6f-457c-88b1-2a608530757c) ==> (File descriptor in bad state) [File descriptor in bad state] bricks/bricks-testvol_brickA.log:[2016-06-06 15:09:58.275652] E [MSGID: 113040] [posix-helpers.c:1686:__posix_fd_ctx_get] 0-drogon-posix: Failed to get anonymous real_path: /bricks/testvol_brickA/.glusterfs/8b/fe/8bfe4154-b137-4a84-a265-b541574adca9 _fd = -1 [No such file or directory] bricks/bricks-testvol_brickA.log:[2016-06-06 15:09:58.275714] E [MSGID: 115073] [server-rpc-fops.c:1907:server_fxattrop_cbk] 0-drogon-server: 16516: FXATTROP -2 (8bfe4154-b137-4a84-a265-b541574adca9) ==> (File descriptor in bad state) [File descriptor in bad state] bricks/bricks-testvol_brickA.log:[2016-06-06 15:09:58.275786] E [MSGID: 113040] [posix-helpers.c:1686:__posix_fd_ctx_get] 0-drogon-posix: Failed to get anonymous real_path: /bricks/testvol_brickA/.glusterfs/49/c3/49c3cbec-3b03-4ec2-968a-89e8bda401c6 _fd = -1 [No such file or directory] bricks/bricks-testvol_brickA.log:[2016-06-06 15:09:58.275851] E [MSGID: 115073] [server-rpc-fops.c:1907:server_fxattrop_cbk] 0-drogon-server: 16518: FXATTROP -2 (49c3cbec-3b03-4ec2-968a-89e8bda401c6) ==> (File descriptor in bad state) [File descriptor in bad state] bricks/bricks-testvol_brickA.log:[2016-06-06 15:09:58.275979] E [MSGID: 113040] [posix-helpers.c:1686:__posix_fd_ctx_get] 0-drogon-posix: Failed to get anonymous real_path: /bricks/testvol_brickA/.glusterfs/f8/6f/f86fe79c-7ad0-4d9c-8e5d-26ec63f14696 _fd = -1 [No such file or directory] bricks/bricks-testvol_brickA.log:[2016-06-06 15:09:58.276039] E [MSGID: 115073] [server-rpc-fops.c:1907:server_fxattrop_cbk] 0-drogon-server: 16519: FXATTROP -2 (f86fe79c-7ad0-4d9c-8e5d-26ec63f14696) ==> (File descriptor in bad state) [File descriptor in bad state] bricks/bricks-testvol_brickA.log:[2016-06-06 15:09:58.276223] E [MSGID: 113040] [posix-helpers.c:1686:__posix_fd_ctx_get] 0-drogon-posix: Failed to get anonymous real_path: /bricks/testvol_brickA/.glusterfs/c4/8a/c48aff49-de3a-46e8-9445-8064c4ca19c5 _fd = -1 [No such file or directory] Xavi, I only fixed the EBADFD part of the bug. We still need to find why Assertions are coming as well as the reason for I/O failure. Pranith That assertion should be solved by http://review.gluster.org/14112. It was caused by the execution of a fop that required an inodelk while the inodelk was not really taken. That build had this patch :-/ REVIEW: http://review.gluster.org/14669 (storage/posix: Give correct errno for anon-fd operations) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu) COMMIT: http://review.gluster.org/14669 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit d5088c056d5aee1bda2997ad5835379465fed3a1 Author: Pranith Kumar K <pkarampu> Date: Tue Jun 7 21:27:10 2016 +0530 storage/posix: Give correct errno for anon-fd operations Change-Id: Ia9e61d3baa6881eb7dc03dd8ddb6bfdde5a01958 BUG: 1343906 Signed-off-by: Pranith Kumar K <pkarampu> Reviewed-on: http://review.gluster.org/14669 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Raghavendra G <rgowdapp> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.9.0, please open a new bug report. glusterfs-3.9.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/gluster-users/2016-November/029281.html [2] https://www.gluster.org/pipermail/gluster-users/ |