+++ This bug was initially created as a clone of Bug #1331254 +++ +++ This bug was initially created as a clone of Bug #1330132 +++ Description of problem: A distributed iozone test over multiple NFS mounts on different machines causes the test to fail and some assertion failures appear on the logs: [2016-04-21 19:29:58.096645] E [ec-inode-read.c:1157:ec_readv_rebuild] (-->/usr/lib64/glusterfs/3.7.10/xlator/cluster/disperse.so(__ec_manager+0x5b) [0x7f9e4e8f18bb] -->/usr/lib64/glusterfs/3.7.10/xlator/cluster/disperse.so(ec_manager_readv+0x107) [0x7f9e4e908197] -->/usr/lib64/glusterfs/3.7.10/xlator/cluster/disperse.so(ec_readv_rebuild+0x236) [0x7f9e4e907f26] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->fd->inode, &cbk->iatt[0].ia_size) [2016-04-21 19:29:58.126547] E [ec-common.c:1641:ec_lock_unfreeze] (-->/usr/lib64/glusterfs/3.7.10/xlator/cluster/disperse.so(ec_manager_inodelk+0x155) [0x7f9e4e8fc305] -->/usr/lib64/glusterfs/3.7.10/xlator/cluster/disperse.so(ec_unlocked+0x35) [0x7f9e4e8f3c25] -->/usr/lib64/glusterfs/3.7.10/xlator/cluster/disperse.so(ec_lock_unfreeze+0x100) [0x7f9e4e8f3ab0] ) 0-: Assertion failed: list_empty(&lock->waiting) && list_empty(&lock->owners) [2016-04-21 19:30:05.998568] E [ec-inode-read.c:1612:ec_manager_stat] (-->/usr/lib64/glusterfs/3.7.10/xlator/cluster/disperse.so(ec_resume+0x88) [0x7f9e4e8f1a68] -->/usr/lib64/glusterfs/3.7.10/xlator/cluster/disperse.so(__ec_manager+0x5b) [0x7f9e4e8f18bb] -->/usr/lib64/glusterfs/3.7.10/xlator/cluster/disperse.so(ec_manager_stat+0x315) [0x7f9e4e905ed5] ) 0-: Assertion failed: ec_get_inode_size(fop, fop->locks[0].lock->loc.inode, &cbk->iatt[0].ia_size) [2016-04-21 19:30:05.999146] E [MSGID: 114031] [client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-test-client-8: remote operation failed [Invalid argument] [2016-04-21 19:30:05.999132] E [MSGID: 114031] [client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-test-client-10: remote operation failed [Invalid argument] [2016-04-21 19:30:05.999237] E [MSGID: 114031] [client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-test-client-11: remote operation failed [Invalid argument] [2016-04-21 19:30:05.999259] E [MSGID: 114031] [client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-test-client-7: remote operation failed [Invalid argument] [2016-04-21 19:30:05.999326] E [MSGID: 114031] [client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-test-client-9: remote operation failed [Invalid argument] [2016-04-21 19:30:06.047496] E [MSGID: 114031] [client-rpc-fops.c:1624:client3_3_inodelk_cbk] 0-test-client-6: remote operation failed [Invalid argument] [2016-04-21 19:30:06.047559] W [MSGID: 122015] [ec-common.c:1675:ec_unlocked] 0-test-disperse-1: entry/inode unlocking failed (FSTAT) [Invalid argument] Version-Release number of selected component (if applicable): mainline How reproducible: It happens randomly after some time running the distributed iozone test. Steps to Reproduce: 1. 2. 3. Actual results: Volume access fails and iozone quits with an error. Expected results: iozone should complete the test successfully. Additional info: Probably related to a race when cancelling the lock release timeout while the callback is already executing. In this case the new fop is not placed in the right waiting list. --- Additional comment from Vijay Bellur on 2016-04-29 11:19:27 CEST --- REVIEW: http://review.gluster.org/14112 (cluster/ec: Fix issues with eager locking) posted (#1) for review on master by Xavier Hernandez (xhernandez) --- Additional comment from Vijay Bellur on 2016-05-02 16:45:05 CEST --- COMMIT: http://review.gluster.org/14112 committed in master by Jeff Darcy (jdarcy) ------ commit 209985e861f4d8a22bfdb457c0e8d7045ab44553 Author: Xavier Hernandez <xhernandez> Date: Thu Apr 28 08:42:40 2016 +0200 cluster/ec: Fix issues with eager locking Due to a race in timer cancellation, in some cases it was possible to unlock the lock while another concurrent fop that needed it continues execution as if it were not released. This patch also fixes an issue that caused a lock to not be released if an error was found while preparing ec_update_size_version(). Change-Id: I1344a3f5ecfc333f05a09e62653838264c9c26b1 BUG: 1331254 Signed-off-by: Xavier Hernandez <xhernandez> Reviewed-on: http://review.gluster.org/14112 Smoke: Gluster Build System <jenkins.com> CentOS-regression: Gluster Build System <jenkins.com> Reviewed-by: Chen Chen <chenchen> NetBSD-regression: NetBSD Build System <jenkins.org>
REVIEW: http://review.gluster.org/14206 (cluster/ec: Fix issues with eager locking) posted (#1) for review on release-3.8 by Xavier Hernandez (xhernandez)
COMMIT: http://review.gluster.org/14206 committed in release-3.8 by Niels de Vos (ndevos) ------ commit d1e0200f7dbbc412b8fc0127be2324beaade1c78 Author: Xavier Hernandez <xhernandez> Date: Thu Apr 28 08:42:40 2016 +0200 cluster/ec: Fix issues with eager locking Due to a race in timer cancellation, in some cases it was possible to unlock the lock while another concurrent fop that needed it continues execution as if it were not released. This patch also fixes an issue that caused a lock to not be released if an error was found while preparing ec_update_size_version(). > Change-Id: I1344a3f5ecfc333f05a09e62653838264c9c26b1 > BUG: 1331254 > Signed-off-by: Xavier Hernandez <xhernandez> > Reviewed-on: http://review.gluster.org/14112 > Smoke: Gluster Build System <jenkins.com> > CentOS-regression: Gluster Build System <jenkins.com> > Reviewed-by: Chen Chen <chenchen> > NetBSD-regression: NetBSD Build System <jenkins.org> Change-Id: I9ccd585a9b9952b6787cfca6720bc59b9c8ddab9 BUG: 1332845 Signed-off-by: Xavier Hernandez <xhernandez> Reviewed-on: http://review.gluster.org/14206 Smoke: Gluster Build System <jenkins.com> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.com> Reviewed-by: Niels de Vos <ndevos>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report. glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/ [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user