+++ This bug was initially created as a clone of Bug #1369077 +++ Description of problem: Killed the data bricks which had the directory and data and renamed the directory from mount pt. renaming was successfull. Note:- Read the steps from more information Version-Release number of selected component (if applicable): gluster --version glusterfs 3.8.2 built on Aug 10 2016 15:34:37 How reproducible: 3/3 [root@dhcp43-223 new]# gluster vol info Volume Name: arbiter Type: Distributed-Replicate Volume ID: 70c7113e-2223-4cd2-acfd-b08b1c376ea4 Status: Started Number of Bricks: 4 x (2 + 1) = 12 Transport-type: tcp Bricks: Brick1: 10.70.43.223:/bricks/brick0/abc Brick2: 10.70.42.58:/bricks/brick0/abc Brick3: 10.70.43.142:/bricks/brick0/abc (arbiter) Brick4: 10.70.43.223:/bricks/brick1/abc Brick5: 10.70.42.58:/bricks/brick1/abc Brick6: 10.70.43.142:/bricks/brick1/abc (arbiter) Brick7: 10.70.43.223:/bricks/brick2/abc Brick8: 10.70.42.58:/bricks/brick2/abc Brick9: 10.70.43.142:/bricks/brick2/abc (arbiter) Brick10: 10.70.43.223:/bricks/brick3/abc Brick11: 10.70.42.58:/bricks/brick3/abc Brick12: 10.70.43.142:/bricks/brick3/abc (arbiter) Options Reconfigured: client.event-threads: 4 server.event-threads: 4 cluster.lookup-optimize: on transport.address-family: inet performance.readdir-ahead: on Steps to Reproduce: 1. Create an arbiter volume 4 x (2 + 1) mount it using FUSE ( volume name -Arbiter) 2. On mount point create a directory "dir1" and create a file inside "abc" 3. write 100M to the file using dd dd if=/dev/urandom of=abc bs=1M count=100 4. now kill the data bricks from the volume on which the data is present i.e "abc" file in my case:- brick10 , brick11 were data bricks , brick12 was the arbiter brick 5. Rest all bricks were online. 6. now change the directory name from dir1 to dir2 from mount point using "mv dir1 dir2" Actual results: The directory got renamed in-spite being in read only mode #mv dir1 dir2 mv: cannot move ‘dir1’ to ‘dir2’: Read-only file system #ls # dir2 Expected results: directory shouldn't be renamed. Additional info: Tried the same on plain dist volume and plain replicate 1*3 volume. the issue was not reproducible. Reproduced the same issue on 2 x (2 + 1) volume observed that after renaming the directory [root@dhcp43-165 super]# mv new one mv: cannot move ‘new’ to ‘one’: Read-only file system [root@dhcp43-165 super]# [root@dhcp43-165 super]# ls ls: cannot access new: No such file or directory new one two directories are created. --- Additional comment from Karan Sandha on 2016-08-22 08:47 EDT --- --- Additional comment from Karan Sandha on 2016-08-22 08:48 EDT --- --- Additional comment from Karan Sandha on 2016-08-22 08:49 EDT --- --- Additional comment from Karan Sandha on 2016-08-22 08:53 EDT --- --- Additional comment from Ravishankar N on 2016-08-24 10:02:28 EDT --- Changing the component to replicate as it occurs on distribute replicate also. (Karan, feel free to correct me if I am wrong). Also assigning it to Pranith as he said he'd work on the fix: Relevant technical discussions on IRC: <itisravi> pranithk1: are you free to talk about the bug Karan raised? <itisravi> its a day one issue IMO and not specific to afr. <itisravi> s/afr/arbiter <pranithk1> itisravi: He said the bug is not recreatable in 3-way replication? <itisravi> pranithk1: It is..I've requested him to check again. <itisravi> pranithk1: so if mkdir fails on one replica subvol due to quorum not met etc , dht has no roll back <itisravi> thats the issue. <pranithk1> itisravi: Does it happen on plain replicate? <itisravi> pranithk1: no <itisravi> pranithk1: its dht renamedir thing.. <pranithk1> itisravi: okay, assign the bug to DHT giving the reason <itisravi> pranithk1: nithya was saying if afr_inodelk can also have quorum checks, then renamedir will not happen <itisravi> so we will be good. <itisravi> instead of partially creating it on the up subvols of DHT <pranithk1> itisravi: That is not a bad idea, send out a patch. Please tell her it only prevents the odds, won't fix the problem completely <itisravi> pranithk1: we can do it for afr_entrylk also then no? <pranithk1> itisravi: Actually the inodelk/finodelk needs to be reworked. I will send the patch <pranithk1> itisravi: yeah, that too <itisravi> pranithk1: I see , okay. --- Additional comment from Niels de Vos on 2016-09-12 01:39:42 EDT --- All 3.8.x bugs are now reported against version 3.8 (without .x). For more information, see http://www.gluster.org/pipermail/gluster-devel/2016-September/050859.html --- Additional comment from Worker Ant on 2016-11-08 07:21:51 EST --- REVIEW: http://review.gluster.org/15802 (cluster/afr: Fix bugs in [f]inodelk/[f]entrylk) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu) --- Additional comment from Worker Ant on 2016-11-25 05:14:30 EST --- REVIEW: http://review.gluster.org/15802 (cluster/afr: Fix bugs in [f]inodelk/[f]entrylk) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu) --- Additional comment from Worker Ant on 2016-11-26 10:35:02 EST --- COMMIT: http://review.gluster.org/15802 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit 6be7bd936eb30aa8d2b908061f60e1534e797657 Author: Pranith Kumar K <pkarampu> Date: Mon Nov 7 14:47:34 2016 +0530 cluster/afr: Fix bugs in [f]inodelk/[f]entrylk Problems: 1) Inodelk is not taking quorum into account 2) finodelk, [f]entrylk are not implemented correctly 3) By default afr doesn't go for non-blocking parallel locks. Fix: Implemented a common framework which can be used by [f]inodelk/[f]entrylk. Used quorum for the same. Change-Id: I239f13875a065298630d266941df10cfa3addc85 BUG: 1369077 Signed-off-by: Pranith Kumar K <pkarampu> Reviewed-on: http://review.gluster.org/15802 Tested-by: Krutika Dhananjay <kdhananj> Reviewed-by: Krutika Dhananjay <kdhananj> Smoke: Gluster Build System <jenkins.org> Reviewed-by: Ravishankar N <ravishankar> CentOS-regression: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> --- Additional comment from Worker Ant on 2016-12-01 01:24:01 EST --- REVIEW: http://review.gluster.org/15984 (cluster/afr: Serialize conflicting locks on all subvols) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu) --- Additional comment from Worker Ant on 2016-12-01 03:38:11 EST --- REVIEW: http://review.gluster.org/15984 (cluster/afr: Serialize conflicting locks on all subvols) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu) --- Additional comment from Worker Ant on 2016-12-01 03:46:21 EST --- REVIEW: http://review.gluster.org/15984 (cluster/afr: Serialize conflicting locks on all subvols) posted (#3) for review on master by Pranith Kumar Karampuri (pkarampu) --- Additional comment from Worker Ant on 2016-12-01 06:48:43 EST --- REVIEW: http://review.gluster.org/15984 (cluster/afr: Serialize conflicting locks on all subvols) posted (#4) for review on master by Pranith Kumar Karampuri (pkarampu) --- Additional comment from Worker Ant on 2016-12-06 07:42:04 EST --- REVIEW: http://review.gluster.org/15984 (cluster/afr: Serialize conflicting locks on all subvols) posted (#5) for review on master by Pranith Kumar Karampuri (pkarampu) --- Additional comment from Worker Ant on 2016-12-06 07:42:08 EST --- REVIEW: http://review.gluster.org/16044 (tests: test parallel rmdirs to be successful) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu) --- Additional comment from Worker Ant on 2016-12-07 01:47:45 EST --- COMMIT: http://review.gluster.org/15984 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit a7d7ed90c9272a42168a91f92754d3a4be605da5 Author: Pranith Kumar K <pkarampu> Date: Thu Dec 1 09:42:19 2016 +0530 cluster/afr: Serialize conflicting locks on all subvols Problem: 1) When a blocking lock is issued and the parallel lock phase fails on all subvolumes with EAGAIN, it is not switching to serialized locking phase. 2) When quorum is enabled and locks fail partially it is better to give errno returned by brick rather than the default quorum errno. Fix: Handled this error case and changed op_errno to reflect the actual errno in case of quorum error. BUG: 1369077 Change-Id: Ifac2e4a13686e9fde601873012700966d56a7f31 Signed-off-by: Pranith Kumar K <pkarampu> Reviewed-on: http://review.gluster.org/15984 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Ravishankar N <ravishankar>
REVIEW: http://review.gluster.org/16056 (cluster/afr: Fix bugs in [f]inodelk/[f]entrylk) posted (#1) for review on release-3.9 by Pranith Kumar Karampuri (pkarampu)
REVIEW: http://review.gluster.org/16057 (cluster/afr: Serialize conflicting locks on all subvols) posted (#1) for review on release-3.9 by Pranith Kumar Karampuri (pkarampu)
COMMIT: http://review.gluster.org/16056 committed in release-3.9 by Pranith Kumar Karampuri (pkarampu) ------ commit 953917924a6298fb1deedf76feec354ee21dc373 Author: Pranith Kumar K <pkarampu> Date: Mon Nov 7 14:47:34 2016 +0530 cluster/afr: Fix bugs in [f]inodelk/[f]entrylk Problems: 1) Inodelk is not taking quorum into account 2) finodelk, [f]entrylk are not implemented correctly 3) By default afr doesn't go for non-blocking parallel locks. Fix: Implemented a common framework which can be used by [f]inodelk/[f]entrylk. Used quorum for the same. >Change-Id: I239f13875a065298630d266941df10cfa3addc85 >BUG: 1369077 >Signed-off-by: Pranith Kumar K <pkarampu> >Reviewed-on: http://review.gluster.org/15802 >Tested-by: Krutika Dhananjay <kdhananj> >Reviewed-by: Krutika Dhananjay <kdhananj> >Smoke: Gluster Build System <jenkins.org> >Reviewed-by: Ravishankar N <ravishankar> >CentOS-regression: Gluster Build System <jenkins.org> >NetBSD-regression: NetBSD Build System <jenkins.org> BUG: 1402482 Change-Id: I0c5fed6ca87c6432bb20d00f76cdf5c328a52a85 Signed-off-by: Pranith Kumar K <pkarampu> Reviewed-on: http://review.gluster.org/16056 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Ravishankar N <ravishankar>
COMMIT: http://review.gluster.org/16057 committed in release-3.9 by Pranith Kumar Karampuri (pkarampu) ------ commit 2d321a770a38972bdc59a9308da791189ffa9823 Author: Pranith Kumar K <pkarampu> Date: Thu Dec 1 09:42:19 2016 +0530 cluster/afr: Serialize conflicting locks on all subvols Problem: 1) When a blocking lock is issued and the parallel lock phase fails on all subvolumes with EAGAIN, it is not switching to serialized locking phase. 2) When quorum is enabled and locks fail partially it is better to give errno returned by brick rather than the default quorum errno. Fix: Handled this error case and changed op_errno to reflect the actual errno in case of quorum error. >BUG: 1369077 >Change-Id: Ifac2e4a13686e9fde601873012700966d56a7f31 >Signed-off-by: Pranith Kumar K <pkarampu> >Reviewed-on: http://review.gluster.org/15984 >Smoke: Gluster Build System <jenkins.org> >NetBSD-regression: NetBSD Build System <jenkins.org> >CentOS-regression: Gluster Build System <jenkins.org> >Reviewed-by: Ravishankar N <ravishankar> BUG: 1402482 Change-Id: Ib1ca577bfa52ae537ab7186d10bfa2ae755813e3 Signed-off-by: Pranith Kumar K <pkarampu> Reviewed-on: http://review.gluster.org/16057 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> Reviewed-by: Ravishankar N <ravishankar> CentOS-regression: Gluster Build System <jenkins.org>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.9.1, please open a new bug report. glusterfs-3.9.1 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/gluster-users/2017-January/029725.html [2] https://www.gluster.org/pipermail/gluster-users/