Bug 1393694

Summary: The directories get renamed when data bricks are offline in 4*(2+1) volume
Product: Red Hat Gluster Storage Reporter: Karan Sandha <ksandha>
Component: replicateAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED ERRATA QA Contact: Karan Sandha <ksandha>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: amukherj, bugs, nchilaka, pkarampu, ravishankar, rhinduja, rhs-bugs, storage-qa-internal
Target Milestone: ---Keywords: Triaged
Target Release: RHGS 3.2.0   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.8.4-8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1369077 Environment:
Last Closed: 2017-03-23 06:17:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On: 1369077, 1402482, 1413032    
Bug Blocks: 1351528    

Description Karan Sandha 2016-11-10 07:50:11 UTC
+++ This bug was initially created as a clone of Bug #1369077 +++

Description of problem:

Killed the data bricks which had the directory and data and renamed the directory from mount pt. renaming was successfull. 
Note:- Read the steps from more information

Version-Release number of selected component (if applicable):
gluster --version
glusterfs 3.8.2 built on Aug 10 2016 15:34:37

How reproducible:
3/3
[root@dhcp43-223 new]# gluster vol info
 
Volume Name: arbiter
Type: Distributed-Replicate
Volume ID: 70c7113e-2223-4cd2-acfd-b08b1c376ea4
Status: Started
Number of Bricks: 4 x (2 + 1) = 12
Transport-type: tcp
Bricks:
Brick1: 10.70.43.223:/bricks/brick0/abc
Brick2: 10.70.42.58:/bricks/brick0/abc
Brick3: 10.70.43.142:/bricks/brick0/abc (arbiter)
Brick4: 10.70.43.223:/bricks/brick1/abc
Brick5: 10.70.42.58:/bricks/brick1/abc
Brick6: 10.70.43.142:/bricks/brick1/abc (arbiter)
Brick7: 10.70.43.223:/bricks/brick2/abc
Brick8: 10.70.42.58:/bricks/brick2/abc
Brick9: 10.70.43.142:/bricks/brick2/abc (arbiter)
Brick10: 10.70.43.223:/bricks/brick3/abc
Brick11: 10.70.42.58:/bricks/brick3/abc
Brick12: 10.70.43.142:/bricks/brick3/abc (arbiter)
Options Reconfigured:
client.event-threads: 4
server.event-threads: 4
cluster.lookup-optimize: on
transport.address-family: inet
performance.readdir-ahead: on



Steps to Reproduce:
1. Create an arbiter volume 4 x (2 + 1) mount it using FUSE ( volume name -Arbiter)
2. On mount point create a directory "dir1" and create a file inside "abc"   
3. write 100M to the file using dd
 dd if=/dev/urandom of=abc bs=1M count=100
4. now kill the data bricks from the volume on which the data is present i.e "abc" file 
in my case:- brick10 , brick11  were data bricks , brick12 was the arbiter brick
5. Rest all bricks were online.
6. now change the directory name from dir1 to dir2 from mount point using "mv dir1 dir2"


Actual results:
The directory got renamed in-spite being in read only mode  
#mv dir1 dir2
mv: cannot move ‘dir1’ to ‘dir2’: Read-only file system
#ls
# dir2 
Expected results:
directory shouldn't be renamed. 

Additional info:
Tried the same on plain dist volume and plain replicate 1*3 volume. the issue was not reproducible.

Reproduced the same issue on 2 x (2 + 1) volume 
observed that after renaming the directory 

[root@dhcp43-165 super]# mv new one
mv: cannot move ‘new’ to ‘one’: Read-only file system
[root@dhcp43-165 super]# 
[root@dhcp43-165 super]# ls
ls: cannot access new: No such file or directory
new  one

two directories are created.

--- Additional comment from Karan Sandha on 2016-08-22 08:47 EDT ---



--- Additional comment from Karan Sandha on 2016-08-22 08:48 EDT ---



--- Additional comment from Karan Sandha on 2016-08-22 08:49 EDT ---



--- Additional comment from Karan Sandha on 2016-08-22 08:53 EDT ---



--- Additional comment from Ravishankar N on 2016-08-24 10:02:28 EDT ---

Changing the component to replicate as it occurs on distribute replicate also. (Karan, feel free to correct me if I am wrong). Also assigning it to Pranith as he said he'd work on the fix:

Relevant technical discussions on IRC:
<itisravi>        pranithk1: are you free to talk about the bug Karan raised?
<itisravi>        its a day one issue IMO and not specific to afr.
<itisravi>        s/afr/arbiter   
<pranithk1>       itisravi: He said the bug is not recreatable in 3-way replication?
<itisravi>        pranithk1: It is..I've requested him to check again.
<itisravi>        pranithk1: so if mkdir fails on one replica subvol due to quorum not met etc , dht has no roll back
<itisravi>        thats the issue.
<pranithk1>       itisravi: Does it happen on plain replicate?
<itisravi>        pranithk1: no   
<itisravi>        pranithk1: its dht renamedir thing..
<pranithk1>       itisravi: okay, assign the bug to DHT giving the reason
<itisravi>        pranithk1: nithya was saying  if afr_inodelk can also have quorum checks, then renamedir will not happen
<itisravi>        so we will be good.
<itisravi>        instead of partially creating it on the up subvols of DHT 
<pranithk1>       itisravi: That is not a bad idea, send out a patch. Please tell her it only prevents the odds, won't fix the problem completely
<itisravi>        pranithk1: we can do it for afr_entrylk also then no?
<pranithk1>       itisravi: Actually the inodelk/finodelk needs to be reworked. I will send the patch
<pranithk1>       itisravi: yeah, that too
<itisravi>        pranithk1: I see , okay.

--- Additional comment from Niels de Vos on 2016-09-12 01:39:42 EDT ---

All 3.8.x bugs are now reported against version 3.8 (without .x). For more information, see http://www.gluster.org/pipermail/gluster-devel/2016-September/050859.html

--- Additional comment from Worker Ant on 2016-11-08 07:21:51 EST ---

REVIEW: http://review.gluster.org/15802 (cluster/afr: Fix bugs in [f]inodelk/[f]entrylk) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu@redhat.com)

Comment 7 Atin Mukherjee 2016-12-01 10:12:04 UTC
Unfortunately the fix mentioned in comment 5 has introduced a regression and upstream mainline patch http://review.gluster.org/#/c/15984/ has been posted to address it. Moving this BZ back to Post.

Comment 9 Ravishankar N 2016-12-07 08:17:46 UTC
Downstream patch merged: https://code.engineering.redhat.com/gerrit/#/c/92316/

Comment 13 errata-xmlrpc 2017-03-23 06:17:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html