1393694 – The directories get renamed when data bricks are offline in 4*(2+1) volume

Bug 1393694 - The directories get renamed when data bricks are offline in 4*(2+1) volume

Summary: The directories get renamed when data bricks are offline in 4*(2+1) volume

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	replicate
Sub Component:
Version:	rhgs-3.2
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.2.0
Assignee:	Pranith Kumar K
QA Contact:	Karan Sandha
Docs Contact:
URL:
Whiteboard:
Depends On:	1369077 1402482 1413032
Blocks:	1351528
TreeView+	depends on / blocked

Reported:	2016-11-10 07:50 UTC by Karan Sandha
Modified:	2017-03-23 06:17 UTC (History)
CC List:	8 users (show)
Fixed In Version:	glusterfs-3.8.4-8
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1369077
Environment:
Last Closed:	2017-03-23 06:17:59 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:0486	0	normal	SHIPPED_LIVE	Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update	2017-03-23 09:18:45 UTC

Description Karan Sandha 2016-11-10 07:50:11 UTC

+++ This bug was initially created as a clone of Bug #1369077 +++

Description of problem:

Killed the data bricks which had the directory and data and renamed the directory from mount pt. renaming was successfull. 
Note:- Read the steps from more information

Version-Release number of selected component (if applicable):
gluster --version
glusterfs 3.8.2 built on Aug 10 2016 15:34:37

How reproducible:
3/3
[root@dhcp43-223 new]# gluster vol info
 
Volume Name: arbiter
Type: Distributed-Replicate
Volume ID: 70c7113e-2223-4cd2-acfd-b08b1c376ea4
Status: Started
Number of Bricks: 4 x (2 + 1) = 12
Transport-type: tcp
Bricks:
Brick1: 10.70.43.223:/bricks/brick0/abc
Brick2: 10.70.42.58:/bricks/brick0/abc
Brick3: 10.70.43.142:/bricks/brick0/abc (arbiter)
Brick4: 10.70.43.223:/bricks/brick1/abc
Brick5: 10.70.42.58:/bricks/brick1/abc
Brick6: 10.70.43.142:/bricks/brick1/abc (arbiter)
Brick7: 10.70.43.223:/bricks/brick2/abc
Brick8: 10.70.42.58:/bricks/brick2/abc
Brick9: 10.70.43.142:/bricks/brick2/abc (arbiter)
Brick10: 10.70.43.223:/bricks/brick3/abc
Brick11: 10.70.42.58:/bricks/brick3/abc
Brick12: 10.70.43.142:/bricks/brick3/abc (arbiter)
Options Reconfigured:
client.event-threads: 4
server.event-threads: 4
cluster.lookup-optimize: on
transport.address-family: inet
performance.readdir-ahead: on



Steps to Reproduce:
1. Create an arbiter volume 4 x (2 + 1) mount it using FUSE ( volume name -Arbiter)
2. On mount point create a directory "dir1" and create a file inside "abc"   
3. write 100M to the file using dd
 dd if=/dev/urandom of=abc bs=1M count=100
4. now kill the data bricks from the volume on which the data is present i.e "abc" file 
in my case:- brick10 , brick11  were data bricks , brick12 was the arbiter brick
5. Rest all bricks were online.
6. now change the directory name from dir1 to dir2 from mount point using "mv dir1 dir2"


Actual results:
The directory got renamed in-spite being in read only mode  
#mv dir1 dir2
mv: cannot move ‘dir1’ to ‘dir2’: Read-only file system
#ls
# dir2 
Expected results:
directory shouldn't be renamed. 

Additional info:
Tried the same on plain dist volume and plain replicate 1*3 volume. the issue was not reproducible.

Reproduced the same issue on 2 x (2 + 1) volume 
observed that after renaming the directory 

[root@dhcp43-165 super]# mv new one
mv: cannot move ‘new’ to ‘one’: Read-only file system
[root@dhcp43-165 super]# 
[root@dhcp43-165 super]# ls
ls: cannot access new: No such file or directory
new  one

two directories are created.

--- Additional comment from Karan Sandha on 2016-08-22 08:47 EDT ---



--- Additional comment from Karan Sandha on 2016-08-22 08:48 EDT ---



--- Additional comment from Karan Sandha on 2016-08-22 08:49 EDT ---



--- Additional comment from Karan Sandha on 2016-08-22 08:53 EDT ---



--- Additional comment from Ravishankar N on 2016-08-24 10:02:28 EDT ---

Changing the component to replicate as it occurs on distribute replicate also. (Karan, feel free to correct me if I am wrong). Also assigning it to Pranith as he said he'd work on the fix:

Relevant technical discussions on IRC:
<itisravi>        pranithk1: are you free to talk about the bug Karan raised?
<itisravi>        its a day one issue IMO and not specific to afr.
<itisravi>        s/afr/arbiter   
<pranithk1>       itisravi: He said the bug is not recreatable in 3-way replication?
<itisravi>        pranithk1: It is..I've requested him to check again.
<itisravi>        pranithk1: so if mkdir fails on one replica subvol due to quorum not met etc , dht has no roll back
<itisravi>        thats the issue.
<pranithk1>       itisravi: Does it happen on plain replicate?
<itisravi>        pranithk1: no   
<itisravi>        pranithk1: its dht renamedir thing..
<pranithk1>       itisravi: okay, assign the bug to DHT giving the reason
<itisravi>        pranithk1: nithya was saying  if afr_inodelk can also have quorum checks, then renamedir will not happen
<itisravi>        so we will be good.
<itisravi>        instead of partially creating it on the up subvols of DHT 
<pranithk1>       itisravi: That is not a bad idea, send out a patch. Please tell her it only prevents the odds, won't fix the problem completely
<itisravi>        pranithk1: we can do it for afr_entrylk also then no?
<pranithk1>       itisravi: Actually the inodelk/finodelk needs to be reworked. I will send the patch
<pranithk1>       itisravi: yeah, that too
<itisravi>        pranithk1: I see , okay.

--- Additional comment from Niels de Vos on 2016-09-12 01:39:42 EDT ---

All 3.8.x bugs are now reported against version 3.8 (without .x). For more information, see http://www.gluster.org/pipermail/gluster-devel/2016-September/050859.html

--- Additional comment from Worker Ant on 2016-11-08 07:21:51 EST ---

REVIEW: http://review.gluster.org/15802 (cluster/afr: Fix bugs in [f]inodelk/[f]entrylk) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 7 Atin Mukherjee 2016-12-01 10:12:04 UTC

Unfortunately the fix mentioned in comment 5 has introduced a regression and upstream mainline patch http://review.gluster.org/#/c/15984/ has been posted to address it. Moving this BZ back to Post.

Comment 9 Ravishankar N 2016-12-07 08:17:46 UTC

Downstream patch merged: https://code.engineering.redhat.com/gerrit/#/c/92316/

Comment 13 errata-xmlrpc 2017-03-23 06:17:59 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html

Note You need to log in before you can comment on or make changes to this bug.