Bug 1398554

Summary:	Rename is failing with ENOENT while remove-brick start operation is in progress
Product:	[Community] GlusterFS	Reporter:	Mohit Agrawal <moagrawa>
Component:	distribute	Assignee:	Mohit Agrawal <moagrawa>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	medium	Docs Contact:
Priority:	medium
Version:	mainline	CC:	bugs, moagrawa, nbalacha, rhs-bugs, storage-qa-internal, tdesala
Target Milestone:	---	Keywords:	Triaged
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	glusterfs-4.1.3 (or later)	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1395133	Environment:
Last Closed:	2018-08-29 03:34:46 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1286127, 1395133, 1395217
Bug Blocks:

Description Mohit Agrawal 2016-11-25 08:56:32 UTC

+++ This bug was initially created as a clone of Bug #1395133 +++

Description of problem:
=======================
Rename is failing with ENOENT while remove-brick start operation is in progress.

Version-Release number of selected component (if applicable):
3.8.4-5.el7rhgs.x86_64

How reproducible:
=================
Always

Steps to Reproduce:
===================
1) Create an EC volume and start it (Distributed replicate volume can also be used here, the issue is seen with Distributed replicate volume as well).
2) FUSE mount the volume on a client.
3) Create a big file of size 10Gb on the FUSE mount.
dd if=/dev/urandom of=BIG bs=1024k count=10000
4) Identify the bricks on which the file 'BIG' is located and remove those bricks so that the BIG file gets migrated.
5) While remove-brick start operation is in progress, rename the file from the mount point. 

We can see the below error,
mv: cannot move ‘BIG’ to ‘BIG_rename’: No such file or directory

Actual results:
===============
Renaming is fialing with ENOENT.

Expected results:
=================
Rename should be successful.

--- Additional comment from Red Hat Bugzilla Rules Engine on 2016-11-15 04:08:20 EST ---

This bug is automatically being proposed for the current release of Red Hat Gluster Storage 3 under active development, by setting the release flag 'rhgs‑3.2.0' to '?'. 

If this bug should be proposed for a different release, please manually change the proposed release flag.

--- Additional comment from Prasad Desala on 2016-11-15 04:19:48 EST ---

sosreports@ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/Prasad/1395133/

Additional info:
===============
1) We are able to reproduce the issue even with a distributed-replicate volume without any md-cache settings.
2) After completion of the remove-brick start operation, tried to rename the file and is successful.

Volname: ecvol
Client: 10.70.37.91 --->  mount -t glusterfs 10.70.37.190:/ecvol /mnt/fuse/
[root@dhcp37-190 glusterfs]# gluster v info
 
Volume Name: ecvol
Type: Distributed-Disperse
Volume ID: f90202e8-a36e-4d3d-a0e2-8fa93152c028
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (4 + 2) = 12
Transport-type: tcp
Bricks:
Brick1: 10.70.37.190:/bricks/brick0/b0
Brick2: 10.70.37.215:/bricks/brick0/b0
Brick3: 10.70.37.44:/bricks/brick0/b0
Brick4: 10.70.37.190:/bricks/brick1/b1
Brick5: 10.70.37.215:/bricks/brick1/b1
Brick6: 10.70.37.44:/bricks/brick1/b1
Brick7: 10.70.37.190:/bricks/brick2/b2
Brick8: 10.70.37.215:/bricks/brick2/b2
Brick9: 10.70.37.44:/bricks/brick2/b2
Brick10: 10.70.37.190:/bricks/brick3/b3
Brick11: 10.70.37.215:/bricks/brick3/b3
Brick12: 10.70.37.44:/bricks/brick3/b3
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
features.uss: on
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
features.quota: on
features.inode-quota: on
features.quota-deem-statfs: on
disperse.shd-max-threads: 8
disperse.shd-wait-qlength: 1024

--- Additional comment from Nithya Balachandran on 2016-11-15 05:06:44 EST ---

Looks like the same issue as reported by BZ 1286127. Marking it depends on 1286127 for now.

--- Additional comment from Mohit Agrawal on 2016-11-25 03:55:58 EST ---

RCA of rename failing with ENOENT is same as mentioned in BZ 1286127.
rename is failing with ENOENT because file is not available on cached_subvol that has changed during migration process.
To resolve it pass a new rename fops(dht_rename2) (in case of failure) to dht_rebalance_complete_check ,it will call dht_rename2 after complete migration process.

Comment 1 Worker Ant 2016-11-25 09:08:31 UTC

REVIEW: http://review.gluster.org/15928 (cluster/dht: Rename is failing with ENOENT while migration is in progress) posted (#1) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 2 Worker Ant 2016-11-29 12:21:34 UTC

REVIEW: http://review.gluster.org/15928 (WIP cluster/dht: Rename is failing with ENOENT while migration is in progress) posted (#2) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 3 Worker Ant 2016-12-01 05:59:29 UTC

REVIEW: http://review.gluster.org/15928 (cluster/dht: Rename is failing with ENOENT while migration is in progress) posted (#3) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 4 Worker Ant 2016-12-02 15:29:59 UTC

REVIEW: http://review.gluster.org/15928 (cluster/dht: Rename is failing with ENOENT while migration is in progress) posted (#4) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 5 Amar Tumballi 2018-08-29 03:34:46 UTC

This update is done in bulk based on the state of the patch and the time since last activity. If the issue is still seen, please reopen the bug.