Bug 1398554 - Rename is failing with ENOENT while remove-brick start operation is in progress
Summary: Rename is failing with ENOENT while remove-brick start operation is in progress
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: distribute
Version: mainline
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Mohit Agrawal
QA Contact:
URL:
Whiteboard:
Depends On: 1286127 1395133 1395217
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-11-25 08:56 UTC by Mohit Agrawal
Modified: 2018-08-29 03:34 UTC (History)
6 users (show)

Fixed In Version: glusterfs-4.1.3 (or later)
Clone Of: 1395133
Environment:
Last Closed: 2018-08-29 03:34:46 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Mohit Agrawal 2016-11-25 08:56:32 UTC
+++ This bug was initially created as a clone of Bug #1395133 +++

Description of problem:
=======================
Rename is failing with ENOENT while remove-brick start operation is in progress.

Version-Release number of selected component (if applicable):
3.8.4-5.el7rhgs.x86_64

How reproducible:
=================
Always

Steps to Reproduce:
===================
1) Create an EC volume and start it (Distributed replicate volume can also be used here, the issue is seen with Distributed replicate volume as well).
2) FUSE mount the volume on a client.
3) Create a big file of size 10Gb on the FUSE mount.
dd if=/dev/urandom of=BIG bs=1024k count=10000
4) Identify the bricks on which the file 'BIG' is located and remove those bricks so that the BIG file gets migrated.
5) While remove-brick start operation is in progress, rename the file from the mount point. 

We can see the below error,
mv: cannot move ‘BIG’ to ‘BIG_rename’: No such file or directory

Actual results:
===============
Renaming is fialing with ENOENT.

Expected results:
=================
Rename should be successful.

--- Additional comment from Red Hat Bugzilla Rules Engine on 2016-11-15 04:08:20 EST ---

This bug is automatically being proposed for the current release of Red Hat Gluster Storage 3 under active development, by setting the release flag 'rhgs‑3.2.0' to '?'. 

If this bug should be proposed for a different release, please manually change the proposed release flag.

--- Additional comment from Prasad Desala on 2016-11-15 04:19:48 EST ---

sosreports@ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/Prasad/1395133/

Additional info:
===============
1) We are able to reproduce the issue even with a distributed-replicate volume without any md-cache settings.
2) After completion of the remove-brick start operation, tried to rename the file and is successful.

Volname: ecvol
Client: 10.70.37.91 --->  mount -t glusterfs 10.70.37.190:/ecvol /mnt/fuse/
[root@dhcp37-190 glusterfs]# gluster v info
 
Volume Name: ecvol
Type: Distributed-Disperse
Volume ID: f90202e8-a36e-4d3d-a0e2-8fa93152c028
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (4 + 2) = 12
Transport-type: tcp
Bricks:
Brick1: 10.70.37.190:/bricks/brick0/b0
Brick2: 10.70.37.215:/bricks/brick0/b0
Brick3: 10.70.37.44:/bricks/brick0/b0
Brick4: 10.70.37.190:/bricks/brick1/b1
Brick5: 10.70.37.215:/bricks/brick1/b1
Brick6: 10.70.37.44:/bricks/brick1/b1
Brick7: 10.70.37.190:/bricks/brick2/b2
Brick8: 10.70.37.215:/bricks/brick2/b2
Brick9: 10.70.37.44:/bricks/brick2/b2
Brick10: 10.70.37.190:/bricks/brick3/b3
Brick11: 10.70.37.215:/bricks/brick3/b3
Brick12: 10.70.37.44:/bricks/brick3/b3
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
features.uss: on
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
features.quota: on
features.inode-quota: on
features.quota-deem-statfs: on
disperse.shd-max-threads: 8
disperse.shd-wait-qlength: 1024

--- Additional comment from Nithya Balachandran on 2016-11-15 05:06:44 EST ---

Looks like the same issue as reported by BZ 1286127. Marking it depends on 1286127 for now.

--- Additional comment from Mohit Agrawal on 2016-11-25 03:55:58 EST ---

RCA of rename failing with ENOENT is same as mentioned in BZ 1286127.
rename is failing with ENOENT because file is not available on cached_subvol that has changed during migration process.
To resolve it pass a new rename fops(dht_rename2) (in case of failure) to dht_rebalance_complete_check ,it will call dht_rename2 after complete migration process.

Comment 1 Worker Ant 2016-11-25 09:08:31 UTC
REVIEW: http://review.gluster.org/15928 (cluster/dht: Rename is failing with ENOENT while migration is in progress) posted (#1) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 2 Worker Ant 2016-11-29 12:21:34 UTC
REVIEW: http://review.gluster.org/15928 (WIP cluster/dht: Rename is failing with ENOENT while migration is in progress) posted (#2) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 3 Worker Ant 2016-12-01 05:59:29 UTC
REVIEW: http://review.gluster.org/15928 (cluster/dht: Rename is failing with ENOENT while migration is in progress) posted (#3) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 4 Worker Ant 2016-12-02 15:29:59 UTC
REVIEW: http://review.gluster.org/15928 (cluster/dht: Rename is failing with ENOENT while migration is in progress) posted (#4) for review on master by MOHIT AGRAWAL (moagrawa)

Comment 5 Amar Tumballi 2018-08-29 03:34:46 UTC
This update is done in bulk based on the state of the patch and the time since last activity. If the issue is still seen, please reopen the bug.


Note You need to log in before you can comment on or make changes to this bug.