1465075 – Fd based fops fail with EBADF on file migration

Bug 1465075 - Fd based fops fail with EBADF on file migration

Summary: Fd based fops fail with EBADF on file migration

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	distribute
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Nithya Balachandran
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1463907 1465123 1467010
TreeView+	depends on / blocked

Reported:	2017-06-26 15:09 UTC by Nithya Balachandran
Modified:	2017-09-05 17:35 UTC (History)
CC List:	1 user (show)
Fixed In Version:	glusterfs-3.12.0
Clone Of:
Clones:	1465123 1467010 (view as bug list)
Environment:
Last Closed:	2017-09-05 17:35:13 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Nithya Balachandran 2017-06-26 15:09:04 UTC

Description of problem:

In DHT, there exists a scenario where fd based fops may be sent on the dst subvolume after the file has been migrated but before the fd has been opened on it. This is because certain operations update the cached subvol in the dht inode ctx without checking to see if an fd has been opened on it on the original subvol. Dht fd based fops currently rely on a phase1/phase2 migration checks to open fds on the dst subvol. However, no such check is made causing the fop to fail with EBADF.

This is seen with dist-rep volumes.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Create a 2x2 volume
2. Create a file FILE1. Assume it is created on subvol1. Rename it to NFILE1 so it hashes to subvol2.
3. Open an fd on NFILE1 (on subvol1).
2. Perform a rebalance so the file is migrated to subvol2.
3. On the same mount point, perform a lookup/readdirp so the cached subvol in the inode_ctx of NFILE1 is updated to subvol2.
4. Perform a write on the fd.

The write is sent to subvol2 on an fd which has been opened only on subvol1. Since the migration phase checks don't kick in, the fd is not opened on subvol2 and the fop fails with EBADF.

Actual results:

Expected results:

Additional info:

This is being fixed by having every fd based fop check if the fd has been opened on the cached subvol before winding the fop down.

Comment 1 Worker Ant 2017-06-26 15:51:07 UTC

REVIEW: https://review.gluster.org/17630 (cluster/dht: Check if fd is opened on dst subvol) posted (#1) for review on master by N Balachandran (nbalacha)

Comment 2 Worker Ant 2017-06-27 11:26:10 UTC

REVIEW: https://review.gluster.org/17630 (cluster/dht: Check if fd is opened on dst subvol) posted (#2) for review on master by N Balachandran (nbalacha)

Comment 3 Worker Ant 2017-06-27 15:47:07 UTC

REVIEW: https://review.gluster.org/17630 (cluster/dht: Check if fd is opened on dst subvol) posted (#3) for review on master by N Balachandran (nbalacha)

Comment 4 Worker Ant 2017-06-27 16:51:29 UTC

REVIEW: https://review.gluster.org/17630 (cluster/dht: Check if fd is opened on dst subvol) posted (#4) for review on master by N Balachandran (nbalacha)

Comment 5 Worker Ant 2017-06-28 04:25:55 UTC

REVIEW: https://review.gluster.org/17630 (cluster/dht: Check if fd is opened on dst subvol) posted (#5) for review on master by N Balachandran (nbalacha)

Comment 6 Worker Ant 2017-06-28 07:09:48 UTC

REVIEW: https://review.gluster.org/17630 (cluster/dht: Check if fd is opened on dst subvol) posted (#6) for review on master by N Balachandran (nbalacha)

Comment 7 Worker Ant 2017-06-28 11:42:25 UTC

COMMIT: https://review.gluster.org/17630 committed in master by Raghavendra G (rgowdapp) 
------
commit 91db0d47ca267aecfc6124a3f337a4e2f2c9f1e2
Author: N Balachandran <nbalacha>
Date:   Mon Jun 26 21:12:56 2017 +0530

    cluster/dht: Check if fd is opened on dst subvol
    
    If an fd is opened on a file, the file is migrated
    and the cached subvol is updated in the inode_ctx
    before an fd based fop is sent, the fop is sent to
    the dst subvol on which the fd is not opened.
    This causes the FOP to fail with EBADF.
    
    Now, every fd based fop will check to see that the fd
    has been opened on the dst subvol before winding it down.
    
    Change-Id: Id92ef5eb7a5b5226688e2d2868b15e383f5f240e
    BUG: 1465075
    Signed-off-by: N Balachandran <nbalacha>
    Reviewed-on: https://review.gluster.org/17630
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: Raghavendra G <rgowdapp>
    Reviewed-by: Susant Palai <spalai>
    CentOS-regression: Gluster Build System <jenkins.org>

Comment 8 Worker Ant 2017-07-10 04:11:58 UTC

REVIEW: https://review.gluster.org/17731 (cluster/dht: Fix fd check race) posted (#1) for review on master by N Balachandran (nbalacha)

Comment 9 Nithya Balachandran 2017-07-10 04:14:03 UTC

There is yet another race between the cached subvol being updated in the inode_ctx and the fd being opened on the target.


1. fop1 -> fd1 -> subvol0
2. file migrated from subvol0 to subvol1 and cached_subvol changed to subvol1 in inode_ctx
3. fop2 -> fd1 -> subvol1 [takes new cached subvol]
4. fop2 -> checks fd ctx (fd not open on subvol1) -> opens fd1 on subvol1
5. fop1 -> checks fd ctx (fd not open on subvol0) -> tries to open fd1 on subvol0 -> fails with "No such file on directory".


Possible fixes:

1. Keep track of all fds opened on all subvols in a list. This will also help fix the fd leaks that are currently present.

2. If dht_fd_open_on_dst fails with ENOENT, check if the cached subvol in the inode_ctx has changed. If yes, wind to the new cached subvol.

3.  If dht_fd_open_on_dst fails with ENOENT, wind to old subvol and let the phase1/phase2 checks handle it.


Option 3 is probably the safest at this point as the phase1/phase2 checks will kick in.

Comment 10 Worker Ant 2017-07-10 14:57:05 UTC

REVIEW: https://review.gluster.org/17731 (cluster/dht: Fix fd check race) posted (#2) for review on master by N Balachandran (nbalacha)

Comment 11 Worker Ant 2017-07-11 05:02:56 UTC

COMMIT: https://review.gluster.org/17731 committed in master by Raghavendra G (rgowdapp) 
------
commit f7a450c17fee7e43c544473366220887f0534ed7
Author: N Balachandran <nbalacha>
Date:   Mon Jul 10 09:38:54 2017 +0530

    cluster/dht: Fix fd check race
    
    There is a another race between the cached subvol
    being updated in the inode_ctx and the fd being opened on
    the target.
    
    1. fop1 -> fd1 -> subvol0
    2. file migrated from subvol0 to subvol1 and cached_subvol
       changed to subvol1 in inode_ctx
    3. fop2 -> fd1 -> subvol1 [takes new cached subvol]
    4. fop2 -> checks fd ctx (fd not open on subvol1) -> opens fd1 on subvol1
    5. fop1 -> checks fd ctx (fd not open on subvol0)
       -> tries to open fd1 on subvol0 -> fails with "No such file on directory".
    
    Fix:
    If dht_fd_open_on_dst fails with ENOENT or ESTALE, wind to old subvol
    and let the phase1/phase2 checks handle it.
    
    Change-Id: I34f8011574a8b72e3bcfe03b0cc4f024b352f225
    BUG: 1465075
    Signed-off-by: N Balachandran <nbalacha>
    Reviewed-on: https://review.gluster.org/17731
    Smoke: Gluster Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Raghavendra G <rgowdapp>
    Reviewed-by: Amar Tumballi <amarts>

Comment 12 Shyamsundar 2017-09-05 17:35:13 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.0, please open a new bug report.

glusterfs-3.12.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-September/000082.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.