Description of problem: In DHT, there exists a scenario where fd based fops may be sent on the dst subvolume after the file has been migrated but before the fd has been opened on it. This is because certain operations update the cached subvol in the dht inode ctx without checking to see if an fd has been opened on it on the original subvol. Dht fd based fops currently rely on a phase1/phase2 migration checks to open fds on the dst subvol. However, no such check is made causing the fop to fail with EBADF. This is seen with dist-rep volumes. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Create a 2x2 volume 2. Create a file FILE1. Assume it is created on subvol1. Rename it to NFILE1 so it hashes to subvol2. 3. Open an fd on NFILE1 (on subvol1). 2. Perform a rebalance so the file is migrated to subvol2. 3. On the same mount point, perform a lookup/readdirp so the cached subvol in the inode_ctx of NFILE1 is updated to subvol2. 4. Perform a write on the fd. The write is sent to subvol2 on an fd which has been opened only on subvol1. Since the migration phase checks don't kick in, the fd is not opened on subvol2 and the fop fails with EBADF. Actual results: Expected results: Additional info: This is being fixed by having every fd based fop check if the fd has been opened on the cached subvol before winding the fop down.
REVIEW: https://review.gluster.org/17630 (cluster/dht: Check if fd is opened on dst subvol) posted (#1) for review on master by N Balachandran (nbalacha)
REVIEW: https://review.gluster.org/17630 (cluster/dht: Check if fd is opened on dst subvol) posted (#2) for review on master by N Balachandran (nbalacha)
REVIEW: https://review.gluster.org/17630 (cluster/dht: Check if fd is opened on dst subvol) posted (#3) for review on master by N Balachandran (nbalacha)
REVIEW: https://review.gluster.org/17630 (cluster/dht: Check if fd is opened on dst subvol) posted (#4) for review on master by N Balachandran (nbalacha)
REVIEW: https://review.gluster.org/17630 (cluster/dht: Check if fd is opened on dst subvol) posted (#5) for review on master by N Balachandran (nbalacha)
REVIEW: https://review.gluster.org/17630 (cluster/dht: Check if fd is opened on dst subvol) posted (#6) for review on master by N Balachandran (nbalacha)
COMMIT: https://review.gluster.org/17630 committed in master by Raghavendra G (rgowdapp) ------ commit 91db0d47ca267aecfc6124a3f337a4e2f2c9f1e2 Author: N Balachandran <nbalacha> Date: Mon Jun 26 21:12:56 2017 +0530 cluster/dht: Check if fd is opened on dst subvol If an fd is opened on a file, the file is migrated and the cached subvol is updated in the inode_ctx before an fd based fop is sent, the fop is sent to the dst subvol on which the fd is not opened. This causes the FOP to fail with EBADF. Now, every fd based fop will check to see that the fd has been opened on the dst subvol before winding it down. Change-Id: Id92ef5eb7a5b5226688e2d2868b15e383f5f240e BUG: 1465075 Signed-off-by: N Balachandran <nbalacha> Reviewed-on: https://review.gluster.org/17630 Smoke: Gluster Build System <jenkins.org> Reviewed-by: Raghavendra G <rgowdapp> Reviewed-by: Susant Palai <spalai> CentOS-regression: Gluster Build System <jenkins.org>
REVIEW: https://review.gluster.org/17731 (cluster/dht: Fix fd check race) posted (#1) for review on master by N Balachandran (nbalacha)
There is yet another race between the cached subvol being updated in the inode_ctx and the fd being opened on the target. 1. fop1 -> fd1 -> subvol0 2. file migrated from subvol0 to subvol1 and cached_subvol changed to subvol1 in inode_ctx 3. fop2 -> fd1 -> subvol1 [takes new cached subvol] 4. fop2 -> checks fd ctx (fd not open on subvol1) -> opens fd1 on subvol1 5. fop1 -> checks fd ctx (fd not open on subvol0) -> tries to open fd1 on subvol0 -> fails with "No such file on directory". Possible fixes: 1. Keep track of all fds opened on all subvols in a list. This will also help fix the fd leaks that are currently present. 2. If dht_fd_open_on_dst fails with ENOENT, check if the cached subvol in the inode_ctx has changed. If yes, wind to the new cached subvol. 3. If dht_fd_open_on_dst fails with ENOENT, wind to old subvol and let the phase1/phase2 checks handle it. Option 3 is probably the safest at this point as the phase1/phase2 checks will kick in.
REVIEW: https://review.gluster.org/17731 (cluster/dht: Fix fd check race) posted (#2) for review on master by N Balachandran (nbalacha)
COMMIT: https://review.gluster.org/17731 committed in master by Raghavendra G (rgowdapp) ------ commit f7a450c17fee7e43c544473366220887f0534ed7 Author: N Balachandran <nbalacha> Date: Mon Jul 10 09:38:54 2017 +0530 cluster/dht: Fix fd check race There is a another race between the cached subvol being updated in the inode_ctx and the fd being opened on the target. 1. fop1 -> fd1 -> subvol0 2. file migrated from subvol0 to subvol1 and cached_subvol changed to subvol1 in inode_ctx 3. fop2 -> fd1 -> subvol1 [takes new cached subvol] 4. fop2 -> checks fd ctx (fd not open on subvol1) -> opens fd1 on subvol1 5. fop1 -> checks fd ctx (fd not open on subvol0) -> tries to open fd1 on subvol0 -> fails with "No such file on directory". Fix: If dht_fd_open_on_dst fails with ENOENT or ESTALE, wind to old subvol and let the phase1/phase2 checks handle it. Change-Id: I34f8011574a8b72e3bcfe03b0cc4f024b352f225 BUG: 1465075 Signed-off-by: N Balachandran <nbalacha> Reviewed-on: https://review.gluster.org/17731 Smoke: Gluster Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Raghavendra G <rgowdapp> Reviewed-by: Amar Tumballi <amarts>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.0, please open a new bug report. glusterfs-3.12.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2017-September/000082.html [2] https://www.gluster.org/pipermail/gluster-users/