+++ This bug was initially created as a clone of Bug #1711764 +++
Description of problem:
This is a consequence of https://review.gluster.org/#/c/glusterfs/+/17239/ and lookup-optimize being enabled.
Rebalance directory processing steps on each node:
1. Set new layout on directory without the commit hash
2. List files on that local subvol. Migrate those files which fall into its bucket. Lookups are performed on the files only if it is determined that it is to be migrated by the process.
3. When done, update the layout on the local subvol with the layout containing the commit hash.
When there are multiple rebalance processes processing the same directory, they finish at different times and one process can update the layout with the commit hash before the others are done listing and migrating their files.
Clients will therefore see a complete layout even before all files have been looked up according to the new layout causing file access to fail.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Create a 2x2 volume spanning 2 nodes. Create some directories and files on it.
2. Add 2 bricks to convert it to a 3x2 volume.
3. Start a rebalance on the volume and break into one rebalance process before it starts processing the directories.
4. Allow the second rebalance process to complete. Kill the process that is blocked by gdb.
5. Mount the volume and try to stat the files without listing the directories.
The stat will fail for several files with the error :
stat: cannot stat ‘<filename>’: No such file or directory
--- Additional comment from Nithya Balachandran on 2019-05-20 05:05:30 UTC ---
The easiest solution is to have each node do the file lookups before the call to gf_defrag_should_i_migrate.
Cons: Will introduce more lookups but is pretty much the same as the number seen before https://review.gluster.org/#/c/glusterfs/+/17239/
--- Additional comment from Worker Ant on 2019-05-20 10:01:20 UTC ---
REVIEW: https://review.gluster.org/22746 (cluster/dht: Lookup all files when processing directory) posted (#1) for review on master by N Balachandran
This bug is moved to https://github.com/gluster/glusterfs/issues/973, and will be tracked there from now on. Visit GitHub issues URL for further details