Most likely this bug is due to cached-subvol changes during rebalance. Since rhs-3.1, dht_rename on files acquires locks. So, this bug is most likely fixed. Need to be retested.
Please note that there is a small race-window between lookup on file(s) and rename fops. If the file gets migrated in this window, we can still run into rename errors (as cached-subvol is changed). To fix this bug completely, rename should also need to handle cached-subvol changes like open (dht_open2), stat (dht_stat2) etc.
This can also happen because 1. the layout of parent directory changed 2. but a lookup was not sent on src/dst. So, no entry corresponding to src/dst is present on newly hashed-subvols 3. rename is issued. Since a rename expects an entry on hashed-subvol, an attempt to unlink/rename/link might fail. To summarize, this bug can happen both because of 1. changes in layout 2. migration of file in the window b/w lookup and rename fops.
Observed the same issue on glusterfs version 3.8.4-35.el7rhgs.x86_64. Steps: ====== 1) On a nfs-ganesha setup, create a distributed-replicate volume and start it. 2) nfs mount it on multiple clients. 3) Create few files from the mount point. 4) Add few bricks and trigger rebalance. 5) From one client start renaming the files, and from other client start changing file permission and continuous lookups. Few files rename operation failed with error 'No such file or directory'. on lookup from mount point we can find those files. Mount point: ============= mv: cannot move ‘rename_0_file_32’ to ‘rename_1_file_32’: No such file or directory mv: cannot move ‘rename_0_file_38’ to ‘rename_1_file_38’: No such file or directory mv: cannot move ‘rename_0_file_66’ to ‘rename_1_file_66’: No such file or directory mv: cannot move ‘rename_0_file_75’ to ‘rename_1_file_75’: No such file or directory mv: cannot move ‘rename_0_file_79’ to ‘rename_1_file_79’: No such file or directory mv: cannot move ‘rename_0_file_142’ to ‘rename_1_file_142’: No such file or directory mv: cannot move ‘rename_0_file_218’ to ‘rename_1_file_218’: No such file or directory mv: cannot move ‘rename_0_file_222’ to ‘rename_1_file_222’: No such file or directory mv: cannot move ‘rename_0_file_239’ to ‘rename_1_file_239’: No such file or directory mv: cannot move ‘rename_0_file_295’ to ‘rename_1_file_295’: No such file or directory mv: cannot move ‘rename_0_file_300’ to ‘rename_1_file_300’: No such file or directory mv: cannot move ‘rename_0_file_375’ to ‘rename_1_file_375’: No such file or directory mv: cannot move ‘rename_0_file_400’ to ‘rename_1_file_400’: No such file or directory mv: cannot move ‘rename_0_file_426’ to ‘rename_1_file_426’: No such file or directory mv: cannot move ‘rename_0_file_514’ to ‘rename_1_file_514’: No such file or directory mv: cannot move ‘rename_0_file_525’ to ‘rename_1_file_525’: No such file or directory mv: cannot move ‘rename_0_file_556’ to ‘rename_1_file_556’: No such file or directory mv: cannot move ‘rename_0_file_679’ to ‘rename_1_file_679’: No such file or directory mv: cannot move ‘rename_0_file_809’ to ‘rename_1_file_809’: No such file or directory mv: cannot move ‘rename_0_file_817’ to ‘rename_1_file_817’: No such file or directory
Hit this issue on 3.4.0(3.12.2-7) while doing the same steps as in the description.