Description of problem: We have an application which leverage POSIX atomic move semantic. Therefore, we allow the same file to be uploaded multiple times, since it can be commited atomically to the file system. However, when multiple clients try to upload the same file concurrently, some gets a ESTALE error on the move operation. Version-Release number of selected component (if applicable): 3.7.5, 3.8.4 How reproducible: It can be reproduced by creating lots of temporary file concurrently, on multiple machines, and to try to move them to the same final location. Steps to Reproduce: 1. Log on multiple machines 1. Execute "while true; do uuid="`uuidgen`"; echo "some data" > "test$uuid"; mv "test$uuid" "test" -f; done &" 2. Wait until the move command fails Actual results: mv: cannot move ‘test5f4c981f-efcb-4ba8-b017-cf4acb76abcc’ to ‘test’: No such file or directory mv: cannot move ‘test7cf00867-4982-4206-abcf-e5e836460eda’ to ‘test’: No such file or directory mv: cannot move ‘testcacb6c40-c164-435f-b118-7a14687bf4bd’ to ‘test’: No such file or directory mv: cannot move ‘test956ff19d-0a16-49bd-a877-df18311570dc’ to ‘test’: No such file or directory mv: cannot move ‘test6e36eb01-9e54-4b50-8de8-cebb063554ba’ to ‘test’: Structure needs cleaning Expected results: No output because no error Additional info:
Du, Nitya, Based on my debugging inodelk keeps failing with ESTALE. When I checked dht_rename(), I see that the inodelk is done both on source and destination inodes. But because the test above can lead to deletion of the file we are trying to lock on by the other 'while ()...' process the inodelk fails with ESTALE. When I changed the test to rename to independent filenames, then everything works as expected. On mount1: while true; do uuid="`uuidgen`"; echo "some data" > "test$uuid"; mv "test$uuid" "test" -f; done On mount2: while true; do uuid="`uuidgen`"; echo "some data" > "test$uuid"; mv "test$uuid" "test2" -f; done Not sure how to fix this in DHT though. For now re-assigning the bug to DHT.
(In reply to Pranith Kumar K from comment #1) > Du, Nitya, > Based on my debugging inodelk keeps failing with ESTALE. When I > checked dht_rename(), I see that the inodelk is done both on source and > destination inodes. But because the test above can lead to deletion of the > file we are trying to lock on by the other 'while ()...' process the inodelk > fails with ESTALE. When I changed the test to rename to independent > filenames, then everything works as expected. > On mount1: > while true; do uuid="`uuidgen`"; echo "some data" > "test$uuid"; mv > "test$uuid" "test" -f; done > > On mount2: > while true; do uuid="`uuidgen`"; echo "some data" > "test$uuid"; mv > "test$uuid" "test2" -f; done > > Not sure how to fix this in DHT though. For now re-assigning the bug to DHT. locking in dht_rename has two purposes: 1. serialize and ensure atomicity (of each rename) when two parallel renames are done on the same file. 2. serialize a rename with file migration during rebalance. The current use-case falls into category 1. I think using entrylk instead of inodelk solves the problem. However need to think more about this. Assigning bug to Kotresh as he is working on synchronization issues.
(In reply to Raghavendra G from comment #2) > (In reply to Pranith Kumar K from comment #1) > > Du, Nitya, > > Based on my debugging inodelk keeps failing with ESTALE. When I > > checked dht_rename(), I see that the inodelk is done both on source and > > destination inodes. But because the test above can lead to deletion of the > > file we are trying to lock on by the other 'while ()...' process the inodelk > > fails with ESTALE. When I changed the test to rename to independent > > filenames, then everything works as expected. > > On mount1: > > while true; do uuid="`uuidgen`"; echo "some data" > "test$uuid"; mv > > "test$uuid" "test" -f; done > > > > On mount2: > > while true; do uuid="`uuidgen`"; echo "some data" > "test$uuid"; mv > > "test$uuid" "test2" -f; done > > > > Not sure how to fix this in DHT though. For now re-assigning the bug to DHT. > > locking in dht_rename has two purposes: > 1. serialize and ensure atomicity (of each rename) when two parallel renames > are done on the same file. > 2. serialize a rename with file migration during rebalance. > > The current use-case falls into category 1. I think using entrylk instead of > inodelk solves the problem. However need to think more about this. > > Assigning bug to Kotresh as he is working on synchronization issues. Just a word of caution, that it is important to do it in backward compatible way.
As Pranith explained, it's a bug in dht_rename code. The fact that dht_rename expects a lock to be successful on "dst" in "mv src dst" is not posix compliant. <man 2 rename> ENOENT The link named by oldpath does not exist; or, a directory component in newpath does not exist; or, oldpath or newpath is an empty string. </man> It should ignore ESTALE/ENOENT errors while trying to acquire lock on "dst" inode. The issue is that "dst" exists when a lookup happened, but it got deleted by the time a rename fop hits dht. Dht, relying on the information it got in lookup sends a lock on "dst" which fails with ESTALE. As mentioned in the bz, exploring using entrylk instead of inodelk is one option. I'll get back to you on this. Sorry about the delay.
Kotresh is working on synchronization issues on directories [1]. This issue is on files and won't be fixed by [1]. Hence resetting the assignee to default owner of component. [1] https://review.gluster.org/15472
This bug is getting closed because the 3.8 version is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.