storage/posix: Fix race in file creation when brick was offline during delete. Summary: If a file is deleted while a brick is offline, and a file by the same name is subsequently created once the brick has been brought back online but not fully healed, the brick returns the old, pre-deletion GFID for the file, This results in lack of consensus about the GFID, which is bad, and usually manifests as EROFS. Fix by making bricks honor the GFID requested by the server when a file is created, even if there is a pre-existing one. This fixes the race and has the additional benefit that any clients with the old GFID will get ESTALE accessing the old file instead of silently getting access to a file which differs fromthe one they thought they were accessing. - This is a port of D3122637 to 3.8. Test Plan: - 12 hours of untarring linux source tarball onto 2-way replicated volume while bringing the second replica server up and down every 30 seconds. Previously consistently failed with EROFS within half an hour. Logs show that new code is being invoked. Mass MD5 over resulting directory shows that all files were correctly untarr'd and are readable. - Prove tests Signed-off-by: Shreyas Siravara <sshreyas> Change-Id: I069e55a5bdfc8de81b1602a093d36fa82f38f9cd Reviewed-on: https://review.gluster.org/16339 Reviewed-by: Kevin Vigor <kvigor> Smoke: Gluster Build System <jenkins.org> Tested-by: Shreyas Siravara <sshreyas> CentOS-regression: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org>
REVIEW: https://review.gluster.org/16816 (storage/posix: Fix race in file creation when brick was offline during delete.) posted (#1) for review on master by Vijay Bellur (vbellur)
REVIEW: https://review.gluster.org/16816 (storage/posix: Fix race in file creation when brick was offline during delete.) posted (#2) for review on master by Vijay Bellur (vbellur)
1 patch merged, one abandoned. What's the latest status?
bugs
Pranith - can you please provide your thoughts about what we want to do here?
I see that there is only one patch. The solution changes gfid of a file which has lot of side effects. I requested more data to debug the problem further at the time. Since we don't have sufficient data to progress on the bug I am going to close the bug.