Description of problem: ======================== During georeplication and tiering interop testing, found a file in slave which is synced with permission "-r--r-S-wT." . On Master: ========== [root@dj scripts]# ll /mnt/master/thread5/level08/level18/ total 110 --w--wxrw-. 1 22974 54638 17345 Jan 27 2016 56a7f56c%%237OHICJOQ --w-rwx-w-. 1 3425 18861 12836 Jan 27 2016 56a7f56c%%7RZMAIM1QZ -r--r---w-. 1 39857 50648 11631 Jan 27 2016 56a7f56c%%A0LGBBEK3A ----r---wx. 1 41939 34071 19524 Jan 27 2016 56a7f56c%%E9YSMFSZTN -r-xrwxr--. 1 10593 2812 13769 Jan 27 2016 56a7f56c%%SQXKUF0JJZ d---r-xrwx. 4 25608 7913 8456 Jan 27 2016 level28 drwxr-xr-x. 2 root root 213 Jan 27 2016 symlink_to_files [root@dj scripts]# On Slave: ========= [root@dj scripts]# ll /mnt/slave/thread5/level08/level18/ total 75 --w--wxrw-. 1 22974 54638 17345 Jan 27 2016 56a7f56c%%237OHICJOQ --w-rwx-w-. 1 3425 18861 12836 Jan 27 2016 56a7f56c%%7RZMAIM1QZ -r--r-S-wT. 1 39857 50648 11631 Jan 27 2016 56a7f56c%%A0LGBBEK3A ----r---wx. 1 41939 34071 19524 Jan 27 2016 56a7f56c%%E9YSMFSZTN -r-xrwxr--. 1 10593 2812 13769 Jan 27 2016 56a7f56c%%SQXKUF0JJZ d---r-xrwx. 4 25608 7913 528 Jan 27 2016 level28 drwxr-xr-x. 2 root root 225 Jan 27 2016 symlink_to_files [root@dj scripts]# During promote of a file, the tiering shows sticky bit set from the glusterfs mount. This could have been picked by rsync during sync with preserve permission and synced to slave with sticky bit set. Appending a file to be picked up again for sync resolves the issue as it syncs the latest permission. Unit test on local system of dev confirms the following: During promotion, the sticky bit is shown on mount and during demote it doesn't. during promotion on mount (ll on directory) -rw-r-Sr-T. 1 root root 701000005 Jan 27 18:47 file1 [root@rafi 0]# ll total 684571 -rw-r-Sr-T. 1 root root 701000005 Jan 27 18:47 file1 [root@rafi 0]# ll total 684571 -rw-r-Sr-T. 1 root root 701000005 Jan 27 18:47 file1 [root@rafi 0]# ll total 684571 -rw-r-Sr-T. 1 root root 701000005 Jan 27 18:47 file1 [root@rafi 0]# ls -lrt total 684571 -rw-r-Sr-T. 1 root root 701000005 Jan 27 18:47 file1 during demotion on mount (ll on directory) [root@rafi 0]# ll total 684571 -rw-r--r--. 1 root root 701000000 Jan 27 18:46 file1 [root@rafi 0]# ll total 684571 -rw-r--r--. 1 root root 701000000 Jan 27 18:46 file1 [root@rafi 0]# ll total 684571 -rw-r--r--. 1 root root 701000000 Jan 27 18:46 file1 [root@rafi 0]# ll total 684571 Raising but against geo-replication as the consumer is rsync and use case is failed in geo-replication. Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.7.5-17.el7rhgs.x86_64 How reproducible: ================= Its a race, a file needs to be picked up for syncing from master to slave during tiering promotion Steps to Reproduce: =================== Found during testing of geo-replication different fops to be synced to slave while promote demotes were in progress on master Actual results: =============== Files is synced with S bit set Expected results: ================= Files should sync as regular file without any S bit set
Minor comments, Please feel free to edit as you wish. During a file promotion , rebalance process sets sticky bit and suid/sgid bit,and strip out these bits when it hands the stat to the client. It removes these bits when it completes the migration. But, when a file is migrating and if you tried to list the file using readdirp call, we missed out the striping part and the two mentioned flags will be given to clients. As a consequence of the above mentioned problem, If rsync happens while the bits are applied, the bits remain applied to the file as it is synced to the destination, impairing accessibility on the destination. This can happen in any geo-replicated configuration, but the likelihood increases with tiering because the rebalance process is continuous.
Looks good to me.