Description of problem: We have a windows client to access the gluster volume. After copy -> paste -> rename a file on the client, the glusterfind pre session output the wrong info about the list of files that are modified. How reproducible: Each time Steps to Reproduce: 1. create a backup session 2. create a fileA on client and check it is in the brick dir of the gluster node. 3. copy fileA by right click and paste it to the same dir. 4. rename fileA to fileB, and check both fileA and fileB are in the brick dir. 5. run glusterfind pre session to generate output file Actual results: Only fileA - Copy was in the output file as a New file NEW test/fileA+-+Copy Expected results: Both fileA and fileB should in the output, and "fileA - Copy" shouldn't existed. NEW test/fileA NEW test/fileB Additional info: md-cache-timeout has been set to 0
Tentative upstream patch: https://review.gluster.org/17355
upstream patch : https://review.gluster.org/#/c/17439/
The summary of the bug is that there was a difference of expectation. Essentially this is NOT A BUG. However, there were two enhancements that got generated out of this exercise: 1. add --end-time option to be used with --since-time for the "query" command This option will help to specify the time up to which the change set is desired. An application which maintains the check-pointing time-stamp externally will find it helpful to control and supply the --end-time to be passed to glusterfind. The --end-time option is optional: the default is to take the current time and deduct the changelog roll-over time and use the resulting value as the end-time. 2. add --field-separator option; to be used with "pre" and "query" commands glusterfind output file has change tag followed by at most two file names; all of which are space separated. If the file name itself has embedded spaces and if the user chooses to avoid file name encoding, then it becomes difficult to identify file name string boundary. The --field-separator option can be used to pass a well known string to be embedded between the string fields in the output file. The output file can then be processed to safely extract file names with embedded spaces by parsing out the field separator string.
downstream patches: https://code.engineering.redhat.com/gerrit/#/c/108767/ https://code.engineering.redhat.com/gerrit/#/c/108769/
Executed basic test case to verify this RFE. will execute some more cases,to verify this RFE further. Bug verified on build glusterfs-3.8.4-33 [root@dhcp35-199 yum.repos.d]# glusterfind query --field-separator "=" --since-time $((now - 3600)) --end-time $now vol0 /tmp/v1.log Generated output file /tmp/v1.log [root@dhcp35-199 yum.repos.d]# cat /tmp/v1.log RENAME=Sales+Department+Budget.xls=Sales+Department+Budget-2017.xls NEW=file1 NEW=file2 NEW=file3 NEW=file6 NEW=file9 NEW=file10 NEW=file4 NEW=file5 NEW=file7 NEW=file8 Generated output file /tmp/a1.log [root@dhcp35-199 yum.repos.d]# cat /tmp/a1.log NEW=test1 NEW=test2 NEW=test4 NEW=test5 NEW=test3 NEW=test8 NEW=test9 NEW=test6 NEW=test7 NEW=test10 NEW=Sales+Department+Budget-2017.xls
We can only keep the doc text short for the errata because of character limits, so providing the previous doc text here: Feature: glusterfind query Reason: Backup and Restore software usually maintain their own checkpoints/timestamps outside of glusterfind. The "query" command can be used to extract changed files by providing a timestamp as desired by the backup application. Result: The synopsis of gluserfind "query" command is as follows: usage: glusterfind query [-h] [--since-time SINCE_TIME] [--end-time END_TIME] [--no-encode] [--full] [--debug] [--disable-partial] [--output-prefix OUTPUT_PREFIX] [-N] [--tag-for-full-find TAG_FOR_FULL_FIND] [--field-separator FIELD_SEPARATOR] volume outfile The glusterfind "query" command is similar to the "pre" command. It helps to fetch the list of changed files. The "query" command expects one of the following options: 1. --since-time <UNIX-time-stamp> --end-time <UNIX-time-stamp> where: --since-time: the starting time from which changes are desired --end-time: the time upto which changed files are desired 2. --full All the list of the files in the volume. This option does not look at the changelogs and instead runs a command at the bricks to fetch file names. The "query" command does not create any internal session files to log checkpoint timestamps. The --field-separator option accepts a string that can be used to delimit the strings on a single line in the output file.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2774