Hide Forgot
Problem with READDIR (default by now) is that during distribute volume expansion these would be the steps: 1 take a list of all files (using find cmd) 2 expand distribute 3 trigger script to break the hash ranges and reassign hash range (as part of dht selfheal) 4 stat the files (taken in 1st step) to create link files 5 "defrag" the files so that data is moved to subvol where link file exists. between 1 and 4 we can not be creating any new files which will not be acceptable from customers point of view. Can we change default READDIR to READDIRP and do the following: We can still avoid the disk seek overhead caused in posix_do_readdir() by having two loops instead of one. loop { readdir(); stat(); } break this into: loop_1 { readdir(); } loop_2 { stat(); } Let me know if I am missing something.
In 3.1.x releases we already solved this by having an option 'use-readdirp' option in distribute. I guess similar option is better in 3.0.x too. Because to solve a one or two time issues we can't giveup performance of glusterfs all along. Also, its advised to run defrag scripts over separate mount point. -Amar
(In reply to comment #1) > In 3.1.x releases we already solved this by having an option 'use-readdirp' > option in distribute. I guess similar option is better in 3.0.x too. Because to > solve a one or two time issues we can't giveup performance of glusterfs all > along. > > Also, its advised to run defrag scripts over separate mount point. No, if readdirp can give almost same performance as readdir (as mentioned by having two loops instead of one) and we have the advantage of readdirp also (as mentioned regarding selfheal) why not have readdirp by default?
(In reply to comment #2) > (In reply to comment #1) > > In 3.1.x releases we already solved this by having an option 'use-readdirp' > > option in distribute. I guess similar option is better in 3.0.x too. Because to > > solve a one or two time issues we can't giveup performance of glusterfs all > > along. > > > > Also, its advised to run defrag scripts over separate mount point. > Also even if defrag script is run from a separate mount point, if we create a file between step 1 and step 4, this file might not be seen after we complete the defrag process. lets say we create a file "foo" which goes to subvol-2. Now we complete defrag process and now if foo is hashed to subvol-3 we will not see foo on mount point because dht_readdir_cbk will filter it out. > No, if readdirp can give almost same performance as readdir (as mentioned by > having two loops instead of one) and we have the advantage of readdirp also (as > mentioned regarding selfheal) why not have readdirp by default?
Avati, Can you please take this decision?
(In reply to comment #4) > Avati, > > Can you please take this decision? We need to run benchmarks and get hard numbers before deciding Avati
Latest git master branch has readdirp as default