Bug 763381 (GLUSTER-1649)

Summary: can we make READDIRP by default?
Product: [Community] GlusterFS Reporter: Krishna Srinivas <krishna>
Component: distributeAssignee: Anand Avati <aavati>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 3.0.5CC: chrisw, gluster-bugs, vijay
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Krishna Srinivas 2010-09-20 22:18:03 UTC
Problem with READDIR (default by now) is that during distribute volume expansion these would be the steps:

1 take a list of all files (using find cmd)
2 expand distribute
3 trigger script to break the hash ranges and reassign hash range (as part of dht selfheal)
4 stat the files (taken in 1st step) to create link files
5 "defrag" the files so that data is moved to subvol where link file exists.

between 1 and 4 we can not be creating any new files which will not be acceptable from customers point of view.

Can we change default READDIR to READDIRP and do the following:

We can still avoid the disk seek overhead caused in posix_do_readdir() by having two loops instead of one. 

loop {
   readdir();
   stat();
}

break this into:

loop_1 {
   readdir();
}
loop_2 {
   stat();
}

Let me know if I am missing something.

Comment 1 Amar Tumballi 2010-09-21 00:51:19 UTC
In 3.1.x releases we already solved this by having an option 'use-readdirp' option in distribute. I guess similar option is better in 3.0.x too. Because to solve a one or two time issues we can't giveup performance of glusterfs all along. 

Also, its advised to run defrag scripts over separate mount point.

-Amar

Comment 2 Krishna Srinivas 2010-09-21 01:41:49 UTC
(In reply to comment #1)
> In 3.1.x releases we already solved this by having an option 'use-readdirp'
> option in distribute. I guess similar option is better in 3.0.x too. Because to
> solve a one or two time issues we can't giveup performance of glusterfs all
> along. 
> 
> Also, its advised to run defrag scripts over separate mount point.

No, if readdirp can give almost same performance as readdir (as mentioned by having two loops instead of one) and we have the advantage of readdirp also (as mentioned regarding selfheal) why not have readdirp by default?

Comment 3 Krishna Srinivas 2010-09-21 01:48:48 UTC
(In reply to comment #2)
> (In reply to comment #1)
> > In 3.1.x releases we already solved this by having an option 'use-readdirp'
> > option in distribute. I guess similar option is better in 3.0.x too. Because to
> > solve a one or two time issues we can't giveup performance of glusterfs all
> > along. 
> > 
> > Also, its advised to run defrag scripts over separate mount point.
> 

Also even if defrag script is run from a separate mount point, if we create a file between step 1 and step 4, this file might not be seen after we complete the defrag process.

lets say we create a file "foo" which goes to subvol-2. Now we complete defrag process and now if foo is hashed to subvol-3 we will not see foo on mount point because dht_readdir_cbk will filter it out.

> No, if readdirp can give almost same performance as readdir (as mentioned by
> having two loops instead of one) and we have the advantage of readdirp also (as
> mentioned regarding selfheal) why not have readdirp by default?

Comment 4 Amar Tumballi 2010-09-23 04:25:37 UTC
Avati,

Can you please take this decision?

Comment 5 Anand Avati 2010-09-23 04:36:12 UTC
(In reply to comment #4)
> Avati,
> 
> Can you please take this decision?

We need to run benchmarks and get hard numbers before deciding

Avati

Comment 6 Anand Avati 2010-10-01 05:13:34 UTC
Latest git master branch has readdirp as default