Bug 1024465

Summary: Dist-geo-rep: Crawling + processing for 14 million pre-existing files take very long time
Product: [Community] GlusterFS Reporter: Venky Shankar <vshankar>
Component: geo-replicationAssignee: bugs <bugs>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: urgent Docs Contact:
Priority: urgent    
Version: mainlineCC: aavati, avishwan, bengland, bugs, csaba, gluster-bugs, kcleveng, kparthas, nkhare, perfbz, racpatel, rhs-bugs, rwheeler, sdharane, vagarwal, vbhat
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1000948 Environment:
Last Closed: 2015-04-09 11:20:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1000948    
Bug Blocks: 957769    

Description Venky Shankar 2013-10-29 17:35:22 UTC
Description of problem:

At master site on 4x2 volume, 14 M files which were mix to small and large files 
were created. Average file size for small files were 32 K and large files were size of 10 GB. After creation geo-rep has been started. 

After the initial crawl, in XSYNC-CHANGELOG.1377249839 file on one of the master node 6281134 entries were created. This file was last modified on 23rd August at 16:26. And as of today on 26 August at 3:20 the geo-replication has not started transferring any file . As per my understanding and from discussion with Venky,  during all this time the processing of the XSYNC based changelog is happening, which is nothing but "pick up an entry + stat + keep it in memory". And because of this 2 of the python processes are consuming ~5.5 GB in memory. 
 
By looking at the gfid from strace output then grepping for the line number from the XSYNC-CHANGELOG.1377249839 file it looks till now ~60% files have been processed. Similarly if we look at the throughput for processing then it is ~10 files/sec. 


Actual results:
- The geo-rep does not start transferring files after waiting for long time.
- The memory usage in the crawling + processing step in very high.
- There is no way to see the progress in this phase.

Expected results:
- The geo-rep should start transferring files without such long wait. 
- The memory footprint at the crawling + processing stage should be less. 
- There should be a way to see the progress in this phase.


Additional info:
- From the description of the problem the processing of XSYNC based changelog file is taking lot of time. From that it look the "stat" call would be taking the most if the time.

Comment 1 Anand Avati 2013-10-29 17:50:57 UTC
REVIEW: http://review.gluster.org/6165 (gsyncd / geo-rep: "threaded" hybrid crawl) posted (#1) for review on master by Venky Shankar (vshankar)

Comment 2 Anand Avati 2013-10-29 18:15:42 UTC
REVIEW: http://review.gluster.org/6165 (gsyncd / geo-rep: "threaded" hybrid crawl) posted (#2) for review on master by Venky Shankar (vshankar)

Comment 3 Ben England 2013-10-30 14:43:16 UTC
 cc'ing perfbz

Comment 4 Anand Avati 2013-11-02 14:40:11 UTC
REVIEW: http://review.gluster.org/6165 (gsyncd / geo-rep: "threaded" hybrid crawl) posted (#3) for review on master by Venky Shankar (vshankar)

Comment 7 Aravinda VK 2015-04-09 11:20:33 UTC
XSync/Hybrid crawl now generates Changelogs when crawling completes 8k entries. And processing will be started immediately. This issue is not valid now. Closing this bug. Please reopen if this issue found again.