Bug 1024465 - Dist-geo-rep: Crawling + processing for 14 million pre-existing files take very long time
Summary: Dist-geo-rep: Crawling + processing for 14 million pre-existing files take ve...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: geo-replication
Version: mainline
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On: 1000948
Blocks: 957769
TreeView+ depends on / blocked
 
Reported: 2013-10-29 17:35 UTC by Venky Shankar
Modified: 2015-04-09 11:20 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 1000948
Environment:
Last Closed: 2015-04-09 11:20:33 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Venky Shankar 2013-10-29 17:35:22 UTC
Description of problem:

At master site on 4x2 volume, 14 M files which were mix to small and large files 
were created. Average file size for small files were 32 K and large files were size of 10 GB. After creation geo-rep has been started. 

After the initial crawl, in XSYNC-CHANGELOG.1377249839 file on one of the master node 6281134 entries were created. This file was last modified on 23rd August at 16:26. And as of today on 26 August at 3:20 the geo-replication has not started transferring any file . As per my understanding and from discussion with Venky,  during all this time the processing of the XSYNC based changelog is happening, which is nothing but "pick up an entry + stat + keep it in memory". And because of this 2 of the python processes are consuming ~5.5 GB in memory. 
 
By looking at the gfid from strace output then grepping for the line number from the XSYNC-CHANGELOG.1377249839 file it looks till now ~60% files have been processed. Similarly if we look at the throughput for processing then it is ~10 files/sec. 


Actual results:
- The geo-rep does not start transferring files after waiting for long time.
- The memory usage in the crawling + processing step in very high.
- There is no way to see the progress in this phase.

Expected results:
- The geo-rep should start transferring files without such long wait. 
- The memory footprint at the crawling + processing stage should be less. 
- There should be a way to see the progress in this phase.


Additional info:
- From the description of the problem the processing of XSYNC based changelog file is taking lot of time. From that it look the "stat" call would be taking the most if the time.

Comment 1 Anand Avati 2013-10-29 17:50:57 UTC
REVIEW: http://review.gluster.org/6165 (gsyncd / geo-rep: "threaded" hybrid crawl) posted (#1) for review on master by Venky Shankar (vshankar)

Comment 2 Anand Avati 2013-10-29 18:15:42 UTC
REVIEW: http://review.gluster.org/6165 (gsyncd / geo-rep: "threaded" hybrid crawl) posted (#2) for review on master by Venky Shankar (vshankar)

Comment 3 Ben England 2013-10-30 14:43:16 UTC
 cc'ing perfbz

Comment 4 Anand Avati 2013-11-02 14:40:11 UTC
REVIEW: http://review.gluster.org/6165 (gsyncd / geo-rep: "threaded" hybrid crawl) posted (#3) for review on master by Venky Shankar (vshankar)

Comment 7 Aravinda VK 2015-04-09 11:20:33 UTC
XSync/Hybrid crawl now generates Changelogs when crawling completes 8k entries. And processing will be started immediately. This issue is not valid now. Closing this bug. Please reopen if this issue found again.


Note You need to log in before you can comment on or make changes to this bug.