Bug 1024465 - Dist-geo-rep: Crawling + processing for 14 million pre-existing files take very long time
Dist-geo-rep: Crawling + processing for 14 million pre-existing files take ve...
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: geo-replication (Show other bugs)
mainline
x86_64 Linux
urgent Severity urgent
: ---
: ---
Assigned To: bugs@gluster.org
:
Depends On: 1000948
Blocks: 957769
  Show dependency treegraph
 
Reported: 2013-10-29 13:35 EDT by Venky Shankar
Modified: 2015-04-09 07:20 EDT (History)
16 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1000948
Environment:
Last Closed: 2015-04-09 07:20:33 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Venky Shankar 2013-10-29 13:35:22 EDT
Description of problem:

At master site on 4x2 volume, 14 M files which were mix to small and large files 
were created. Average file size for small files were 32 K and large files were size of 10 GB. After creation geo-rep has been started. 

After the initial crawl, in XSYNC-CHANGELOG.1377249839 file on one of the master node 6281134 entries were created. This file was last modified on 23rd August at 16:26. And as of today on 26 August at 3:20 the geo-replication has not started transferring any file . As per my understanding and from discussion with Venky,  during all this time the processing of the XSYNC based changelog is happening, which is nothing but "pick up an entry + stat + keep it in memory". And because of this 2 of the python processes are consuming ~5.5 GB in memory. 
 
By looking at the gfid from strace output then grepping for the line number from the XSYNC-CHANGELOG.1377249839 file it looks till now ~60% files have been processed. Similarly if we look at the throughput for processing then it is ~10 files/sec. 


Actual results:
- The geo-rep does not start transferring files after waiting for long time.
- The memory usage in the crawling + processing step in very high.
- There is no way to see the progress in this phase.

Expected results:
- The geo-rep should start transferring files without such long wait. 
- The memory footprint at the crawling + processing stage should be less. 
- There should be a way to see the progress in this phase.


Additional info:
- From the description of the problem the processing of XSYNC based changelog file is taking lot of time. From that it look the "stat" call would be taking the most if the time.
Comment 1 Anand Avati 2013-10-29 13:50:57 EDT
REVIEW: http://review.gluster.org/6165 (gsyncd / geo-rep: "threaded" hybrid crawl) posted (#1) for review on master by Venky Shankar (vshankar@redhat.com)
Comment 2 Anand Avati 2013-10-29 14:15:42 EDT
REVIEW: http://review.gluster.org/6165 (gsyncd / geo-rep: "threaded" hybrid crawl) posted (#2) for review on master by Venky Shankar (vshankar@redhat.com)
Comment 3 Ben England 2013-10-30 10:43:16 EDT
 cc'ing perfbz
Comment 4 Anand Avati 2013-11-02 10:40:11 EDT
REVIEW: http://review.gluster.org/6165 (gsyncd / geo-rep: "threaded" hybrid crawl) posted (#3) for review on master by Venky Shankar (vshankar@redhat.com)
Comment 7 Aravinda VK 2015-04-09 07:20:33 EDT
XSync/Hybrid crawl now generates Changelogs when crawling completes 8k entries. And processing will be started immediately. This issue is not valid now. Closing this bug. Please reopen if this issue found again.

Note You need to log in before you can comment on or make changes to this bug.