+++ This bug was initially created as a clone of Bug #1210965 +++ +++ This bug was initially created as a clone of Bug #1210719 +++ Description of problem: ======================= Geo-replication is very slow, not able to sync all the files to slave. Geo-replication state moving to faulty very frequently. Reproduction: 1. Create 2 node replica master volume and slave volume. 2. Set up geo-replication from master volume to slave volume. 3. Put numerous files into master volume. (I'm currently testing with 52 million files which size is 5KB. CU's case is same file size and the number of file is 45 million.) 4. Check the geo-replication status, 'df -h', 'df -i' command output. 5. Initially, the geo-replication crawl status is Changelog crawl. 6. After several times(From my testing, it took 1 - 2 days in our lab env), the geo-replication crawl status changed to History crawl then the file transfer to slave volume seems to be not working. ~~~ [Sample result] <Master side> [root@master1 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_master1-lv_root 17G 4.8G 11G 31% / tmpfs 4.0G 0 4.0G 0% /dev/shm /dev/vda1 477M 28M 424M 7% /boot /dev/vdb 3.0T 130G 2.9T 5% /data [root@master1 ~]# df -i Filesystem Inodes IUsed IFree IUse% Mounted on /dev/mapper/vg_master1-lv_root 1093440 64936 1028504 6% / tmpfs 1023965 2 1023963 1% /dev/shm /dev/vda1 128016 38 127978 1% /boot /dev/vdb 322122496 15507582 306614914 5% /data <Slave side> [root@slave1 geo-replication-slaves]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_slave1-lv_root 139G 20G 112G 16% / tmpfs 4.0G 0 4.0G 0% /dev/shm /dev/vda1 477M 28M 424M 7% /boot /dev/vdb 3.0T 35G 3.0T 2% /data localhost:******* 3.0T 35G 3.0T 2% /mnt [root@slave1 geo-replication-slaves]# df -i Filesystem Inodes IUsed IFree IUse% Mounted on /dev/mapper/vg_slave1-lv_root 9224192 46456 9177736 1% / tmpfs 1023966 2 1023964 1% /dev/shm /dev/vda1 128016 38 127978 1% /boot /dev/vdb 322122496 5145872 316976624 2% /data localhost:****** 322122496 5145872 316976624 2% /mnt <--- huge difference between master and slave. ~~~ <Master side> [root@master1 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_master1-lv_root 17G 3.6G 12G 23% / tmpfs 4.0G 0 4.0G 0% /dev/shm /dev/vda1 477M 28M 424M 7% /boot /dev/vdb 3.0T 78G 3.0T 3% /data ~~~ ~~~ <Slave side> [root@slave1 geo-replication-slaves]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_slave1-lv_root 139G 7.5G 124G 6% / tmpfs 4.0G 0 4.0G 0% /dev/shm /dev/vda1 477M 28M 424M 7% /boot /dev/vdb 3.0T 29G 3.0T 1% /data ~~~ ~~~ Status shows History Crawl. --- Additional comment from Bipin Kunal on 2015-04-10 08:31:35 EDT --- RCA done by Aravinda, Hi Bipin, Thanks for the setup. I root caused the issue. Changelog processing is done in batch, for example if 10 changelogs available for processing then process all at once. Collect Entry, Meta and Data operations separately, All the entry operations like CREATE, MKDIR, MKNOD, LINK, UNLINK will be executed first then rsync will be triggered for whole batch. Stime will get updated once the complete batch is complete. In this setup, large number of Changelogs available for processing during History Changelogs processing(Since batch size is automatic). Entry operations are complete, but while trying rsync, worker goes to faulty for some reason and restarted. Since stime is not updated, it has to process all the changelogs again. While processing same changelogs again, all CREATE will get EEXIST since all the files created in previous run. To understand better we will consider three changelogs in batch, CHANGELOG.1428375039, CHANGELOG.1428375054, CHANGELOG.1428375069 each having 850 creates and 850 data operations recorded. Geo-rep picks all the three files for processing, collect all the entry operations and data operations(2550 Creates and 2550 data). After processing 2550 entries if worker failed, Geo-rep has to process all 2550 entries again(But gets EEXIST). Solution: Geo-rep should limit the batch size either based on number of changelogs or based on number of records to be processed. Once this smaller/Optimal batch processed, stime will be updated. Even if worker crashes, Geo-rep has to repeat only small delta instead of large set. Workaround: Not available. :( Geo-rep is not failed, only thing is it is taking more time to come to the speed. If workers not crashed, it will complete sync. If workers crash in between, then again retry the same and time consuming :( I will work on this as priority, will post the patch soon. -- regards Aravinda --- Additional comment from Anand Avati on 2015-04-11 12:02:24 EDT --- REVIEW: http://review.gluster.org/10202 (geo-rep: Limit number of changelogs to process in batch) posted (#1) for review on master by Aravinda VK (avishwan) --- Additional comment from Anand Avati on 2015-04-27 07:23:44 EDT --- REVIEW: http://review.gluster.org/10202 (geo-rep: Limit number of changelogs to process in batch) posted (#2) for review on master by Aravinda VK (avishwan) --- Additional comment from Anand Avati on 2015-04-28 13:39:45 EDT --- COMMIT: http://review.gluster.org/10202 committed in master by Vijay Bellur (vbellur) ------ commit 428933dce2c87ea62b4f58af7d260064fade6a8b Author: Aravinda VK <avishwan> Date: Sat Apr 11 20:03:47 2015 +0530 geo-rep: Limit number of changelogs to process in batch Changelog processing is done in batch, for example if 10 changelogs available for processing then process all at once. Collect Entry, Meta and Data operations separately, All the entry operations like CREATE, MKDIR, MKNOD, LINK, UNLINK will be executed first then rsync will be triggered for whole batch. Stime will get updated once the complete batch is complete. In case of large number of Changelogs in a batch, If geo-rep fails after Entry operations, but before rsync then on restart, it again starts from the beginning since stime is not updated. It has to process all the changelogs again. While processing same changelogs again, all CREATE will get EEXIST since all the files created in previous run. Big hit for performance. With this patch, Geo-rep limits number of changelogs per batch based on Changelog file size. So that when geo-rep fails it has to retry only last batch changelogs since stime gets updated after each batch. BUG: 1210965 Change-Id: I844448c4cdcce38a3a2e2cca7c9a50db8f5a9062 Signed-off-by: Aravinda VK <avishwan> Reviewed-on: http://review.gluster.org/10202 Reviewed-by: Kotresh HR <khiremat> Tested-by: NetBSD Build System Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Vijay Bellur <vbellur>
REVIEW: http://review.gluster.org/10499 (geo-rep: Limit number of changelogs to process in batch) posted (#1) for review on release-3.7 by Aravinda VK (avishwan)
REVIEW: http://review.gluster.org/10499 (geo-rep: Limit number of changelogs to process in batch) posted (#2) for review on release-3.7 by Aravinda VK (avishwan)
COMMIT: http://review.gluster.org/10499 committed in release-3.7 by Venky Shankar (vshankar) ------ commit 613414e837cb5a09c3adbf2258ad691151f1c7e1 Author: Aravinda VK <avishwan> Date: Sat Apr 11 20:03:47 2015 +0530 geo-rep: Limit number of changelogs to process in batch Changelog processing is done in batch, for example if 10 changelogs available for processing then process all at once. Collect Entry, Meta and Data operations separately, All the entry operations like CREATE, MKDIR, MKNOD, LINK, UNLINK will be executed first then rsync will be triggered for whole batch. Stime will get updated once the complete batch is complete. In case of large number of Changelogs in a batch, If geo-rep fails after Entry operations, but before rsync then on restart, it again starts from the beginning since stime is not updated. It has to process all the changelogs again. While processing same changelogs again, all CREATE will get EEXIST since all the files created in previous run. Big hit for performance. With this patch, Geo-rep limits number of changelogs per batch based on Changelog file size. So that when geo-rep fails it has to retry only last batch changelogs since stime gets updated after each batch. BUG: 1217930 Change-Id: I844448c4cdcce38a3a2e2cca7c9a50db8f5a9062 Signed-off-by: Aravinda VK <avishwan> Reviewed-on: http://review.gluster.org/10202 Reviewed-by: Kotresh HR <khiremat> Reviewed-by: Vijay Bellur <vbellur> Reviewed-on: http://review.gluster.org/10499 Tested-by: Gluster Build System <jenkins.com> Tested-by: NetBSD Build System Reviewed-by: Venky Shankar <vshankar>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report. glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939 [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user