Description of problem: Looks like the replication works properly. The status is "Active" and "Changelog Crawl", but in the gsyncd.log are warnings and errors logged every minute. Version-Release number of selected component (if applicable): 4.1.3 How reproducible: Create GlusterFS volumes (w/ one brick only) on master, interimmaster, and slaves. Create geo replication. Steps to Reproduce: 1. gluster volume geo-replication <vol> geoaccount@<slave_node>::<vol> create 2. gluster volume geo-replication <vol> geoaccount@<slave_node>::<vol> start Actual results: gsyncd.log [2018-09-12 15:24:59.966188] I [gsyncd(config-get):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/glustervol1_nrcnzj-sld2012.nibr.novartis.net_glustervol1/gsyncd.conf [2018-09-12 15:25:00.232584] I [gsyncd(status):297:main] <top>: Using session config file path=/var/lib/glusterd/geo-replication/glustervol1_nrcnzj-sld2012.nibr.novartis.net_glustervol1/gsyncd.conf [2018-09-12 15:25:28.981199] I [master(worker /bricks/brick1/brick):1460:crawl] _GMaster: slave's time stime=(1536765864, 0) [2018-09-12 15:25:30.617948] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4733 num_files=1 job=3 return_code=23 [2018-09-12 15:25:30.619462] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536765925'] [2018-09-12 15:25:32.924488] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4728 num_files=1 job=2 return_code=23 [2018-09-12 15:25:32.926064] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536765925'] [2018-09-12 15:25:34.969587] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.5028 num_files=1 job=1 return_code=23 [2018-09-12 15:25:34.970601] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536765925'] [2018-09-12 15:25:37.21770] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.3958 num_files=1 job=3 return_code=23 [2018-09-12 15:25:37.23441] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536765925'] [2018-09-12 15:25:39.357233] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4250 num_files=1 job=2 return_code=23 [2018-09-12 15:25:39.358779] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536765925'] [2018-09-12 15:25:41.385746] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4088 num_files=1 job=1 return_code=23 [2018-09-12 15:25:41.387021] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536765925'] [2018-09-12 15:25:43.450726] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4084 num_files=1 job=3 return_code=23 [2018-09-12 15:25:43.451534] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536765925'] [2018-09-12 15:25:45.783606] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4186 num_files=1 job=2 return_code=23 [2018-09-12 15:25:45.785322] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536765925'] [2018-09-12 15:25:47.915402] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.5219 num_files=1 job=1 return_code=23 [2018-09-12 15:25:47.917021] W [master(worker /bricks/brick1/brick):1346:process] _GMaster: incomplete sync, retrying changelogs files=['CHANGELOG.1536765925'] [2018-09-12 15:25:49.887428] I [master(worker /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken duration=1.4296 num_files=1 job=3 return_code=23 [2018-09-12 15:25:49.888284] E [master(worker /bricks/brick1/brick):1325:process] _GMaster: changelogs could not be processed completely - moving on... files=['CHANGELOG.1536765925'] [2018-09-12 15:25:50.135560] I [master(worker /bricks/brick1/brick):1374:process] _GMaster: Entry Time Taken MKD=0 MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=0.0000 UNL=0 [2018-09-12 15:25:50.136039] I [master(worker /bricks/brick1/brick):1384:process] _GMaster: Data/Metadata Time Taken SETA=0 SETX=0 meta_duration=0.0000 data_duration=1536765950.1360 DATA=0 XATT=0 [2018-09-12 15:25:50.136800] I [master(worker /bricks/brick1/brick):1394:process] _GMaster: Batch Completed changelog_end=1536765925 entry_stime=(1536756219, 0) changelog_start=1536765925 stime=(1536765924, 0) duration=21.0007 num_changelogs=1 mode=live_changelog Expected results: No warnings and errors every minute. Additional info: The same CHANGELOG.xxx files in the logs are created in .glusterfs/changelogs on the master and interimmaster nodes. There are lots of changelog files after some time.
This is not a bug: >changelogs could not be processed completely - moving on Those which are not processed are used in retrial and geo-rep never skips any change obviously to maintain data consistency.