Description of problem: I was syncing files from master to slave by enabling the tar+ssh option. But some how few of the regular files are not synced to slave. There is no "SKIPPED FILES" list in the status detail and there is no "skipped files" list in the geo-rep log files as well. Also for some reason unknown the active nodes have switched to xsync crawl instead of changelog crawl. There were no node reboots and no node down. Version-Release number of selected component (if applicable): glusterfs-3.4.0.42rhs-1.el6rhs.x86_64 How reproducible: Hit once in two tries. Not sure if it can reproduced. Steps to Reproduce: 1. Create a geo-rep session between 2*2 dist-rep master node and 2*2 dist-rep slave node. 2. Mount the volume from the node and copy /etc few times and start creating small files with following command. time ./smallfile_cli.py --top /mnt/master/second-dir --threads 10 --file-size 200 --operation create --files 2000 --hash-into-dirs Y 3. Now enable use-tarssh via config command before starting geo-rep. 4. Start the geo-rep session. 5. Wait for files to get synced to slave. Actual results: Files are not synced to slave. arequal checksum from master and slave mount points. [root@lightning ]# /opt/qa/tools/arequal-checksum /mnt/master/ Entry counts Regular files : 25747 Directories : 1382 Symbolic links : 2940 Other : 0 Total : 30069 Metadata checksums Regular files : 47df85 Directories : 3e9 Symbolic links : 3e9 Other : 3e9 Checksums Regular files : 3fd5260d0515381fcef785a7fd0f29f7 Directories : 1a242b4f61752e0f Symbolic links : a047250611b353f Other : 0 Total : e102fab5f8740ad8 [root@lightning ~]# /opt/qa/tools/arequal-checksum /mnt/slave/ Entry counts Regular files : 25638 Directories : 1382 Symbolic links : 2940 Other : 0 Total : 29960 Metadata checksums Regular files : bbb0 Directories : 3e9 Symbolic links : 3e9 Other : 3e9 Checksums Regular files : 2e8370e150161173baca58ca44aad97e Directories : 201d1c7e3e4e1e0f Symbolic links : a047250611b353f Other : 0 Total : be5046054be9e33d Status detail indicate that there is noting left to be synced. MASTER NODE MASTER VOL MASTER BRICK SLAVE STATUS CRAWL STATUS FILES SYNCD FILES PENDING BYTES PENDING DELETES PENDING FILES SKIPPED ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- spitfire.blr.redhat.com master /rhs/bricks/brick0 falcon::slave Active Hybrid Crawl 15663 0 0 0 0 harrier.blr.redhat.com master /rhs/bricks/brick2 hornet::slave Active Hybrid Crawl 15790 0 0 0 0 typhoon.blr.redhat.com master /rhs/bricks/brick3 lightning::slave Passive N/A 0 0 0 0 0 mustang.blr.redhat.com master /rhs/bricks/brick1 interceptor::slave Passive N/A 0 0 0 0 0 You can see that slave has less regular files than master. For some unknown reasons, active nodes have started using hybrid crawl instead of changelog crawl. There were no node reboots, no not sure what can trigger the switch from xsync to changelog. And there seems to be nothing in the log files about the skipped files. [root@spitfire ~]# grep -i SKIPPED /var/log/glusterfs/geo-replication/master/ssh%3A%2F%2Froot%4010.70.42.224%3Agluster%3A%2F%2F127.0.0.1%3Aslave.log [root@spitfire ~]# Expected results: All the files should get synced to slave properly without any issues. Additional info: I will try and archive the logs. But this error seems to be silent. There are not much helpful messages in the logs.
Closing this bug since RHGS 2.1 release reached EOL. Required bugs are cloned to RHGS 3.1. Please re-open this issue if found again.