Hide Forgot
Description of problem: checksum between master and slave was mismatched when there were regular ssh disconnections between one of the master and slave connections. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [root@shaktiman ~]# ./arequal-checksum /mnt/master/ Entry counts Regular files : 430248 Directories : 101 Symbolic links : 0 Other : 0 Total : 430349 Metadata checksums Regular files : 3e9 Directories : 24d74c Symbolic links : 3e9 Other : 3e9 Checksums Regular files : a8e1dc7a0748e73ab0a0eae943960164 Directories : 4a0a500c79674109 Symbolic links : 0 Other : 0 Total : 524b669f3db9a757 [root@spiderman ~]# ./arequal-checksum /mnt/slave/ Entry counts Regular files : 430248 Directories : 101 Symbolic links : 0 Other : 0 Total : 430349 Metadata checksums Regular files : 3e9 Directories : 24d74c Symbolic links : 3e9 Other : 3e9 Checksums Regular files : 260f13ccd7dde34db57eb67b06e22083 Directories : 4a0a500c79674109 Symbolic links : 0 Other : 0 Total : d97bf5bba85882c7 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> And also the session which was not getting disconnected hasn't reached checkpoint even after a day and none of the files are in the .processing directory of geo-rep working_dir. # gluster v geo master 10.70.43.159::slave status detail MASTER NODE MASTER VOL MASTER BRICK SLAVE STATUS CHECKPOINT STATUS CRAWL STATUS FILES SYNCD FILES PENDING BYTES PENDING DELETES PENDING FILES SKIPPED -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- shaktiman.blr.redhat.com master /bricks/brick1 10.70.42.171::slave Active checkpoint as of 2013-12-02 17:18:43 is not reached yet Changelog Crawl 141858 0 0 0 0 riverrun.blr.redhat.com master /bricks/brick2 10.70.42.225::slave Passive N/A N/A 6332 0 0 0 0 targarean.blr.redhat.com master /bricks/brick3 10.70.43.159::slave Active checkpoint as of 2013-12-02 17:18:43 is completed at 2013-12-02 17:18:53 Changelog Crawl 126665 0 0 0 0 snow.blr.redhat.com master /bricks/brick4 10.70.42.229::slave Passive N/A N/A 0 0 0 0 0 this mismatch could be result of this checkpoint not reaching, or both of these could be result of the common problem. Version-Release number of selected component (if applicable): glusterfs-3.4.0.45geo-1 How reproducible: Didn't try to reproduce Steps to Reproduce: 1.create and start a geo-rep relationship between master and slave. 2.start creating files on the master using the command "./crefi.py -n 4000 --multi -b 10 -d 10 --random --max=100K --min=1K /mnt/master/" 3.Parallely run following command on one of the active master node, "while : ; do ps ax | grep "ssh " | awk '{print $1}' | xargs kill ; sleep 100 ; ps ax | grep "ssh " | awk '{print $1}' | xargs kill ; sleep 1000; done" 4. Wait for he geo-rep to complete syncing and check the checksum of master and slave. Actual results: checksum mismatched between master and slave Expected results: checksum should always match between master and slave. Additional info:
Closing this bug since RHGS 2.1 release reached EOL. Required bugs are cloned to RHGS 3.1. Please re-open this issue if found again.