Bug 1037438

Summary: dist-geo-rep : mismatch in checksum of master and slave when there were ssh disconnections during syncing.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Vijaykumar Koppad <vkoppad>
Component: geo-replicationAssignee: Bug Updates Notification Mailing List <rhs-bugs>
Status: CLOSED EOL QA Contact: storage-qa-internal <storage-qa-internal>
Severity: high Docs Contact:
Priority: high    
Version: 2.1CC: avishwan, chrisw, csaba, david.macdonald
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: consistency
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-25 08:48:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vijaykumar Koppad 2013-12-03 07:34:14 UTC
Description of problem: checksum between master and slave was mismatched when there were regular ssh disconnections between one of the master and slave connections.

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[root@shaktiman ~]# ./arequal-checksum  /mnt/master/
Entry counts
Regular files   : 430248
Directories     : 101
Symbolic links  : 0
Other           : 0
Total           : 430349

Metadata checksums
Regular files   : 3e9
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : a8e1dc7a0748e73ab0a0eae943960164
Directories     : 4a0a500c79674109
Symbolic links  : 0
Other           : 0
Total           : 524b669f3db9a757


[root@spiderman ~]# ./arequal-checksum /mnt/slave/

Entry counts
Regular files   : 430248
Directories     : 101
Symbolic links  : 0
Other           : 0
Total           : 430349

Metadata checksums
Regular files   : 3e9
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : 260f13ccd7dde34db57eb67b06e22083
Directories     : 4a0a500c79674109
Symbolic links  : 0
Other           : 0
Total           : d97bf5bba85882c7

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

And also the session which was not getting disconnected hasn't reached checkpoint even after a day and none of the files are in the .processing directory of geo-rep working_dir. 


# gluster v geo master 10.70.43.159::slave status detail

MASTER NODE                 MASTER VOL    MASTER BRICK      SLAVE                  STATUS     CHECKPOINT STATUS                                                           CRAWL STATUS       FILES SYNCD    FILES PENDING    BYTES PENDING    DELETES PENDING    FILES SKIPPED
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
shaktiman.blr.redhat.com    master        /bricks/brick1    10.70.42.171::slave    Active     checkpoint as of 2013-12-02 17:18:43 is not reached yet                     Changelog Crawl    141858         0                0                0                  0
riverrun.blr.redhat.com     master        /bricks/brick2    10.70.42.225::slave    Passive    N/A                                                                         N/A                6332           0                0                0                  0
targarean.blr.redhat.com    master        /bricks/brick3    10.70.43.159::slave    Active     checkpoint as of 2013-12-02 17:18:43 is completed at 2013-12-02 17:18:53    Changelog Crawl    126665         0                0                0                  0
snow.blr.redhat.com         master        /bricks/brick4    10.70.42.229::slave    Passive    N/A                                                                         N/A                0              0                0                0                  0


this mismatch could be result of this checkpoint not reaching, or both of these could be result of the common problem.

Version-Release number of selected component (if applicable): glusterfs-3.4.0.45geo-1


How reproducible: Didn't try to reproduce


Steps to Reproduce:
1.create and start a geo-rep relationship between master and slave.
2.start creating files on the master using the command "./crefi.py -n 4000 --multi -b 10 -d 10 --random --max=100K --min=1K   /mnt/master/"
3.Parallely run following command on one of the active master node, "while : ; do ps ax | grep "ssh " | awk '{print $1}' | xargs kill ; sleep 100 ; ps ax | grep "ssh " | awk '{print $1}' | xargs kill ; sleep 1000; done"
4. Wait for he geo-rep to complete syncing and check the checksum of master and slave.

Actual results: checksum mismatched between master and slave


Expected results: checksum should always match between master and slave.


Additional info:

Comment 3 Aravinda VK 2015-11-25 08:48:51 UTC
Closing this bug since RHGS 2.1 release reached EOL. Required bugs are cloned to RHGS 3.1. Please re-open this issue if found again.

Comment 4 Aravinda VK 2015-11-25 08:50:44 UTC
Closing this bug since RHGS 2.1 release reached EOL. Required bugs are cloned to RHGS 3.1. Please re-open this issue if found again.