Bug 1037438 - dist-geo-rep : mismatch in checksum of master and slave when there were ssh disconnections during syncing.
Summary: dist-geo-rep : mismatch in checksum of master and slave when there were ssh d...
Keywords:
Status: CLOSED EOL
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: geo-replication
Version: 2.1
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Bug Updates Notification Mailing List
QA Contact: storage-qa-internal@redhat.com
URL:
Whiteboard: consistency
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-12-03 07:34 UTC by Vijaykumar Koppad
Modified: 2015-11-25 08:50 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-11-25 08:48:51 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Vijaykumar Koppad 2013-12-03 07:34:14 UTC
Description of problem: checksum between master and slave was mismatched when there were regular ssh disconnections between one of the master and slave connections.

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[root@shaktiman ~]# ./arequal-checksum  /mnt/master/
Entry counts
Regular files   : 430248
Directories     : 101
Symbolic links  : 0
Other           : 0
Total           : 430349

Metadata checksums
Regular files   : 3e9
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : a8e1dc7a0748e73ab0a0eae943960164
Directories     : 4a0a500c79674109
Symbolic links  : 0
Other           : 0
Total           : 524b669f3db9a757


[root@spiderman ~]# ./arequal-checksum /mnt/slave/

Entry counts
Regular files   : 430248
Directories     : 101
Symbolic links  : 0
Other           : 0
Total           : 430349

Metadata checksums
Regular files   : 3e9
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : 260f13ccd7dde34db57eb67b06e22083
Directories     : 4a0a500c79674109
Symbolic links  : 0
Other           : 0
Total           : d97bf5bba85882c7

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

And also the session which was not getting disconnected hasn't reached checkpoint even after a day and none of the files are in the .processing directory of geo-rep working_dir. 


# gluster v geo master 10.70.43.159::slave status detail

MASTER NODE                 MASTER VOL    MASTER BRICK      SLAVE                  STATUS     CHECKPOINT STATUS                                                           CRAWL STATUS       FILES SYNCD    FILES PENDING    BYTES PENDING    DELETES PENDING    FILES SKIPPED
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
shaktiman.blr.redhat.com    master        /bricks/brick1    10.70.42.171::slave    Active     checkpoint as of 2013-12-02 17:18:43 is not reached yet                     Changelog Crawl    141858         0                0                0                  0
riverrun.blr.redhat.com     master        /bricks/brick2    10.70.42.225::slave    Passive    N/A                                                                         N/A                6332           0                0                0                  0
targarean.blr.redhat.com    master        /bricks/brick3    10.70.43.159::slave    Active     checkpoint as of 2013-12-02 17:18:43 is completed at 2013-12-02 17:18:53    Changelog Crawl    126665         0                0                0                  0
snow.blr.redhat.com         master        /bricks/brick4    10.70.42.229::slave    Passive    N/A                                                                         N/A                0              0                0                0                  0


this mismatch could be result of this checkpoint not reaching, or both of these could be result of the common problem.

Version-Release number of selected component (if applicable): glusterfs-3.4.0.45geo-1


How reproducible: Didn't try to reproduce


Steps to Reproduce:
1.create and start a geo-rep relationship between master and slave.
2.start creating files on the master using the command "./crefi.py -n 4000 --multi -b 10 -d 10 --random --max=100K --min=1K   /mnt/master/"
3.Parallely run following command on one of the active master node, "while : ; do ps ax | grep "ssh " | awk '{print $1}' | xargs kill ; sleep 100 ; ps ax | grep "ssh " | awk '{print $1}' | xargs kill ; sleep 1000; done"
4. Wait for he geo-rep to complete syncing and check the checksum of master and slave.

Actual results: checksum mismatched between master and slave


Expected results: checksum should always match between master and slave.


Additional info:

Comment 3 Aravinda VK 2015-11-25 08:48:51 UTC
Closing this bug since RHGS 2.1 release reached EOL. Required bugs are cloned to RHGS 3.1. Please re-open this issue if found again.

Comment 4 Aravinda VK 2015-11-25 08:50:44 UTC
Closing this bug since RHGS 2.1 release reached EOL. Required bugs are cloned to RHGS 3.1. Please re-open this issue if found again.


Note You need to log in before you can comment on or make changes to this bug.