Bug 1029030

Summary: dist-geo-rep: Few files not synced to slave with tar+ssh on and there is no "skipped files" messages in the log files
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: M S Vishwanath Bhat <vbhat>
Component: geo-replicationAssignee: Bug Updates Notification Mailing List <rhs-bugs>
Status: CLOSED EOL QA Contact: storage-qa-internal <storage-qa-internal>
Severity: high Docs Contact:
Priority: high    
Version: 2.1CC: avishwan, chrisw, csaba, mzywusko
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: consistency, status
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description M S Vishwanath Bhat 2013-11-11 14:13:23 UTC
Description of problem:
I was syncing files from master to slave by enabling the tar+ssh option. But some how few of the regular files are not synced to slave. There is no "SKIPPED FILES" list in the status detail and there is no "skipped files" list in the geo-rep log files as well. Also for some reason unknown the active nodes have switched to xsync crawl instead of changelog crawl. There were no node reboots and no node down.


Version-Release number of selected component (if applicable):
glusterfs-3.4.0.42rhs-1.el6rhs.x86_64


How reproducible:
Hit once in two tries. Not sure if it can reproduced.

Steps to Reproduce:
1. Create a geo-rep session between 2*2 dist-rep master node and 2*2 dist-rep slave node.
2. Mount the volume from the node and copy /etc few times and start creating small files with following command.
time ./smallfile_cli.py --top /mnt/master/second-dir --threads 10 --file-size 200 --operation create --files  2000 --hash-into-dirs Y
3. Now enable use-tarssh via config command before starting geo-rep. 
4. Start the geo-rep session.
5. Wait for files to get synced to slave.

Actual results:
Files are not synced to slave.

arequal checksum from master and slave mount points.

[root@lightning ]# /opt/qa/tools/arequal-checksum /mnt/master/

Entry counts
Regular files   : 25747
Directories     : 1382
Symbolic links  : 2940
Other           : 0
Total           : 30069

Metadata checksums
Regular files   : 47df85
Directories     : 3e9
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : 3fd5260d0515381fcef785a7fd0f29f7
Directories     : 1a242b4f61752e0f
Symbolic links  : a047250611b353f
Other           : 0
Total           : e102fab5f8740ad8



[root@lightning ~]# /opt/qa/tools/arequal-checksum /mnt/slave/

Entry counts
Regular files   : 25638
Directories     : 1382
Symbolic links  : 2940
Other           : 0
Total           : 29960

Metadata checksums
Regular files   : bbb0
Directories     : 3e9
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : 2e8370e150161173baca58ca44aad97e
Directories     : 201d1c7e3e4e1e0f
Symbolic links  : a047250611b353f
Other           : 0
Total           : be5046054be9e33d


Status detail indicate that there is noting left to be synced. 

MASTER NODE                MASTER VOL    MASTER BRICK          SLAVE                 STATUS     CRAWL STATUS    FILES SYNCD    FILES PENDING    BYTES PENDING    DELETES PENDING    FILES SKIPPED
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
spitfire.blr.redhat.com    master        /rhs/bricks/brick0    falcon::slave         Active     Hybrid Crawl    15663          0                0                0                  0     
harrier.blr.redhat.com     master        /rhs/bricks/brick2    hornet::slave         Active     Hybrid Crawl    15790          0                0                0                  0     
typhoon.blr.redhat.com     master        /rhs/bricks/brick3    lightning::slave      Passive    N/A             0              0                0                0                  0     
mustang.blr.redhat.com     master        /rhs/bricks/brick1    interceptor::slave    Passive    N/A             0              0                0                0                  0     


You can see that slave has less regular files than master.

For some unknown reasons, active nodes have started using hybrid crawl instead of changelog crawl. There were no node reboots, no not sure what can trigger the switch from xsync to changelog.

And there seems to be nothing in the log files about the skipped files.

[root@spitfire ~]# grep -i SKIPPED /var/log/glusterfs/geo-replication/master/ssh%3A%2F%2Froot%4010.70.42.224%3Agluster%3A%2F%2F127.0.0.1%3Aslave.log
[root@spitfire ~]# 



Expected results:
All the files should get synced to slave properly without any issues.


Additional info:


I will try and archive the logs. But this error seems to be silent. There are not much helpful messages in the logs.

Comment 2 Aravinda VK 2015-11-25 08:51:43 UTC
Closing this bug since RHGS 2.1 release reached EOL. Required bugs are cloned to RHGS 3.1. Please re-open this issue if found again.

Comment 3 Aravinda VK 2015-11-25 08:52:24 UTC
Closing this bug since RHGS 2.1 release reached EOL. Required bugs are cloned to RHGS 3.1. Please re-open this issue if found again.