Bug 1597563

Summary: [geo-rep+tiering]: Hot and Cold tier brick changelogs report rsync failure
Product: [Community] GlusterFS Reporter: Sunny Kumar <sunkumar>
Component: geo-replicationAssignee: Sunny Kumar <sunkumar>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: mainlineCC: amukherj, bugs, csaba, khiremat, rallan, rhinduja, rhs-bugs, sankarshan, sheggodu, storage-qa-internal, vdas
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-5.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1581047 Environment:
Last Closed: 2018-10-23 15:12:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1581047    
Bug Blocks: 1293332    

Description Sunny Kumar 2018-07-03 08:40:36 UTC
+++ This bug was initially created as a clone of Bug #1581047 +++

Description of problem:
=======================
The following message -- "changelogs could not be processed completely"..for a brick that was in the hot tier as well as cold tier 

Master volume:
--------------
Volume Name: master
Type: Tier
Volume ID: c6233039-bcdc-4b3c-b2b9-e4d7a7bccb4e
Status: Started
Snapshot Count: 0
Number of Bricks: 9
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distribute
Number of Bricks: 3
Brick1: 10.70.43.190:/rhs/brick2/b9
Brick2: 10.70.42.58:/rhs/brick2/b8
Brick3: 10.70.42.29:/rhs/brick2/b7
Cold Tier:
Cold Tier Type : Disperse
Number of Bricks: 1 x (4 + 2) = 6
Brick4: 10.70.42.29:/rhs/brick1/b1
Brick5: 10.70.42.58:/rhs/brick1/b2
Brick6: 10.70.43.190:/rhs/brick1/b3
Brick7: 10.70.41.160:/rhs/brick1/b4
Brick8: 10.70.42.79:/rhs/brick1/b5
Brick9: 10.70.42.200:/rhs/brick1/b6
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
cluster.tier-mode: cache
features.ctr-enabled: on
transport.address-family: inet
nfs.disable: on
cluster.enable-shared-storage: enable


Changelog not processed completely:
1. From hot tier

ssh%3A%2F%2Froot%4010.70.42.53%3Agluster%3A%2F%2F127.0.0.1%3Aslave.log:[2018-05-18 05:58:58.644992] E [master(/rhs/brick2/b7):1249:process] _GMaster: changelogs could not be processed completely - moving on...	files=['CHANGELOG.1526621118', 'CHANGELOG.1526621133', 'CHANGELOG.1526621148', 'CHANGELOG.1526621163', 'CHANGELOG.1526621178', 'CHANGELOG.1526621194', 'CHANGELOG.1526621209', 'CHANGELOG.1526621224', 'CHANGELOG.1526621239', 'CHANGELOG.1526621254', 'CHANGELOG.1526621269', 'CHANGELOG.1526621284', 'CHANGELOG.1526621299', 'CHANGELOG.1526621314', 'CHANGELOG.1526621329']

2. From cold tier

ssh%3A%2F%2Froot%4010.70.42.53%3Agluster%3A%2F%2F127.0.0.1%3Aslave.log:[2018-05-18 06:09:00.369133] E [master(/rhs/brick1/b1):1249:process] _GMaster: changelogs could not be processed completely - moving on...	files=['CHANGELOG.1526621118', 'CHANGELOG.1526621133', 'CHANGELOG.1526621148', 'CHANGELOG.1526621163', 'CHANGELOG.1526621178', 'CHANGELOG.1526621194', 'CHANGELOG.1526621209']

--------------------------------------------------------------------------------

Version-Release number of selected component (if applicable):
=============================================================


How reproducible:
=================
1/1


Steps to Reproduce:
===================
1. Create Master and a Slave cluster from 6 nodes (each)
2. Create and Start master volume (Tiered: cold-tier 1x(4+2)  and hot-tier 1x3)
4. Create and Start slave volume (Tiered: cold-tier 1x(4+2)  and hot-tier 1x3)
5. Enable quota on master volume 
6. Enable shared storage on master volume
7. Setup geo-rep session between master and slave volume 
8. Mount master volume on client 
9. Create data from master client
10. Arequal checksum matches, data was synced

Actual results:
===============
Hot and cold tier brick changelogs report rsync failure

Expected results:
================
There should be no rsync failure

Comment 1 Worker Ant 2018-07-03 08:47:26 UTC
REVIEW: https://review.gluster.org/20450 (dht: delete tier realted internal xattr in dht_getxattr_cbk) posted (#1) for review on master by Sunny Kumar

Comment 2 Worker Ant 2018-07-16 05:34:28 UTC
COMMIT: https://review.gluster.org/20450 committed in master by "N Balachandran" <nbalacha> with a commit message- dht: delete tier related internal xattr in dht_getxattr_cbk

Problem :  Hot and Cold tier brick changelogs report rsync failure

Solution : georep session is failing to sync directory
           from master volume to slave volume due to lot
           of changelog retries, solution would be to ignore tier
           related internal xattrs trusted.tier.fix.layout.complete and
           trusted.tier.tier-dht.commithash in dht_getxattr_cbk.

Change-Id: I3530ffe7c4157584b439486f33ecd82ed8d66aee
fixes: bz#1597563
Signed-off-by: Sunny Kumar <sunkumar>

Comment 3 Worker Ant 2018-07-16 10:02:13 UTC
REVIEW: https://review.gluster.org/20520 (dht: delete tier related internal xattr in dht_getxattr_cbk) posted (#1) for review on master by Sunny Kumar

Comment 4 Worker Ant 2018-07-17 18:04:05 UTC
COMMIT: https://review.gluster.org/20520 committed in master by "Amar Tumballi" <amarts> with a commit message- dht: delete tier related internal xattr in dht_getxattr_cbk

Use dict_del instead of GF_REMOVE_INTERNAL_XATTR.

For problem and fix related information see here -
https://review.gluster.org/20450.

This patch have some modification as requested by reviewers on already
merged patch :  https://review.gluster.org/20450.

Change-Id: I50c263e3411354bb9c1e028b64b9ebfd755dfe37
fixes: bz#1597563
Signed-off-by: Sunny Kumar <sunkumar>

Comment 5 Shyamsundar 2018-10-23 15:12:54 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-5.0, please open a new bug report.

glusterfs-5.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2018-October/000115.html
[2] https://www.gluster.org/pipermail/gluster-users/