Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1293332 - [geo-rep+tiering]: Hot tier bricks changelogs reports rsync failure
[geo-rep+tiering]: Hot tier bricks changelogs reports rsync failure
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: geo-replication (Show other bugs)
3.1
x86_64 Linux
high Severity high
: ---
: RHGS 3.4.0
Assigned To: Aravinda VK
Rochelle
rebase
: ZStream
Depends On: 1572043 1577627 1581047 1597563
Blocks: 1503134
  Show dependency treegraph
 
Reported: 2015-12-21 07:51 EST by Rahul Hinduja
Modified: 2018-09-14 00:41 EDT (History)
8 users (show)

See Also:
Fixed In Version: glusterfs-3.12.2-1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-09-04 02:27:31 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:2607 None None None 2018-09-04 02:29 EDT

  None (edit)
Description Rahul Hinduja 2015-12-21 07:51:32 EST
Description of problem:
=======================

On a tiered volume setup, all the hot bricks changelogs reports incomplete sync, and retrying.

In actual the data is completely sync to slave but it took very long in compare to normal volume vs tier volume. These retrial might slow the sync process as the rsync retrial happens in a batch.

Size comparison:
===============

[root@mia test]# df -h 
Filesystem                 Size  Used Avail Use% Mounted on
10.70.37.99:/slave         597G   26G  572G   5% /mnt/slave
10.70.37.165:/master       746G   26G  721G   4% /mnt/master
[root@mia test]#

Log Snippet:
============

[2015-12-21 11:37:07.551009] W [master(/rhs/brick3/hot-b2):1077:process] _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1450696895 CHANGELOG.1450696929 CHANGELOG.1450696944 CHANGELOG.1450696961 CHANGELOG.1450696976 CHANGELOG.1450696995 CHANGELOG.1450697011 CHANGELOG.1450697028 CHANGELOG.1450697047 CHANGELOG.1450697063 CHANGELOG.1450697079 CHANGELOG.1450697094 CHANGELOG.1450697143 CHANGELOG.1450697160 CHANGELOG.1450697177 CHANGELOG.1450697192
[2015-12-21 11:37:25.350402] W [master(/rhs/brick3/hot-b2):1077:process] _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1450696895 CHANGELOG.1450696929 CHANGELOG.1450696944 CHANGELOG.1450696961 CHANGELOG.1450696976 CHANGELOG.1450696995 CHANGELOG.1450697011 CHANGELOG.1450697028 CHANGELOG.1450697047 CHANGELOG.1450697063 CHANGELOG.1450697079 CHANGELOG.1450697094 CHANGELOG.1450697143 CHANGELOG.1450697160 CHANGELOG.1450697177 CHANGELOG.1450697192
[2015-12-21 11:37:40.441261] I [master(/rhs/brick1/ct-b4):571:crawlwrap] _GMaster: 1 crawls, 30 turns
[2015-12-21 11:37:40.561965] I [master(/rhs/brick1/ct-b4):1131:crawl] _GMaster: slave's time: (1450694460, 0)
[2015-12-21 11:37:42.528250] W [master(/rhs/brick3/hot-b2):1077:process] _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1450696895 CHANGELOG.1450696929 CHANGELOG.1450696944 CHANGELOG.1450696961 CHANGELOG.1450696976 CHANGELOG.1450696995 CHANGELOG.1450697011 CHANGELOG.1450697028 CHANGELOG.1450697047 CHANGELOG.1450697063 CHANGELOG.1450697079 CHANGELOG.1450697094 CHANGELOG.1450697143 CHANGELOG.1450697160 CHANGELOG.1450697177 CHANGELOG.1450697192
[2015-12-21 11:51:24.521485] W [master(/rhs/brick3/hot-b2):1077:process] _GMaster: incomplete sync, retrying changelogs: CHANGELOG.1450696895 CHANGELOG.1450696929 CHANGELOG.1450696944 CHANGELOG.1450696961 CHANGELOG.1450696976 CHANGELOG.1450696995 CHANGELOG.1450697011 CHANGELOG.1450697028 CHANGELOG.1450697047 CHANGELOG.1450697063 CHANGELOG.1450697079 CHANGELOG.1450697094 CHANGELOG.1450697143 CHANGELOG.1450697160 CHANGELOG.1450697177 CHANGELOG.1450697192


Version-Release number of selected component (if applicable):
==============================================================

glusterfs-3.7.5-12.el7rhgs.x86_64

How reproducible:
=================

Always

Steps to Reproduce:
===================

1. Create Master cluster from 6 nodes
2. Create Slave cluster from 6 nodes
3. Create and Start master volume (Tiered: cold-tier 3x2 and hot-tier 2x2)
4. Create and Start slave volume (4x2)
5. Enable quota on master volume 
6. Enable shared storage on master volume
7. Setup geo-rep session between master and slave volume 
8. Mount master volume on client 
9. Create data from master client, used the following:

> crefi --multi -n 50 -b 5 -d 5 --max=1024k --min=5k --random -T 5 -t text -I 5 --fop=create /mnt/master
> for i in {1..10}; do dd if=/dev/zero of=rs.$i bs=10M count=100 ; done
> for i in {1..999}; do dd if=/dev/zero of=file.$i bs=2M count=10 ; done

10. Monitor georep logs 

Actual results:
===============

> Incomplete sync errors
> Changelog retrial
> Rsync being very slow

Additional info:
================

Something similar load has been tried on regular volume earlier, causing the sync to happen in about 15-20 mins.  Whereas on tier volume it took few hours. This comparison is with historical data and will update after retrying again on normal volume
Comment 6 Kotresh HR 2017-09-21 15:09:49 EDT
The patch is already merged upstream and is in 3.12, hence moving it to POST.

https://review.gluster.org/#/c/16010/
Comment 12 errata-xmlrpc 2018-09-04 02:27:31 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607

Note You need to log in before you can comment on or make changes to this bug.