1238068 – [geo-rep]: some of the files are not removed from slaves when performed rm -rf * from master

Bug 1238068 - [geo-rep]: some of the files are not removed from slaves when performed rm -rf * from master

Summary: [geo-rep]: some of the files are not removed from slaves when performed rm -r...

Keywords:
Status:	CLOSED DUPLICATE of bug 1400198
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	geo-replication
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Kotresh HR
QA Contact:	storage-qa-internal@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:	1235633 1310194
Blocks:
TreeView+	depends on / blocked

Reported:	2015-07-01 06:44 UTC by Rahul Hinduja
Modified:	2018-11-19 09:08 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-11-19 09:08:57 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Rahul Hinduja 2015-07-01 06:44:57 UTC

Description of problem:
=======================

Tried deleting all the files from master using rm -rf *, some of the files on the root of slaves did not get deleted. When issued rm -rf from master there was a switch between active and passive bricks because for unknown reason the active bricks went down and passive bricks became active. Eventually the bricks which went down came back and became passive bricks. 

Files On Master:
================

[root@wingo master]# pwd
/mnt/master
[root@wingo master]# ls
[root@wingo master]# 

Files on Slave:
===============

[root@wingo slave]# ls
101  37  60  92                       environment   localtime  nsswitch.conf
104  41  71  asound.conf              fprintd.conf  mail.rc    sudo-ldap.conf
16   43  80  cgrules.conf             gshadow-      motd       updatedb.conf
22   56  83  csh.cshrc                kdump.conf    my.cnf
34   57  89  DIR_COLORS.lightbgcolor  krb5.conf     networks
[root@wingo slave]# 


Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.7.1-6.el6rhs.x86_64


How reproducible:
=================
Tried once

Steps to Reproduce:
===================
1. Create Master and Slave Cluster
2. Create and Start Master volume (2x2)
3. Create and Start Slave volume (2x2)
4. Create and Start meta volume (1x2)
5. Create and Start geo-rep session
6. Mount the master and slave volume (Fuse and NFS)
7. Create huge set of date from Master Fuse and NFS mount
8. Sync should complete to slave. Confirm using arequal
9. Add 2 bricks to the master volume (2x2=>3x2)
10. Start Rebalance
11. Once rebalance is completed, check arequal on master and slave. It should be same.
12. Perform rm -rf * from Master. 


Actual results:
===============


Noticed some of the files never got deleted from the slave's root


Expected results:
=================

All files should be deleted from slaves too.

Comment 3 Kotresh HR 2015-07-01 08:51:55 UTC

From the logs we could not concretely say what could have happened.
But we could depict the following things from changelogs that were processed and
also considering the bricks going offline and coming online.

For one of the file which has not removed from slave, the entries are as follows in .processed directory.

.processing:

1435675373 UNLINK   georep2: b2 normal changelog .processing
=========================================================================

georep1:
./764586b145d7206a154a778f64bd2f50/.processed/CHANGELOG.1435606117:E 988e518b-0766-473a-990c-4427577cc413 CREATE 384 0 0 00000000-0000-0000-0000-000000000001%2Fasound.conf
./764586b145d7206a154a778f64bd2f50/.processed/CHANGELOG.1435675375:E c2edfe25-3ba4-4d09-9f27-c9cd5e4c0bfe UNLINK 00000000-0000-0000-0000-000000000001%2Fasound.conf

georep2:

[root@georep2 ssh%3A%2F%2Froot%4010.70.46.101%3Agluster%3A%2F%2F127.0.0.1%3Aslave]# find . | xargs grep 00000000-0000-0000-0000-000000000001%2Fasound 2>/dev/null
./764586b145d7206a154a778f64bd2f50/.processing/CHANGELOG.1435675373:E c2edfe25-3ba4-4d09-9f27-c9cd5e4c0bfe UNLINK 00000000-0000-0000-0000-000000000001%2Fasound.conf

georep3:
[root@georep3 ssh%3A%2F%2Froot%4010.70.46.101%3Agluster%3A%2F%2F127.0.0.1%3Aslave]# find . | xargs grep 00000000-0000-0000-0000-000000000001%2Fasound 2>/dev/null
Binary file ./764586b145d7206a154a778f64bd2f50/xsync/archive_201506.tar matches
./764586b145d7206a154a778f64bd2f50/xsync/XSYNC-CHANGELOG.1435666036:E c2edfe25-3ba4-4d09-9f27-c9cd5e4c0bfe MKNOD 33188 0 0 00000000-0000-0000-0000-000000000001%2Fasound.conf

georep4:
[root@georep4 ssh%3A%2F%2Froot%4010.70.46.101%3Agluster%3A%2F%2F127.0.0.1%3Aslave]#  find . | xargs grep 00000000-0000-0000-0000-000000000001%2Fasound 2>/dev/null
Binary file ./c19b89ac45352ab8c894d210d136dd56/.history/.processed/archive_201506.tar matches
./c19b89ac45352ab8c894d210d136dd56/.history/.processed/CHANGELOG.1435675319:E c2edfe25-3ba4-4d09-9f27-c9cd5e4c0bfe CREATE 384 0 0 00000000-0000-0000-0000-000000000001%2Fasound.conf
========================================================

.processed:

CHANGELOG.1435606117 CREATE         georep1: b2 normal changelog
XSYNC-CHANGELOG.1435666036 MKNOD    georep3: b2 xsync  (Could be after replace brick. This was added as part of replace brick)
The flip between gerep3:/rhs/brick2/b2 to georep4:/rhs/brick1/b1. So history picked and created the file
CHANGELOG.1435675319 CREATE   georep4: b1 history
CHANGELOG.1435675375 DEL      georep1: b2 normal changelog

Because of ping pong nature of bricks going down and becoming online,
and also stime query from mount point which gives max of replica and min of distribute, there could be overlaps of changelog processsing and it could process changelog which is already processed in other brick which went down.
By the time it is processing already processed changelog, if this brick also goes down. There could be chances of inconsistent sync.

Say in above, if CHANGELOG.1435675319 is processed last, the file would remain.
which is what possibly could have happened.

Comment 11 Mike McCune 2016-03-28 22:17:27 UTC

This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 12 Kotresh HR 2018-11-19 09:08:57 UTC


*** This bug has been marked as a duplicate of bug 1400198 ***

Note You need to log in before you can comment on or make changes to this bug.