Bug 1109557

Summary:

Dist-geo-rep : after renames on master, slave has more number files than master when synced through history crawl.

Product:

[Red Hat Storage] Red Hat Gluster Storage

Reporter:

Vijaykumar Koppad <vkoppad>

Component:

geo-replication

Assignee:

Kotresh HR <khiremat>

Status:

CLOSED ERRATA

QA Contact:

Bhaskar Bandari <bbandari>

Severity:

high

Docs Contact:

Priority:

high

Version:

rhgs-3.0

CC:

aavati, avishwan, bbandari, csaba, david.macdonald, nlevinki, nsathyan, ssamanta, vagarwal, vshankar

Target Milestone:

---

Target Release:

RHGS 3.0.0

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

glusterfs-3.6.0.27-1

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2014-09-22 19:41:36 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

1111577

Bug Blocks:

Attachments:

Description	Flags
sosreport of the master and slave nodes.	none

Description Vijaykumar Koppad 2014-06-15 10:50:40 UTC

Description of problem: after renames on master, slave has more number files than master. 
Find on master, mount point
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
# find /mnt/master -type f | wc -l
11000
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Find on slave
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
# find /mnt/slave -type f | wc -l
find: `/mnt/slave/thread0/level00/level10/level20/level30/539d69cb%%18LA547TZJ': No data available
find: `/mnt/slave/thread0/level00/level10/level20/level30/539d6643%%F0R99S1T5G': No data available
find: `/mnt/slave/thread1/level00/level10/level20/level30/539d69ca%%VT7K04HXPH': No data available
find: `/mnt/slave/thread1/level00/level10/level20/level30/539d69ca%%37LTOZA961': No data available
find: `/mnt/slave/thread1/level00/level10/level20/level30/539d6642%%UHEFAQ4RBJ': No data available
find: `/mnt/slave/thread1/level00/level10/level20/level30/539d6642%%I9CNV8UXAP': No data available
find: `/mnt/slave/thread1/level00/level10/level20/level30/539d69ca%%VPSZRM3BC1': No data available
11814
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Some of those are sticky bit files. 

# ls -l /mnt/slave/thread0/level00/level10/level20/level30/539d6643%%F0R99S1T5G                     ---------T 0 root root 0 Jun 15 15:13 /mnt/slave/thread0/level00/level10/level20/level30/539d6643%%F0R99S1T5G

this is on mount-point.

Version-Release number of selected component (if applicable): glusterfs-3.6.0.16-1.el6rhs


How reproducible: Happens most of the time.


Steps to Reproduce:
1. create and start a geo-rep relationship between master and slave.
2. create data on master using the command "crefi -T 10 -n 10 --multi -d 10 -b 10 --random --max=10K --min=1K  /mnt/master"
3. let it sync to slave.
4. start renaming files on master using the command "crefi -T 10 -n 10 --multi -d 10 -b 10 --random --max=10K --min=1K   --fop=rename /mnt/master"
5. paralelly pause geo-rep and wait for the completion of renames.
6. Then resume the georep.
7. After completion of the syncing, check the number of files on slave. 
 
Actual results: More number of files on slave after renames. 


Expected results: slave should have same number of files as master. 


Additional info:

entries in changelogs for the above mentioned file, 

on one subvolume
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
# grep -R "24cf8a71-0e9c-48a2-a468-89b21351b36c" *
44a036035a0a3ffc61e5172721ec0322/.history/.processed/CHANGELOG.1402825174:M 24cf8a71-0e9c-48a2-a468-89b21351b36c NULL
44a036035a0a3ffc61e5172721ec0322/.processed/CHANGELOG.1402824268:E 24cf8a71-0e9c-48a2-a468-89b21351b36c CREATE 33188 0 0 03f8bb63-384d-488b-b5f1-696cf033c754%2F539d6643%25%25F0R99S1T5G
44a036035a0a3ffc61e5172721ec0322/.processed/CHANGELOG.1402824268:M 24cf8a71-0e9c-48a2-a468-89b21351b36c NULL
44a036035a0a3ffc61e5172721ec0322/.processed/CHANGELOG.1402824268:D 24cf8a71-0e9c-48a2-a468-89b21351b36c
44a036035a0a3ffc61e5172721ec0322/.processed/CHANGELOG.1402825174:M 24cf8a71-0e9c-48a2-a468-89b21351b36c NULL
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

on different sub-volume 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
# grep -R "24cf8a71-0e9c-48a2-a468-89b21351b36c" *
7516eec9d9cdb6b870fad1635c610fc3/.history/.processed/CHANGELOG.1402825174:E 24cf8a71-0e9c-48a2-a468-89b21351b36c MKNOD 33280 0 0 03f8bb63-384d-488b-b5f1-696cf033c754%2F539d6643%25%25F0R99S1T5G
7516eec9d9cdb6b870fad1635c610fc3/.history/.processed/CHANGELOG.1402825174:M 24cf8a71-0e9c-48a2-a468-89b21351b36c NULL
7516eec9d9cdb6b870fad1635c610fc3/.history/.processed/CHANGELOG.1402825174:E 24cf8a71-0e9c-48a2-a468-89b21351b36c RENAME 03f8bb63-384d-488b-b5f1-696cf033c754%2F539d6643%25%25F0R99S1T5G 03f8bb63-384d-488b-b5f1-696cf033c754%2F539d69cb%25%2518LA547TZJ
7516eec9d9cdb6b870fad1635c610fc3/.processed/CHANGELOG.1402825174:E 24cf8a71-0e9c-48a2-a468-89b21351b36c MKNOD 33280 0 0 03f8bb63-384d-488b-b5f1-696cf033c754%2F539d6643%25%25F0R99S1T5G
7516eec9d9cdb6b870fad1635c610fc3/.processed/CHANGELOG.1402825174:M 24cf8a71-0e9c-48a2-a468-89b21351b36c NULL
7516eec9d9cdb6b870fad1635c610fc3/.processed/CHANGELOG.1402825174:E 24cf8a71-0e9c-48a2-a468-89b21351b36c RENAME 03f8bb63-384d-488b-b5f1-696cf033c754%2F539d6643%25%25F0R99S1T5G 03f8bb63-384d-488b-b5f1-696cf033c754%2F539d69cb%25%2518LA547TZJ
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>


There are no logs on slave gluster logs, w.r.t gfid or file name
all I got are these 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
# grep "F0R99S1T5G" *
4f31cedb-7a8b-4b6d-9ffa-85245549ae6a:gluster%3A%2F%2F127.0.0.1%3Aslave.gluster.log:[2014-06-15 09:39:38.571405] W [client-rpc-fops.c:240:client3_3_mknod_cbk] 0-slave-client-11: remote operation failed: File exists. Path: <gfid:03f8bb63-384d-488b-b5f1-696cf033c754>/539d6643%%F0R99S1T5G
4f31cedb-7a8b-4b6d-9ffa-85245549ae6a:gluster%3A%2F%2F127.0.0.1%3Aslave.gluster.log:[2014-06-15 09:39:38.571860] W [client-rpc-fops.c:240:client3_3_mknod_cbk] 0-slave-client-10: remote operation failed: File exists. Path: <gfid:03f8bb63-384d-488b-b5f1-696cf033c754>/539d6643%%F0R99S1T5G

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Comment 1 Vijaykumar Koppad 2014-06-15 10:59:07 UTC

Created attachment 908894 [details]
sosreport of the master and slave nodes.

Comment 3 Venky Shankar 2014-06-20 15:25:20 UTC

This issue was easily reproducible before commit 62265f4 was merged.

commit 62265f4 introduces a change in the mknod() path that ignores internals fops. With this I was unable to reproduce this in @vkoppad test setup with a fair amount of rename/hardlink workload.

But for some reason, this @ajha is still able to hit it in his setup. Therefore, this needs some more debugging for RCA. Will update soon.

Comment 4 Venky Shankar 2014-06-20 15:26:30 UTC

s/this @ajha/@ajha/ in comment #3 ;)

Comment 5 Vijaykumar Koppad 2014-07-03 06:03:26 UTC

I think this bug is not related to capturing internal fops. Its related to a small window where we still do xsync crawl, and this xsync crawl for some reason tries to sync some of the already synced files, which ultimately results in more files on slave than master, because xsync can't handle renames according this Bug 984591.

Comment 6 Kotresh HR 2014-07-07 14:12:57 UTC

We were seeing two issues with @ajha setup:

1. Stickybit files on mount.
2. Data loss during rename.

And 3rd could be Vijaykumar's prediction.

3. xsync came in between and does not handle deletes.

It is found out that, after the commit 62265f4(http://review.gluster.org/#/c/8070/), the first one is fixed.

The data loss during rename was happening because, distributed-replicate
setup of master and slave on same node with (2*2). With this unoptimal
setup, gsyncds of both replicates will be active and tries to sync the changelogs collected at both replicated pairs. This is similar to a test case where two renames on same filename happens simultaneously. This is resulting in data loss.
So there could be some race on simultaneous renames in dht resulting to data loss.
This data loss is not at all reproducible with optimal setup, where replicate bricks falling into different nodes. So there is no issue with geo-replication w.r.t data loss.

The third one could have been hit with steps mentioned above only if pause exceeded 120 sec. After 120 sec of pause, on geo-rep resume, it will go to faulty and restarts with history crawl first, then for a small window xsync and then changelog. So it is possible that, during that small xsync window, the deletes are missed to sync and hence more number of files on slave mount.
The fix to this is being done as a separate patch (http://review.gluster.org/8151) and tracked with bug (https://bugzilla.redhat.com/show_bug.cgi?id=1112238). If this patch goes in, this will automatically be fixed.

Comment 7 Kotresh HR 2014-07-08 07:34:16 UTC

Marking this on QA based on following reasons for three issues seen.

1. Stickybit files on mount:
    http://review.gluster.org/#/c/8070/ fixes it.

2. Data loss during rename:
    This is because two gsyncds processing the same changelog entries parallely   
    from two replicas. This is an un-optimal setup of having both replicas on 
    single node.  
   
    I could hit this without geo-replication and corresponding dht bug is also 
    there https://bugzilla.redhat.com/show_bug.cgi?id=1117135

3. The chances of xsync coming in between and missing renames/deletes:
     Well, this is known issue that, xsync does not handle rename/deletes.
     But problem, of geo-rep using xsync even when history crawl is there is 
     being tracked by different bug 
     https://bugzilla.redhat.com/show_bug.cgi?id=1112238
     I think this is a different problem and should not be dependent of this bug.

Hence moving to ON_QA.

Comment 8 Kotresh HR 2014-07-10 06:14:38 UTC

After much discussion, its decided that it should be dependent on Bug 1112238
as the extra files on slave volume is specifically seen with History crawl. Hence moving it to ASSIGNED and marking it as dependent.

Comment 11 Vijaykumar Koppad 2014-08-05 09:31:09 UTC

verified on the build glusterfs-3.6.0.27-1

Comment 15 errata-xmlrpc 2014-09-22 19:41:36 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-1278.html