Bug 1385045 - [DHT]] A file and its hardlink are lost during rebalance+lookup
Summary: [DHT]] A file and its hardlink are lost during rebalance+lookup
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: distribute
Version: rhgs-3.2
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Nithya Balachandran
QA Contact: Prasad Desala
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-14 15:47 UTC by Prasad Desala
Modified: 2016-12-13 07:11 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-24 06:55:04 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Prasad Desala 2016-10-14 15:47:04 UTC
Description of problem:
=======================
f1776 and its hardlink fl1776 are lost after the below steps were performed,

Version-Release number of selected component (if applicable):

3.8.4-2.26.git0a405a4.el7rhgs.x86_64

Steps to Reproduce:
===================
1) Create a distributed replica volume and start it.
2) Enable md-cache settings on the volume ( Please see gluster volume info output for enabled md-cache settings)
3) Fuse mount the volume on multiple clients.
4) Perform below tasks simultaneously from multiple clients,
     a) From client-1, touch -->  for i in {1..20000};do touch f$i;done
     b) From client-2, create hardlinks for the created files , for i in {1..20000};do ln f$i fl$i;done
     c) From client-3, change the permissions for the created files, for i in {1..20000};do ln f$i fl$i;done
     d) From client-4, do a continuous lookup.
5) While the tasks in step-4 are in progress, add few bricks to the volume and start rebalance.
Wait till step-4 and step-5 completes.

Check for all the files that were created from the mount point.

Below file and its hardlink are missing,

ls: cannot access f1776: No such file or directory
ls: cannot access fl1776: No such file or directory

Actual results:

A file and its hardlink are missing.

Expected results:

All files should be present.

Comment 4 Prasad Desala 2016-10-17 08:51:31 UTC
In Description -> Steps to reproduce -> 4-> c, command that was used for changing the file permission is 
for i in {1..20000};chmod 660 f$i;done

Comment 6 Poornima G 2016-10-19 05:30:10 UTC
Were the files present on the bricks? Does a fresh mount should up the missing files?

Comment 7 Poornima G 2016-10-19 05:38:21 UTC
Also, is this test run on 3.2 build, with md-cache disabled? Was the issue not reproducible without md-cache?

Comment 8 Prasad Desala 2016-10-19 06:21:32 UTC
(In reply to Poornima G from comment #6)
> Were the files present on the bricks? Does a fresh mount should up the
> missing files?
The files are not present on the bricks as well.

Comment 9 Prasad Desala 2016-10-19 06:26:43 UTC
(In reply to Poornima G from comment #7)
> Also, is this test run on 3.2 build, with md-cache disabled? Was the issue
> not reproducible without md-cache?

I repeated the same test twice without md-cache on an another setup but the issue is not reproduced. Updated the same in Comment 5.

Comment 10 Nithya Balachandran 2016-10-27 11:10:56 UTC
Assigning this to Poornima based on comments #5 and #9.

Comment 11 Poornima G 2016-11-04 04:57:18 UTC
As discussed offline, there are 3 issues founf with the above test case:
1. Hard link and the file disappears, but is not reproducible
2. Hardlink is lost, but the file exists, and is consistently reprodicible.
3. chmod at time fails with "No such file or directory"

I suppose this bug is to track issue1? If so, it is not related to md-cache from the initial analysis, i will move the component to dht.

Is there a BZ raised for issue 2? As discussed with dht team, this is likely to happen in dht, hence unrelated to dht.

Issue 3, could be because of stale layout, could you please try the same test case without md-cache as discussed?

Comment 12 Prasad Desala 2016-11-08 13:21:06 UTC
(In reply to Poornima G from comment #11)
> As discussed offline, there are 3 issues founf with the above test case:
> 1. Hard link and the file disappears, but is not reproducible
> 2. Hardlink is lost, but the file exists, and is consistently reprodicible.
> 3. chmod at time fails with "No such file or directory"
> 
> I suppose this bug is to track issue1? If so, it is not related to md-cache
> from the initial analysis, i will move the component to dht.
Yes, this bug is raised for tracking issue 1. I tried to reproduce this issue multiple times with and without md-cache settings but the issue is not reproduced.
 
> Is there a BZ raised for issue 2? As discussed with dht team, this is likely
> to happen in dht, hence unrelated to dht.
Yes, BZ 1392837 has been raised for tracking issue 2.

> Issue 3, could be because of stale layout, could you please try the same
> test case without md-cache as discussed?
Issue 3 is seen without md-cache as well, BZ 1392837 is already raised for the observed issue.

Comment 13 Prasad Desala 2016-11-08 13:25:29 UTC
(In reply to Prasad Desala from comment #12)
> (In reply to Poornima G from comment #11)
> > As discussed offline, there are 3 issues founf with the above test case:
> > 1. Hard link and the file disappears, but is not reproducible
> > 2. Hardlink is lost, but the file exists, and is consistently reprodicible.
> > 3. chmod at time fails with "No such file or directory"
> > 
> > I suppose this bug is to track issue1? If so, it is not related to md-cache
> > from the initial analysis, i will move the component to dht.
> Yes, this bug is raised for tracking issue 1. I tried to reproduce this
> issue multiple times with and without md-cache settings but the issue is not
> reproduced.
>  
> > Is there a BZ raised for issue 2? As discussed with dht team, this is likely
> > to happen in dht, hence unrelated to dht.
> Yes, BZ 1392837 has been raised for tracking issue 2.
> 
> > Issue 3, could be because of stale layout, could you please try the same
> > test case without md-cache as discussed?
> Issue 3 is seen without md-cache as well, BZ 1392837 is already raised for
> the observed issue.
Correcting the bug ID for issue 3... 
The correct BZ is 1385072

Comment 14 Poornima G 2016-11-23 07:18:53 UTC
Since we are not able to reproduce it again, moving it out of 3.2.0.

Prasad,
If you are able to reproduce the data loss of both hardlink and data file, will take a look into this.

Comment 15 Nithya Balachandran 2016-11-24 06:55:04 UTC
As this is not reproducible anymore, I am closing this BZ. Prasad, please reopen if you see it again.


Note You need to log in before you can comment on or make changes to this bug.