Bug 1115428 - [DHT-REBALANCE]: Few files are missing after add-brick and rebalance
Summary: [DHT-REBALANCE]: Few files are missing after add-brick and rebalance
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: distribute
Version: rhgs-3.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: RHGS 3.0.0
Assignee: Raghavendra G
QA Contact: shylesh
URL:
Whiteboard:
Depends On:
Blocks: 1116236
TreeView+ depends on / blocked
 
Reported: 2014-07-02 10:51 UTC by shylesh
Modified: 2015-05-13 16:56 UTC (History)
6 users (show)

Fixed In Version: glusterfs-3.6.0.25-1
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1116236 (view as bug list)
Environment:
Last Closed: 2014-09-22 19:43:37 UTC
Embargoed:


Attachments (Terms of Use)
rebalance logs (147.28 KB, application/gzip)
2014-07-02 10:51 UTC, shylesh
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2014:1278 0 normal SHIPPED_LIVE Red Hat Storage Server 3.0 bug fix and enhancement update 2014-09-22 23:26:55 UTC

Description shylesh 2014-07-02 10:51:20 UTC
Created attachment 914098 [details]
rebalance logs

Description of problem:
Created few hidden files, add-brick followed by rebalance causes some of the files to be missed from the mount point

Version-Release number of selected component (if applicable):

3.6.0.22-1.el6rhs.x86_64

How reproducible:
Manually not reproducible, only through automation

Steps to Reproduce:
1.created a 2 brick distribute volume
2. create some hidden files on the mount point
3. add one more brick and rebalance

Actual results:
one file missing from the mount point 

 
from rebalance logs
===============
From Node-0
================

[2014-07-01 10:14:29.139736] I [dht-common.c:1113:dht_lookup_everywhere_cbk] 0-testvol-dht: deleting stale linkfile /hidden/.16_hidden on testvol-client-2




From node-2
===========
[2014-07-01 10:14:29.144045] I [dht-rebalance.c:823:dht_migrate_file] 0-testvol-dht: /hidden/.16_hidden: attempting to move from testvol-client-0 to testvol-client-2
[2014-07-01 10:14:29.166731] I [MSGID: 109022] [dht-rebalance.c:1067:dht_migrate_file] 0-testvol-dht: completed migration of /hidden/.16_hidden from subvolume testvol-client-0 to testvol-client-2




attaching the complete logs

Comment 2 Vivek Agarwal 2014-07-03 06:15:04 UTC
Per discussion, Marking it as a blocker for Denali.

Comment 4 Raghavendra G 2014-07-03 15:24:19 UTC
Currently lookup-everywhere is deleting any link file it finds. To prevent deleting linkfiles under migration, it checks the number of fds opened and deletes the file only if count is zero. However, even this check is not foolproof and results can vary due to race-condition. Consider the following scenario:

1. rebalance process p1 lookup everywhere returns success on a file.
2. rebalance process p2 identifies the file for migration and initiates migration - opens an fd on dst node.
3. p1 goes ahead with deletion of file, since it is a linkfile AND there are no open-fds.
4. p2 completes migration without any errors since fd was opened before p1 deleted the file.
5. Though, we do lookup on file after migration, the result is logged in DEBUG log-level and are logged in the logs attached here. Also, the current code doesn't consider the lookup failure as rebalance failure.

Correct fix should make unlink of link-file and check for open-fd count as atomic operations.

Comment 6 shylesh 2014-09-19 12:19:58 UTC
Not seen on the latest build glusterfs-3.6.0.28-1

Comment 8 errata-xmlrpc 2014-09-22 19:43:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-1278.html


Note You need to log in before you can comment on or make changes to this bug.