Bug 1110694
Summary: | [DHT:REBALANCE]: Rebalance failures are seen with error message " remote operation failed: File exists" | ||||||
---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | shylesh <shmohan> | ||||
Component: | distribute | Assignee: | vsomyaju | ||||
Status: | CLOSED ERRATA | QA Contact: | shylesh <shmohan> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | rhgs-3.0 | CC: | amukherj, nbalacha, nsathyan, rgowdapp, smohan, ssamanta, surs, vagarwal, vsomyaju | ||||
Target Milestone: | --- | ||||||
Target Release: | RHGS 3.0.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | glusterfs-3.6.0.28-1 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1116150 (view as bug list) | Environment: | |||||
Last Closed: | 2014-09-22 19:42:03 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1115937, 1116150, 1117661, 1138385, 1139995 | ||||||
Attachments: |
|
Description
shylesh
2014-06-18 09:25:12 UTC
Created attachment 911931 [details]
Rebalane-Race
Added an attachment which describes the race condition. From the logs, it seems to be a race condition between two rebalance prcocess. STATE 1: BRICK-1 only one brick Cached File in the system STATE 2: Add brick-2 BRICK-1 BRICK-2 STATE 3: Lookup of File on brick-2 by this node's rebalance will fail because hashed file is not created yet. So dht_lookup_everywhere is about to get called. STATE 4: As part of lookup link file at brick-2 will be created. STATE 5: getxattr to check that cached file belongs to this node is done STATE 6: dht_lookup_everywhere_cbk detects the link created by rebalance-1. It will unlink it. STATE 7: getxattr at the link file with "pathinfo" key will be called will fail as the link file is deleted by rebalance on node-2 With the release: I see much more failures reported during remove-brick: Node Rebalanced-files size scanned failures skipped status r un time in secs --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 0 0Bytes 458684 0 0 in progress 5790.00 172.17.69.1 114615 3.4MB 293904 30561 0 in progress 5790.00 30561 and counting... However, I do not see any errors reported in the logs. Sent on downstream branch: https://code.engineering.redhat.com/gerrit/#/c/29357/ Additional patches need to be merged to complete this fix bug. They are currently being reviewed. Moving this back to POST. Gluster-server version ====================== [root@rhssvm-swift2 ~]# gluster --version glusterfs 3.6.0.28 built on Sep 3 2014 10:13:12 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU General Public License. glusterfs-client version ======================== [root@rhs-client10 10]# glusterfs --version glusterfs 3.6.0.28 built on Sep 3 2014 10:13:11 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2013 Red Hat, Inc. <http://www.redhat.com/> GlusterFS comes with ABSOLUTELY NO WARRANTY. It is licensed to you under your choice of the GNU Lesser General Public License, version 3 or any later version (LGPLv3 or later), or the GNU General Public License, version 2 (GPLv2), in all cases as published by the Free Software Foundation. Tested the following steps: 1. Created a distributed volume with 2 bricks 2. From mount point ran for i in {1..10} do mkdir $i ; cd $i; cp -R /etc/* .; done to create some data 3. Added a brick and then executed rebalance & waited till the status became completed. 4. grep "remote operation failed: File exists" /var/log/glusterfs/vol1-rebalance.log | wc -l 0 Hence this bug is verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-1278.html |