Bug 1110311
Summary: | BVT: Rebalance status reporting failure | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Lalatendu Mohanty <lmohanty> | ||||||||||
Component: | glusterfs | Assignee: | vsomyaju | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | shylesh <shmohan> | ||||||||||
Severity: | urgent | Docs Contact: | |||||||||||
Priority: | urgent | ||||||||||||
Version: | rhgs-3.0 | CC: | amukherj, asrivast, lmohanty, nbalacha, nsathyan, shmohan, ssamanta, vsomyaju | ||||||||||
Target Milestone: | --- | ||||||||||||
Target Release: | RHGS 3.0.0 | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Unspecified | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | glusterfs-3.6.0.25-1 | Doc Type: | Bug Fix | ||||||||||
Doc Text: |
Cause:
Cause is mentioned in link :
https://bugzilla.redhat.com/show_bug.cgi?id=1110694#c4
This bug hits when two or more than two rebalance process is acting on same file.
After add-brick, if a file hashes to newly added brick, lookup on newly added brick will fail. In such cases lookup is performed on all the nodes and if linkto file is found, it gets deleted.
If rebalance-1 creates a linkto-file on newly added brick as a part of everywhere_lookup, this linkto-file will be deleted by rebalance-2 being treated as a stale file.
Consequence:
Rebalance may see failure because linkto-file created by rebalance-1 will be deleted by rebalance-2.
Fix:
Added more carefull checks in dht xlator, so that if fd-is open on a linkto file or file already got migrated then file wont be deleted.
Result:
If file is already under migration or migrated, is should not deleted.
|
Story Points: | --- | ||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2014-09-22 19:41:51 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Attachments: |
|
Description
Lalatendu Mohanty
2014-06-17 12:27:00 UTC
Created attachment 909580 [details]
Rebalance logs from the node where commands were executed
Created attachment 909582 [details]
Rebalance logs from all nodes
Can you please attach glusterd log? The statistics of the log files and cli output doesn't match. Lala, can you please upload the correct set of logs for this case? Looking in to the automation to see why it has wrong logs. The good news is I am not seeing the issue in the recent runs. Found similar issue in yesterday's BVT run: https://beaker.engineering.redhat.com/jobs/676309 with build: glusterfs-3.6.0.20-1.el6rhs.x86_64.rpm I can see failures in rebalance log i.e. rebalvol-rebalance.log [2014-06-22 22:50:26.335949] E [dht-common.c:2037:dht_vgetxattr_cbk] 0-rebalvol-dht: vgetxattr: Subvolume rebalvol-client-0 returned -1 (Stale file handle) [2014-06-22 22:50:26.336185] E [MSGID: 109023] [dht-rebalance.c:1416:gf_defrag_migrate_data] 0-rebalvol-dht: Migrate file failed:Failed to get node-uuid for /symlinks/23 [2014-06-22 22:50:26.606546] I [dht-rebalance.c:823:dht_migrate_file] 0-rebalvol-dht: /symlinks/16: attempting to move from rebalvol-client-1 to rebalvol-client-2 [2014-06-22 22:50:26.640725] E [dht-rebalance.c:1470:gf_defrag_migrate_data] 0-rebalvol-dht: /symlinks/22: failed to get trusted.distribute.linkinfo key - Stale file handle [2014-06-22 22:50:27.802194] I [MSGID: 109028] [dht-rebalance.c:2008:gf_defrag_status_get] 0-glusterfs: Files migrated: 23, size: 485, lookups: 211, failures: 1, skipped: 0 [2014-06-22 22:50:28.134909] I [MSGID: 109028] [dht-rebalance.c:2008:gf_defrag_status_get] 0-glusterfs: Files migrated: 28, size: 630, lookups: 226, failures: 1, skipped: 0 [2014-06-22 22:50:28.818404] I [MSGID: 109028] [dht-rebalance.c:2008:gf_defrag_status_get] 0-glusterfs: Files migrated: 43, size: 1177, lookups: 385, failures: 1, skipped: 0 Created attachment 911310 [details]
Rebalance logs from node with failure
Initial analysis as follows: /symlinks/23 file migration failed with the following log: [2014-06-22 22:50:26.330680] W [MSGID: 109007] [dht-common.c:1099:dht_lookup_everywhere_cbk] 0-rebalvol-dht: multiple subvolumes (rebalvol-client-0 and rebalvol-client-2) have file /symlinks/23 (preferably rename the file in the backend, and do a fresh lookup) [2014-06-22 22:50:26.332788] W [client-rpc-fops.c:240:client3_3_mknod_cbk] 0-rebalvol-client-2: remote operation failed: File exists. Path: /symlinks/23 [2014-06-22 22:50:26.334622] W [dht-linkfile.c:49:dht_linkfile_lookup_cbk] 0-rebalvol-dht: got non-linkfile rebalvol-client-2:/symlinks/23, gfid = 5dc75005-5a88-4097-a88a-3b4dc82ef831 [2014-06-22 22:50:26.335949] E [dht-common.c:2037:dht_vgetxattr_cbk] 0-rebalvol-dht: vgetxattr: Subvolume rebalvol-client-0 returned -1 (Stale file handle) And this is why the failures count was incremented by 1. As far as glusterd is concerned, it has shown the correct stats. Assigning it to the DHT team. There is another old bug on rebalance failure bz 1053537 i.e. https://bugzilla.redhat.com/show_bug.cgi?id=1053537. Not sure if the root cause of this bug is related with it. Hi Atin, The problem which you mentioned is because of race in rebalance. State 1: Brick-1 File-1 State 2: Add-Brick Brick-1 Brick-2 File-1 (Cached) State 3: Lets assume rebalance on brick-2 ran first, it will do the lookup on hashed which is brick-2 and becuase file is not present on brick-2, dht_lookup_everywhere should get called. If just before sending the lookup fops over the wire, rebalance on brick-1 will get called and it will migrate the file, and sets the permission from source to sink, then at this point in time in the system there are two files and 0 link file. Now if lookup of rebalance on brick-2 proceeds, it will see two files and following log will be seen dht-common.c:1099:dht_lookup_everywhere_cbk] 0-rebalvol-dht: multiple subvolumes (rebalvol-client-0 and rebalvol-client-2) have file /symlinks/23 (preferably rename the file in the backend, and do a fresh lookup) <<MKNOD WILL Fail as file is already migrated by brick-1 rebalance>> [2014-06-22 22:50:26.332788] W [client-rpc-fops.c:240:client3_3_mknod_cbk] 0-rebalvol-client-2: remote operation failed: File exists. Path: /symlinks/23 <<As I mentioned, a non-linkfile will be present >> [2014-06-22 22:50:26.334622] W [dht-linkfile.c:49:dht_linkfile_lookup_cbk] 0-rebalvol-dht: got non-linkfile rebalvol-client-2:/symlinks/23, gfid = 5dc75005-5a88-4097-a88a-3b4dc82ef831 <<Now just before getxattr, rebalance might have got finished, and hashed file might got have removed, and this will fail>> [2014-06-22 22:50:26.335949] E [dht-common.c:2037:dht_vgetxattr_cbk] 0-rebalvol-dht: vgetxattr: Subvolume rebalvol-client-0 returned -1 (Stale file handle) EDIT: Instead of hased, cached file initally present at brick-1, which got migrated to brick-2 will be removed <<Now just before getxattr, rebalance might have got finished, and "CACHED" file might got have removed from brick-1, and getxattr for node-uuid form brick-1 will fail>> [2014-06-22 22:50:26.335949] E [dht-common.c:2037:dht_vgetxattr_cbk] 0-rebalvol-dht: vgetxattr: Subvolume rebalvol-client-0 returned -1 (Stale file handle) Reproduced the issue again on build glusterfs-server-3.6.0.22-1. Attaching logs. Created attachment 913381 [details]
Rebalance logs from node with failure
Venkatesh, Couple of questions: 1. Why is server returning ESTALE instead of ENOENT? If its an ENOENT, can't we send another lookup to hashed subvol, to account for the fact that the file might be migrated already and treat this as a non-failure case? 2. How are we handling races b/w rebalance vs unlink from I/O Path? Are we considering file not existing because of an unlink on I/O path also as a failure? regards, Raghavendra Since this issue is getting reproduced in almost every BVT, couldn't we fix this race asap? This again got reproduced in today's BVT run https://beaker.engineering.redhat.com/jobs/682433 We should mark it as blocker as we are able to reproduce it almost every time on build glusterfs-server-3.6.0.22-1, which was not the case in rhs2.1 Race gets fixed with patch: http://review.gluster.org/#/c/8231/ Race Explanation: https://bugzilla.redhat.com/show_bug.cgi?id=1110694#c4 Similar Race : In comment number 10. Patch: http://review.gluster.org/#/c/8231/ Not seeing this issue on glusterfs-3.6.0.25-1 and glusterfs-3.6.0.27-1. Marking it verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-1278.html |