Bug 1110311 - BVT: Rebalance status reporting failure
Summary: BVT: Rebalance status reporting failure
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterfs
Version: rhgs-3.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: RHGS 3.0.0
Assignee: vsomyaju
QA Contact: shylesh
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-06-17 12:27 UTC by Lalatendu Mohanty
Modified: 2015-05-13 17:01 UTC (History)
8 users (show)

Fixed In Version: glusterfs-3.6.0.25-1
Doc Type: Bug Fix
Doc Text:
Cause: Cause is mentioned in link : https://bugzilla.redhat.com/show_bug.cgi?id=1110694#c4 This bug hits when two or more than two rebalance process is acting on same file. After add-brick, if a file hashes to newly added brick, lookup on newly added brick will fail. In such cases lookup is performed on all the nodes and if linkto file is found, it gets deleted. If rebalance-1 creates a linkto-file on newly added brick as a part of everywhere_lookup, this linkto-file will be deleted by rebalance-2 being treated as a stale file. Consequence: Rebalance may see failure because linkto-file created by rebalance-1 will be deleted by rebalance-2. Fix: Added more carefull checks in dht xlator, so that if fd-is open on a linkto file or file already got migrated then file wont be deleted. Result: If file is already under migration or migrated, is should not deleted.
Clone Of:
Environment:
Last Closed: 2014-09-22 19:41:51 UTC
Embargoed:


Attachments (Terms of Use)
Rebalance logs from the node where commands were executed (56.89 KB, text/x-log)
2014-06-17 12:28 UTC, Lalatendu Mohanty
no flags Details
Rebalance logs from all nodes (13.12 KB, application/x-gzip)
2014-06-17 12:37 UTC, Lalatendu Mohanty
no flags Details
Rebalance logs from node with failure (714.79 KB, text/x-log)
2014-06-23 07:20 UTC, Lalatendu Mohanty
no flags Details
Rebalance logs from node with failure (85.89 KB, application/x-gzip)
2014-06-30 11:45 UTC, Lalatendu Mohanty
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2014:1278 0 normal SHIPPED_LIVE Red Hat Storage Server 3.0 bug fix and enhancement update 2014-09-22 23:26:55 UTC

Description Lalatendu Mohanty 2014-06-17 12:27:00 UTC
Description of problem:

The test is about "Peer probing while rebalance is in progress". Though peer probe works fine when rebalance is going on , but there is failure seen in rebalance status.

BVT ran twice on the same build and this test case passed once and failed once.

However I couldn't find any failure message in the rebalance logs.

Version-Release number of selected component (if applicable):
glusterfs-server-3.6.0.17-1.el6rhs.x86_64

How reproducible:
Intermittently

Steps to Reproduce:
1. Create a dht gluster volume with 2 bricks.
2. Mount it on a client and create data on the mount point.

qeCreateData --small-files $MOUNT_POINT 10000 10K
qeCreateData --medium-files  $MOUNT_POINT  1000 10M

mkdir -p ${MOUNT_POINT}/large
for i in `seq 1 5`
do
    rlRun "dd if=/dev/zero of=${MOUNT_POINT}/large/${i}.large bs=1M count=10000"
done

3. Add a new brick using "gluster volume add-brick"
4. Start the reblance process . "gluster volume rebalance  $VOL_NAME  start"
5. Perform a peer probe for a new node. "gluster peer probe $NEW_HOST"
6. Check rebalance status "gluster volume rebalance $VOL_NAME status "

Actual results:

:: [ 04:32:26 ] ::  The rebalance is complete, logging final status

Node Rebalanced-files  size   scanned  failures   skipped    status    run time in secs
--   -----------       ----   -------  --------   -------    ------    ----------------
localhost  19       190.0KB    11457    1          1827      completed  32.00
rhsauto057 1799     1.7GB      12823    0             2      completed  145.00

Expected results:

Rebalance should not fail.

Additional info:

Comment 1 Lalatendu Mohanty 2014-06-17 12:28:26 UTC
Created attachment 909580 [details]
Rebalance logs from the node where commands were executed

Comment 2 Lalatendu Mohanty 2014-06-17 12:37:38 UTC
Created attachment 909582 [details]
Rebalance logs from all nodes

Comment 3 Atin Mukherjee 2014-06-17 14:02:59 UTC
Can you please attach glusterd log?

Comment 4 Atin Mukherjee 2014-06-19 08:33:38 UTC
The statistics of the log files and cli output doesn't match. Lala, can you please upload the correct set of logs for this case?

Comment 5 Lalatendu Mohanty 2014-06-19 14:51:37 UTC
Looking in to the automation to see why it has wrong logs. The good news is I am not seeing the issue in the recent runs.

Comment 6 Lalatendu Mohanty 2014-06-23 07:19:40 UTC
Found similar issue in yesterday's BVT run: https://beaker.engineering.redhat.com/jobs/676309 with build: glusterfs-3.6.0.20-1.el6rhs.x86_64.rpm

I can see failures in rebalance log i.e. rebalvol-rebalance.log

[2014-06-22 22:50:26.335949] E [dht-common.c:2037:dht_vgetxattr_cbk] 0-rebalvol-dht: vgetxattr: Subvolume rebalvol-client-0 returned -1 (Stale file handle)
[2014-06-22 22:50:26.336185] E [MSGID: 109023] [dht-rebalance.c:1416:gf_defrag_migrate_data] 0-rebalvol-dht: Migrate file failed:Failed to get node-uuid for /symlinks/23
[2014-06-22 22:50:26.606546] I [dht-rebalance.c:823:dht_migrate_file] 0-rebalvol-dht: /symlinks/16: attempting to move from rebalvol-client-1 to rebalvol-client-2
[2014-06-22 22:50:26.640725] E [dht-rebalance.c:1470:gf_defrag_migrate_data] 0-rebalvol-dht: /symlinks/22: failed to get trusted.distribute.linkinfo key - Stale file handle


[2014-06-22 22:50:27.802194] I [MSGID: 109028] [dht-rebalance.c:2008:gf_defrag_status_get] 0-glusterfs: Files migrated: 23, size: 485, lookups: 211, failures: 1, skipped: 0
[2014-06-22 22:50:28.134909] I [MSGID: 109028] [dht-rebalance.c:2008:gf_defrag_status_get] 0-glusterfs: Files migrated: 28, size: 630, lookups: 226, failures: 1, skipped: 0
[2014-06-22 22:50:28.818404] I [MSGID: 109028] [dht-rebalance.c:2008:gf_defrag_status_get] 0-glusterfs: Files migrated: 43, size: 1177, lookups: 385, failures: 1, skipped: 0

Comment 7 Lalatendu Mohanty 2014-06-23 07:20:42 UTC
Created attachment 911310 [details]
Rebalance logs from node with failure

Comment 8 Atin Mukherjee 2014-06-23 07:29:27 UTC
Initial analysis as follows:

/symlinks/23 file migration failed with the following log:

[2014-06-22 22:50:26.330680] W [MSGID: 109007] [dht-common.c:1099:dht_lookup_everywhere_cbk] 0-rebalvol-dht: multiple subvolumes (rebalvol-client-0 and rebalvol-client-2) have file /symlinks/23 (preferably rename the file in the backend, and do a fresh lookup)
[2014-06-22 22:50:26.332788] W [client-rpc-fops.c:240:client3_3_mknod_cbk] 0-rebalvol-client-2: remote operation failed: File exists. Path: /symlinks/23
[2014-06-22 22:50:26.334622] W [dht-linkfile.c:49:dht_linkfile_lookup_cbk] 0-rebalvol-dht: got non-linkfile rebalvol-client-2:/symlinks/23, gfid = 5dc75005-5a88-4097-a88a-3b4dc82ef831
[2014-06-22 22:50:26.335949] E [dht-common.c:2037:dht_vgetxattr_cbk] 0-rebalvol-dht: vgetxattr: Subvolume rebalvol-client-0 returned -1 (Stale file handle)

And this is why the failures count was incremented by 1. 
As far as glusterd is concerned, it has shown the correct stats. 

Assigning it to the DHT team.

Comment 9 Lalatendu Mohanty 2014-06-23 07:30:58 UTC
There is another old bug on rebalance failure bz 1053537 i.e. https://bugzilla.redhat.com/show_bug.cgi?id=1053537. Not sure if the root cause of this bug is related with it.

Comment 10 vsomyaju 2014-06-24 12:24:30 UTC
Hi Atin,

The problem which you mentioned is because of race in rebalance.

State 1:
                             Brick-1    
                             File-1

State 2:     Add-Brick       Brick-1        Brick-2
                             File-1          
                            (Cached)


State 3:      Lets assume rebalance on brick-2 ran first, it will do the lookup
              on hashed which is brick-2 and becuase file is not present on brick-2, dht_lookup_everywhere should get called.


If just before sending the lookup fops over the wire, rebalance on brick-1 will get called and it will migrate the file, and sets the permission from source to sink, then at this point in time in the system there are two files and 0 link file. 

Now if lookup of rebalance on brick-2 proceeds, it will see two files and following log will be seen

dht-common.c:1099:dht_lookup_everywhere_cbk] 0-rebalvol-dht: multiple subvolumes (rebalvol-client-0 and rebalvol-client-2) have file /symlinks/23 (preferably rename the file in the backend, and do a fresh lookup)

<<MKNOD WILL Fail as file is already migrated by brick-1 rebalance>>

[2014-06-22 22:50:26.332788] W [client-rpc-fops.c:240:client3_3_mknod_cbk] 0-rebalvol-client-2: remote operation failed: File exists. Path: /symlinks/23

<<As I mentioned, a non-linkfile will be present >>
[2014-06-22 22:50:26.334622] W [dht-linkfile.c:49:dht_linkfile_lookup_cbk] 0-rebalvol-dht: got non-linkfile rebalvol-client-2:/symlinks/23, gfid = 5dc75005-5a88-4097-a88a-3b4dc82ef831

<<Now just before getxattr, rebalance might have got finished, and hashed file might got have removed, and this will fail>>

[2014-06-22 22:50:26.335949] E [dht-common.c:2037:dht_vgetxattr_cbk] 0-rebalvol-dht: vgetxattr: Subvolume rebalvol-client-0 returned -1 (Stale file handle)

Comment 11 vsomyaju 2014-06-24 12:29:55 UTC
EDIT:

Instead of hased, cached file initally present at brick-1, which got migrated to 
brick-2  will be removed

<<Now just before getxattr, rebalance might have got finished, and "CACHED" file might got have removed from brick-1, and getxattr for node-uuid form brick-1 will fail>>

[2014-06-22 22:50:26.335949] E [dht-common.c:2037:dht_vgetxattr_cbk] 0-rebalvol-dht: vgetxattr: Subvolume rebalvol-client-0 returned -1 (Stale file handle)

Comment 12 Lalatendu Mohanty 2014-06-30 11:44:12 UTC
Reproduced the issue again on build glusterfs-server-3.6.0.22-1. Attaching logs.

Comment 13 Lalatendu Mohanty 2014-06-30 11:45:59 UTC
Created attachment 913381 [details]
Rebalance logs from node with failure

Comment 14 Raghavendra G 2014-07-01 07:25:18 UTC
Venkatesh,

Couple of questions:

1. Why is server returning ESTALE instead of ENOENT? If its an ENOENT, can't we send another lookup to hashed subvol, to account for the fact that the file might be migrated already and treat this as a non-failure case?

2. How are we handling races b/w rebalance vs unlink from I/O Path? Are we considering file not existing because of an unlink on I/O path also as a failure?

regards,
Raghavendra

Comment 15 Atin Mukherjee 2014-07-01 14:13:45 UTC
Since this issue is getting reproduced in almost every BVT, couldn't we fix this race asap?

Comment 16 Lalatendu Mohanty 2014-07-01 14:16:34 UTC
This again got reproduced in today's BVT run https://beaker.engineering.redhat.com/jobs/682433

Comment 17 Lalatendu Mohanty 2014-07-01 14:18:45 UTC
We should mark it as blocker as we are able to reproduce it almost every time on build glusterfs-server-3.6.0.22-1, which was not the case in rhs2.1

Comment 18 vsomyaju 2014-07-07 09:40:44 UTC
Race gets fixed with patch:  http://review.gluster.org/#/c/8231/


Race Explanation: https://bugzilla.redhat.com/show_bug.cgi?id=1110694#c4
Similar Race    : In comment number 10.

Patch:       http://review.gluster.org/#/c/8231/

Comment 22 Lalatendu Mohanty 2014-08-20 12:50:26 UTC
Not seeing this issue on glusterfs-3.6.0.25-1 and glusterfs-3.6.0.27-1. Marking it verified.

Comment 24 errata-xmlrpc 2014-09-22 19:41:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-1278.html


Note You need to log in before you can comment on or make changes to this bug.