Bug 1110311

Summary:

BVT: Rebalance status reporting failure

Product:

[Red Hat Storage] Red Hat Gluster Storage

Reporter:

Lalatendu Mohanty <lmohanty>

Component:

glusterfs

Assignee:

vsomyaju

Status:

CLOSED ERRATA

QA Contact:

shylesh <shmohan>

Severity:

urgent

Docs Contact:

Priority:

urgent

Version:

rhgs-3.0

CC:

amukherj, asrivast, lmohanty, nbalacha, nsathyan, shmohan, ssamanta, vsomyaju

Target Milestone:

---

Target Release:

RHGS 3.0.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

glusterfs-3.6.0.25-1

Doc Type:

Bug Fix

Doc Text:

Cause: Cause is mentioned in link : https://bugzilla.redhat.com/show_bug.cgi?id=1110694#c4 This bug hits when two or more than two rebalance process is acting on same file. After add-brick, if a file hashes to newly added brick, lookup on newly added brick will fail. In such cases lookup is performed on all the nodes and if linkto file is found, it gets deleted. If rebalance-1 creates a linkto-file on newly added brick as a part of everywhere_lookup, this linkto-file will be deleted by rebalance-2 being treated as a stale file. Consequence: Rebalance may see failure because linkto-file created by rebalance-1 will be deleted by rebalance-2. Fix: Added more carefull checks in dht xlator, so that if fd-is open on a linkto file or file already got migrated then file wont be deleted. Result: If file is already under migration or migrated, is should not deleted.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2014-09-22 19:41:51 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Rebalance logs from the node where commands were executed	none
Rebalance logs from all nodes	none
Rebalance logs from node with failure	none
Rebalance logs from node with failure	none

Description Lalatendu Mohanty 2014-06-17 12:27:00 UTC

Description of problem:

The test is about "Peer probing while rebalance is in progress". Though peer probe works fine when rebalance is going on , but there is failure seen in rebalance status.

BVT ran twice on the same build and this test case passed once and failed once.

However I couldn't find any failure message in the rebalance logs.

Version-Release number of selected component (if applicable):
glusterfs-server-3.6.0.17-1.el6rhs.x86_64

How reproducible:
Intermittently

Steps to Reproduce:
1. Create a dht gluster volume with 2 bricks.
2. Mount it on a client and create data on the mount point.

qeCreateData --small-files $MOUNT_POINT 10000 10K
qeCreateData --medium-files $MOUNT_POINT 1000 10M

mkdir -p ${MOUNT_POINT}/large
for i in `seq 1 5`
do
rlRun "dd if=/dev/zero of=${MOUNT_POINT}/large/${i}.large bs=1M count=10000"
done

3. Add a new brick using "gluster volume add-brick"
4. Start the reblance process . "gluster volume rebalance $VOL_NAME start"
5. Perform a peer probe for a new node. "gluster peer probe $NEW_HOST"
6. Check rebalance status "gluster volume rebalance $VOL_NAME status "

Actual results:

:: [ 04:32:26 ] :: The rebalance is complete, logging final status

Node Rebalanced-files size scanned failures skipped status run time in secs
-- ----------- ---- ------- -------- ------- ------ ----------------
localhost 19 190.0KB 11457 1 1827 completed 32.00
rhsauto057 1799 1.7GB 12823 0 2 completed 145.00

Expected results:

Rebalance should not fail.

Additional info:

Comment 1 Lalatendu Mohanty 2014-06-17 12:28:26 UTC

Created attachment 909580 [details]
Rebalance logs from the node where commands were executed

Comment 2 Lalatendu Mohanty 2014-06-17 12:37:38 UTC

Created attachment 909582 [details]
Rebalance logs from all nodes

Comment 3 Atin Mukherjee 2014-06-17 14:02:59 UTC

Can you please attach glusterd log?

Comment 4 Atin Mukherjee 2014-06-19 08:33:38 UTC

The statistics of the log files and cli output doesn't match. Lala, can you please upload the correct set of logs for this case?

Comment 5 Lalatendu Mohanty 2014-06-19 14:51:37 UTC

Looking in to the automation to see why it has wrong logs. The good news is I am not seeing the issue in the recent runs.

Comment 6 Lalatendu Mohanty 2014-06-23 07:19:40 UTC

Found similar issue in yesterday's BVT run: https://beaker.engineering.redhat.com/jobs/676309 with build: glusterfs-3.6.0.20-1.el6rhs.x86_64.rpm

I can see failures in rebalance log i.e. rebalvol-rebalance.log

[2014-06-22 22:50:26.335949] E [dht-common.c:2037:dht_vgetxattr_cbk] 0-rebalvol-dht: vgetxattr: Subvolume rebalvol-client-0 returned -1 (Stale file handle)
[2014-06-22 22:50:26.336185] E [MSGID: 109023] [dht-rebalance.c:1416:gf_defrag_migrate_data] 0-rebalvol-dht: Migrate file failed:Failed to get node-uuid for /symlinks/23
[2014-06-22 22:50:26.606546] I [dht-rebalance.c:823:dht_migrate_file] 0-rebalvol-dht: /symlinks/16: attempting to move from rebalvol-client-1 to rebalvol-client-2
[2014-06-22 22:50:26.640725] E [dht-rebalance.c:1470:gf_defrag_migrate_data] 0-rebalvol-dht: /symlinks/22: failed to get trusted.distribute.linkinfo key - Stale file handle


[2014-06-22 22:50:27.802194] I [MSGID: 109028] [dht-rebalance.c:2008:gf_defrag_status_get] 0-glusterfs: Files migrated: 23, size: 485, lookups: 211, failures: 1, skipped: 0
[2014-06-22 22:50:28.134909] I [MSGID: 109028] [dht-rebalance.c:2008:gf_defrag_status_get] 0-glusterfs: Files migrated: 28, size: 630, lookups: 226, failures: 1, skipped: 0
[2014-06-22 22:50:28.818404] I [MSGID: 109028] [dht-rebalance.c:2008:gf_defrag_status_get] 0-glusterfs: Files migrated: 43, size: 1177, lookups: 385, failures: 1, skipped: 0

Comment 7 Lalatendu Mohanty 2014-06-23 07:20:42 UTC

Created attachment 911310 [details]
Rebalance logs from node with failure

Comment 8 Atin Mukherjee 2014-06-23 07:29:27 UTC

Initial analysis as follows:

/symlinks/23 file migration failed with the following log:

[2014-06-22 22:50:26.330680] W [MSGID: 109007] [dht-common.c:1099:dht_lookup_everywhere_cbk] 0-rebalvol-dht: multiple subvolumes (rebalvol-client-0 and rebalvol-client-2) have file /symlinks/23 (preferably rename the file in the backend, and do a fresh lookup)
[2014-06-22 22:50:26.332788] W [client-rpc-fops.c:240:client3_3_mknod_cbk] 0-rebalvol-client-2: remote operation failed: File exists. Path: /symlinks/23
[2014-06-22 22:50:26.334622] W [dht-linkfile.c:49:dht_linkfile_lookup_cbk] 0-rebalvol-dht: got non-linkfile rebalvol-client-2:/symlinks/23, gfid = 5dc75005-5a88-4097-a88a-3b4dc82ef831
[2014-06-22 22:50:26.335949] E [dht-common.c:2037:dht_vgetxattr_cbk] 0-rebalvol-dht: vgetxattr: Subvolume rebalvol-client-0 returned -1 (Stale file handle)

And this is why the failures count was incremented by 1. 
As far as glusterd is concerned, it has shown the correct stats. 

Assigning it to the DHT team.

Comment 9 Lalatendu Mohanty 2014-06-23 07:30:58 UTC

There is another old bug on rebalance failure bz 1053537 i.e. https://bugzilla.redhat.com/show_bug.cgi?id=1053537. Not sure if the root cause of this bug is related with it.

Comment 10 vsomyaju 2014-06-24 12:24:30 UTC

Hi Atin,

The problem which you mentioned is because of race in rebalance.

State 1:
                             Brick-1    
                             File-1

State 2:     Add-Brick       Brick-1        Brick-2
                             File-1          
                            (Cached)


State 3:      Lets assume rebalance on brick-2 ran first, it will do the lookup
              on hashed which is brick-2 and becuase file is not present on brick-2, dht_lookup_everywhere should get called.


If just before sending the lookup fops over the wire, rebalance on brick-1 will get called and it will migrate the file, and sets the permission from source to sink, then at this point in time in the system there are two files and 0 link file. 

Now if lookup of rebalance on brick-2 proceeds, it will see two files and following log will be seen

dht-common.c:1099:dht_lookup_everywhere_cbk] 0-rebalvol-dht: multiple subvolumes (rebalvol-client-0 and rebalvol-client-2) have file /symlinks/23 (preferably rename the file in the backend, and do a fresh lookup)

<<MKNOD WILL Fail as file is already migrated by brick-1 rebalance>>

[2014-06-22 22:50:26.332788] W [client-rpc-fops.c:240:client3_3_mknod_cbk] 0-rebalvol-client-2: remote operation failed: File exists. Path: /symlinks/23

<<As I mentioned, a non-linkfile will be present >>
[2014-06-22 22:50:26.334622] W [dht-linkfile.c:49:dht_linkfile_lookup_cbk] 0-rebalvol-dht: got non-linkfile rebalvol-client-2:/symlinks/23, gfid = 5dc75005-5a88-4097-a88a-3b4dc82ef831

<<Now just before getxattr, rebalance might have got finished, and hashed file might got have removed, and this will fail>>

[2014-06-22 22:50:26.335949] E [dht-common.c:2037:dht_vgetxattr_cbk] 0-rebalvol-dht: vgetxattr: Subvolume rebalvol-client-0 returned -1 (Stale file handle)

Comment 11 vsomyaju 2014-06-24 12:29:55 UTC

EDIT:

Instead of hased, cached file initally present at brick-1, which got migrated to 
brick-2  will be removed

<<Now just before getxattr, rebalance might have got finished, and "CACHED" file might got have removed from brick-1, and getxattr for node-uuid form brick-1 will fail>>

[2014-06-22 22:50:26.335949] E [dht-common.c:2037:dht_vgetxattr_cbk] 0-rebalvol-dht: vgetxattr: Subvolume rebalvol-client-0 returned -1 (Stale file handle)

Comment 12 Lalatendu Mohanty 2014-06-30 11:44:12 UTC

Reproduced the issue again on build glusterfs-server-3.6.0.22-1. Attaching logs.

Comment 13 Lalatendu Mohanty 2014-06-30 11:45:59 UTC

Created attachment 913381 [details]
Rebalance logs from node with failure

Comment 14 Raghavendra G 2014-07-01 07:25:18 UTC

Venkatesh,

Couple of questions:

1. Why is server returning ESTALE instead of ENOENT? If its an ENOENT, can't we send another lookup to hashed subvol, to account for the fact that the file might be migrated already and treat this as a non-failure case?

2. How are we handling races b/w rebalance vs unlink from I/O Path? Are we considering file not existing because of an unlink on I/O path also as a failure?

regards,
Raghavendra

Comment 15 Atin Mukherjee 2014-07-01 14:13:45 UTC

Since this issue is getting reproduced in almost every BVT, couldn't we fix this race asap?

Comment 16 Lalatendu Mohanty 2014-07-01 14:16:34 UTC

This again got reproduced in today's BVT run https://beaker.engineering.redhat.com/jobs/682433

Comment 17 Lalatendu Mohanty 2014-07-01 14:18:45 UTC

We should mark it as blocker as we are able to reproduce it almost every time on build glusterfs-server-3.6.0.22-1, which was not the case in rhs2.1

Comment 18 vsomyaju 2014-07-07 09:40:44 UTC

Race gets fixed with patch:  http://review.gluster.org/#/c/8231/


Race Explanation: https://bugzilla.redhat.com/show_bug.cgi?id=1110694#c4
Similar Race    : In comment number 10.

Patch:       http://review.gluster.org/#/c/8231/

Comment 22 Lalatendu Mohanty 2014-08-20 12:50:26 UTC

Not seeing this issue on glusterfs-3.6.0.25-1 and glusterfs-3.6.0.27-1. Marking it verified.

Comment 24 errata-xmlrpc 2014-09-22 19:41:51 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-1278.html