1730175 – Seeing failure due to "getxattr err for dir [No data available]" in rebalance

Bug 1730175 - Seeing failure due to "getxattr err for dir [No data available]" in rebalance

Summary: Seeing failure due to "getxattr err for dir [No data available]" in rebalance

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	distribute
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Susant Kumar Palai
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-07-16 05:08 UTC by Susant Kumar Palai
Modified:	2019-07-18 10:19 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2019-07-18 10:19:57 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Gluster.org Gerrit	23053	0	None	Merged	dht: log getxattr failure for node-uuid at \"DEBUG\"	2019-07-18 10:19:56 UTC

Description Susant Kumar Palai 2019-07-16 05:08:33 UTC

Description of problem:
While running rebalance on a heterogeneous brick volume, saw failure in rebalance due to "[2019-07-05 09:36:55.653538] E [MSGID: 109039] [dht-common.c:4245:dht_find_local_subvol_cbk] 0-vol6-dht: getxattr err for dir [No data available]" error.

How reproducible:
1/1

Steps to Reproduce:
1. Create a 2 brick volume, where brick1 is of 20G and brick2 of 5G
2. Fuse mount the volume on a client node.
3. Check the hash layout on the bricks.
4. Start running I/O from the mount point.
5. While the I/O is still in progress, disable the cluster.weighted-rebalance volume option.
6. Let the I/O continue to run and add-brick of 10G to the volume.
7. Trigger rebalance on the volume.
8. Check hash layout on the back-end bricks.

Actual results:
Seeing failure in rebalance 

Expected results:
Rebalance should complete successfully.

Comment 1 Worker Ant 2019-07-16 05:14:57 UTC

REVIEW: https://review.gluster.org/23053 (dht: log getxattr failure for node-uuid at \"DEBUG\") posted (#1) for review on master by Susant Palai

Comment 2 Nithya Balachandran 2019-07-16 05:16:15 UTC

This is because of the mismatch in the xattr when it is a pure dist versus a dist-rep or dist-disperse. This should not prevent the rebalance from proceeding.

Comment 3 Susant Kumar Palai 2019-07-16 05:18:47 UTC

(In reply to Nithya Balachandran from comment #2)
> This is because of the mismatch in the xattr when it is a pure dist versus a
> dist-rep or dist-disperse. This should not prevent the rebalance from
> proceeding.

Correct. But since there was an error logged in dht_find_local_subvol_cbk, it creates confusion that if it is a real error. Just moved the log to DEBUG as the parent function logs if both attempts failed.

Comment 4 Nithya Balachandran 2019-07-16 05:23:59 UTC

(In reply to Susant Kumar Palai from comment #3)
> (In reply to Nithya Balachandran from comment #2)
> > This is because of the mismatch in the xattr when it is a pure dist versus a
> > dist-rep or dist-disperse. This should not prevent the rebalance from
> > proceeding.
> 
> Correct. But since there was an error logged in dht_find_local_subvol_cbk,
> it creates confusion that if it is a real error. Just moved the log to DEBUG
> as the parent function logs if both attempts failed.

Then the description is incorrect. 

"Actual results:
Seeing failure in rebalance 

Expected results:
Rebalance should complete successfully."



Rebalance will complete successfully. This message does not cause it to stop.

Comment 5 Worker Ant 2019-07-18 10:19:57 UTC

REVIEW: https://review.gluster.org/23053 (dht: log getxattr failure for node-uuid at \"DEBUG\") merged (#3) on master by N Balachandran

Note You need to log in before you can comment on or make changes to this bug.