Bug 1401402 - [GSS] - DHT hash layout corrupt
Summary: [GSS] - DHT hash layout corrupt
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: distribute
Version: rhgs-3.1
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: ---
: ---
Assignee: Susant Kumar Palai
QA Contact: Prasad Desala
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-12-05 07:22 UTC by Bipin Kunal
Modified: 2020-04-15 14:56 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-12-07 16:19:21 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Bipin Kunal 2016-12-05 07:22:53 UTC
Description of problem: We have hit corruption in DHT layout.
  * DHT start range was found bigger than end range.
  * DHT hash range not well distributed among all the bricks

This was observed after rebalance failure when 100% brick full occured.


Version-Release number of selected component (if applicable):
3.7.1-11.el6rhs.x86_64


Additional info :

100% brick full occurred probably due to running rebalance with "force" command.
Rebalance was executed after adding 4 new nodes with 5 brick each i.e a total of 20 new bricks.

current volume size : 16 nodes, 5 bricks each i.e a total of 80 bricks
Volume type :  Distribute

Comment 8 Susant Kumar Palai 2016-12-07 09:28:20 UTC
Bipin,
  Can you update the brick sizes from all the nodes?

Comment 10 Susant Kumar Palai 2016-12-07 10:48:17 UTC
RCA:

There was bug with weighted-rebalance option in version 3.7.1-11 where the sum of size of all the bricks were stored in an unsigned integer  (uint32_t). For big clusters with larger size bricks like the current one where each brick size is 55TB (totaling to (55TB * 80 = 4.8PB) ), the value will overflow causing incorrect chunk computation, giving rise to overflowing layout every few bricks

We had hit a similar bug before here: https://bugzilla.redhat.com/show_bug.cgi?id=1281946

This was fixed in 3.1.2
Patch: https://code.engineering.redhat.com/gerrit/#/c/64630/

For workaround customer can turn the weighted-rebalance off and remount all the clients or upgrade to  3.1.2.

Comment 13 Bipin Kunal 2016-12-07 16:19:21 UTC
For the new dir creation we saw that dht layout was getting proper hash range sometimes with gs9 bricks with no hash range as it is almost 100%, but we did see hash layout corruption for few of them.

Not sure what is the condition when dht stops giving hash range to new dirs, Might be min-free disk, but not sure.


We tried workaround for now to off weighted-rebalance. 

As of now weighted-rebalance off is working fine. We did lookup from new mounts(all the old mounts were unmounted) and saw layout getting rectified.

We started lookup recursively on all the directory in order to fix the layout for all.

I will close this bug as the fix is already available in newer releases.

Comment 14 Atin Mukherjee 2016-12-07 16:35:57 UTC
This should be marked as closed, current release.


Note You need to log in before you can comment on or make changes to this bug.