Bug 963896
Summary: | DHT - remove-brick - data loss in remove-brick because in DHT 'remove-brick start' makes hash - layout 0000000000000000 for some other brick, no migration and data written after start operation also goes to that brick so on commit it ends in data loss | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Rachana Patel <racpatel> | |
Component: | glusterfs | Assignee: | shishir gowda <sgowda> | |
Status: | CLOSED ERRATA | QA Contact: | amainkar | |
Severity: | urgent | Docs Contact: | ||
Priority: | urgent | |||
Version: | 2.1 | CC: | aavati, amarts, nsathyan, rcyriac, rhs-bugs, vbellur | |
Target Milestone: | --- | Keywords: | Regression | |
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.4.0.10rhs | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 966845 (view as bug list) | Environment: | ||
Last Closed: | 2013-09-23 22:29:53 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 923555, 961632, 966845, 996474 |
Description
Rachana Patel
2013-05-16 18:16:03 UTC
[2013-05-19 11:12:40.218890] C [dht-selfheal.c:559:dht_get_layout_count] 0-shishir: brick2: sng1-client-2 <===subvolume being decommissioned [2013-05-19 11:12:40.219014] C [dht-selfheal.c:781:dht_selfheal_layout_new_directory] 0-shishir: gave fix: 0 - 1431655764 on sng1-client-0 for / [2013-05-19 11:12:40.219051] C [dht-selfheal.c:781:dht_selfheal_layout_new_directory] 0-shishir: gave fix: 1431655765 - 2863311529 on sng1-client-1 for / [2013-05-19 11:12:40.219075] C [dht-selfheal.c:781:dht_selfheal_layout_new_directory] 0-shishir: gave fix: 2863311530 - 4294967294 on sng1-client-3 for / <=== no layout given for subvolume sng1-client-2 (This is the correct op) [2013-05-19 11:12:40.219099] C [dht-selfheal.c:736:dht_fix_layout_of_directory] 0-shishir: after overlapt: 0 - 1431655764 on sng1-client-0 for / [2013-05-19 11:12:40.219122] C [dht-selfheal.c:736:dht_fix_layout_of_directory] 0-shishir: after overlapt: 0 - 0 on sng1-client-1 for / <==== layout zeroed out for sng1-client-1 (incorrect) [2013-05-19 11:12:40.219145] C [dht-selfheal.c:736:dht_fix_layout_of_directory] 0-shishir: after overlapt: 1431655765 - 2863311529 on sng1-client-2 for / <=== overlap op gives layout for subvolume sng1-client-2 (incorrect) [2013-05-19 11:12:40.219168] C [dht-selfheal.c:736:dht_fix_layout_of_directory] 0-shishir: after overlapt: 2863311530 - 4294967295 on sng1-client-3 for / [2013-05-19 11:12:40.219201] C [dht-selfheal.c:170:dht_selfheal_dir_xattr_persubvol] 0-shishir: setting hash range 0 - 1431655764 (type 0) on subvolume sng1-client-0 for / [2013-05-19 11:12:40.219544] C [dht-selfheal.c:170:dht_selfheal_dir_xattr_persubvol] 0-shishir: setting hash range 0 - 0 (type 0) on subvolume sng1-client-1 for / [2013-05-19 11:12:40.219677] C [dht-selfheal.c:170:dht_selfheal_dir_xattr_persubvol] 0-shishir: setting hash range 1431655765 - 2863311529 (type 0) on subvolume sng1-client-2 for / [2013-05-19 11:12:40.219996] C [dht-selfheal.c:170:dht_selfheal_dir_xattr_persubvol] 0-shishir: setting hash range 2863311530 - 4294967295 (type 0) on subvolume sng1-client-3 for / <=== layouts written to the disk. dht_selfheal_layout_maximize_overlap called in dht_fix_layout_of_directory over-writes the layouts for optimization, without considering decommissioned nodes, which leads to this problem of incorrect subvolume getting zero-ed out ranges. Suspect this is a regression caused by: commit 4f87fd0ae2ce629576ca5f647a99888d31a46815 Author: Anand Avati <avati> Date: Thu Aug 30 13:15:39 2012 -0700 dht: improve dht_fix_layout_of_directory for better re-assignment ..... Change-Id: I0cbbf3bfa334645728072d66aaaa80120d0b295f BUG: 853258 Signed-off-by: Anand Avati <avati> Reviewed-on: http://review.gluster.org/3883 Tested-by: Gluster Build System <jenkins.com> verified on 3.4.0.9rhs-1.el6.x86_64 Working as per expectation, hence moving it to verified Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html |