Bug 1240333
Summary: | [geo-rep]: original directory and renamed directory both at the slave after rename on master | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Rahul Hinduja <rhinduja> |
Component: | geo-replication | Assignee: | Kotresh HR <khiremat> |
Status: | CLOSED ERRATA | QA Contact: | Rahul Hinduja <rhinduja> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | rhgs-3.1 | CC: | amukherj, asriram, asrivast, avishwan, chrisw, csaba, khiremat, nbalacha, nlevinki, rgowdapp, sarumuga, smohan |
Target Milestone: | --- | Keywords: | ZStream |
Target Release: | RHGS 3.2.0 | Flags: | nbalacha:
needinfo-
|
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-3.8.4-2 | Doc Type: | If docs needed, set a value |
Doc Text: |
Concurrent rename and lookup operations on a directory caused both old and new directories to be healed. At the end of the heal operation, both directories existed and had the same GFID. This meant that clients were sometimes unable to access the contents of the directory. The distributed hash table algorithm has been updated so that this issue no longer occurs.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2017-03-23 05:21:59 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1118762 | ||
Bug Blocks: | 1216951, 1351522, 1351530 |
Description
Rahul Hinduja
2015-07-06 15:22:30 UTC
RCA: Geo-rep rename of directory just do os.rename and nothing else. Since we see RENAME entry in both master active bricks in .processed, it is clear that os.rename is issued on temp1. From the logs below, we see the DHT self heal happened on the directory 'temp1'. So it seems 'temp1' got created as part of DHT self heal. I discussed with Du (rgowdapp) and this is a known issue it seems. LOGS: [root@georep1 ssh%3A%2F%2Fgeoaccount%4010.70.46.101%3Agluster%3A%2F%2F127.0.0.1%3Aslave2]# find . | xargs grep RENAME 2>/dev/null | grep temp ./c19b89ac45352ab8c894d210d136dd56/.processing/CHANGELOG.1436013576:E 66f8112d-1857-4413-a9d2-bd6a7416162d RENAME 00000000-0000-0000-0000-000000000001%2Ftemp1 00000000-0000-0000-0000-000000000001%2Ftemp2 ./764586b145d7206a154a778f64bd2f50/.processed/CHANGELOG.1436013576:E 66f8112d-1857-4413-a9d2-bd6a7416162d RENAME 00000000-0000-0000-0000-000000000001%2Ftemp1 00000000-0000-0000-0000-000000000001%2Ftemp2 [root@georep2 ssh%3A%2F%2Fgeoaccount%4010.70.46.101%3Agluster%3A%2F%2F127.0.0.1%3Aslave2]# find . | xargs grep RENAME 2>/dev/null | grep temp ./c19b89ac45352ab8c894d210d136dd56/.processed/CHANGELOG.1436013567:E 66f8112d-1857-4413-a9d2-bd6a7416162d RENAME 00000000-0000-0000-0000-000000000001%2Ftemp1 00000000-0000-0000-0000-000000000001%2Ftemp2 ./764586b145d7206a154a778f64bd2f50/.processing/CHANGELOG.1436013567:E 66f8112d-1857-4413-a9d2-bd6a7416162d RENAME 00000000-0000-0000-0000-000000000001%2Ftemp1 00000000-0000-0000-0000-000000000001%2Ftemp2 And DHT LOGS says this: [root@georep3 geo-replication-slaves]# grep temp1 * d01c04e8-09af-4c02-920f-f4bd60433a9e:gluster%3A%2F%2F127.0.0.1%3Aslave2.gluster.log:[2015-07-04 12:39:29.566451] I [MSGID: 109066] [dht-rename.c:1410ht_rename] 0-slave2-dht: renaming /.gfid/00000000-0000-0000-0000-000000000001/temp1 (hash=slave2-replicate-0/cache=slave2-replicate-1) => /.gfid/00000000-0000-0000-0000-000000000001/temp2 (hash=slave2-replicate-1/cache=<nul>) grep: mbr: Is a directory [root@georep4 geo-replication-slaves]# grep temp1 * d01c04e8-09af-4c02-920f-f4bd60433a9e:gluster%3A%2F%2F127.0.0.1%3Aslave2.gluster.log:[2015-07-04 12:39:23.566466] I [MSGID: 109036] [dht-common.c:7106ht_log_new_layout_for_dir_selfheal] 0-slave2-dht: Setting layout of /temp1 with [Subvol_name: slave2-replicate-0, Err: -1 , Start: 2147475392 , Stop: 4294967295 , Hash: 1 ], [Subvol_name: slave2-replicate-1, Err: -1 , Start: 0 , Stop: 2147475391 , Hash: 1 ], d01c04e8-09af-4c02-920f-f4bd60433a9e:gluster%3A%2F%2F127.0.0.1%3Aslave2.gluster.log:[2015-07-04 12:39:30.284235] I [MSGID: 109036] [dht-common.c:7106ht_log_new_layout_for_dir_selfheal] 0-slave2-dht: Setting layout of /temp1 with [Subvol_name: slave2-replicate-0, Err: -1 , Start: 2147475392 , Stop: 4294967295 , Hash: 1 ], [Subvol_name: slave2-replicate-1, Err: -1 , Start: 0 , Stop: 2147475391 , Hash: 1 ], Du, Please provide the existing DHT bugs on rename self-heal race and more info on this. Doc text is edited. Please sign off to be included in Known Issues. This issue is similar to the one tracked by bz 1118762. Hence marking this as a dependent bug. Also please note that, bz 1118762 is fixed in rhgs 3.1.3 and hence I think current bz is also fixed. Carried the same scenario on build: glusterfs-server-3.8.4-2.el7rhgs.x86_64 glusterfs-geo-replication-3.8.4-2.el7rhgs.x86_64 Didn't hit the issue even after trying the scenario multiple times. As mentioned in comment 9, this seems to be fixed with the latest downstream 3.2.0 build. Moving this BZ to On_QA. Based on comment 9 and 10, moving this bug to verified state. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html |