Bug 1344826
Summary: | [geo-rep]: Worker crashed with "KeyError: " | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Rahul Hinduja <rhinduja> | |
Component: | geo-replication | Assignee: | Aravinda VK <avishwan> | |
Status: | CLOSED ERRATA | QA Contact: | Rahul Hinduja <rhinduja> | |
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | rhgs-3.1 | CC: | amukherj, asrivast, avishwan, csaba, khiremat, mnapolis, olim, pdhange, rabhat, rcyriac, rhinduja | |
Target Milestone: | --- | |||
Target Release: | RHGS 3.2.0 | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.8.4-1 | Doc Type: | Bug Fix | |
Doc Text: |
When an rsync operation is retried, the geo-replication process attempted to clean up GFIDs from the rsync queue that were already unlinked during the previous sync attempt. This resulted in a KeyError. The geo-replication process now checks for the existence of a GFID before attempting to unlink a file and remove it from the rsync queue, preventing this failure.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1345744 (view as bug list) | Environment: | ||
Last Closed: | 2017-03-23 05:35:53 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1345744, 1348085, 1348086, 1351515, 1351530 |
Description
Rahul Hinduja
2016-06-11 09:54:03 UTC
Upstream Patch posted. http://review.gluster.org/#/c/14706/ Hello Aravinda, The customer is still saying that the files are still not renamed. From them: It looks like whatever rename process should have taken place, did not. The files are still in the limbo state. What are some next steps I can take. If I mount the slave brinks RW and rename the files to match the master, will I create an inconsistent state that cannot be recovered from? Thanks & Regards Oonkwee Emerging Technologies RedHat Global Support (In reply to Oonkwee Lim_ from comment #8) > Hello Aravinda, > > The customer is still saying that the files are still not renamed. > > From them: > > It looks like whatever rename process should have taken place, did not. > > The files are still in the limbo state. What are some next steps I can take. > > If I mount the slave brinks RW and rename the files to match the master, > will I create an inconsistent state that cannot be recovered from? > > Thanks & Regards > > Oonkwee > Emerging Technologies > RedHat Global Support Looks like the files which are in limbo state are due to errors previously(before upgrade). Safe workaround is, - Delete the problematic file in Slave - Trigger resync for the file using a virtual setxattr in Master mount. cd $MASTER_MOUNT/ setfattr -n glusterfs.geo-rep.trigger-sync -v "1" <file-path-in-master-mount> Virtual Setxattr(glusterfs.geo-rep.trigger-sync) is similar to touch command which Geo-replication can understand. This should be set on each files or directory which needs resync. If the problematic files are not deleted from Slave Volume, resyncing may face errors.(In both the options) Post glusterfs.geo-rep.trigger-sync update: The geo-repl status since performing this operation has been in a Crawl Status of 'History Crawl' and I can see that LAST_SYNCED is advancing, albeit at a snail's pace. Is there any way to gauge where in the process it might be? (In reply to Oonkwee Lim_ from comment #13) > Post glusterfs.geo-rep.trigger-sync update: > > The geo-repl status since performing this operation has been in a Crawl > Status of 'History Crawl' and I can see that LAST_SYNCED is advancing, > albeit at a snail's pace. > > Is there any way to gauge where in the process it might be? History Crawl will process historical changelogs till it reaches worker start time(Worker register time can be found in respective worker's log). Once it crosses the register time then it starts consuming live changelogs. We do not have a way to estimate the pending sync time since Geo-rep has to reprocess all the changelogs till current time. Upstream mainline : http://review.gluster.org/14706 Upstream 3.8 : http://review.gluster.org/14767 And the fix is available in rhgs-3.2.0 as part of rebase to GlusterFS 3.8.4. *** Bug 1400765 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html |