Bug 1365694
Summary: | [GSS] Geo-Replication session faulty after running out of space on /var partition. | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Cal Calhoun <ccalhoun> |
Component: | geo-replication | Assignee: | Bug Updates Notification Mailing List <rhs-bugs> |
Status: | CLOSED UPSTREAM | QA Contact: | Rahul Hinduja <rhinduja> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | rhgs-3.1 | CC: | atumball, csaba, rhs-bugs, storage-qa-internal, vnosov |
Target Milestone: | --- | Keywords: | ZStream |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-10-11 10:30:31 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1472361 |
Description
Cal Calhoun
2016-08-09 22:55:17 UTC
Checked the log files, Observations: gsyncd.conf file corruption: When /var partition is full, on glustered restart it is corrupting the gsyncd.conf(Geo-rep session conf). As a workaround, conf file copied from good peer node to other nodes. To fix this issue in future, we should avoid regeneration of conf file every time when glusterd starts and handling failures. Python Traceback causing Geo-rep status Faulty. We found following traceback in Slave log file. Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker res = getattr(self.obj, rmeth)(*in_data[2:]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 772, in entry_ops os.unlink(entry) OSError: [Errno 21] Is a directory: '.gfid/12711ebf-7fdc-4f4b-9850-2d75581eb452/New folder' During Rename, if source and Target has same inode then Geo-rep is deleting the source since No rename required. It is not handling the directories. When it tries to delete the source directory it is failing with this error. We will work on this fix. As a workaround, In all brick backends of Slave, ls .glusterfs/12/71/12711ebf-7fdc-4f4b-9850-2d75581eb452/New folder If Empty then cleanup/delete "New folder" in Slave. So that Geo-rep can continue by passing this error. If not empty, backup the files and then delete the "New folder". We can trigger sync for the files from this directory if any. Geo-replication support added to Glusterd2 project, which will be available with Gluster upstream 4.0 and 4.1 releases. Most of the issues already fixed with issue https://github.com/gluster/glusterd2/issues/271 and remaining fixes are noted in issue https://github.com/gluster/glusterd2/issues/557 We can close these issues since we are not planning any fixes for 3.x series. I see the issues are fixed UPSTREAM right now. Customer case is closed too. When the GD2 comes to product, this gets automatically closed! Please re-open the issue if the ask is to get it in GD1 itself, which means we have to rescope the effort and see what can be done! |