Bug 1462508
Summary: | [Stress] : Worker crashed with "[Errno 16] Device or resource busy" | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Ambarish <asoman> |
Component: | geo-replication | Assignee: | Sunny Kumar <sunkumar> |
Status: | CLOSED WONTFIX | QA Contact: | Rahul Hinduja <rhinduja> |
Severity: | low | Docs Contact: | |
Priority: | unspecified | ||
Version: | rhgs-3.3 | CC: | amukherj, csaba, khiremat, puebele, rhinduja, rhs-bugs, sabose, sanandpa, storage-qa-internal, sunkumar |
Target Milestone: | --- | Keywords: | ZStream |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-03-11 09:54:07 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Ambarish
2017-06-18 08:57:47 UTC
I found these ones to be unique : On master (gqas013) : Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 204, in main main_i() File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 780, in main_i local.service_loop(*[r for r in [remote] if r]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1566, in service_loop g3.crawlwrap(oneshot=True) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 570, in crawlwrap self.crawl() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1204, in crawl self.changelogs_batch_process(changes) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1111, in changelogs_batch_process self.process(batch) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 994, in process self.process_change(change, done, retry) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 935, in process_change failures = self.slave.server.entry_ops(entries) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 226, in __call__ return self.ins(self.meth, *a) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 208, in __call__ raise res OSError: [Errno 16] Device or resource busy And thsi on one of the slaves (gqas014): [2017-06-18 08:55:59.804363] E [repce(slave):117:worker] <top>: call failed: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker res = getattr(self.obj, rmeth)(*in_data[2:]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 725, in entry_ops er = entry_purge(entry, gfid) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 618, in entry_purge if not matching_disk_gfid(gfid, entry): File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 650, in matching_disk_gfid disk_gfid = cls.gfid_mnt(entry) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 392, in gfid_mnt cls.GX_GFID_CANONICAL_LEN], [ENOENT], [ESTALE]) File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 495, in errno_wrap return call(*arg) File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 55, in lgetxattr return cls._query_xattr(path, siz, 'lgetxattr', attr) File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 47, in _query_xattr cls.raise_oserr() File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 37, in raise_oserr raise OSError(errn, os.strerror(errn)) OSError: [Errno 16] Device or resource busy I see the following errors from slave mount logs during the same time. [2017-06-16 18:05:19.761267] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 1069: MKDIR() /.gfid/01dd8259-061e-4cf5-be56-84a6f9d4c8d4 => -1 (Operation not permitted) [2017-06-16 18:05:19.784205] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 1096: MKDIR() /.gfid/b9f67339-5128-46ca-84a3-fb7d27748503 => -1 (Operation not permitted) Not sure which xlator returned EBUSY. Strangely that's not logged. But this happens in rsync mode, which tries to create entry when the entry is not created. We can avoid this by taking in patch [1] 1. Take in patch [1] 2. Also add logic to retry on EBUSY during lgetxattr in above path. [1] https://review.gluster.org/#/c/16010/ Aravinda, Rahul, What do you think? *** Bug 1599175 has been marked as a duplicate of this bug. *** The patch referenced in comment 4 is merged. Is there anything left to address on this bug? |