Bug 1022582
Summary: | dist-geo-rep: Worker process crashing because of "Invalid Argument" error in slave | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | M S Vishwanath Bhat <vbhat> | ||||||
Component: | geo-replication | Assignee: | Ajeet Jha <ajha> | ||||||
Status: | CLOSED ERRATA | QA Contact: | M S Vishwanath Bhat <vbhat> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 2.1 | CC: | aavati, ajha, amarts, csaba, grajaiya, mzywusko, nsathyan, vagarwal, vshankar | ||||||
Target Milestone: | --- | Keywords: | ZStream | ||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | glusterfs-3.4.0.38rhs-1 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2013-11-27 15:43:46 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Created attachment 815448 [details]
Log from slave node
It's a regression. I just tried with 35rhs build. But it's working there. Only 36rhs has this issue. Issue is not being reproduced with the glusterfs-3.4.0.38rhs-1.el6rhs.x86_64 build. I followed the same steps I mentioned earlier in the bug description and issue is not hit. Moving to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1769.html |
Created attachment 815447 [details] Logs from master node Description of problem: When I start geo-rep process and copy some files on the master, sessions in two of the nodes went into the faulty state. Worker in both of those machines are crashing because of a Errno 22 in slave. Version-Release number of selected component (if applicable): glusterfs-3.4.0.36rhs-1.el6rhs.x86_64 How reproducible: Hit one time out of as many tries Steps to Reproduce: 1. Create and start a geo-rep session between 2*2 dist-rep master and slave volumes. 2. Now cp -r /etc/ <master_mount_point> 3. Run geo-rep status of geo-rep status detail Actual results: # gluster v geo master hornet::slave status detail MASTER: master SLAVE: hornet::slave NODE HEALTH UPTIME FILES SYNCD FILES PENDING BYTES PENDING DELETES PENDING -------------------------------------------------------------------------------------------------------------------- spitfire.blr.redhat.com faulty N/A N/A N/A N/A N/A typhoon.blr.redhat.com Stable 02:20:19 0 0 0Bytes 0 mustang.blr.redhat.com Stable 02:20:19 0 0 0Bytes 0 harrier.blr.redhat.com faulty N/A N/A N/A N/A N/A Expected results: Status should not go into the 'faulty' state. Additional info: Logs in master node [2013-10-23 20:20:43.561261] I [master(/rhs/bricks/brick2):345:crawlwrap] _GMaster: crawl interval: 3 seconds [2013-10-23 20:20:51.155394] E [repce(/rhs/bricks/brick2):188:__call__] RepceClient: call 31994:139915356534528:1382539850.02 (meta_ops) failed on peer with OSError [2013-10-23 20:20:51.156507] E [syncdutils(/rhs/bricks/brick2):207:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 150, in main main_i() File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 530, in main_i local.service_loop(*[r for r in [remote] if r]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1074, in service_loop g2.crawlwrap() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 369, in crawlwrap self.crawl() File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 799, in crawl self.process(changes) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 760, in process if self.process_change(change, done, retry): File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 740, in process_change self.slave.server.meta_ops(meta_entries) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 204, in __call__ return self.ins(self.meth, *a) File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 189, in __call__ raise res OSError: [Errno 22] Invalid argument: '.gfid/1da8224d-aa34-433b-8abf-b07a13e5cfd2' [2013-10-23 20:20:51.159435] I [syncdutils(/rhs/bricks/brick2):159:finalize] <top>: exiting. [2013-10-23 20:21:01.268341] I [monitor(monitor):129:monitor] Monitor: ------------------------------------------------------------ [2013-10-23 20:21:01.269103] I [monitor(monitor):130:monitor] Monitor: starting gsyncd worker [2013-10-23 20:21:01.531034] I [gsyncd(/rhs/bricks/brick2):520:main_i] <top>: syncing: gluster://localhost:master -> ssh://root@hornet:gluster://localhost:slave [2013-10-23 20:21:04.483725] I [master(/rhs/bricks/brick2):57:gmaster_builder] <top>: setting up xsync change detection mode [2013-10-23 20:21:04.487170] I [master(/rhs/bricks/brick2):57:gmaster_builder] <top>: setting up changelog change detection mode [2013-10-23 20:21:04.490206] I [master(/rhs/bricks/brick2):835:register] _GMaster: xsync temp directory: /var/run/gluster/master/ssh%3A%2F%2Froot%4010.70.43.194%3Agluster%3A%2F%2F127.0.0.1%3Aslave/68fa5cc90f61530aea097cdc78c2b376/xsync [2013-10-23 20:21:04.659682] I [master(/rhs/bricks/brick2):335:crawlwrap] _GMaster: primary master with volume id c2ba3dc1-58f2-4dad-93af-d08c249923d2 ... Logs from slave node [2013-10-23 20:21:33.999643] I [repce(slave):78:service_loop] RepceServer: terminating on reaching EOF. [2013-10-23 20:21:34.35] I [syncdutils(slave):159:finalize] <top>: exiting. [2013-10-23 20:21:45.735359] I [gsyncd(slave):520:main_i] <top>: syncing: gluster://localhost:slave [2013-10-23 20:21:46.827519] I [resource(slave):642:service_loop] GLUSTER: slave listening [2013-10-23 20:21:50.195013] E [repce(slave):103:worker] <top>: call failed: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 99, in worker res = getattr(self.obj, rmeth)(*in_data[2:]) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 545, in meta_ops errno_wrap(os.chmod, [go, mode], [ENOENT]) File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 383, in errno_wrap return call(*arg) OSError: [Errno 22] Invalid argument: '.gfid/0e3d546c-ca85-4913-8e9f-e3ed822fcf46'