Bug 1590774

Summary:	[GSS] gsyncd worker crashed in syncdutils with "OSError: [Errno 22] Invalid argument"
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Sonal <sarora>
Component:	geo-replication	Assignee:	Kotresh HR <khiremat>
Status:	CLOSED ERRATA	QA Contact:	Rochelle <rallan>
Severity:	urgent	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.2	CC:	abhishku, amukherj, avishwan, bkunal, bturner, csaba, khiremat, kurathod, mchangir, mpandey, nchilaka, rcyriac, rhs-bugs, sankarshan, sarora, storage-qa-internal
Target Milestone:	---	Keywords:	ZStream
Target Release:	RHGS 3.3.1 Async
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	glusterfs-3.8.4-54.15	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-07-19 06:00:07 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Sonal 2018-06-13 11:37:25 UTC

Description of problem:
gsyncd worker crashed in syncdutils with "OSError: [Errno 22] Invalid argument"

Observing the following logs:

[2018-05-22 11:59:52.196463] I [master(/rhgs/brick1/data):83:gmaster_builder] <top>: setting up xsync change detection mode
[2018-05-22 11:59:52.197062] I [master(/rhgs/brick1/data):369:__init__] _GMaster: using 'rsync' as the sync engine
[2018-05-22 11:59:52.197985] I [master(/rhgs/brick1/data):83:gmaster_builder] <top>: setting up changelog change detection mode
[2018-05-22 11:59:52.198193] I [master(/rhgs/brick1/data):369:__init__] _GMaster: using 'rsync' as the sync engine
[2018-05-22 11:59:52.198868] I [master(/rhgs/brick1/data):83:gmaster_builder] <top>: setting up changeloghistory change detection mode
[2018-05-22 11:59:52.199088] I [master(/rhgs/brick1/data):369:__init__] _GMaster: using 'rsync' as the sync engine
[2018-05-22 11:59:54.232836] I [monitor(monitor):274:monitor] Monitor: ------------------------------------------------------------
[2018-05-22 11:59:54.233115] I [monitor(monitor):275:monitor] Monitor: starting gsyncd worker
[2018-05-22 11:59:54.273042] I [master(/rhgs/brick1/data):1253:register] _GMaster: xsync temp directory: /var/lib/misc/glusterfsd/upprodvarum/ssh%3A%2F%2Fgeoaccount%4010.127.28.25%3Agluster%3A%2F%2F127.0.0.1%3Aupprodvarum_rep/19b6ede62a50cc554027d6a4416c4fef/xsync
[2018-05-22 11:59:54.273315] I [resource(/rhgs/brick1/data):1533:service_loop] GLUSTER: Register time: 1526990394
[2018-05-22 11:59:54.276341] I [master(/rhgs/brick1/data):512:crawlwrap] _GMaster: primary master with volume id 2fbee41f-9473-47c4-83f7-12b617ef5a4b ...
[2018-05-22 11:59:54.286515] I [master(/rhgs/brick1/data):521:crawlwrap] _GMaster: crawl interval: 1 seconds
[2018-05-22 11:59:54.289128] I [master(/rhgs/brick1/data):468:mgmt_lock] _GMaster: Got lock : /rhgs/brick1/data : Becoming ACTIVE
[2018-05-22 11:59:54.294075] I [master(/rhgs/brick1/data):1167:crawl] _GMaster: starting history crawl... turns: 1, stime: (1512658469, 0), etime: 1526990394
[2018-05-22 11:59:54.346426] I [gsyncd(/rhgs/brick2/data):747:main_i] <top>: syncing: gluster://localhost:upprodvarum -> ssh://geoaccount@gluster11:gluster://localhost:upprodvarum_rep
[2018-05-22 11:59:54.346404] I [changelogagent(agent):73:__init__] ChangelogAgent: Agent listining...
[2018-05-22 11:59:55.316481] I [master(/rhgs/brick1/data):1196:crawl] _GMaster: slave's time: (1512658469, 0)
[2018-05-22 11:59:57.385174] E [syncdutils(/rhgs/brick1/data):296:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 204, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 757, in main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1539, in service_loop
    g3.crawlwrap(oneshot=True)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 573, in crawlwrap
    self.crawl()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1205, in crawl
    self.changelogs_batch_process(changes)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1111, in changelogs_batch_process
    self.process(batch)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 994, in process
    self.process_change(change, done, retry)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 887, in process_change
    rl = errno_wrap(os.readlink, [en], [ENOENT], [ESTALE])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 495, in errno_wrap
    return call(*arg)
OSError: [Errno 22] Invalid argument: '.gfid/b2464acd-855c-42f0-8a4a-f8dffad6cae9/269

Version-Release number of selected component (if applicable):
RHGS 3.2 on RHEL-7 Async


Actual results:
Worker crash in syncdutils causing session to be in faulty state.

Expected results:
Worker should not crash

Additional info:
Package Version : glusterfs-3.8.4-18.el7rhgs.x86_64

Comment 34 Bipin Kunal 2018-07-12 10:15:51 UTC

Removing stale needinfo on me.

Comment 43 errata-xmlrpc 2018-07-19 06:00:07 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2222