1693648 – Geo-re: Geo replication failing in "cannot allocate memory"

Bug 1693648 - Geo-re: Geo replication failing in "cannot allocate memory"

Summary: Geo-re: Geo replication failing in "cannot allocate memory"

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	geo-replication
Sub Component:
Version:	mainline
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Kotresh HR
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1670429 1694002 1714526
TreeView+	depends on / blocked

Reported:	2019-03-28 12:07 UTC by Kotresh HR
Modified:	2019-05-28 11:52 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Clone Of:	1670429
Clones:	1694002 (view as bug list)
Environment:
Last Closed:	2019-04-03 14:19:23 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Gluster.org Gerrit	22438	0	None	Merged	geo-rep: Fix syncing multiple rename of symlink	2019-03-29 07:24:21 UTC

Comment 1 Kotresh HR 2019-03-28 12:12:01 UTC

Description of the Problem:
Geo-rep is 'Faulty' and not syncing

Slave worker crash:

[2019-01-21 14:46:36.338450] I [resource(slave):1422:connect] GLUSTER: Mounting gluster volume locally...
[2019-01-21 14:46:47.581492] I [resource(slave):1435:connect] GLUSTER: Mounted gluster volume   duration=11.2428
[2019-01-21 14:46:47.582036] I [resource(slave):905:service_loop] GLUSTER: slave listening
[2019-01-21 14:47:36.831804] E [repce(slave):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 756, in entry_ops
    [ESTALE, EINVAL, EBUSY])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 553, in errno_wrap
    return call(*arg)
  File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 79, in lsetxattr
    cls.raise_oserr()
  File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 37, in raise_oserr
    raise OSError(errn, os.strerror(errn))
OSError: [Errno 12] Cannot allocate memory


Master worker crash:

[2019-01-21 14:46:36.7253] I [resource(/glusterfs/glprd01-vsb-pil-modshape000/brick1):1700:connect_remote] SSH: Initializing SSH connection between master and slave...
[2019-01-21 14:46:36.7440] I [changelogagent(/glusterfs/glprd01-vsb-pil-modshape000/brick1):73:__init__] ChangelogAgent: Agent listining...
[2019-01-21 14:46:47.585638] I [resource(/glusterfs/glprd01-vsb-pil-modshape000/brick1):1707:connect_remote] SSH: SSH connection between master and slave established.  duration=11.5781
[2019-01-21 14:46:47.585905] I [resource(/glusterfs/glprd01-vsb-pil-modshape000/brick1):1422:connect] GLUSTER: Mounting gluster volume locally...
[2019-01-21 14:46:48.650470] I [resource(/glusterfs/glprd01-vsb-pil-modshape000/brick1):1435:connect] GLUSTER: Mounted gluster volume   duration=1.0644
[2019-01-21 14:46:48.650816] I [gsyncd(/glusterfs/glprd01-vsb-pil-modshape000/brick1):803:main_i] <top>: Worker spawn successful. Acknowledging back to monitor
[2019-01-21 14:46:50.675277] I [master(/glusterfs/glprd01-vsb-pil-modshape000/brick1):1583:register] _GMaster: Working dir      path=/var/lib/misc/glusterfsd/pil-vbs-modshape/ssh%3A%2F%2Fgeoaccount%40172.21.142.
33%3Agluster%3A%2F%2F127.0.0.1%3Apil-vbs-modshape/5eaac78a29ba1e2e24b401621c5240c3
[2019-01-21 14:46:50.675633] I [resource(/glusterfs/glprd01-vsb-pil-modshape000/brick1):1582:service_loop] GLUSTER: Register time       time=1548082010
[2019-01-21 14:46:50.690826] I [master(/glusterfs/glprd01-vsb-pil-modshape000/brick1):482:mgmt_lock] _GMaster: Didn't get lock Becoming PASSIVE brick=/glusterfs/glprd01-vsb-pil-modshape000/brick1
[2019-01-21 14:46:50.703552] I [gsyncdstatus(/glusterfs/glprd01-vsb-pil-modshape000/brick1):282:set_passive] GeorepStatus: Worker Status Change status=Passive
[2019-01-21 14:47:35.797741] I [master(/glusterfs/glprd01-vsb-pil-modshape000/brick1):436:mgmt_lock] _GMaster: Got lock Becoming ACTIVE brick=/glusterfs/glprd01-vsb-pil-modshape000/brick1
[2019-01-21 14:47:35.802330] I [gsyncdstatus(/glusterfs/glprd01-vsb-pil-modshape000/brick1):276:set_active] GeorepStatus: Worker Status Change  status=Active
[2019-01-21 14:47:35.804092] I [gsyncdstatus(/glusterfs/glprd01-vsb-pil-modshape000/brick1):248:set_worker_crawl_status] GeorepStatus: Crawl Status Change      status=History Crawl
[2019-01-21 14:47:35.804485] I [master(/glusterfs/glprd01-vsb-pil-modshape000/brick1):1497:crawl] _GMaster: starting history crawl      turns=1 stime=(1548059316, 0)   entry_stime=(1548059310, 0)     etime=15480
82055
[2019-01-21 14:47:36.808142] I [master(/glusterfs/glprd01-vsb-pil-modshape000/brick1):1526:crawl] _GMaster: slave's time        stime=(1548059316, 0)
[2019-01-21 14:47:36.833885] E [repce(/glusterfs/glprd01-vsb-pil-modshape000/brick1):209:__call__] RepceClient: call failed     call=32116:139676615182144:1548082056.82        method=entry_ops        error=OSError
[2019-01-21 14:47:36.834212] E [syncdutils(/glusterfs/glprd01-vsb-pil-modshape000/brick1):349:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 210, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 805, in main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1588, in service_loop
    g3.crawlwrap(oneshot=True)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 597, in crawlwrap
    self.crawl()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1535, in crawl
    self.changelogs_batch_process(changes)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1435, in changelogs_batch_process
    self.process(batch)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1269, in process
    self.process_change(change, done, retry)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1165, in process_change
    failures = self.slave.server.entry_ops(entries)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 228, in __call__
    return self.ins(self.meth, *a)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 210, in __call__
    raise res
OSError: [Errno 12] Cannot allocate memory
[2019-01-21 14:47:36.846298] I [syncdutils(/glusterfs/glprd01-vsb-pil-modshape000/brick1):289:finalize] <top>: exiting.
[2019-01-21 14:47:36.849236] I [repce(/glusterfs/glprd01-vsb-pil-modshape000/brick1):92:service_loop] RepceServer: terminating

Comment 2 Worker Ant 2019-03-29 07:24:23 UTC

REVIEW: https://review.gluster.org/22438 (geo-rep: Fix syncing multiple rename of symlink) merged (#2) on master by Amar Tumballi

Comment 3 Worker Ant 2019-03-29 09:07:11 UTC

REVIEW: https://review.gluster.org/22447 (geo-rep: Fix syncing multiple rename of symlink) posted (#1) for review on release-6 by Kotresh HR

Comment 4 Worker Ant 2019-03-29 09:08:35 UTC

REVISION POSTED: https://review.gluster.org/22447 (geo-rep: Fix syncing multiple rename of symlink) posted (#2) for review on release-6 by Kotresh HR

Note You need to log in before you can comment on or make changes to this bug.