1221175 – [geo-rep]: Session goes to faulty with "Cannot allocate memory" traceback when deletes were performed having trash translators ON

Bug 1221175 - [geo-rep]: Session goes to faulty with "Cannot allocate memory" traceback when deletes were performed having trash translators ON

Summary: [geo-rep]: Session goes to faulty with "Cannot allocate memory" traceback whe...

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	trash-xlator
Sub Component:
Version:	3.7.0
Hardware:	x86_64
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	qe_tracker_everglades 1203293
TreeView+	depends on / blocked

Reported:	2015-05-13 12:29 UTC by Rahul Hinduja
Modified:	2017-03-08 10:54 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-03-08 10:54:34 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Rahul Hinduja 2015-05-13 12:29:37 UTC

Description of problem:
=======================

geo-rep session is going into faulty state with the below traceback:

[2015-05-13 17:47:26.465090] W [master(/rhs/brick1/b1):792:log_failures] _GMaster: META FAILED: ({'go': '.gfid/8db3cbe6-946e-45cb-bf74-d233c4091003', 'stat': {'atime': 1431518244.1599989, 'gid': 0, 'mtime': 1431518665.03, 'mode': 16877, 'uid': 0}, 'op': 'META'}, 2)
[2015-05-13 17:47:26.669015] E [repce(/rhs/brick1/b1):207:__call__] RepceClient: call 23144:139914667464448:1431519446.51 (entry_ops) failed on peer with OSError
[2015-05-13 17:47:26.669274] E [syncdutils(/rhs/brick1/b1):276:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 165, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 659, in main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1440, in service_loop
    g3.crawlwrap(oneshot=True)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 580, in crawlwrap
    self.crawl()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1150, in crawl
    self.changelogs_batch_process(changes)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1059, in changelogs_batch_process
    self.process(batch)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 946, in process
    self.process_change(change, done, retry)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 902, in process_change
    failures = self.slave.server.entry_ops(entries)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 226, in __call__
    return self.ins(self.meth, *a)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 208, in __call__
    raise res
OSError: [Errno 12] Cannot allocate memory
[2015-05-13 17:47:26.670993] I [syncdutils(/rhs/brick1/b1):220:finalize] <top>: exiting.
[2015-05-13 17:47:26.674686] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF.
[2015-05-13 17:47:26.675065] I [syncdutils(agent):220:finalize] <top>: exiting.
[2015-05-13 17:47:27.299925] I [monitor(monitor):282:monitor] Monitor: worker(/rhs/brick1/b1) died in startup phase


I was doing delete from fuse and NFS mount with the trash translator ON. The session is configured to use meta volume.


Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.7.0beta2-0.0.el6.x86_64


How reproducible:
=================

Could reproduce twice in as many attempts.


Steps Carried:
==============
1. Created master cluster 
2. Created and started master volume
3. Created shared volume (gluster_shared_storage)
4. Mounted the shared volume on /var/run/gluster/shared_storage
5. Created Slave cluster
6. Created and Started slave volume
7. Created geo-rep session between master and slave
8. Configured use_meta_volume true
9. Started geo-rep
10. Mounted master volume over Fuse and NFS to client
11. Copied files /etc{1..10} from fuse mount
12. Copied files /etc{11.20} from NFS mount
13. Sync completed successfully
14. Set the option features.trash ON for master and slave volume
15. Removed the files etc.2 from fuse and etc.12 from NFS
16. From Fuse and NFS it errored "Directory not empty". Another try was successful in removing.
17. Looked into the geo-rep session it was faulty 
18. Looked into the logs, it showed continuous traceback

Actual results:
===============

Worker died


Expected results:
================

Worker should run and sync the deleted files to slave


Additional info:
================

geo-rep session goes to faulty state. There was a similar bug with rename  and it was closed having ID 1144428. which did not involve trash translator and meta volume

Comment 5 Kaushal 2017-03-08 10:54:34 UTC

This bug is getting closed because GlusteFS-3.7 has reached its end-of-life.

Note: This bug is being closed using a script. No verification has been performed to check if it still exists on newer releases of GlusterFS.
If this bug still exists in newer GlusterFS releases, please reopen this bug against the newer release.

Note You need to log in before you can comment on or make changes to this bug.