1342785 – [geo-rep]: Worker crashes with permission denied during hybrid crawl caused via replace brick

Bug 1342785 - [geo-rep]: Worker crashes with permission denied during hybrid crawl caused via replace brick

Summary: [geo-rep]: Worker crashes with permission denied during hybrid crawl caused v...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	geo-replication
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	RHGS 3.4.0
Assignee:	Kotresh HR
QA Contact:	Rochelle
Docs Contact:
URL:
Whiteboard:	rebase
Depends On:
Blocks:	1503134
TreeView+	depends on / blocked

Reported:	2016-06-05 09:48 UTC by Rahul Hinduja
Modified:	2018-09-14 03:56 UTC (History)
CC List:	5 users (show)
Fixed In Version:	glusterfs-3.12.2-1
Doc Type:	If docs needed, set a value
Doc Text:	Previously, metadata changes such as, ownership change on symlink file crashed with "Permission Denied" error. With this fix, geo-replication is fixed to sync metadata of symlink files and ownership change of symlink files is replicated properly and does not result in a crash.
Clone Of:
Environment:
Last Closed:	2018-09-04 06:29:40 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2018:2607	0	None	None	None	2018-09-04 06:31:22 UTC

Description Rahul Hinduja 2016-06-05 09:48:01 UTC

Description of problem:
=======================

In a scenario, where files are synced to slave and replace brick is issued on master. This can cause hybrid crawl if the replaced brick becomes ACTIVE, it crashes with permission denied and becomes passive. 

[2016-06-05 09:39:57.549393] E [repce(/rhs/brick3/b9):207:__call__] RepceClient: call 5532:140024108808000:1465119596.96 (meta_ops) failed on peer with OSError
[2016-06-05 09:39:57.549735] E [syncdutils(/rhs/brick3/b9):276:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 201, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 720, in main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1510, in service_loop
    g2.crawlwrap()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 571, in crawlwrap
    self.crawl()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1132, in crawl
    self.changelogs_batch_process(changes)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1107, in changelogs_batch_process
    self.process(batch)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 992, in process
    self.process_change(change, done, retry)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 952, in process_change
    failures = self.slave.server.meta_ops(meta_entries)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 226, in __call__
    return self.ins(self.meth, *a)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 208, in __call__
    raise res
OSError: [Errno 13] Permission denied: '.gfid/86bf33db-52e8-49b7-a7d9-1d00d82b88ef'
[2016-06-05 09:39:57.552012] I [syncdutils(/rhs/brick3/b9):220:finalize] <top>: exiting.
[2016-06-05 09:39:57.561508] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF.
[2016-06-05 09:39:57.561933] I [syncdutils(agent):220:finalize] <top>: exiting.


Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.7.9-8

How reproducible:
=================
Yet to try second time.


Steps to Reproduce:
===================
1. Create Master and Slave volume. Create geo-rep session between them
2. Create data on Master and let it sync to slave.
3. Stop the existing session.
4. Initiate Replace brick commit for one of the brick
5. Start geo-rep immediately

Actual results:
===============

Replace brick starts (Hybrid) crawl and crashes.


Expected results:
=================

Worker should not crash

Comment 4 Kotresh HR 2017-05-31 12:06:31 UTC

Upstream Patch:
https://review.gluster.org/17389

Comment 5 Kotresh HR 2017-09-21 19:53:46 UTC

The same patch is fixing this bug and https://bugzilla.redhat.com/show_bug.cgi?id=1299740

Comment 12 errata-xmlrpc 2018-09-04 06:29:40 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607

Note You need to log in before you can comment on or make changes to this bug.