Bug 1342785 - [geo-rep]: Worker crashes with permission denied during hybrid crawl caused via replace brick
Summary: [geo-rep]: Worker crashes with permission denied during hybrid crawl caused v...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: geo-replication
Version: rhgs-3.1
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: RHGS 3.4.0
Assignee: Kotresh HR
QA Contact: Rochelle
URL:
Whiteboard: rebase
Depends On:
Blocks: 1503134
TreeView+ depends on / blocked
 
Reported: 2016-06-05 09:48 UTC by Rahul Hinduja
Modified: 2018-09-14 03:56 UTC (History)
5 users (show)

Fixed In Version: glusterfs-3.12.2-1
Doc Type: If docs needed, set a value
Doc Text:
Previously, metadata changes such as, ownership change on symlink file crashed with "Permission Denied" error. With this fix, geo-replication is fixed to sync metadata of symlink files and ownership change of symlink files is replicated properly and does not result in a crash.
Clone Of:
Environment:
Last Closed: 2018-09-04 06:29:40 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:2607 None None None 2018-09-04 06:31:22 UTC

Description Rahul Hinduja 2016-06-05 09:48:01 UTC
Description of problem:
=======================

In a scenario, where files are synced to slave and replace brick is issued on master. This can cause hybrid crawl if the replaced brick becomes ACTIVE, it crashes with permission denied and becomes passive. 

[2016-06-05 09:39:57.549393] E [repce(/rhs/brick3/b9):207:__call__] RepceClient: call 5532:140024108808000:1465119596.96 (meta_ops) failed on peer with OSError
[2016-06-05 09:39:57.549735] E [syncdutils(/rhs/brick3/b9):276:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 201, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 720, in main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1510, in service_loop
    g2.crawlwrap()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 571, in crawlwrap
    self.crawl()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1132, in crawl
    self.changelogs_batch_process(changes)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1107, in changelogs_batch_process
    self.process(batch)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 992, in process
    self.process_change(change, done, retry)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 952, in process_change
    failures = self.slave.server.meta_ops(meta_entries)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 226, in __call__
    return self.ins(self.meth, *a)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 208, in __call__
    raise res
OSError: [Errno 13] Permission denied: '.gfid/86bf33db-52e8-49b7-a7d9-1d00d82b88ef'
[2016-06-05 09:39:57.552012] I [syncdutils(/rhs/brick3/b9):220:finalize] <top>: exiting.
[2016-06-05 09:39:57.561508] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF.
[2016-06-05 09:39:57.561933] I [syncdutils(agent):220:finalize] <top>: exiting.


Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.7.9-8

How reproducible:
=================
Yet to try second time.


Steps to Reproduce:
===================
1. Create Master and Slave volume. Create geo-rep session between them
2. Create data on Master and let it sync to slave.
3. Stop the existing session.
4. Initiate Replace brick commit for one of the brick
5. Start geo-rep immediately

Actual results:
===============

Replace brick starts (Hybrid) crawl and crashes.


Expected results:
=================

Worker should not crash

Comment 4 Kotresh HR 2017-05-31 12:06:31 UTC
Upstream Patch:
https://review.gluster.org/17389

Comment 5 Kotresh HR 2017-09-21 19:53:46 UTC
The same patch is fixing this bug and https://bugzilla.redhat.com/show_bug.cgi?id=1299740

Comment 12 errata-xmlrpc 2018-09-04 06:29:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607


Note You need to log in before you can comment on or make changes to this bug.