Bug 1342785

Summary: [geo-rep]: Worker crashes with permission denied during hybrid crawl caused via replace brick
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Rahul Hinduja <rhinduja>
Component: geo-replicationAssignee: Kotresh HR <khiremat>
Status: CLOSED ERRATA QA Contact: Rochelle <rallan>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: rhgs-3.1CC: csaba, khiremat, nchilaka, sheggodu, srmukher
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.4.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: rebase
Fixed In Version: glusterfs-3.12.2-1 Doc Type: If docs needed, set a value
Doc Text:
Previously, metadata changes such as, ownership change on symlink file crashed with "Permission Denied" error. With this fix, geo-replication is fixed to sync metadata of symlink files and ownership change of symlink files is replicated properly and does not result in a crash.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-04 06:29:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1503134    

Description Rahul Hinduja 2016-06-05 09:48:01 UTC
Description of problem:
=======================

In a scenario, where files are synced to slave and replace brick is issued on master. This can cause hybrid crawl if the replaced brick becomes ACTIVE, it crashes with permission denied and becomes passive. 

[2016-06-05 09:39:57.549393] E [repce(/rhs/brick3/b9):207:__call__] RepceClient: call 5532:140024108808000:1465119596.96 (meta_ops) failed on peer with OSError
[2016-06-05 09:39:57.549735] E [syncdutils(/rhs/brick3/b9):276:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 201, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 720, in main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1510, in service_loop
    g2.crawlwrap()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 571, in crawlwrap
    self.crawl()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1132, in crawl
    self.changelogs_batch_process(changes)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1107, in changelogs_batch_process
    self.process(batch)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 992, in process
    self.process_change(change, done, retry)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 952, in process_change
    failures = self.slave.server.meta_ops(meta_entries)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 226, in __call__
    return self.ins(self.meth, *a)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 208, in __call__
    raise res
OSError: [Errno 13] Permission denied: '.gfid/86bf33db-52e8-49b7-a7d9-1d00d82b88ef'
[2016-06-05 09:39:57.552012] I [syncdutils(/rhs/brick3/b9):220:finalize] <top>: exiting.
[2016-06-05 09:39:57.561508] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF.
[2016-06-05 09:39:57.561933] I [syncdutils(agent):220:finalize] <top>: exiting.


Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.7.9-8

How reproducible:
=================
Yet to try second time.


Steps to Reproduce:
===================
1. Create Master and Slave volume. Create geo-rep session between them
2. Create data on Master and let it sync to slave.
3. Stop the existing session.
4. Initiate Replace brick commit for one of the brick
5. Start geo-rep immediately

Actual results:
===============

Replace brick starts (Hybrid) crawl and crashes.


Expected results:
=================

Worker should not crash

Comment 4 Kotresh HR 2017-05-31 12:06:31 UTC
Upstream Patch:
https://review.gluster.org/17389

Comment 5 Kotresh HR 2017-09-21 19:53:46 UTC
The same patch is fixing this bug and https://bugzilla.redhat.com/show_bug.cgi?id=1299740

Comment 12 errata-xmlrpc 2018-09-04 06:29:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607