Bug 1129208

Summary: [Dist-geo-rep] : In a cascaded setup, rm -rf on master didn't completely propagated to level 2 slave.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Vijaykumar Koppad <vkoppad>
Component: geo-replicationAssignee: Bug Updates Notification Mailing List <rhs-bugs>
Status: CLOSED WONTFIX QA Contact: storage-qa-internal <storage-qa-internal>
Severity: high Docs Contact:
Priority: medium    
Version: rhgs-3.0CC: avishwan, chrisw, csaba, david.macdonald, mzywusko, nlevinki, smohan
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: cascaded
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-16 15:56:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vijaykumar Koppad 2014-08-12 10:06:18 UTC
Description of problem:  In A cascaded setup, rm -rf on master didn't completely propagated to level 2 slave. 

imaster(intermediate master) geo-rep logs had some trace-backs.
===============================================================================
[2014-08-11 17:02:42.58315] I [master(/bricks/brick2/b7):1147:crawl] _GMaster: slave's time: (1407756535, 0)
[2014-08-11 17:05:38.262269] E [repce(/bricks/brick2/b7):207:__call__] RepceClient: call 17308:140607044015872:1407756932.83 (entry_ops) failed on peer with OSError
[2014-08-11 17:05:38.262967] E [syncdutils(/bricks/brick2/b7):270:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 164, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 643, in main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1335, in service_loop
    g2.crawlwrap()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 513, in crawlwrap
    self.crawl(no_stime_update=no_stime_update)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1159, in crawl
    self.process(changes)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 910, in process
    self.process_change(change, done, retry)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 874, in process_change
    self.slave.server.entry_ops(entries)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 226, in __call__
    return self.ins(self.meth, *a)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 208, in __call__
    raise res
OSError: [Errno 5] Input/output error
[2014-08-11 17:05:38.266292] I [syncdutils(/bricks/brick2/b7):214:finalize] <top>: exiting.
[2014-08-11 17:05:38.269656] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF.
[2014-08-11 17:05:38.270167] I [syncdutils(agent):214:finalize] <top>: exiting.
[2014-08-11 17:05:38.286237] I [monitor(monitor):109:set_state] Monitor: new state: faulty
[2014-08-11 17:05:38.371052] E [repce(/bricks/brick3/b11):207:__call__] RepceClient: call 17310:139915850999552:1407756928.11 (entry_ops) failed on peer with OSError
[2014-08-11 17:05:38.371644] E [syncdutils(/bricks/brick3/b11):270:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 164, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 643, in main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1335, in service_loop
    g2.crawlwrap()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 513, in crawlwrap
    self.crawl(no_stime_update=no_stime_update)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1159, in crawl
    self.process(changes)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 910, in process
    self.process_change(change, done, retry)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 874, in process_change
    self.slave.server.entry_ops(entries)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 226, in __call__
    return self.ins(self.meth, *a)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 208, in __call__
    raise res
OSError: [Errno 61] No data available
[2014-08-11 17:05:38.374772] I [syncdutils(/bricks/brick3/b11):214:finalize] <top>: exiting.
[2014-08-11 17:05:38.378520] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF.
[2014-08-11 17:05:38.379053] I [syncdutils(agent):214:finalize] <top>: exiting.
[2014-08-11 17:05:48.300900] I [monitor(monitor):163:monitor] Monitor: ------------------------------------------------------------
[2014-08-11 17:05:48.301461] I [monitor(monitor):164:monitor] Monitor: starting gsyncd worker
[2014-08-11 17:05:48.399454] I [monitor(monitor):163:monitor] Monitor: ------------------------------------------------------------
[2014-08-11 17:05:48.400115] I [monitor(monitor):164:monitor] Monitor: starting gsyncd worker
[2014-08-11 17:05:48.720443] I [changelogagent(agent):72:__init__] ChangelogAgent: Agent listining...
===============================================================================


Version-Release number of selected component (if applicable): glusterfs-3.6.0.27-1.el6rhs


How reproducible: Didn't try to reproduce the issue


Steps to Reproduce:
1. create and start a geo-rep cascaded setup between master, imaster and slave.
2. create some data on master and let it sync to imaster and slave.
3. then rm -rf on master mount-point. 
4. Check if it succeeds to remove from the slave. 

Actual results: rm -rf on master fails to propagate to slave level 2 


Expected results: rm -rf on master  should be remove all the files from all the slaves. 
 

Additional info:

Comment 1 Vijaykumar Koppad 2014-08-12 10:19:47 UTC
sosreport of all master, imaster and slave nodes @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1129208/