1599215 – [geo-rep]: [Errno 2] No such file or directory --status remains FAULTY

Bug 1599215 - [geo-rep]: [Errno 2] No such file or directory --status remains FAULTY

Summary: [geo-rep]: [Errno 2] No such file or directory --status remains FAULTY

Keywords:
Status:	CLOSED DUPLICATE of bug 1598384
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	geo-replication
Sub Component:
Version:	rhgs-3.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Kotresh HR
QA Contact:	Rahul Hinduja
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-07-09 08:50 UTC by Rochelle
Modified:	2018-07-11 05:14 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-07-11 05:14:10 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Rochelle 2018-07-09 08:50:05 UTC

Description of problem:
=======================
The ACTIVE workers (2/3) of a geo-replication session remain in a FAULTY state when a single directory has been created in the master as shown:

[root@dhcp42-18 master]# gluster volume geo-replication master 10.70.43.116::slave status
 
MASTER NODE     MASTER VOL    MASTER BRICK      SLAVE USER    SLAVE                  SLAVE NODE      STATUS     CRAWL STATUS       LAST_SYNCED                  
-----------------------------------------------------------------------------------------------------------------------------------------------------
10.70.42.18     master        /rhs/brick1/b1    root          10.70.43.116::slave    10.70.42.246    Active     Changelog Crawl    2018-07-09 04:30:01          
10.70.42.18     master        /rhs/brick2/b4    root          10.70.43.116::slave    N/A             Faulty     N/A                N/A                          
10.70.42.18     master        /rhs/brick3/b7    root          10.70.43.116::slave    10.70.42.246    Active     History Crawl      2018-07-09 04:28:01          
10.70.41.239    master        /rhs/brick1/b2    root          10.70.43.116::slave    10.70.43.116    Passive    N/A                N/A                          
10.70.41.239    master        /rhs/brick2/b5    root          10.70.43.116::slave    10.70.43.116    Passive    N/A                N/A                          
10.70.41.239    master        /rhs/brick3/b8    root          10.70.43.116::slave    10.70.43.116    Passive    N/A                N/A                          
10.70.43.179    master        /rhs/brick1/b3    root          10.70.43.116::slave    10.70.42.128    Passive    N/A                N/A                          
10.70.43.179    master        /rhs/brick2/b6    root          10.70.43.116::slave    10.70.42.128    Passive    N/A                N/A                          
10.70.43.179    master        /rhs/brick3/b9    root          10.70.43.116::slave    10.70.42.128    Passive    N/A                N/A                          
[root@dhcp42-18 master]# gluster volume geo-replication master 10.70.43.116::slave status
 
MASTER NODE     MASTER VOL    MASTER BRICK      SLAVE USER    SLAVE                  SLAVE NODE      STATUS     CRAWL STATUS       LAST_SYNCED                  
-----------------------------------------------------------------------------------------------------------------------------------------------------
10.70.42.18     master        /rhs/brick1/b1    root          10.70.43.116::slave    10.70.42.246    Active     Changelog Crawl    2018-07-09 04:30:01          
10.70.42.18     master        /rhs/brick2/b4    root          10.70.43.116::slave    N/A             Faulty     N/A                N/A                          
10.70.42.18     master        /rhs/brick3/b7    root          10.70.43.116::slave    N/A             Faulty     N/A                N/A                          
10.70.41.239    master        /rhs/brick1/b2    root          10.70.43.116::slave    10.70.43.116    Passive    N/A                N/A                          
10.70.41.239    master        /rhs/brick2/b5    root          10.70.43.116::slave    10.70.43.116    Passive    N/A                N/A                          
10.70.41.239    master        /rhs/brick3/b8    root          10.70.43.116::slave    10.70.43.116    Passive    N/A                N/A                          
10.70.43.179    master        /rhs/brick1/b3    root          10.70.43.116::slave    10.70.42.128    Passive    N/A                N/A                          
10.70.43.179    master        /rhs/brick2/b6    root          10.70.43.116::slave    10.70.42.128    Passive    N/A                N/A                          
10.70.43.179    master        /rhs/brick3/b9    root          10.70.43.116::slave    10.70.42.128    Passive    N/A                N/A                          
[root@dhcp42-18 master]# gluster volume geo-replication master 10.70.43.116::slave status
 
MASTER NODE     MASTER VOL    MASTER BRICK      SLAVE USER    SLAVE                  SLAVE NODE      STATUS     CRAWL STATUS       LAST_SYNCED                  
-----------------------------------------------------------------------------------------------------------------------------------------------------
10.70.42.18     master        /rhs/brick1/b1    root          10.70.43.116::slave    10.70.42.246    Active     Changelog Crawl    2018-07-09 04:30:01          
10.70.42.18     master        /rhs/brick2/b4    root          10.70.43.116::slave    N/A             Faulty     N/A                N/A                          
10.70.42.18     master        /rhs/brick3/b7    root          10.70.43.116::slave    N/A             Faulty     N/A                N/A                          
10.70.41.239    master        /rhs/brick1/b2    root          10.70.43.116::slave    10.70.43.116    Passive    N/A                N/A                          
10.70.41.239    master        /rhs/brick2/b5    root          10.70.43.116::slave    10.70.43.116    Passive    N/A                N/A                          
10.70.41.239    master        /rhs/brick3/b8    root          10.70.43.116::slave    10.70.43.116    Passive    N/A                N/A                          
10.70.43.179    master        /rhs/brick1/b3    root          10.70.43.116::slave    10.70.42.128    Passive    N/A                N/A                          
10.70.43.179    master        /rhs/brick2/b6    root          10.70.43.116::slave    10.70.42.128    Passive    N/A                N/A                          
10.70.43.179    master        /rhs/brick3/b9    root          10.70.43.116::slave    10.70.42.128    Passive    N/A                N/A                          



The following was the traceback on the master:
----------------------------------------------
[2018-07-09 08:30:04.763963] E [repce(/rhs/brick2/b4):209:__call__] RepceClient: call failed    call=28558:139802932234048:1531125004.38        method=entry_ops        error=OSError
[2018-07-09 08:30:04.764759] E [syncdutils(/rhs/brick2/b4):348:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 210, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 803, in main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1586, in service_loop
    g2.crawlwrap()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 597, in crawlwrap
    self.crawl()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1396, in crawl
    self.changelogs_batch_process(changes)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1370, in changelogs_batch_process
    self.process(batch)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1204, in process
    self.process_change(change, done, retry)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1114, in process_change
    failures = self.slave.server.entry_ops(entries)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 228, in __call__
    return self.ins(self.meth, *a)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 210, in __call__
    raise res
OSError: [Errno 2] No such file or directory: '/rhs/brick1/b1/.glusterfs/6a/0e/6a0e4415-f45c-4dcb-9862-a8925a586f57'
[2018-07-09 08:30:04.800748] I [syncdutils(/rhs/brick2/b4):288:finalize] <top>: exiting.




The following was the traceback on the slave:
---------------------------------------------
[2018-07-09 08:31:18.867358] E [repce(slave):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 644, in entry_ops
    gfid[2:4], gfid))
OSError: [Errno 2] No such file or directory: '/rhs/brick1/b1/.glusterfs/6a/0e/6a0e4415-f45c-4dcb-9862-a8925a586f57'


Version-Release number of selected component (if applicable):
==============================================================
glusterfs-geo-replication-3.12.2-13.el7rhgs.x86_64

How reproducible:
=================
1/1


Steps to Reproduce:
===================
1.Create and start a geo-replication session
2.Mount the master and slave volume
3.Create a single directory on the master

Actual results:
===============
2/3 ACTIVE workers went to FAULTY and did not come back to ACTIVE


Expected results:
================
None of the workers should go to FAULTY

Comment 5 Rochelle 2018-07-11 05:14:10 UTC


*** This bug has been marked as a duplicate of bug 1598384 ***

Note You need to log in before you can comment on or make changes to this bug.