Bug 1577796

Summary:	[Geo-rep]: Worker crashes with OSError: [Errno 116] Stale file handle
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Rochelle <rallan>
Component:	geo-replication	Assignee:	Aravinda VK <avishwan>
Status:	CLOSED DUPLICATE	QA Contact:	Rahul Hinduja <rhinduja>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.4	CC:	csaba, khiremat, rallan, rgowdapp, rhs-bugs, sankarshan, storage-qa-internal
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-05-24 06:39:42 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Rochelle 2018-05-14 07:08:37 UTC

Description of problem:
========================
All the files were not removed from the slave when an rmdir was happening on the slave.

Ran (34) automation case -- Rsync + Fuse and encountered the following 


Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 210, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 802, in main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1676, in service_loop
    g3.crawlwrap(oneshot=True)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 597, in crawlwrap
    self.crawl()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1470, in crawl
    self.changelogs_batch_process(changes)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1370, in changelogs_batch_process
    self.process(batch)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1204, in process
    self.process_change(change, done, retry)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1114, in process_change
    failures = self.slave.server.entry_ops(entries)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 228, in __call__
    return self.ins(self.meth, *a)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 210, in __call__
    raise res
OSError: [Errno 116] Stale file handle

Version-Release number of selected component (if applicable):
==============================================================
[root@dhcp41-226 tmp]# rpm -qa | grep gluster
glusterfs-api-3.12.2-8.el7rhgs.x86_64
glusterfs-3.12.2-8.el7rhgs.x86_64
glusterfs-debuginfo-3.8.4-54.8.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
libvirt-daemon-driver-storage-gluster-3.9.0-14.el7_5.2.x86_64
glusterfs-client-xlators-3.12.2-8.el7rhgs.x86_64
glusterfs-cli-3.12.2-8.el7rhgs.x86_64
glusterfs-geo-replication-3.12.2-8.el7rhgs.x86_64
vdsm-gluster-4.19.43-2.3.el7rhgs.noarch
python2-gluster-3.12.2-8.el7rhgs.x86_64
glusterfs-server-3.12.2-8.el7rhgs.x86_64
glusterfs-events-3.12.2-8.el7rhgs.x86_64
gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64
glusterfs-libs-3.12.2-8.el7rhgs.x86_64
glusterfs-rdma-3.12.2-8.el7rhgs.x86_64
glusterfs-fuse-3.12.2-8.el7rhgs.x86_64


How reproducible:
=================
1/1



Actual results:
==============
Worker crashed and rmdir was not synced


Expected results:
================
The worker should not crash

Comment 3 Raghavendra G 2018-05-21 12:20:53 UTC

This looks to be a duplicate of bz 1546717, which is fixed in version glusterfs-3.12.2-9. Can you try recreating this bug on glusterfs-3.12.2-9 or higher? Alternatively you can also try with performance.stat-prefetch off as bz 1546717 was found to be an issue with stat-prefetch.

Also, how serious is the issue? I hear from Kotresh, if this issue is a transient error which doesn't affect syncing of data to slaves, the issue is not serious enough to be targeted for rhgs-3.4.0. @Kotresh/Rochelle, can you please let me know whether this bug is serious enough to be considered for rhgs-3.4.0?

Comment 8 Raghavendra G 2018-05-24 06:39:42 UTC


*** This bug has been marked as a duplicate of bug 1546717 ***