Bug 1476269 - [geo-rep]: geo-rep worker crashes with "Directory Not Empty" and "No data available"
[geo-rep]: geo-rep worker crashes with "Directory Not Empty" and "No data ava...
Status: NEW
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: geo-replication (Show other bugs)
3.3
x86_64 Linux
unspecified Severity urgent
: ---
: ---
Assigned To: Aravinda VK
Rahul Hinduja
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-28 09:16 EDT by Rahul Hinduja
Modified: 2017-09-28 13:03 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Rahul Hinduja 2017-07-28 09:16:15 EDT
Description of problem:
=======================

While running the sanity check of geo-replication with fops

{create,chmod,chown,chgrp,hardlink,symlink,truncate,rename,remove} with Changelog, hybrid, history crawl. Found the worker crash after remove operation. 

Master Volume: EC (2x(4+2))
Slave Volume: DR (2x2)

[2017-07-28 12:47:41.743443] E [repce(/bricks/brick1/master_brick6):207:__call__] RepceClient: call 19136:140593553704704:1501246047.99 (entry_ops) failed on peer with 
OSError
[2017-07-28 12:47:41.744010] E [syncdutils(/bricks/brick1/master_brick6):296:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 204, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 780, in main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1566, in service_loop
    g3.crawlwrap(oneshot=True)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 570, in crawlwrap
    self.crawl()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1211, in crawl
    self.changelogs_batch_process(changes)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1118, in changelogs_batch_process
    self.process(batch)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1001, in process
    self.process_change(change, done, retry)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 942, in process_change
    failures = self.slave.server.entry_ops(entries)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 226, in __call__
    return self.ins(self.meth, *a)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 208, in __call__
    raise res
OSError: [Errno 39] Directory not empty: '.gfid/9ea5ba1b-5a50-4dd5-96df-9c8ec582617e/hardlink_to_files'
[2017-07-28 12:47:41.747756] I [syncdutils(/bricks/brick1/master_brick6):237:finalize] <top>: exiting.


[2017-07-28 12:47:58.636365] I [resource(/bricks/brick1/master_brick6):1560:service_loop] GLUSTER: Register time: 1501246078
[2017-07-28 12:48:06.957875] I [master(/bricks/brick1/master_brick6):459:mgmt_lock] _GMaster: Didn't get lock : /bricks/brick1/master_brick6 : Becoming PASSIVE
[2017-07-28 12:48:06.966227] I [gsyncdstatus(/bricks/brick1/master_brick6):276:set_passive] GeorepStatus: Worker Status: Passive
[2017-07-28 12:48:15.567603] E [repce(/bricks/brick0/master_brick0):207:__call__] RepceClient: call 22896:139827161237248:1501246094.35 (entry_ops) failed on peer with OSError
[2017-07-28 12:48:15.568198] E [syncdutils(/bricks/brick0/master_brick0):296:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 204, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 780, in main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1566, in service_loop
    g3.crawlwrap(oneshot=True)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 570, in crawlwrap
    self.crawl()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1211, in crawl
    self.changelogs_batch_process(changes)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1118, in changelogs_batch_process
    self.process(batch)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1001, in process
    self.process_change(change, done, retry)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 942, in process_change
    failures = self.slave.server.entry_ops(entries)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 226, in __call__
    return self.ins(self.meth, *a)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 208, in __call__
    raise res
OSError: [Errno 61] No data available
[2017-07-28 12:48:15.572015] I [syncdutils(/bricks/brick0/master_brick0):237:finalize] <top>: exiting.
[2017-07-28 12:48:15.575790] I [repce(/bricks/brick0/master_brick0):92:service_loop] RepceServer: terminating on reaching EOF.
[2017-07-28 12:48:15.576379] I [syncdutils(/bricks/brick0/master_brick0):237:finalize] <top>: exiting.
[2017-07-28 12:48:15.592614] I [gsyncdstatus(monitor):240:set_worker_status] GeorepStatus: Worker Status: Faulty



Version-Release number of selected component (if applicable):
=============================================================

glusterfs-geo-replication-3.8.4-35.el6rhs.x86_64


How reproducible:
=================

Have run the same cases >10 times on same build and have seen only once.

Note You need to log in before you can comment on or make changes to this bug.