Bug 1258379 - Geo-replication fails with OSError: [Errno 16] Device or resource busy
Geo-replication fails with OSError: [Errno 16] Device or resource busy
Status: POST
Product: GlusterFS
Classification: Community
Component: geo-replication (Show other bugs)
mainline
x86_64 Linux
high Severity high
: ---
: ---
Assigned To: Aravinda VK
: Triaged
Depends On: 1181870
Blocks:
  Show dependency treegraph
 
Reported: 2015-08-31 05:00 EDT by Aravinda VK
Modified: 2016-03-28 18:17 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1181870
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Aravinda VK 2015-08-31 05:00:16 EDT
+++ This bug was initially created as a clone of Bug #1181870 +++

Description of problem:

Brand new 2 brick replica volume, geo-replication set up before any data went into the volume. rsync'd about 40GB of data from a range of small (1k) to large (800MB) files.

I waited for geo-replication to finish and then ran an 'rsync --update' to catch some files that had changed on the source, geo-replication went into Faulty mode.

Error log from gluster brick server:

[2015-01-14 01:05:41.227730] I [monitor(monitor):163:monitor] Monitor: ------------------------------------------------------------
[2015-01-14 01:05:41.227941] I [monitor(monitor):164:monitor] Monitor: starting gsyncd worker
[2015-01-14 01:05:41.329799] I [changelogagent(agent):72:__init__] ChangelogAgent: Agent listining...
[2015-01-14 01:05:41.330574] I [gsyncd(/export/sitedata):633:main_i] <top>: syncing: gluster://localhost:ds0 -> ssh://root@gaige:gluster://localhost:geo-repl
[2015-01-14 01:05:45.691020] I [master(/export/sitedata):75:gmaster_builder] <top>: setting up xsync change detection mode
[2015-01-14 01:05:45.691421] I [master(/export/sitedata):403:__init__] _GMaster: using 'rsync' as the sync engine
[2015-01-14 01:05:45.692125] I [master(/export/sitedata):75:gmaster_builder] <top>: setting up changelog change detection mode
[2015-01-14 01:05:45.692292] I [master(/export/sitedata):403:__init__] _GMaster: using 'rsync' as the sync engine
[2015-01-14 01:05:45.692955] I [master(/export/sitedata):75:gmaster_builder] <top>: setting up changeloghistory change detection mode
[2015-01-14 01:05:45.693108] I [master(/export/sitedata):403:__init__] _GMaster: using 'rsync' as the sync engine
[2015-01-14 01:05:45.703636] I [master(/export/sitedata):1301:register] _GMaster: xsync temp directory: /var/lib/misc/glusterfsd/ds0/ssh%3A%2F%2Froot%4010.42.0.8%3Agluster%3A%2F%2F127.0.0.1%3Ageo-repl/669c18441eb6dec3d3bb2994485b6206/xsync
[2015-01-14 01:05:45.703836] I [resource(/export/sitedata):1318:service_loop] GLUSTER: Register time: 1421157945
[2015-01-14 01:05:45.721772] I [master(/export/sitedata):480:crawlwrap] _GMaster: primary master with volume id 88bb6b5d-cc89-47cf-a74f-1a9c17f194d7 ...
[2015-01-14 01:05:45.731245] I [master(/export/sitedata):491:crawlwrap] _GMaster: crawl interval: 1 seconds
[2015-01-14 01:05:45.747462] I [master(/export/sitedata):1216:crawl] _GMaster: starting history crawl... turns: 1, stime: (1421156284, 0)
[2015-01-14 01:05:45.749725] I [master(/export/sitedata):1245:crawl] _GMaster: slave's time: (1421156284, 0)
[2015-01-14 01:05:48.813282] E [repce(/export/sitedata):207:__call__] RepceClient: call 20573:139902371477312:1421157947.51 (entry_ops) failed on peer with OSError
[2015-01-14 01:05:48.813823] E [syncdutils(/export/sitedata):270:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py", line 164, in main
    main_i()
  File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py", line 643, in main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/resource.py", line 1325, in service_loop
    g3.crawlwrap(oneshot=True)
  File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line 529, in crawlwrap
    self.crawl(no_stime_update=no_stime_update)
  File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line 1256, in crawl
    self.process(changes)
  File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line 946, in process
    self.process_change(change, done, retry)
  File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line 910, in process_change
    self.slave.server.entry_ops(entries)
  File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/repce.py", line 226, in __call__
    return self.ins(self.meth, *a)
  File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/repce.py", line 208, in __call__
    raise res
OSError: [Errno 16] Device or resource busy
[2015-01-14 01:05:48.815470] I [syncdutils(/export/sitedata):214:finalize] <top>: exiting.
[2015-01-14 01:05:48.818737] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF.
[2015-01-14 01:05:48.819194] I [syncdutils(agent):214:finalize] <top>: exiting.
[2015-01-14 01:05:49.695061] I [monitor(monitor):222:monitor] Monitor: worker(/export/sitedata) died in startup phase


Error log from geo-replication slave:

[2015-01-14 01:07:07.787431] I [gsyncd(slave):633:main_i] <top>: syncing: gluster://localhost:geo-repl
[2015-01-14 01:07:08.919622] I [resource(slave):761:service_loop] GLUSTER: slave listening
[2015-01-14 01:07:11.168715] E [repce(slave):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/resource.py", line 658, in entry_ops
    errno_wrap(os.rename, [entry, en], [ENOENT, EEXIST])
  File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/syncdutils.py", line 470, in errno_wrap
    return call(*arg)
OSError: [Errno 16] Device or resource busy
[2015-01-14 01:07:11.176639] I [repce(slave):92:service_loop] RepceServer: terminating on reaching EOF.
[2015-01-14 01:07:11.177026] I [syncdutils(slave):214:finalize] <top>: exiting.



Version-Release number of selected component (if applicable):
glusterfs-server 3.6.1-ubuntu1~trusty1 from the Ubuntu launchpad PPA


How reproducible:
Rsync some data, change some files from the source and rsync --update


Steps to Reproduce:
1. rsync files from an external location into gluster
2. change some files in the external location, rsync --update
3. geo-replication failed

Actual results:

geo-replication fails

Expected results:

geo-replication works

Additional info:

This was a fresh gluster 3.6.1 installation, no upgrade was performed and the volume was empty when geo-replication was setup.

--- Additional comment from Aravinda VK on 2015-01-15 01:24:16 EST ---

This error comes if Rebalance is happening in Slave nodes. DHT layer holds lock during Rebalance, hence rename fails with this error. Since Geo-rep monitor restarts the worker dies, this error logged again. Once the Rebalance complete, Geo-rep become normal and continue syncing data from Master to Slave.

We will enhance Geo-rep to handle this situation smoothly.
Comment 1 Aravinda VK 2016-01-13 01:22:25 EST
Upstream Patch is sent http://review.gluster.org/12060
Comment 2 Mike McCune 2016-03-28 18:17:27 EDT
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune@redhat.com with any questions

Note You need to log in before you can comment on or make changes to this bug.