Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 764642 (GLUSTER-2910)

Summary:	EOFError on geo-replication
Product:	[Community] GlusterFS	Reporter:	Jacob Shucart <jacob>
Component:	geo-replication	Assignee:	kaushik <kbudiger>
Status:	CLOSED WORKSFORME	QA Contact:
Severity:	medium	Docs Contact:
Priority:	medium
Version:	3.2.0	CC:	aavati, bala, csaba, gluster-bugs, platform, vijay
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:		Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jacob Shucart 2011-05-17 16:37:23 UTC

I have two Gluster virtual appliances running on ESXi at a prospect's site.  They were both updated using gluster-app-migrate 3.2 and I verified that all of the directories(/var/log/glusterfs/geo-replication*) are there and everything.  I ran /etc/init.d/glusterd.  I have a simple volume on one server and on the remote server.

When I run:

gluster volume geo-replication glustervol1 172.16.102.244::glustervol1 start

It comes back faulty.  I verified passwordless SSH is working, NTP is in sync, all the dependencies are there.  When I look at his logs, it gives an EOFError.  I tried creating new volumes that had no data in them, and I get the same result.  Below is the contents of the logs:

[2011-05-17 11:25:57.867026] I [monitor(monitor):43:monitor] Monitor: starting gsyncd worker
[2011-05-17 11:25:57.906320] I [gsyncd:287:main_i] <top>: syncing: gluster://localhost:test -> ssh://172.16.102.244::glustervol1
[2011-05-17 11:25:57.973546] E [syncdutils:131:exception] <top>: FAIL:
Traceback (most recent call last):
  File "/opt/glusterfs/3.2.0/local/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 152, in twrap
    tf(*aa)
  File "/opt/glusterfs/3.2.0/local/libexec/glusterfs/python/syncdaemon/repce.py", line 118, in listen
    rid, exc, res = recv(self.inf)
  File "/opt/glusterfs/3.2.0/local/libexec/glusterfs/python/syncdaemon/repce.py", line 42, in recv
    return pickle.load(inf)
EOFError

Comment 1 Csaba Henk 2011-05-17 16:39:50 UTC

There are four related logs:

- master gsyncd log
- master glusterfs log
- slave gayncd log
- slave glusterfs log

Best aid from you would be:

1. set them all to DEBUG loglevel
2. locate the logfiles and post them

For 1., perform the following steps before starting geo-rep:

- On master machine:

# gluster volume geo-replication glustervol1 172.16.102.244::glustervol1 config log-level DEBUG
# gluster volume geo-replication glustervol1 172.16.102.244::glustervol1 config gluster-log-level DEBUG

- On slave (172.16.102.244):

# gluster volume geo-replication :glustervol1 config log-level DEBUG
# gluster volume geo-replication :glustervol1 config gluster-log-level DEBUG

For 2.:

- locating master side logfiles:

  - On master:

  # gluster volume geo-replication glustervol1 172.16.102.244::glustervol1 config log-file
  # gluster volume geo-replication glustervol1 172.16.102.244::glustervol1 config gluster-log-file

- locating slave side logfiles (ie. they should be looked for on 172.16.102.244):

  - On master:

  # gluster volume geo-replication glustervol1 172.16.102.244::glustervol1 config session-owner

  - On slave (172.16.102.244):

  # gluster volume geo-replication :glustervol1 config log-file
  # gluster volume geo-replication :glustervol1 config gluster-log-file

  The outputs will include the parameter ${session-owner}. Substitute with that the value you got above on master side to get the actual logfile paths.

All four log file should exist if the given geo-rep session can start up successfully. If some is missing, that's a sign of some invocation problem, ie. a valuable info as well.

Comment 2 kaushik 2011-10-20 02:56:54 UTC

EOFError means RPC connection b/w Master & Slave has failed, We are not seeing the issue in the master if the setup is as expected. We have listed out the reasons for EOFError in the documentation http://gluster.com/community/documentation/index.php/Gluster_3.2:_Troubleshooting_Geo-replication

Reopen the bug if the issue is found again satisfying all the prerequisites.