Bug 764642 (GLUSTER-2910)

Summary: EOFError on geo-replication
Product: [Community] GlusterFS Reporter: Jacob Shucart <jacob>
Component: geo-replicationAssignee: kaushik <kbudiger>
Status: CLOSED WORKSFORME QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3.2.0CC: aavati, bala, csaba, gluster-bugs, platform, vijay
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Jacob Shucart 2011-05-17 16:37:23 UTC
I have two Gluster virtual appliances running on ESXi at a prospect's site.  They were both updated using gluster-app-migrate 3.2 and I verified that all of the directories(/var/log/glusterfs/geo-replication*) are there and everything.  I ran /etc/init.d/glusterd.  I have a simple volume on one server and on the remote server.

When I run:

gluster volume geo-replication glustervol1 172.16.102.244::glustervol1 start

It comes back faulty.  I verified passwordless SSH is working, NTP is in sync, all the dependencies are there.  When I look at his logs, it gives an EOFError.  I tried creating new volumes that had no data in them, and I get the same result.  Below is the contents of the logs:

[2011-05-17 11:25:57.867026] I [monitor(monitor):43:monitor] Monitor: starting gsyncd worker
[2011-05-17 11:25:57.906320] I [gsyncd:287:main_i] <top>: syncing: gluster://localhost:test -> ssh://172.16.102.244::glustervol1
[2011-05-17 11:25:57.973546] E [syncdutils:131:exception] <top>: FAIL:
Traceback (most recent call last):
  File "/opt/glusterfs/3.2.0/local/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 152, in twrap
    tf(*aa)
  File "/opt/glusterfs/3.2.0/local/libexec/glusterfs/python/syncdaemon/repce.py", line 118, in listen
    rid, exc, res = recv(self.inf)
  File "/opt/glusterfs/3.2.0/local/libexec/glusterfs/python/syncdaemon/repce.py", line 42, in recv
    return pickle.load(inf)
EOFError

Comment 1 Csaba Henk 2011-05-17 16:39:50 UTC
There are four related logs:

- master gsyncd log
- master glusterfs log
- slave gayncd log
- slave glusterfs log

Best aid from you would be:

1. set them all to DEBUG loglevel
2. locate the logfiles and post them

For 1., perform the following steps before starting geo-rep:

- On master machine:

# gluster volume geo-replication glustervol1 172.16.102.244::glustervol1 config log-level DEBUG
# gluster volume geo-replication glustervol1 172.16.102.244::glustervol1 config gluster-log-level DEBUG

- On slave (172.16.102.244):

# gluster volume geo-replication :glustervol1 config log-level DEBUG
# gluster volume geo-replication :glustervol1 config gluster-log-level DEBUG

For 2.:

- locating master side logfiles:

  - On master:

  # gluster volume geo-replication glustervol1 172.16.102.244::glustervol1 config log-file
  # gluster volume geo-replication glustervol1 172.16.102.244::glustervol1 config gluster-log-file

- locating slave side logfiles (ie. they should be looked for on 172.16.102.244):

  - On master:

  # gluster volume geo-replication glustervol1 172.16.102.244::glustervol1 config session-owner

  - On slave (172.16.102.244):

  # gluster volume geo-replication :glustervol1 config log-file
  # gluster volume geo-replication :glustervol1 config gluster-log-file

  The outputs will include the parameter ${session-owner}. Substitute with that the value you got above on master side to get the actual logfile paths.

All four log file should exist if the given geo-rep session can start up successfully. If some is missing, that's a sign of some invocation problem, ie. a valuable info as well.

Comment 2 kaushik 2011-10-20 02:56:54 UTC
EOFError means RPC connection b/w Master & Slave has failed, We are not seeing the issue in the master if the setup is as expected. We have listed out the reasons for EOFError in the documentation http://gluster.com/community/documentation/index.php/Gluster_3.2:_Troubleshooting_Geo-replication

Reopen the bug if the issue is found again satisfying all the prerequisites.