Bug 764642 (GLUSTER-2910) - EOFError on geo-replication
Summary: EOFError on geo-replication
Keywords:
Status: CLOSED WORKSFORME
Alias: GLUSTER-2910
Product: GlusterFS
Classification: Community
Component: geo-replication
Version: 3.2.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: kaushik
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-05-17 16:37 UTC by Jacob Shucart
Modified: 2011-10-20 05:56 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Jacob Shucart 2011-05-17 16:37:23 UTC
I have two Gluster virtual appliances running on ESXi at a prospect's site.  They were both updated using gluster-app-migrate 3.2 and I verified that all of the directories(/var/log/glusterfs/geo-replication*) are there and everything.  I ran /etc/init.d/glusterd.  I have a simple volume on one server and on the remote server.

When I run:

gluster volume geo-replication glustervol1 172.16.102.244::glustervol1 start

It comes back faulty.  I verified passwordless SSH is working, NTP is in sync, all the dependencies are there.  When I look at his logs, it gives an EOFError.  I tried creating new volumes that had no data in them, and I get the same result.  Below is the contents of the logs:

[2011-05-17 11:25:57.867026] I [monitor(monitor):43:monitor] Monitor: starting gsyncd worker
[2011-05-17 11:25:57.906320] I [gsyncd:287:main_i] <top>: syncing: gluster://localhost:test -> ssh://172.16.102.244::glustervol1
[2011-05-17 11:25:57.973546] E [syncdutils:131:exception] <top>: FAIL:
Traceback (most recent call last):
  File "/opt/glusterfs/3.2.0/local/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 152, in twrap
    tf(*aa)
  File "/opt/glusterfs/3.2.0/local/libexec/glusterfs/python/syncdaemon/repce.py", line 118, in listen
    rid, exc, res = recv(self.inf)
  File "/opt/glusterfs/3.2.0/local/libexec/glusterfs/python/syncdaemon/repce.py", line 42, in recv
    return pickle.load(inf)
EOFError

Comment 1 Csaba Henk 2011-05-17 16:39:50 UTC
There are four related logs:

- master gsyncd log
- master glusterfs log
- slave gayncd log
- slave glusterfs log

Best aid from you would be:

1. set them all to DEBUG loglevel
2. locate the logfiles and post them

For 1., perform the following steps before starting geo-rep:

- On master machine:

# gluster volume geo-replication glustervol1 172.16.102.244::glustervol1 config log-level DEBUG
# gluster volume geo-replication glustervol1 172.16.102.244::glustervol1 config gluster-log-level DEBUG

- On slave (172.16.102.244):

# gluster volume geo-replication :glustervol1 config log-level DEBUG
# gluster volume geo-replication :glustervol1 config gluster-log-level DEBUG

For 2.:

- locating master side logfiles:

  - On master:

  # gluster volume geo-replication glustervol1 172.16.102.244::glustervol1 config log-file
  # gluster volume geo-replication glustervol1 172.16.102.244::glustervol1 config gluster-log-file

- locating slave side logfiles (ie. they should be looked for on 172.16.102.244):

  - On master:

  # gluster volume geo-replication glustervol1 172.16.102.244::glustervol1 config session-owner

  - On slave (172.16.102.244):

  # gluster volume geo-replication :glustervol1 config log-file
  # gluster volume geo-replication :glustervol1 config gluster-log-file

  The outputs will include the parameter ${session-owner}. Substitute with that the value you got above on master side to get the actual logfile paths.

All four log file should exist if the given geo-rep session can start up successfully. If some is missing, that's a sign of some invocation problem, ie. a valuable info as well.

Comment 2 kaushik 2011-10-20 02:56:54 UTC
EOFError means RPC connection b/w Master & Slave has failed, We are not seeing the issue in the master if the setup is as expected. We have listed out the reasons for EOFError in the documentation http://gluster.com/community/documentation/index.php/Gluster_3.2:_Troubleshooting_Geo-replication

Reopen the bug if the issue is found again satisfying all the prerequisites.


Note You need to log in before you can comment on or make changes to this bug.