Bug 764615 (GLUSTER-2883) - VSA 3.2 geo-replication always comes back faulty
Summary: VSA 3.2 geo-replication always comes back faulty
Keywords:
Status: CLOSED WORKSFORME
Alias: GLUSTER-2883
Product: GlusterFS
Classification: Community
Component: geo-replication
Version: 3.2.0
Hardware: x86_64
OS: Linux
urgent
high
Target Milestone: ---
Assignee: kaushik
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-05-05 16:01 UTC by Jacob Shucart
Modified: 2011-09-28 08:55 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)
Log file for geo-replication (14.76 KB, application/octet-stream)
2011-05-06 12:06 UTC, Jacob Shucart
no flags Details

Description Jacob Shucart 2011-05-05 16:01:01 UTC
I have 3.2 on all my virtual appliances, SSH keys set up from all my storage nodes to my virtual appliance that is supposed to be the "remote" system, and I have tried multiple commands.  The logs are full of python errors which don't tell me anyhting:

OSError: [Errno 22] Invalid argument
[2011-05-05 15:55:38.177349] I [monitor(monitor):19:set_state] Monitor: new state: faulty
[2011-05-05 15:55:48.180674] I [monitor(monitor):42:monitor] Monitor: ------------------------------                                                   ------------------------------
[2011-05-05 15:55:48.180843] I [monitor(monitor):43:monitor] Monitor: starting gsyncd worker
[2011-05-05 15:55:48.222249] I [gsyncd:287:main_i] <top>: syncing: gluster://localhost:mirror -> ssh                                                   ://172.17.30.158::mirror-dr
[2011-05-05 15:55:57.911958] E [syncdutils:131:exception] <top>: FAIL:
Traceback (most recent call last):
  File "/opt/glusterfs/3.2.0/local/libexec//glusterfs/python/syncdaemon/gsyncd.py", line 102, in mai                                                   n
    main_i()
  File "/opt/glusterfs/3.2.0/local/libexec//glusterfs/python/syncdaemon/gsyncd.py", line 295, in mai                                                   n_i
    local.service_loop(*[r for r in [remote] if r])
  File "/opt/glusterfs/3.2.0/local/libexec/glusterfs/python/syncdaemon/resource.py", line 401, in se                                                   rvice_loop
    GMaster(self, args[0]).crawl_loop()
  File "/opt/glusterfs/3.2.0/local/libexec/glusterfs/python/syncdaemon/master.py", line 116, in craw                                                   l_loop
    self.crawl()
  File "/opt/glusterfs/3.2.0/local/libexec/glusterfs/python/syncdaemon/master.py", line 176, in craw                                                   l
    volinfo_sys = self.get_sys_volinfo()
  File "/opt/glusterfs/3.2.0/local/libexec/glusterfs/python/syncdaemon/master.py", line 22, in get_s                                                   ys_volinfo
    fgn_vis, nat_vi = self.master.server.foreign_volume_infos(), \
  File "/opt/glusterfs/3.2.0/local/libexec/glusterfs/python/syncdaemon/resource.py", line 349, in na                                                   tive_volume_info
    return cls._attr_unpack_dict('.'.join([cls.GX_NSPACE, 'volume-mark']))
  File "/opt/glusterfs/3.2.0/local/libexec/glusterfs/python/syncdaemon/resource.py", line 313, in _a                                                   ttr_unpack_dict
    buf = Xattr.lgetxattr('.', xattr, struct.calcsize(fmt_string))
  File "/opt/glusterfs/3.2.0/local/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 34, in lg                                                   etxattr
    return cls._query_xattr( path, siz, 'lgetxattr', attr)
  File "/opt/glusterfs/3.2.0/local/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 26, in _q                                                   uery_xattr
    cls.raise_oserr()
  File "/opt/glusterfs/3.2.0/local/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 16, in ra                                                   ise_oserr
    raise OSError(errn, os.strerror(errn))
OSError: [Errno 22] Invalid argument

Comment 1 Csaba Henk 2011-05-06 03:41:28 UTC
(In reply to comment #0)
> I have 3.2 on all my virtual appliances, SSH keys set up from all my storage
> nodes to my virtual appliance that is supposed to be the "remote" system, and I
> have tried multiple commands.  The logs are full of python errors which don't
> tell me anyhting:

What we see here is that gsyncd (the python program which does the synchronization) is crashing due to some irregular behavior of extended attributes in glusterfs. Most likely, the error is in glusterfs. Can you describe your setup? Also, can you set the log level of the glusterfs instance used by gsyncd to debug and send us the resultant logs? This can be done as follows:

# gluster volume geo-replication mirror ssh://172.17.30.158::mirror-dr config gluster-log-level DEBUG

and to locate the log file:

# gluster volume geo-replication mirror ssh://172.17.30.158::mirror-dr config gluster-log-file

Comment 2 Jacob Shucart 2011-05-06 12:06:25 UTC
All of my 5 systems are Gluster Virtual Storage Appliances running on XenServer.  I have not installed any third party packages.  I have 4 servers set up in a mirror configuration and 1 server that is also a Gluster VSA that I'm using for the remote:

Volume Name: mirror
Type: Distributed-Replicate
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 172.17.30.109:/export/hdb/mirror
Brick2: 172.17.30.112:/export/hdb/mirror
Brick3: 172.17.30.135:/export/hdb/mirror
Brick4: 172.17.30.136:/export/hdb/mirror
Options Reconfigured:
geo-replication.indexing: on


The remote system is 172.17.30.158

I have not changed the root password on any of these systems, so you can login with a password of "syst3m" if you need to troubleshoot.  I am attaching the log file for you.

Comment 3 Jacob Shucart 2011-05-06 12:06:55 UTC
Created attachment 480


Attached is the log file.

Comment 4 Jacob Shucart 2011-05-06 18:14:07 UTC
This should be considered a P1 blocker.  This will affect anybody who pays us for software and wants to use this feature...

Comment 5 Jacob Shucart 2011-05-06 18:14:32 UTC
a

Comment 6 kaushik 2011-05-09 10:03:01 UTC
(In reply to comment #4)
> This should be considered a P1 blocker.  This will affect anybody who pays us
> for software and wants to use this feature...

The setup is working as of now, we had to start ntp daemons across all the cluster, since there was a time lag between the servers in the gluster cluster and then restart the daemon's.

We would as well like to look at the AMI setup where geo-replication is not working

Comment 7 Jacob Shucart 2011-05-09 12:47:38 UTC
Any AMI setup will do.  Nothing special was done to cause the issue...

1. Boot up some AMIs and update them to 3.2.
2. Configure geo-replication

Can we add some error messaging somewhere that tells people that the time is out of sync and that's why things aren't working?

Comment 8 kaushik 2011-06-03 06:20:11 UTC
(In reply to comment #7)
> Any AMI setup will do.  Nothing special was done to cause the issue...
> 
> 1. Boot up some AMIs and update them to 3.2.
> 2. Configure geo-replication
> 
> Can we add some error messaging somewhere that tells people that the time is
> out of sync and that's why things aren't working?


Can you confirm the status w.r.t AMI, because we cannot reproduce the issue you said.
The issue could be the one observed in http://bugs.gluster.com/show_bug.cgi?id=2901#c4.

Comment 9 Amar Tumballi 2011-09-28 05:55:58 UTC
We are not able to reproduce the issue, and also, the later versions are running fine on AMI for us.


Note You need to log in before you can comment on or make changes to this bug.