Hide Forgot
I have 3.2 on all my virtual appliances, SSH keys set up from all my storage nodes to my virtual appliance that is supposed to be the "remote" system, and I have tried multiple commands. The logs are full of python errors which don't tell me anyhting: OSError: [Errno 22] Invalid argument [2011-05-05 15:55:38.177349] I [monitor(monitor):19:set_state] Monitor: new state: faulty [2011-05-05 15:55:48.180674] I [monitor(monitor):42:monitor] Monitor: ------------------------------ ------------------------------ [2011-05-05 15:55:48.180843] I [monitor(monitor):43:monitor] Monitor: starting gsyncd worker [2011-05-05 15:55:48.222249] I [gsyncd:287:main_i] <top>: syncing: gluster://localhost:mirror -> ssh ://172.17.30.158::mirror-dr [2011-05-05 15:55:57.911958] E [syncdutils:131:exception] <top>: FAIL: Traceback (most recent call last): File "/opt/glusterfs/3.2.0/local/libexec//glusterfs/python/syncdaemon/gsyncd.py", line 102, in mai n main_i() File "/opt/glusterfs/3.2.0/local/libexec//glusterfs/python/syncdaemon/gsyncd.py", line 295, in mai n_i local.service_loop(*[r for r in [remote] if r]) File "/opt/glusterfs/3.2.0/local/libexec/glusterfs/python/syncdaemon/resource.py", line 401, in se rvice_loop GMaster(self, args[0]).crawl_loop() File "/opt/glusterfs/3.2.0/local/libexec/glusterfs/python/syncdaemon/master.py", line 116, in craw l_loop self.crawl() File "/opt/glusterfs/3.2.0/local/libexec/glusterfs/python/syncdaemon/master.py", line 176, in craw l volinfo_sys = self.get_sys_volinfo() File "/opt/glusterfs/3.2.0/local/libexec/glusterfs/python/syncdaemon/master.py", line 22, in get_s ys_volinfo fgn_vis, nat_vi = self.master.server.foreign_volume_infos(), \ File "/opt/glusterfs/3.2.0/local/libexec/glusterfs/python/syncdaemon/resource.py", line 349, in na tive_volume_info return cls._attr_unpack_dict('.'.join([cls.GX_NSPACE, 'volume-mark'])) File "/opt/glusterfs/3.2.0/local/libexec/glusterfs/python/syncdaemon/resource.py", line 313, in _a ttr_unpack_dict buf = Xattr.lgetxattr('.', xattr, struct.calcsize(fmt_string)) File "/opt/glusterfs/3.2.0/local/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 34, in lg etxattr return cls._query_xattr( path, siz, 'lgetxattr', attr) File "/opt/glusterfs/3.2.0/local/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 26, in _q uery_xattr cls.raise_oserr() File "/opt/glusterfs/3.2.0/local/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 16, in ra ise_oserr raise OSError(errn, os.strerror(errn)) OSError: [Errno 22] Invalid argument
(In reply to comment #0) > I have 3.2 on all my virtual appliances, SSH keys set up from all my storage > nodes to my virtual appliance that is supposed to be the "remote" system, and I > have tried multiple commands. The logs are full of python errors which don't > tell me anyhting: What we see here is that gsyncd (the python program which does the synchronization) is crashing due to some irregular behavior of extended attributes in glusterfs. Most likely, the error is in glusterfs. Can you describe your setup? Also, can you set the log level of the glusterfs instance used by gsyncd to debug and send us the resultant logs? This can be done as follows: # gluster volume geo-replication mirror ssh://172.17.30.158::mirror-dr config gluster-log-level DEBUG and to locate the log file: # gluster volume geo-replication mirror ssh://172.17.30.158::mirror-dr config gluster-log-file
All of my 5 systems are Gluster Virtual Storage Appliances running on XenServer. I have not installed any third party packages. I have 4 servers set up in a mirror configuration and 1 server that is also a Gluster VSA that I'm using for the remote: Volume Name: mirror Type: Distributed-Replicate Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: 172.17.30.109:/export/hdb/mirror Brick2: 172.17.30.112:/export/hdb/mirror Brick3: 172.17.30.135:/export/hdb/mirror Brick4: 172.17.30.136:/export/hdb/mirror Options Reconfigured: geo-replication.indexing: on The remote system is 172.17.30.158 I have not changed the root password on any of these systems, so you can login with a password of "syst3m" if you need to troubleshoot. I am attaching the log file for you.
Created attachment 480 Attached is the log file.
This should be considered a P1 blocker. This will affect anybody who pays us for software and wants to use this feature...
a
(In reply to comment #4) > This should be considered a P1 blocker. This will affect anybody who pays us > for software and wants to use this feature... The setup is working as of now, we had to start ntp daemons across all the cluster, since there was a time lag between the servers in the gluster cluster and then restart the daemon's. We would as well like to look at the AMI setup where geo-replication is not working
Any AMI setup will do. Nothing special was done to cause the issue... 1. Boot up some AMIs and update them to 3.2. 2. Configure geo-replication Can we add some error messaging somewhere that tells people that the time is out of sync and that's why things aren't working?
(In reply to comment #7) > Any AMI setup will do. Nothing special was done to cause the issue... > > 1. Boot up some AMIs and update them to 3.2. > 2. Configure geo-replication > > Can we add some error messaging somewhere that tells people that the time is > out of sync and that's why things aren't working? Can you confirm the status w.r.t AMI, because we cannot reproduce the issue you said. The issue could be the one observed in http://bugs.gluster.com/show_bug.cgi?id=2901#c4.
We are not able to reproduce the issue, and also, the later versions are running fine on AMI for us.