Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 764615 (GLUSTER-2883)

Summary:

VSA 3.2 geo-replication always comes back faulty

Product:

[Community] GlusterFS

Reporter:

Jacob Shucart <jacob>

Component:

geo-replication

Assignee:

kaushik <kbudiger>

Status:

CLOSED WORKSFORME

QA Contact:

Severity:

high

Docs Contact:

Priority:

urgent

Version:

3.2.0

CC:

amarts, bala, craig, gluster-bugs, kbudiger, platform, vijay

Target Milestone:

---

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Log file for geo-replication	none

Description Jacob Shucart 2011-05-05 16:01:01 UTC

I have 3.2 on all my virtual appliances, SSH keys set up from all my storage nodes to my virtual appliance that is supposed to be the "remote" system, and I have tried multiple commands.  The logs are full of python errors which don't tell me anyhting:

OSError: [Errno 22] Invalid argument
[2011-05-05 15:55:38.177349] I [monitor(monitor):19:set_state] Monitor: new state: faulty
[2011-05-05 15:55:48.180674] I [monitor(monitor):42:monitor] Monitor: ------------------------------                                                   ------------------------------
[2011-05-05 15:55:48.180843] I [monitor(monitor):43:monitor] Monitor: starting gsyncd worker
[2011-05-05 15:55:48.222249] I [gsyncd:287:main_i] <top>: syncing: gluster://localhost:mirror -> ssh                                                   ://172.17.30.158::mirror-dr
[2011-05-05 15:55:57.911958] E [syncdutils:131:exception] <top>: FAIL:
Traceback (most recent call last):
  File "/opt/glusterfs/3.2.0/local/libexec//glusterfs/python/syncdaemon/gsyncd.py", line 102, in mai                                                   n
    main_i()
  File "/opt/glusterfs/3.2.0/local/libexec//glusterfs/python/syncdaemon/gsyncd.py", line 295, in mai                                                   n_i
    local.service_loop(*[r for r in [remote] if r])
  File "/opt/glusterfs/3.2.0/local/libexec/glusterfs/python/syncdaemon/resource.py", line 401, in se                                                   rvice_loop
    GMaster(self, args[0]).crawl_loop()
  File "/opt/glusterfs/3.2.0/local/libexec/glusterfs/python/syncdaemon/master.py", line 116, in craw                                                   l_loop
    self.crawl()
  File "/opt/glusterfs/3.2.0/local/libexec/glusterfs/python/syncdaemon/master.py", line 176, in craw                                                   l
    volinfo_sys = self.get_sys_volinfo()
  File "/opt/glusterfs/3.2.0/local/libexec/glusterfs/python/syncdaemon/master.py", line 22, in get_s                                                   ys_volinfo
    fgn_vis, nat_vi = self.master.server.foreign_volume_infos(), \
  File "/opt/glusterfs/3.2.0/local/libexec/glusterfs/python/syncdaemon/resource.py", line 349, in na                                                   tive_volume_info
    return cls._attr_unpack_dict('.'.join([cls.GX_NSPACE, 'volume-mark']))
  File "/opt/glusterfs/3.2.0/local/libexec/glusterfs/python/syncdaemon/resource.py", line 313, in _a                                                   ttr_unpack_dict
    buf = Xattr.lgetxattr('.', xattr, struct.calcsize(fmt_string))
  File "/opt/glusterfs/3.2.0/local/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 34, in lg                                                   etxattr
    return cls._query_xattr( path, siz, 'lgetxattr', attr)
  File "/opt/glusterfs/3.2.0/local/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 26, in _q                                                   uery_xattr
    cls.raise_oserr()
  File "/opt/glusterfs/3.2.0/local/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 16, in ra                                                   ise_oserr
    raise OSError(errn, os.strerror(errn))
OSError: [Errno 22] Invalid argument

Comment 1 Csaba Henk 2011-05-06 03:41:28 UTC

(In reply to comment #0)
> I have 3.2 on all my virtual appliances, SSH keys set up from all my storage
> nodes to my virtual appliance that is supposed to be the "remote" system, and I
> have tried multiple commands.  The logs are full of python errors which don't
> tell me anyhting:

What we see here is that gsyncd (the python program which does the synchronization) is crashing due to some irregular behavior of extended attributes in glusterfs. Most likely, the error is in glusterfs. Can you describe your setup? Also, can you set the log level of the glusterfs instance used by gsyncd to debug and send us the resultant logs? This can be done as follows:

# gluster volume geo-replication mirror ssh://172.17.30.158::mirror-dr config gluster-log-level DEBUG

and to locate the log file:

# gluster volume geo-replication mirror ssh://172.17.30.158::mirror-dr config gluster-log-file

Comment 2 Jacob Shucart 2011-05-06 12:06:25 UTC

All of my 5 systems are Gluster Virtual Storage Appliances running on XenServer.  I have not installed any third party packages.  I have 4 servers set up in a mirror configuration and 1 server that is also a Gluster VSA that I'm using for the remote:

Volume Name: mirror
Type: Distributed-Replicate
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 172.17.30.109:/export/hdb/mirror
Brick2: 172.17.30.112:/export/hdb/mirror
Brick3: 172.17.30.135:/export/hdb/mirror
Brick4: 172.17.30.136:/export/hdb/mirror
Options Reconfigured:
geo-replication.indexing: on


The remote system is 172.17.30.158

I have not changed the root password on any of these systems, so you can login with a password of "syst3m" if you need to troubleshoot.  I am attaching the log file for you.

Comment 3 Jacob Shucart 2011-05-06 12:06:55 UTC

Created attachment 480


Attached is the log file.

Comment 4 Jacob Shucart 2011-05-06 18:14:07 UTC

This should be considered a P1 blocker.  This will affect anybody who pays us for software and wants to use this feature...

Comment 5 Jacob Shucart 2011-05-06 18:14:32 UTC

Comment 6 kaushik 2011-05-09 10:03:01 UTC

(In reply to comment #4)
> This should be considered a P1 blocker.  This will affect anybody who pays us
> for software and wants to use this feature...

The setup is working as of now, we had to start ntp daemons across all the cluster, since there was a time lag between the servers in the gluster cluster and then restart the daemon's.

We would as well like to look at the AMI setup where geo-replication is not working

Comment 7 Jacob Shucart 2011-05-09 12:47:38 UTC

Any AMI setup will do.  Nothing special was done to cause the issue...

1. Boot up some AMIs and update them to 3.2.
2. Configure geo-replication

Can we add some error messaging somewhere that tells people that the time is out of sync and that's why things aren't working?

Comment 8 kaushik 2011-06-03 06:20:11 UTC

(In reply to comment #7)
> Any AMI setup will do.  Nothing special was done to cause the issue...
> 
> 1. Boot up some AMIs and update them to 3.2.
> 2. Configure geo-replication
> 
> Can we add some error messaging somewhere that tells people that the time is
> out of sync and that's why things aren't working?


Can you confirm the status w.r.t AMI, because we cannot reproduce the issue you said.
The issue could be the one observed in http://bugs.gluster.com/show_bug.cgi?id=2901#c4.

Comment 9 Amar Tumballi 2011-09-28 05:55:58 UTC

We are not able to reproduce the issue, and also, the later versions are running fine on AMI for us.