Bug 1022582

Summary:

dist-geo-rep: Worker process crashing because of "Invalid Argument" error in slave

Product:

[Red Hat Storage] Red Hat Gluster Storage

Reporter:

M S Vishwanath Bhat <vbhat>

Component:

geo-replication

Assignee:

Ajeet Jha <ajha>

Status:

CLOSED ERRATA

QA Contact:

M S Vishwanath Bhat <vbhat>

Severity:

urgent

Docs Contact:

Priority:

high

Version:

2.1

CC:

aavati, ajha, amarts, csaba, grajaiya, mzywusko, nsathyan, vagarwal, vshankar

Target Milestone:

---

Keywords:

ZStream

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

glusterfs-3.4.0.38rhs-1

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2013-11-27 15:43:46 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Logs from master node	none
Log from slave node	none

Description M S Vishwanath Bhat 2013-10-23 14:58:49 UTC

Created attachment 815447 [details]
Logs from master node

Description of problem:
When I start geo-rep process and copy some files on the master, sessions in two of the nodes went into the faulty state. Worker in both of those machines are crashing because of a Errno 22 in slave.

Version-Release number of selected component (if applicable):
glusterfs-3.4.0.36rhs-1.el6rhs.x86_64


How reproducible:
Hit one time out of as many tries

Steps to Reproduce:
1. Create and start a geo-rep session between 2*2 dist-rep master and slave volumes.
2. Now cp -r /etc/ <master_mount_point>
3. Run geo-rep status of geo-rep status detail

Actual results:
# gluster v geo master hornet::slave status detail
 
                                        MASTER: master  SLAVE: hornet::slave
 
NODE                         HEALTH    UPTIME      FILES SYNCD    FILES PENDING    BYTES PENDING    DELETES PENDING   
--------------------------------------------------------------------------------------------------------------------
spitfire.blr.redhat.com      faulty    N/A         N/A            N/A              N/A              N/A               
typhoon.blr.redhat.com       Stable    02:20:19    0              0                0Bytes           0                 
mustang.blr.redhat.com       Stable    02:20:19    0              0                0Bytes           0                 
harrier.blr.redhat.com       faulty    N/A         N/A            N/A              N/A              N/A               



Expected results:
Status should not go into the 'faulty' state.

Additional info:


Logs in master node

[2013-10-23 20:20:43.561261] I [master(/rhs/bricks/brick2):345:crawlwrap] _GMaster: crawl interval: 3 seconds
[2013-10-23 20:20:51.155394] E [repce(/rhs/bricks/brick2):188:__call__] RepceClient: call 31994:139915356534528:1382539850.02 (meta_ops) failed on peer with OSError
[2013-10-23 20:20:51.156507] E [syncdutils(/rhs/bricks/brick2):207:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 150, in main
    main_i()
  File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 530, in main_i
    local.service_loop(*[r for r in [remote] if r])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1074, in service_loop
    g2.crawlwrap()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 369, in crawlwrap
    self.crawl()
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 799, in crawl
    self.process(changes)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 760, in process
    if self.process_change(change, done, retry):
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 740, in process_change
    self.slave.server.meta_ops(meta_entries)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 204, in __call__
    return self.ins(self.meth, *a)
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 189, in __call__
    raise res
OSError: [Errno 22] Invalid argument: '.gfid/1da8224d-aa34-433b-8abf-b07a13e5cfd2'
[2013-10-23 20:20:51.159435] I [syncdutils(/rhs/bricks/brick2):159:finalize] <top>: exiting.
[2013-10-23 20:21:01.268341] I [monitor(monitor):129:monitor] Monitor: ------------------------------------------------------------
[2013-10-23 20:21:01.269103] I [monitor(monitor):130:monitor] Monitor: starting gsyncd worker
[2013-10-23 20:21:01.531034] I [gsyncd(/rhs/bricks/brick2):520:main_i] <top>: syncing: gluster://localhost:master -> ssh://root@hornet:gluster://localhost:slave
[2013-10-23 20:21:04.483725] I [master(/rhs/bricks/brick2):57:gmaster_builder] <top>: setting up xsync change detection mode
[2013-10-23 20:21:04.487170] I [master(/rhs/bricks/brick2):57:gmaster_builder] <top>: setting up changelog change detection mode
[2013-10-23 20:21:04.490206] I [master(/rhs/bricks/brick2):835:register] _GMaster: xsync temp directory: /var/run/gluster/master/ssh%3A%2F%2Froot%4010.70.43.194%3Agluster%3A%2F%2F127.0.0.1%3Aslave/68fa5cc90f61530aea097cdc78c2b376/xsync
[2013-10-23 20:21:04.659682] I [master(/rhs/bricks/brick2):335:crawlwrap] _GMaster: primary master with volume id c2ba3dc1-58f2-4dad-93af-d08c249923d2 ...



Logs from slave node


[2013-10-23 20:21:33.999643] I [repce(slave):78:service_loop] RepceServer: terminating on reaching EOF.
[2013-10-23 20:21:34.35] I [syncdutils(slave):159:finalize] <top>: exiting.
[2013-10-23 20:21:45.735359] I [gsyncd(slave):520:main_i] <top>: syncing: gluster://localhost:slave
[2013-10-23 20:21:46.827519] I [resource(slave):642:service_loop] GLUSTER: slave listening
[2013-10-23 20:21:50.195013] E [repce(slave):103:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 99, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 545, in meta_ops
    errno_wrap(os.chmod, [go, mode], [ENOENT])
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 383, in errno_wrap
    return call(*arg)
OSError: [Errno 22] Invalid argument: '.gfid/0e3d546c-ca85-4913-8e9f-e3ed822fcf46'

Comment 1 M S Vishwanath Bhat 2013-10-23 14:59:24 UTC

Created attachment 815448 [details]
Log from slave node

Comment 3 M S Vishwanath Bhat 2013-10-24 09:27:55 UTC

It's a regression. I just tried with 35rhs build. But it's working there. Only 36rhs has this issue.

Comment 6 Amar Tumballi 2013-10-25 11:50:20 UTC

https://code.engineering.redhat.com/gerrit/#/c/14551/

Comment 7 M S Vishwanath Bhat 2013-11-02 11:24:35 UTC

Issue is not being reproduced with the glusterfs-3.4.0.38rhs-1.el6rhs.x86_64 build. I followed the same steps I mentioned earlier in the bug description and issue is not hit. Moving to verified.

Comment 9 errata-xmlrpc 2013-11-27 15:43:46 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1769.html