1293595 – [geo-rep]: ChangelogException: [Errno 22] Invalid argument observed upon rebooting the ACTIVE master node

Bug 1293595 - [geo-rep]: ChangelogException: [Errno 22] Invalid argument observed upon rebooting the ACTIVE master node

Summary: [geo-rep]: ChangelogException: [Errno 22] Invalid argument observed upon rebo...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	geo-replication
Sub Component:
Version:	3.7.6
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Kotresh HR
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1291988 1292463
Blocks:	glusterfs-3.7.7
TreeView+	depends on / blocked

Reported:	2015-12-22 10:21 UTC by Kotresh HR
Modified:	2016-04-19 07:51 UTC (History)
CC List:	8 users (show)
Fixed In Version:	glusterfs-3.7.7
Clone Of:	1292463
Environment:
Last Closed:	2016-03-22 08:15:21 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Kotresh HR 2015-12-22 10:21:18 UTC

+++ This bug was initially created as a clone of Bug #1292463 +++

+++ This bug was initially created as a clone of Bug #1291988 +++

Description of problem:
=======================

In a geo-rep setup if a ACTIVE master is rebooted observed ChangelogException as:

[2015-12-15 12:30:39.725385] I [resource(/rhgs/brick1/master):1433:service_loop] GLUSTER: Register time: 1450162839
[2015-12-15 12:30:39.741044] I [master(/rhgs/brick1/master):549:crawlwrap] _GMaster: primary master with volume id e560607b-131b-4e35-b6b7-4e9b1e369224 ...
[2015-12-15 12:30:39.744363] I [master(/rhgs/brick1/master):558:crawlwrap] _GMaster: crawl interval: 1 seconds
[2015-12-15 12:30:39.748029] I [master(/rhgs/brick1/master):1173:crawl] _GMaster: starting history crawl... turns: 1, stime: (1450086953, 0), etime: 1450162839
[2015-12-15 12:30:39.748600] E [repce(agent):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 54, in history
    num_parallel)
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 100, in cl_history_changelog
    cls.raise_changelog_err()
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 27, in raise_changelog_err
    raise ChangelogException(errn, os.strerror(errn))
ChangelogException: [Errno 2] No such file or directory
[2015-12-15 12:30:39.749222] E [repce(/rhgs/brick1/master):207:__call__] RepceClient: call 13048:139759777986368:1450162839.75 (history) failed on peer with ChangelogException
[2015-12-15 12:30:39.749366] E [resource(/rhgs/brick1/master):1448:service_loop] GLUSTER: Changelog History Crawl failed, [Errno 2] No such file or directory
[2015-12-15 12:30:39.749612] I [syncdutils(/rhgs/brick1/master):220:finalize] <top>: exiting.
[2015-12-15 12:30:39.753611] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF.
[2015-12-15 12:30:39.753813] I [syncdutils(agent):220:finalize] <top>: exiting.
[2015-12-15 12:30:40.705105] I [monitor(monitor):229:monitor] Monitor: ------------------------------------------------------------
[2015-12-15 12:30:40.705337] I [monitor(monitor):230:monitor] Monitor: starting gsyncd worker
[2015-12-15 12:30:40.717370] I [monitor(monitor):290:monitor] Monitor: worker(/rhgs/brick1/master) died in startup phase
[2015-12-15 12:30:40.780814] I [gsyncd(/rhgs/brick2/master):652:main_i] <top>: syncing: gluster://localhost:Master_vol -> ssh://root.35.196:gluster://localhost:Slave_vol
[2015-12-15 12:30:40.781518] I [changelogagent(agent):75:__init__] ChangelogAgent: Agent listining...
[2015-12-15 12:30:43.313604] I [master(/rhgs/brick2/master):83:gmaster_builder] <top>: setting up xsync change detection mode
[2015-12-15 12:30:43.314054] I [master(/rhgs/brick2/master):404:__init__] _GMaster: using 'rsync' as the sync engine
[2015-12-15 12:30:43.314963] I [master(/rhgs/brick2/master):83:gmaster_builder] <top>: setting up changelog change detection mode
[2015-12-15 12:30:43.315217] I [master(/rhgs/brick2/master):404:__init__] _GMaster: using 'rsync' as the sync engine
[2015-12-15 12:30:43.316255] I [master(/rhgs/brick2/master):83:gmaster_builder] <top>: setting up changeloghistory change detection mode
[2015-12-15 12:30:43.316457] I [master(/rhgs/brick2/master):404:__init__] _GMaster: using 'rsync' as the sync engine
[2015-12-15 12:30:45.333230] I [master(/rhgs/brick2/master):1259:register] _GMaster: xsync temp directory: /var/lib/misc/glusterfsd/Master_vol/ssh%3A%2F%2Froot%4010.70.35.196%3Agluster%3A%2F%2F127.0.0.1%3ASla
ve_vol/78ce16e9d2384148a6abf384c52b8ab1/xsync
[2015-12-15 12:30:45.333684] I [resource(/rhgs/brick2/master):1433:service_loop] GLUSTER: Register time: 1450162845
[2015-12-15 12:30:45.346733] I [master(/rhgs/brick2/master):549:crawlwrap] _GMaster: primary master with volume id e560607b-131b-4e35-b6b7-4e9b1e369224 ...
[2015-12-15 12:30:45.349627] I [master(/rhgs/brick2/master):558:crawlwrap] _GMaster: crawl interval: 1 seconds
[2015-12-15 12:30:45.353549] I [master(/rhgs/brick2/master):1173:crawl] _GMaster: starting history crawl... turns: 1, stime: (1450086953, 0), etime: 1450162845
[2015-12-15 12:30:46.354907] E [repce(agent):117:worker] <top>: call failed: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 57, in history_scan
    return Changes.cl_history_scan()
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 89, in cl_history_scan
    cls.raise_changelog_err()
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 27, in raise_changelog_err
    raise ChangelogException(errn, os.strerror(errn))
ChangelogException: [Errno 22] Invalid argument
[2015-12-15 12:30:46.355663] E [repce(/rhgs/brick2/master):207:__call__] RepceClient: call 13117:140461545350976:1450162845.35 (history_scan) failed on peer with ChangelogException
[2015-12-15 12:30:46.355832] E [resource(/rhgs/brick2/master):1448:service_loop] GLUSTER: Changelog History Crawl failed, [Errno 22] Invalid argument
[2015-12-15 12:30:46.356125] I [syncdutils(/rhgs/brick2/master):220:finalize] <top>: exiting.
[2015-12-15 12:30:46.359954] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF.
[2015-12-15 12:30:46.360156] I [syncdutils(agent):220:finalize] <top>: exiting.


At the same time the errors logged in the changes.log
=====================================================


[2015-12-15 07:02:21.780179] E [mem-pool.c:417:mem_get0] (-->/lib64/libglusterfs.so.0(+0x876c2) [0x7f24942e06c2] -->/lib64/libglusterfs.so.0(_gf_msg+0x79f) [0x7f24942805df] -->/lib64/libglusterfs.so.0(mem_get
0+0x81) [0x7f24942b2811] ) 0-mem-pool: invalid argument [Invalid argument]
[2015-12-15 07:02:21.780396] E [mem-pool.c:417:mem_get0] (-->/lib64/libglusterfs.so.0(+0x876c2) [0x7f24942e06c2] -->/lib64/libglusterfs.so.0(_gf_msg+0x79f) [0x7f24942805df] -->/lib64/libglusterfs.so.0(mem_get
0+0x81) [0x7f24942b2811] ) 0-mem-pool: invalid argument [Invalid argument]
[2015-12-15 07:02:21.780516] E [mem-pool.c:417:mem_get0] (-->/lib64/libglusterfs.so.0(+0x876c2) [0x7f24942e06c2] -->/lib64/libglusterfs.so.0(_gf_msg+0x79f) [0x7f24942805df] -->/lib64/libglusterfs.so.0(mem_get
0+0x81) [0x7f24942b2811] ) 0-mem-pool: invalid argument [Invalid argument]
[2015-12-15 07:02:21.781449] E [mem-pool.c:417:mem_get0] (-->/lib64/libglusterfs.so.0(recursive_rmdir+0x192) [0x7f249429dc12] -->/lib64/libglusterfs.so.0(_gf_msg+0x79f) [0x7f24942805df] -->/lib64/libglusterfs
.so.0(mem_get0+0x81) [0x7f24942b2811] ) 0-mem-pool: invalid argument [Invalid argument]
[2015-12-15 07:02:21.782716] E [mem-pool.c:417:mem_get0] (-->/lib64/libglusterfs.so.0(recursive_rmdir+0x192) [0x7f249429dc12] -->/lib64/libglusterfs.so.0(_gf_msg+0x79f) [0x7f24942805df] -->/lib64/libglusterfs
.so.0(mem_get0+0x81) [0x7f24942b2811] ) 0-mem-pool: invalid argument [Invalid argument]


ve_vol/78ce16e9d2384148a6abf384c52b8ab1/.history/.processing/CHANGELOG.1450086954 to processed directory
[2015-12-15 07:29:57.229711] I [gf-changelog.c:542:gf_changelog_register_generic] 0-gfchangelog: Registering brick: /rhgs/brick2/master [notify filter: 1]
[2015-12-15 07:29:57.230785] E [mem-pool.c:417:mem_get0] (-->/lib64/libglusterfs.so.0(+0x876c2) [0x7f49899686c2] -->/lib64/libglusterfs.so.0(_gf_msg+0x79f) [0x7f49899085df] -->/lib64/libglusterfs.so.0(mem_get0+0x81) [0x7f498993a811] ) 0-mem-pool: invalid argument [Invalid argument]
[2015-12-15 07:29:57.230955] E [mem-pool.c:417:mem_get0] (-->/lib64/libglusterfs.so.0(+0x876c2) [0x7f49899686c2] -->/lib64/libglusterfs.so.0(_gf_msg+0x79f) [0x7f49899085df] -->/lib64/libglusterfs.so.0(mem_get0+0x81) [0x7f498993a811] ) 0-mem-pool: invalid argument [Invalid argument]
[2015-12-15 07:29:57.232027] E [mem-pool.c:417:mem_get0] (-->/lib64/libglusterfs.so.0(recursive_rmdir+0x192) [0x7f4989925c12] -->/lib64/libglusterfs.so.0(_gf_msg+0x79f) [0x7f49899085df] -->/lib64/libglusterfs.so.0(mem_get0+0x81) [0x7f498993a811] ) 0-mem-pool: invalid argument [Invalid argument]


Version-Release number of selected component (if applicable):
=============================================================



How reproducible:
=================

Observed Traceback only on one setup and only once while able to reproduce invalid argument error with every reboot


Steps to Reproduce:
===================
1. Create 2 node master and slave clusters
2. Create and start 2x2 master volume
3. Create and start 2x2 slave volume
4. Setup geo-rep between master and slave volume
5. Reboot master ACTIVE node

Actual results:
===============

Once the node is up the Worker goes faulty for sometime with traceback and errors [Invalid argument]


Expected results:
=================

Once the node is down, the bricks on peer node should be ACTIVE. Once the node comes back, the bricks should be PASSIVE without going to faulty or reporting error.

--- Additional comment from Vijay Bellur on 2015-12-17 09:26:16 EST ---

REVIEW: http://review.gluster.org/12997 (libgfchangelog: Allocate logbuf_pool in master xlator) posted (#1) for review on master by Kotresh HR (khiremat)

--- Additional comment from Vijay Bellur on 2015-12-22 04:33:17 EST ---

COMMIT: http://review.gluster.org/12997 committed in master by Venky Shankar (vshankar) 
------
commit d73f5c9eae0f5c8796598fa1930844c15c2c37d4
Author: Kotresh HR <khiremat>
Date:   Tue Dec 15 19:55:05 2015 +0530

    libgfchangelog: Allocate logbuf_pool in master xlator
    
    The master xlator needs to allocate 'logbuf_pool'
    else 'gf_msg' fails with EINVAL.
    
    Change-Id: I6b2d3450250de7e77126d12b75b0dbc4db414bfb
    BUG: 1292463
    Signed-off-by: Kotresh HR <khiremat>
    Reviewed-on: http://review.gluster.org/12997
    Tested-by: NetBSD Build System <jenkins.org>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Venky Shankar <vshankar>

Comment 1 Vijay Bellur 2015-12-22 10:26:13 UTC

REVIEW: http://review.gluster.org/13064 (libgfchangelog: Allocate logbuf_pool in master xlator) posted (#1) for review on release-3.7 by Kotresh HR (khiremat)

Comment 2 Vijay Bellur 2015-12-30 12:24:59 UTC

COMMIT: http://review.gluster.org/13064 committed in release-3.7 by Venky Shankar (vshankar) 
------
commit a37ea8462cfd87f30bc38c509f0a079aae14e797
Author: Kotresh HR <khiremat>
Date:   Tue Dec 15 19:55:05 2015 +0530

    libgfchangelog: Allocate logbuf_pool in master xlator
    
    The master xlator needs to allocate 'logbuf_pool'
    else 'gf_msg' fails with EINVAL.
    
    BUG: 1293595
    Change-Id: I51a1895b5ff9c8eaf6cc15a9cacc415fa8cd7bdd
    Reviewed-on: http://review.gluster.org/12997
    Tested-by: NetBSD Build System <jenkins.org>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Venky Shankar <vshankar>
    Signed-off-by: Kotresh HR <khiremat>
    Reviewed-on: http://review.gluster.org/13064
    Reviewed-by: Aravinda VK <avishwan>

Comment 3 Kaushal 2016-04-19 07:51:49 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.7, please open a new bug report.

glusterfs-3.7.7 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/gluster-users/2016-February/025292.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.