1393678 – Worker restarts on log-rsync-performance config update

Bug 1393678 - Worker restarts on log-rsync-performance config update

Summary: Worker restarts on log-rsync-performance config update

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	geo-replication
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Aravinda VK
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1402727 1402728 1425690
TreeView+	depends on / blocked

Reported:	2016-11-10 07:11 UTC by Aravinda VK
Modified:	2017-03-06 17:33 UTC (History)
CC List:	2 users (show)
Fixed In Version:	glusterfs-3.10.0
Clone Of:
Clones:	1402727 1402728 1425690 (view as bug list)
Environment:
Last Closed:	2017-03-06 17:33:37 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Aravinda VK 2016-11-10 07:11:17 UTC

Description of problem:
If log-rsync-performance config is set using following command, workers restarts and causes reprocessing Changelogs which are processed before the config change.

gluster volume geo-replication <MASTER> <SLAVEHOST>::<SLAVEVOL> config log-rsync-performance true

Comment 1 Worker Ant 2016-11-10 07:12:36 UTC

REVIEW: http://review.gluster.org/15816 (geo-rep: Do not restart workers when log-rsync-performance config change) posted (#1) for review on master by Aravinda VK (avishwan)

Comment 2 Worker Ant 2016-11-17 06:33:20 UTC

REVIEW: http://review.gluster.org/15816 (geo-rep: Do not restart workers when log-rsync-performance config change) posted (#2) for review on master by Aravinda VK (avishwan)

Comment 3 Worker Ant 2016-12-02 08:36:56 UTC

REVIEW: http://review.gluster.org/15816 (geo-rep: Do not restart workers when log-rsync-performance config change) posted (#3) for review on master by Aravinda VK (avishwan)

Comment 4 Worker Ant 2016-12-08 06:05:39 UTC

COMMIT: http://review.gluster.org/15816 committed in master by Aravinda VK (avishwan) 
------
commit a268e2865c21ec8d2b4fed26715e986cfcc66fad
Author: Aravinda VK <avishwan>
Date:   Thu Nov 10 12:35:30 2016 +0530

    geo-rep: Do not restart workers when log-rsync-performance config change
    
    Geo-rep restarts workers when any of the configurations changed. We
    don't need to restart workers if tunables like log-rsync-performance
    is modified.
    
    With this patch, Geo-rep workers will get new "log-rsync-performance"
    config automatically without restart.
    
    BUG: 1393678
    Change-Id: I40ec253892ea7e70c727fa5d3c540a11e891897b
    Signed-off-by: Aravinda VK <avishwan>
    Reviewed-on: http://review.gluster.org/15816
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: Kotresh HR <khiremat>

Comment 5 Worker Ant 2016-12-12 07:42:26 UTC

REVIEW: http://review.gluster.org/16102 (geo-rep: Fix log-rsync-performance config issue) posted (#1) for review on master by Aravinda VK (avishwan)

Comment 6 Worker Ant 2016-12-12 09:57:42 UTC

REVIEW: http://review.gluster.org/16102 (geo-rep: Fix log-rsync-performance config issue) posted (#2) for review on master by Aravinda VK (avishwan)

Comment 7 Worker Ant 2016-12-14 10:20:15 UTC

COMMIT: http://review.gluster.org/16102 committed in master by Aravinda VK (avishwan) 
------
commit ff2a58d784bc20ccafab8183d82787ceb8ac471b
Author: Aravinda VK <avishwan>
Date:   Mon Dec 12 13:06:15 2016 +0530

    geo-rep: Fix log-rsync-performance config issue
    
    If log-rsync-performance config is not set, gconf.get_realtime
    will return None, Added default value as False if config file
    doesn't have this option set.
    
    BUG: 1393678
    Change-Id: I89016ab480a16179db59913d635d8553beb7e14f
    Signed-off-by: Aravinda VK <avishwan>
    Reviewed-on: http://review.gluster.org/16102
    Smoke: Gluster Build System <jenkins.org>
    Tested-by: Kotresh HR <khiremat>
    Reviewed-by: Kotresh HR <khiremat>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>

Comment 8 nh2 2017-01-19 04:15:31 UTC

I believe this fix has been incorrectly backported to 3.9, or at least the Ubuntu PPA of 3.9.

Consider
https://launchpadlibrarian.net/302065598/glusterfs_3.8.7-ubuntu1~xenial1_3.8.8-ubuntu1~xenial1.diff.gz and
https://launchpadlibrarian.net/302850916/glusterfs_3.9.0-ubuntu1~xenial6_3.9.1-ubuntu1~xenial1.diff.gz

The former contains 

-        if gconf.log_rsync_performance:
+        log_rsync_performance = boolify(gconf.configinterface.get_realtime(
+            "log_rsync_performance", default_value=False))

but the latter doesn't have `default_value=False`:

-        if gconf.log_rsync_performance:
+        if boolify(gconf.configinterface.get_realtime(
+                "log_rsync_performance")):

So I'm running `gluster --version` `glusterfs 3.9.1` from that PPA (3.9.1-ubuntu1~xenial1 to be precise), and I get this error (attached so that people can Google it):

[2017-01-19 03:44:47.201340] I [monitor(monitor):273:monitor] Monitor: starting gsyncd worker(/gluster-brick/brick1/gv0). Slave node: ssh://root@mymachine:gluster://localhost:gv0-geo-sfo2
[2017-01-19 03:44:47.460453] I [changelogagent(/gluster-brick/brick1/gv0):73:__init__] ChangelogAgent: Agent listining...
[2017-01-19 03:44:54.361351] I [master(/gluster-brick/brick1/gv0):1323:register] _GMaster: Working dir: /var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40mymachine%3Agluster%3A%2F%2F127.0.0.1%3Agv0-geo-sfo2/e989bdc037f1478d9
b2cc6e6ae3d3d0d
[2017-01-19 03:44:54.361701] I [resource(/gluster-brick/brick1/gv0):1584:service_loop] GLUSTER: Register time: 1484797494
[2017-01-19 03:44:54.376507] I [gsyncdstatus(/gluster-brick/brick1/gv0):264:set_active] GeorepStatus: Worker Status: Active
[2017-01-19 03:44:54.377803] I [gsyncdstatus(/gluster-brick/brick1/gv0):237:set_worker_crawl_status] GeorepStatus: Crawl Status: History Crawl
[2017-01-19 03:44:54.378166] I [master(/gluster-brick/brick1/gv0):1239:crawl] _GMaster: starting history crawl... turns: 1, stime: None, etime: 1484797494, entry_stime: None
[2017-01-19 03:44:54.378293] I [resource(/gluster-brick/brick1/gv0):1599:service_loop] GLUSTER: No stime available, using xsync crawl
[2017-01-19 03:44:54.385798] I [master(/gluster-brick/brick1/gv0):1348:crawl] _GMaster: starting hybrid crawl..., stime: None
[2017-01-19 03:44:54.387316] I [gsyncdstatus(/gluster-brick/brick1/gv0):237:set_worker_crawl_status] GeorepStatus: Crawl Status: Hybrid Crawl
[2017-01-19 03:44:55.388740] I [master(/gluster-brick/brick1/gv0):1358:crawl] _GMaster: processing xsync changelog /var/lib/misc/glusterfsd/gv0/ssh%3A%2F%2Froot%40mymachine%3Agluster%3A%2F%2F127.0.0.1%3Agv0-geo-sfo2/e989bd
c037f1478d9b2cc6e6ae3d3d0d/xsync/XSYNC-CHANGELOG.1484797494
[2017-01-19 03:44:55.852421] E [syncdutils(/gluster-brick/brick1/gv0):296:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/syncdutils.py", line 326, in twrap
    tf(*aa)
  File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line 1649, in syncjob
    po = self.sync_engine(pb, self.log_err)
  File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/resource.py", line 1730, in rsync
    log_err=log_err)
  File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/resource.py", line 56, in sup
    sys._getframe(1).f_code.co_name)(*a, **kw)
  File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/resource.py", line 1041, in rsync
    "log_rsync_performance")):
  File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/syncdutils.py", line 368, in boolify
    lstr = s.lower()
AttributeError: 'NoneType' object has no attribute 'lower'
[2017-01-19 03:44:55.854578] I [syncdutils(/gluster-brick/brick1/gv0):237:finalize] <top>: exiting.
[2017-01-19 03:44:55.860104] I [repce(/gluster-brick/brick1/gv0):92:service_loop] RepceServer: terminating on reaching EOF.
[2017-01-19 03:44:55.860390] I [syncdutils(/gluster-brick/brick1/gv0):237:finalize] <top>: exiting.
[2017-01-19 03:44:56.351736] I [monitor(monitor):349:monitor] Monitor: worker(/gluster-brick/brick1/gv0) died in startup phase
[2017-01-19 03:44:56.358608] I [gsyncdstatus(monitor):233:set_worker_status] GeorepStatus: Worker Status: Faulty

I can work around this bug by setting

  gluster volume geo-replication gv0 root@mymachine::gv0-geo config log_rsync_performance true

Could you check if the fix for this issue was correctly applied to 3.9, and release a new version then (I assume you are the ones maintaining that PPA)?

Thanks!

Comment 9 Shyamsundar 2017-03-06 17:33:37 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report.

glusterfs-3.10.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2017-February/030119.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.