Bug 1459620 - [geo-rep]: Worker crashed with TypeError: expected string or buffer
Summary: [geo-rep]: Worker crashed with TypeError: expected string or buffer
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: geo-replication
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
Assignee: Aravinda VK
QA Contact:
URL:
Whiteboard:
Depends On: 1448386
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-06-07 15:25 UTC by Aravinda VK
Modified: 2017-09-05 17:33 UTC (History)
8 users (show)

Fixed In Version: glusterfs-3.12.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1448386
Environment:
Last Closed: 2017-09-05 17:33:35 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Aravinda VK 2017-06-07 15:25:53 UTC
+++ This bug was initially created as a clone of Bug #1448386 +++

Description of problem:
=======================

While running geo-replication sanity check which does following fop's (create,chmod,chown,chgrp,hardlink,symlink,truncate,rename,remove) observed the following worker crash. Eventually the checksum matches at master and slave, and hence do not know after which fop or crawl this is observed. The crash is only seen once and worker became online post that. 

[2017-05-04 17:24:29.679775] I [gsyncdstatus(/bricks/brick0/master_brick1):276:set_passive] GeorepStatus: Worker Status: Passive
[2017-05-04 17:24:30.635793] I [master(/bricks/brick1/master_brick4):1195:crawl] _GMaster: slave's time: (1493917690, 0)
[2017-05-04 17:24:35.496570] E [syncdutils(/bricks/brick1/master_brick4):296:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 326, in twrap
    tf(*aa)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1575, in syncjob
    po = self.sync_engine(pb, self.log_err)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1702, in rsync
    log_err=log_err)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 56, in sup
    sys._getframe(1).f_code.co_name)(*a, **kw)
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1025, in rsync
    "log_rsync_performance", default_value=False))
  File "/usr/libexec/glusterfs/python/syncdaemon/configinterface.py", line 264, in get_realtime
    return self.get(opt, printValue=False, default_value=default_value)
  File "/usr/libexec/glusterfs/python/syncdaemon/configinterface.py", line 369, in get
    self.update_to(d, allow_unresolved=True)
  File "/usr/libexec/glusterfs/python/syncdaemon/configinterface.py", line 359, in update_to
    update_from_sect(sect, MultiDict(dct, mad, *self.auxdicts))
  File "/usr/libexec/glusterfs/python/syncdaemon/configinterface.py", line 343, in update_from_sect
    dct[k] = Template(v).safe_substitute(mud)
  File "/usr/lib64/python2.7/string.py", line 205, in safe_substitute
    return self.pattern.sub(convert, self.template)
TypeError: expected string or buffer
[2017-05-04 17:24:35.563095] I [syncdutils(/bricks/brick1/master_brick4):237:finalize] <top>: exiting.
[2017-05-04 17:24:35.572370] I [repce(/bricks/brick1/master_brick4):92:service_loop] RepceServer: terminating on reaching EOF.
[2017-05-04 17:24:35.573046] I [syncdutils(/bricks/brick1/master_brick4):237:finalize] <top>: exiting.

Version-Release number of selected component (if applicable):
=============================================================
glusterfs-geo-replication-3.8.4-24.el7rhgs.x86_64


Steps to Reproduce:
==================
Do not know the exact steps since it was seen in the automation run. 
Will work to find out the specific steps and update this space later.

Actual results:
==============

Worker crashed and then came online.
Arequal checksum between master and slave matches.


Expected results:
=================

Worker should not crash.

--- Additional comment from Aravinda VK on 2017-06-06 08:48:33 EDT ---

Easy reproducer:

cd /usr/libexec/glusterfs/python/syncdaemon/
python

    from configinterface import GConffile
    conf = GConffile("/var/lib/glusterd/geo-replication/gsyncd_template.conf", 
                     ["master", "slave"], {})
    print conf.get()
    conf.set("log-rsync-performance", 10)
    print conf.get("log-rsync-performance")

Above script fails with the same traceback.

RCA: We are not restarting the worker for Some of the configuration changes(For example, log-rsync-performance), this causes non string value passed as template. Worker restart fixes this since it reads config values from file as string instead of actual type.

Comment 1 Worker Ant 2017-06-07 15:31:57 UTC
REVIEW: https://review.gluster.org/17489 (geo-rep: Fix ConfigInterface Template issue) posted (#1) for review on master by Aravinda VK (avishwan)

Comment 2 Worker Ant 2017-06-08 11:16:24 UTC
COMMIT: https://review.gluster.org/17489 committed in master by Aravinda VK (avishwan) 
------
commit 513984ad90531c53fcb7d6f0d581f198a6afcf93
Author: Aravinda VK <avishwan>
Date:   Tue Jun 6 17:59:59 2017 +0530

    geo-rep: Fix ConfigInterface Template issue
    
    ConfigParser uses string Template to substitute the dynamic values
    for config. For some of the configurations, Geo-rep worker will
    not restart. Due to this conf object may have non string values.
    
    If val is not string in Template(val), then it fails with
    "TypeError: expected string or buffer"
    
    BUG: 1459620
    Change-Id: I25b8bbc1df42f6f29e9563a55b3e27a228321c44
    Signed-off-by: Aravinda VK <avishwan>
    Reviewed-on: https://review.gluster.org/17489
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Kotresh HR <khiremat>

Comment 3 Worker Ant 2017-06-12 05:39:17 UTC
REVIEW: https://review.gluster.org/17503 (geo-rep: Fix string format issue caused due to #17489) posted (#1) for review on master by Aravinda VK (avishwan)

Comment 4 Worker Ant 2017-06-13 06:09:09 UTC
COMMIT: https://review.gluster.org/17503 committed in master by Aravinda VK (avishwan) 
------
commit 778ad0e2bbfe60db32df460590e0c3596fdf1aa5
Author: Aravinda VK <avishwan>
Date:   Mon Jun 12 11:05:27 2017 +0530

    geo-rep: Fix string format issue caused due to #17489
    
    With Patch #17489, values from Geo-rep config always represented
    as Unicode string, which is not compatible with rest of the code.
    
    Changed the format with this patch to fix the issue.
    
    BUG: 1459620
    Change-Id: I935fca0d24f02e90757f688f92ef73fad9f9b8e1
    Signed-off-by: Aravinda VK <avishwan>
    Reviewed-on: https://review.gluster.org/17503
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: Kotresh HR <khiremat>

Comment 5 Shyamsundar 2017-09-05 17:33:35 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.0, please open a new bug report.

glusterfs-3.12.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-September/000082.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.