Description of problem: ======================= During tarssh syncup and rmdir from Master nfs client, following traceback is seen on one of the master nodes: [2016-05-29 22:40:09.701273] I [master(/bricks/brick0/master_brick1):1192:crawl] _GMaster: slave's time: (1464560999, 0) [2016-05-29 22:40:10.335872] I [syncdutils(/bricks/brick1/master_brick6):220:finalize] <top>: exiting. [2016-05-29 22:40:10.336153] E [syncdutils(/bricks/brick0/master_brick1):276:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 306, in twrap tf(*aa) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1575, in syncjob po.errfail() File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 242, in errfail self.errlog() File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 223, in errlog if self.elines: AttributeError: 'Popen' object has no attribute 'elines' [2016-05-29 22:40:10.336755] E [syncdutils(/bricks/brick0/master_brick1):252:log_raise_exception] <top>: connection to peer is broken [2016-05-29 22:40:10.340313] E [resource(/bricks/brick0/master_brick1):226:errlog] Popen: command "ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-jsxe5z/55c207f77f58e7b6352df3f6f7e6779b.sock root.37.88 /nonexistent/gsyncd --session-owner 4f616379-ac10-4dde-a8c8-1a8c5dfb71f8 -N --listen --timeout 120 gluster://localhost:slave" returned with 255, saying: [2016-05-29 22:40:10.341049] E [resource(/bricks/brick0/master_brick1):230:logerr] Popen: ssh> [2016-05-29 22:30:00.731420] I [cli.c:721:main] 0-cli: Started running /usr/sbin/gluster with version 3.7.9 Version-Release number of selected component (if applicable): ============================================================== glusterfs-geo-replication-3.7.9-6.el7rhgs.x86_64 glusterfs-3.7.9-6.el7rhgs.x86_64 How reproducible: ================= Seen only once, not sure about the occurrence. Will update BZ if observe again. Steps to Reproduce: =================== 1. Geo-Rep automated cases which does create,rmdir and other fops. Client: NFS protocol and Sync: tarssh
sosreports @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1340756/
I am receiving the same error [2016-07-13 19:25:52.16657] I [master(/srv/gluster):1192:crawl] _GMaster: slave's time: (1468363119, 0) [2016-07-13 19:25:52.583666] E [syncdutils(/srv/gluster):276:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/syncdutils.py", line 306, in twrap tf(*aa) File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line 1575, in syncjob po.errfail() File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/resource.py", line 242, in errfail self.errlog() File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/resource.py", line 223, in errlog if self.elines: AttributeError: 'Popen' object has no attribute 'elines' [2016-07-13 19:25:52.585411] I [syncdutils(/srv/gluster):220:finalize] <top>: exiting. [2016-07-13 19:25:52.588901] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF. [2016-07-13 19:25:52.589368] I [syncdutils(agent):220:finalize] <top>: exiting. [2016-07-13 19:25:52.928951] I [monitor(monitor):343:monitor] Monitor: worker(/srv/gluster) died in startup phase geo-replication status is Faulty and it appears as if the gsyncd.py process is unable to start on the slave server. gluster-server 3.7.11-1~bpo8+1 on Debian 8.
The error I was receiving was due to rsync not being installed on the slave server.
Upstream patch sent for review http://review.gluster.org/#/c/15379/
Upstream mainline : http://review.gluster.org/15379 Upstream 3.8 : http://review.gluster.org/15447 downstream patch : https://code.engineering.redhat.com/gerrit/#/c/85005
(In reply to Atin Mukherjee from comment #8) > Upstream mainline : http://review.gluster.org/15379 > Upstream 3.8 : http://review.gluster.org/15447 > downstream patch : https://code.engineering.redhat.com/gerrit/#/c/85005 Correction, downstream patch link is https://code.engineering.redhat.com/gerrit/#/c/85007
Approving the accelerated fix. Please note that the fix has to be in 3.2 to avoid regression.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html