+++ This bug was initially created as a clone of Bug #1372193 +++ +++ This bug was initially created as a clone of Bug #1340756 +++ Description of problem: ======================= During tarssh syncup and rmdir from Master nfs client, following traceback is seen on one of the master nodes: [2016-05-29 22:40:09.701273] I [master(/bricks/brick0/master_brick1):1192:crawl] _GMaster: slave's time: (1464560999, 0) [2016-05-29 22:40:10.335872] I [syncdutils(/bricks/brick1/master_brick6):220:finalize] <top>: exiting. [2016-05-29 22:40:10.336153] E [syncdutils(/bricks/brick0/master_brick1):276:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 306, in twrap tf(*aa) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1575, in syncjob po.errfail() File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 242, in errfail self.errlog() File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 223, in errlog if self.elines: AttributeError: 'Popen' object has no attribute 'elines' [2016-05-29 22:40:10.336755] E [syncdutils(/bricks/brick0/master_brick1):252:log_raise_exception] <top>: connection to peer is broken [2016-05-29 22:40:10.340313] E [resource(/bricks/brick0/master_brick1):226:errlog] Popen: command "ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-jsxe5z/55c207f77f58e7b6352df3f6f7e6779b.sock root.37.88 /nonexistent/gsyncd --session-owner 4f616379-ac10-4dde-a8c8-1a8c5dfb71f8 -N --listen --timeout 120 gluster://localhost:slave" returned with 255, saying: [2016-05-29 22:40:10.341049] E [resource(/bricks/brick0/master_brick1):230:logerr] Popen: ssh> [2016-05-29 22:30:00.731420] I [cli.c:721:main] 0-cli: Started running /usr/sbin/gluster with version 3.7.9 How reproducible: ================= Seen only once, not sure about the occurrence. Will update BZ if observe again. Steps to Reproduce: =================== 1. Geo-Rep automated cases which does create,rmdir and other fops. Client: NFS protocol and Sync: tarssh --- Additional comment from Jeff on 2016-07-13 17:46:03 EDT --- I am receiving the same error [2016-07-13 19:25:52.16657] I [master(/srv/gluster):1192:crawl] _GMaster: slave's time: (1468363119, 0) [2016-07-13 19:25:52.583666] E [syncdutils(/srv/gluster):276:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/syncdutils.py", line 306, in twrap tf(*aa) File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line 1575, in syncjob po.errfail() File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/resource.py", line 242, in errfail self.errlog() File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/resource.py", line 223, in errlog if self.elines: AttributeError: 'Popen' object has no attribute 'elines' [2016-07-13 19:25:52.585411] I [syncdutils(/srv/gluster):220:finalize] <top>: exiting. [2016-07-13 19:25:52.588901] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF. [2016-07-13 19:25:52.589368] I [syncdutils(agent):220:finalize] <top>: exiting. [2016-07-13 19:25:52.928951] I [monitor(monitor):343:monitor] Monitor: worker(/srv/gluster) died in startup phase geo-replication status is Faulty and it appears as if the gsyncd.py process is unable to start on the slave server. gluster-server 3.7.11-1~bpo8+1 on Debian 8. --- Additional comment from Jeff on 2016-07-14 19:59:55 EDT --- The error I was receiving was due to rsync not being installed on the slave server. --- Additional comment from Rahul Hinduja on 2016-08-31 12:29:00 EDT --- While trying one of the rm case on cascading cases(private build based on 3.1.0+patches), I could reproduce this issue by following steps: 1. Create geo-rep cascaded setup with (vol0,vol1,vol2). Such that vol0=>vol1, vol1=>vol2 2. Mount the vol0 volume and perform: [root@fan data]# for i in {1..100}; do cp -rf /root/data/new_data/* . ; sleep 5 ; rm -rf * ; sleep 2 ; done [root@fan data]# Following traceback on one of the master is seen: [2016-08-31 16:20:55.410460] E [syncdutils(/rhs/brick2/b3):276:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 306, in twrap tf(*aa) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1530, in syncjob po.errfail() File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 242, in errfail self.errlog() File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 223, in errlog if self.elines: AttributeError: 'Popen' object has no attribute 'elines' [2016-08-31 16:20:55.415150] I [syncdutils(/rhs/brick2/b3):220:finalize] <top>: exiting. --- Additional comment from Worker Ant on 2016-09-01 03:11:15 EDT --- REVIEW: http://review.gluster.org/15379 (geo-rep: Fix logging sync failures) posted (#1) for review on master by Aravinda VK (avishwan) --- Additional comment from Worker Ant on 2016-09-05 01:43:27 EDT --- REVIEW: http://review.gluster.org/15379 (geo-rep: Fix logging sync failures) posted (#2) for review on master by Aravinda VK (avishwan) --- Additional comment from Worker Ant on 2016-09-08 12:14:53 EDT --- COMMIT: http://review.gluster.org/15379 committed in master by Aravinda VK (avishwan) ------ commit c0f877c0374d97e0bee17aac4850d7655a35e61b Author: Aravinda VK <avishwan> Date: Thu Sep 1 12:35:46 2016 +0530 geo-rep: Fix logging sync failures If Rsync/Tar subprocess dies, while logging error Geo-rep fails with EBADF while accessing error file. Also worker dies while accessing elines before it is set. BUG: 1372193 Change-Id: I9cfce116e8aafa4a98654f5190d40a455af8ec95 Signed-off-by: Aravinda VK <avishwan> Reviewed-on: http://review.gluster.org/15379 NetBSD-regression: NetBSD Build System <jenkins.org> Smoke: Gluster Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Kotresh HR <khiremat>
REVIEW: http://review.gluster.org/15442 (geo-rep: Fix logging sync failures) posted (#1) for review on release-3.8 by Aravinda VK (avishwan)
All 3.8.x bugs are now reported against version 3.8 (without .x). For more information, see http://www.gluster.org/pipermail/gluster-devel/2016-September/050859.html
COMMIT: http://review.gluster.org/15442 committed in release-3.8 by Aravinda VK (avishwan) ------ commit 0da3b27e1483a55a63511b4c851f4f1f14c9eacd Author: Aravinda VK <avishwan> Date: Thu Sep 1 12:35:46 2016 +0530 geo-rep: Fix logging sync failures If Rsync/Tar subprocess dies, while logging error Geo-rep fails with EBADF while accessing error file. Also worker dies while accessing elines before it is set. > Reviewed-on: http://review.gluster.org/15379 > NetBSD-regression: NetBSD Build System <jenkins.org> > Smoke: Gluster Build System <jenkins.org> > CentOS-regression: Gluster Build System <jenkins.org> > Reviewed-by: Kotresh HR <khiremat> BUG: 1374596 Change-Id: I9cfce116e8aafa4a98654f5190d40a455af8ec95 Signed-off-by: Aravinda VK <avishwan> (cherry picked from commit c0f877c0374d97e0bee17aac4850d7655a35e61b) Reviewed-on: http://review.gluster.org/15442 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Saravanakumar Arumugam <sarumuga>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.5, please open a new bug report. glusterfs-3.8.5 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://www.gluster.org/pipermail/announce/2016-October/000061.html [2] https://www.gluster.org/pipermail/gluster-users/