Bug 1374597

Summary: [geo-rep]: AttributeError: 'Popen' object has no attribute 'elines'
Product: [Community] GlusterFS Reporter: Aravinda VK <avishwan>
Component: geo-replicationAssignee: Aravinda VK <avishwan>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.9CC: bugs, csaba, jeff, rhinduja, rhs-bugs, storage-qa-internal
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.9.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1372193 Environment:
Last Closed: 2016-12-06 06:00:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1340756, 1372193    
Bug Blocks:    

Description Aravinda VK 2016-09-09 07:50:13 UTC
+++ This bug was initially created as a clone of Bug #1372193 +++

+++ This bug was initially created as a clone of Bug #1340756 +++

Description of problem:
=======================

During tarssh syncup and rmdir from Master nfs client, following traceback is seen on one of the master nodes:

[2016-05-29 22:40:09.701273] I [master(/bricks/brick0/master_brick1):1192:crawl] _GMaster: slave's time: (1464560999, 0)
[2016-05-29 22:40:10.335872] I [syncdutils(/bricks/brick1/master_brick6):220:finalize] <top>: exiting.
[2016-05-29 22:40:10.336153] E [syncdutils(/bricks/brick0/master_brick1):276:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 306, in twrap
    tf(*aa)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1575, in syncjob
    po.errfail()
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 242, in errfail
    self.errlog()
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 223, in errlog
    if self.elines:
AttributeError: 'Popen' object has no attribute 'elines'
[2016-05-29 22:40:10.336755] E [syncdutils(/bricks/brick0/master_brick1):252:log_raise_exception] <top>: connection to peer is broken
[2016-05-29 22:40:10.340313] E [resource(/bricks/brick0/master_brick1):226:errlog] Popen: command "ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-jsxe5z/55c207f77f58e7b6352df3f6f7e6779b.sock root.37.88 /nonexistent/gsyncd --session-owner 4f616379-ac10-4dde-a8c8-1a8c5dfb71f8 -N --listen --timeout 120 gluster://localhost:slave" returned with 255, saying:
[2016-05-29 22:40:10.341049] E [resource(/bricks/brick0/master_brick1):230:logerr] Popen: ssh> [2016-05-29 22:30:00.731420] I [cli.c:721:main] 0-cli: Started running /usr/sbin/gluster with version 3.7.9


How reproducible:
=================

Seen only once, not sure about the occurrence. Will update BZ if observe again. 


Steps to Reproduce:
===================
1. Geo-Rep automated cases which does create,rmdir and other fops. Client: NFS protocol and Sync: tarssh

--- Additional comment from Jeff on 2016-07-13 17:46:03 EDT ---

I am receiving the same error

[2016-07-13 19:25:52.16657] I [master(/srv/gluster):1192:crawl] _GMaster: slave's time: (1468363119, 0)
[2016-07-13 19:25:52.583666] E [syncdutils(/srv/gluster):276:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/syncdutils.py", line 306, in twrap
    tf(*aa)
  File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/master.py", line 1575, in syncjob
    po.errfail()
  File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/resource.py", line 242, in errfail
    self.errlog()
  File "/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/resource.py", line 223, in errlog
    if self.elines:
AttributeError: 'Popen' object has no attribute 'elines'
[2016-07-13 19:25:52.585411] I [syncdutils(/srv/gluster):220:finalize] <top>: exiting.
[2016-07-13 19:25:52.588901] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF.
[2016-07-13 19:25:52.589368] I [syncdutils(agent):220:finalize] <top>: exiting.
[2016-07-13 19:25:52.928951] I [monitor(monitor):343:monitor] Monitor: worker(/srv/gluster) died in startup phase

geo-replication status is Faulty and it appears as if the gsyncd.py process is unable to start on the slave server.

gluster-server 3.7.11-1~bpo8+1 on Debian 8.

--- Additional comment from Jeff on 2016-07-14 19:59:55 EDT ---

The error I was receiving was due to rsync not being installed on the slave server.

--- Additional comment from Rahul Hinduja on 2016-08-31 12:29:00 EDT ---

While trying one of the rm case on cascading cases(private build based on 3.1.0+patches), I could reproduce this issue by following steps:

1. Create geo-rep cascaded setup with (vol0,vol1,vol2). Such that vol0=>vol1, vol1=>vol2
2. Mount the vol0 volume and perform:
[root@fan data]# for i in {1..100}; do cp -rf /root/data/new_data/* . ; sleep 5 ; rm -rf * ; sleep 2 ; done
[root@fan data]#

Following traceback on one of the master is seen:

[2016-08-31 16:20:55.410460] E [syncdutils(/rhs/brick2/b3):276:log_raise_exception] <top>: FAIL: 
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 306, in twrap
    tf(*aa)
  File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1530, in syncjob
    po.errfail()
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 242, in errfail
    self.errlog()
  File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 223, in errlog
    if self.elines:
AttributeError: 'Popen' object has no attribute 'elines'
[2016-08-31 16:20:55.415150] I [syncdutils(/rhs/brick2/b3):220:finalize] <top>: exiting.

--- Additional comment from Worker Ant on 2016-09-01 03:11:15 EDT ---

REVIEW: http://review.gluster.org/15379 (geo-rep: Fix logging sync failures) posted (#1) for review on master by Aravinda VK (avishwan)

--- Additional comment from Worker Ant on 2016-09-05 01:43:27 EDT ---

REVIEW: http://review.gluster.org/15379 (geo-rep: Fix logging sync failures) posted (#2) for review on master by Aravinda VK (avishwan)

--- Additional comment from Worker Ant on 2016-09-08 12:14:53 EDT ---

COMMIT: http://review.gluster.org/15379 committed in master by Aravinda VK (avishwan) 
------
commit c0f877c0374d97e0bee17aac4850d7655a35e61b
Author: Aravinda VK <avishwan>
Date:   Thu Sep 1 12:35:46 2016 +0530

    geo-rep: Fix logging sync failures
    
    If Rsync/Tar subprocess dies, while logging error Geo-rep fails
    with EBADF while accessing error file. Also worker dies while
    accessing elines before it is set.
    
    BUG: 1372193
    Change-Id: I9cfce116e8aafa4a98654f5190d40a455af8ec95
    Signed-off-by: Aravinda VK <avishwan>
    Reviewed-on: http://review.gluster.org/15379
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Smoke: Gluster Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Kotresh HR <khiremat>

Comment 1 Worker Ant 2016-09-09 08:16:53 UTC
REVIEW: http://review.gluster.org/15441 (geo-rep: Fix logging sync failures) posted (#1) for review on release-3.9 by Aravinda VK (avishwan)

Comment 2 Worker Ant 2016-09-15 06:28:41 UTC
COMMIT: http://review.gluster.org/15441 committed in release-3.9 by Aravinda VK (avishwan) 
------
commit 31ad819d8ccfd7a78a6fc35b9e21b673597d7b93
Author: Aravinda VK <avishwan>
Date:   Thu Sep 1 12:35:46 2016 +0530

    geo-rep: Fix logging sync failures
    
    If Rsync/Tar subprocess dies, while logging error Geo-rep fails
    with EBADF while accessing error file. Also worker dies while
    accessing elines before it is set.
    
    > Reviewed-on: http://review.gluster.org/15379
    > NetBSD-regression: NetBSD Build System <jenkins.org>
    > Smoke: Gluster Build System <jenkins.org>
    > CentOS-regression: Gluster Build System <jenkins.org>
    > Reviewed-by: Kotresh HR <khiremat>
    
    BUG: 1374597
    Change-Id: I9cfce116e8aafa4a98654f5190d40a455af8ec95
    Signed-off-by: Aravinda VK <avishwan>
    (cherry picked from commit c0f877c0374d97e0bee17aac4850d7655a35e61b)
    Reviewed-on: http://review.gluster.org/15441
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Saravanakumar Arumugam <sarumuga>

Comment 3 Aravinda VK 2016-10-27 05:22:49 UTC
glusterfs-3.9.0rc2 is released[1] and packages are available for different distributions[2] to test.

[1] http://www.gluster.org/pipermail/maintainers/2016-October/001601.html
[2] http://www.gluster.org/pipermail/maintainers/2016-October/001605.html and http://www.gluster.org/pipermail/maintainers/2016-October/001606.html

Comment 4 Aravinda VK 2016-12-06 06:00:19 UTC
Gluster 3.9 GA is released http://blog.gluster.org/2016/11/announcing-gluster-3-9/