Bug 1373976

Summary: [geo-rep]: defunct tar process while using tar+ssh sync
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Rahul Hinduja <rhinduja>
Component: geo-replicationAssignee: Aravinda VK <avishwan>
Status: CLOSED ERRATA QA Contact: Rahul Hinduja <rhinduja>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.1CC: amukherj, asrivast, csaba, rhs-bugs, storage-qa-internal
Target Milestone: ---   
Target Release: RHGS 3.2.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.8.4-1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1374286 (view as bug list) Environment:
Last Closed: 2017-03-23 05:46:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1351528, 1374286, 1375541, 1375542, 1375543    

Description Rahul Hinduja 2016-09-07 15:00:38 UTC
Description of problem:
=======================

While syncing data using tar, the sync completes but lots of tar process becomes defunct. 

[root@dhcp41-167 ~]# ps -eaf | grep tar
root     12520  4519  1 17:19 ?        00:00:00 tar --sparse -cf - --files-from -
root     12521  4522  1 17:19 ?        00:00:00 tar --sparse -cf - --files-from -
root     12522  4519  6 17:19 ?        00:00:00 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem -p 22 root.41.203 tar --overwrite -xf - -C /proc/22664/cwd
root     12523  4522  6 17:19 ?        00:00:00 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem -p 22 root.41.203 tar --overwrite -xf - -C /proc/22663/cwd
root     12524  4510  1 17:19 ?        00:00:00 tar --sparse -cf - --files-from -
root     12525  4510 10 17:19 ?        00:00:00 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem -p 22 root.41.203 tar --overwrite -xf - -C /proc/22665/cwd
root     12526  4498  0 17:19 ?        00:00:00 tar --sparse -cf - --files-from -
root     12527  4498  0 17:19 ?        00:00:00 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem -p 22 root.41.203 tar --overwrite -xf - -C /proc/22662/cwd
root     12529  4186  0 17:19 pts/0    00:00:00 grep tar
[root@dhcp41-167 ~]#
[root@dhcp41-167 ~]# ps -eaf | grep tar
root     12520  4519  1 17:19 ?        00:00:00 [tar] <defunct>
root     12521  4522  0 17:19 ?        00:00:00 tar --sparse -cf - --files-from -
root     12523  4522  5 17:19 ?        00:00:00 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem -p 22 root.41.203 tar --overwrite -xf - -C /proc/22663/cwd
root     12524  4510  1 17:19 ?        00:00:00 tar --sparse -cf - --files-from -
root     12525  4510  7 17:19 ?        00:00:00 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem -p 22 root.41.203 tar --overwrite -xf - -C /proc/22665/cwd
root     12526  4498  0 17:19 ?        00:00:00 tar --sparse -cf - --files-from -
root     12527  4498  1 17:19 ?        00:00:00 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem -p 22 root.41.203 tar --overwrite -xf - -C /proc/22662/cwd
root     12531  4186  0 17:19 pts/0    00:00:00 grep tar
[root@dhcp41-167 ~]#
[root@dhcp41-167 ~]# ps -eaf | grep tar
root     12520  4519  0 17:19 ?        00:00:00 [tar] <defunct>
root     12521  4522  0 17:19 ?        00:00:00 [tar] <defunct>
root     12524  4510  0 17:19 ?        00:00:00 [tar] <defunct>
root     12526  4498  0 17:19 ?        00:00:00 [tar] <defunct>
root     12533  4186  0 17:19 pts/0    00:00:00 grep tar
[root@dhcp41-167 ~]#
[root@dhcp41-167 ~]# ps -eaf | grep tar
root     12520  4519  0 17:19 ?        00:00:00 [tar] <defunct>
root     12521  4522  0 17:19 ?        00:00:00 [tar] <defunct>
root     12524  4510  0 17:19 ?        00:00:00 [tar] <defunct>
root     12526  4498  0 17:19 ?        00:00:00 [tar] <defunct>
root     12543  4186  0 17:19 pts/0    00:00:00 grep tar
[root@dhcp41-167 ~]# 


Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.7.9-12


How reproducible:
=================
Always


Steps to Reproduce:
===================
1. Setup geo-rep between master and slave
2. Set config parameter use-tarssh true
3. Start geo-replication
4. Write some data on master volume 
5, Monitor tar process on master nodes using "ps -eaf | grep tar" 

Actual results:
===============

Data at master and slave is synced and arequal checksum matches, However, lots of process gets defunct. 
[root@dhcp41-167 ~]# ps -eaf | grep tar
root     12520  4519  0 17:19 ?        00:00:00 [tar] <defunct>
root     12521  4522  0 17:19 ?        00:00:00 [tar] <defunct>
root     12524  4510  0 17:19 ?        00:00:00 [tar] <defunct>
root     12526  4498  0 17:19 ?        00:00:00 [tar] <defunct>
root     12543  4186  0 17:19 pts/0    00:00:00 grep tar
[root@dhcp41-167 ~]# 


Expected results:
=================
No tar process should be defunct

Comment 2 Aravinda VK 2016-09-08 12:04:37 UTC
Upstream Patch sent to fix the issue
http://review.gluster.org/15426

Comment 4 Atin Mukherjee 2016-09-19 08:55:54 UTC
Upstream mainline : http://review.gluster.org/15426
Upstream 3.8 : http://review.gluster.org/15489
Downstream patch : https://code.engineering.redhat.com/gerrit/85004

Comment 8 Rahul Hinduja 2016-11-11 12:14:49 UTC
Verified with the build: glusterfs-3.8.4-3.el7rhgs.x86_64

While the sync is inprogress, defunct tar process are available in the system. However after sync the defunct process are cleaned up. Since the fix is to clean the defunct process after sync. Moving this bug to verified state.

[root@dhcp37-76 scripts]# ps -eaf | grep tar
root     31101 27632  0 15:53 pts/0    00:00:00 grep --color=auto tar
[root@dhcp37-76 scripts]# 
[root@dhcp37-76 scripts]# 
[root@dhcp37-76 scripts]# ps -eaf | grep tar
root     31124 30560  1 15:54 ?        00:00:00 [tar] <defunct>
root     31126 30563  1 15:54 ?        00:00:00 [tar] <defunct>
root     31127 30558  1 15:54 ?        00:00:00 [tar] <defunct>
root     31128 30563  2 15:54 ?        00:00:00 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem -p 22 root.37.207 tar --overwrite -xf - -C /proc/9926/cwd
root     31129 30558  3 15:54 ?        00:00:00 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem -p 22 root.37.207 tar --overwrite -xf - -C /proc/9927/cwd
root     31143 27632  0 15:54 pts/0    00:00:00 grep --color=auto tar
[root@dhcp37-76 scripts]# 
[root@dhcp37-76 scripts]# 
[root@dhcp37-76 scripts]# 
[root@dhcp37-76 scripts]# ps -eaf | grep tar
root     31362 27632  0 16:11 pts/0    00:00:00 grep --color=auto tar
[root@dhcp37-76 scripts]#

Comment 10 errata-xmlrpc 2017-03-23 05:46:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html