Bug 1373976 - [geo-rep]: defunct tar process while using tar+ssh sync
Summary: [geo-rep]: defunct tar process while using tar+ssh sync
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: geo-replication
Version: rhgs-3.1
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: RHGS 3.2.0
Assignee: Aravinda VK
QA Contact: Rahul Hinduja
URL:
Whiteboard:
Depends On:
Blocks: 1351528 1374286 1375541 1375542 1375543
TreeView+ depends on / blocked
 
Reported: 2016-09-07 15:00 UTC by Rahul Hinduja
Modified: 2017-03-23 05:46 UTC (History)
5 users (show)

Fixed In Version: glusterfs-3.8.4-1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1374286 (view as bug list)
Environment:
Last Closed: 2017-03-23 05:46:29 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:0486 0 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update 2017-03-23 09:18:45 UTC

Description Rahul Hinduja 2016-09-07 15:00:38 UTC
Description of problem:
=======================

While syncing data using tar, the sync completes but lots of tar process becomes defunct. 

[root@dhcp41-167 ~]# ps -eaf | grep tar
root     12520  4519  1 17:19 ?        00:00:00 tar --sparse -cf - --files-from -
root     12521  4522  1 17:19 ?        00:00:00 tar --sparse -cf - --files-from -
root     12522  4519  6 17:19 ?        00:00:00 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem -p 22 root.41.203 tar --overwrite -xf - -C /proc/22664/cwd
root     12523  4522  6 17:19 ?        00:00:00 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem -p 22 root.41.203 tar --overwrite -xf - -C /proc/22663/cwd
root     12524  4510  1 17:19 ?        00:00:00 tar --sparse -cf - --files-from -
root     12525  4510 10 17:19 ?        00:00:00 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem -p 22 root.41.203 tar --overwrite -xf - -C /proc/22665/cwd
root     12526  4498  0 17:19 ?        00:00:00 tar --sparse -cf - --files-from -
root     12527  4498  0 17:19 ?        00:00:00 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem -p 22 root.41.203 tar --overwrite -xf - -C /proc/22662/cwd
root     12529  4186  0 17:19 pts/0    00:00:00 grep tar
[root@dhcp41-167 ~]#
[root@dhcp41-167 ~]# ps -eaf | grep tar
root     12520  4519  1 17:19 ?        00:00:00 [tar] <defunct>
root     12521  4522  0 17:19 ?        00:00:00 tar --sparse -cf - --files-from -
root     12523  4522  5 17:19 ?        00:00:00 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem -p 22 root.41.203 tar --overwrite -xf - -C /proc/22663/cwd
root     12524  4510  1 17:19 ?        00:00:00 tar --sparse -cf - --files-from -
root     12525  4510  7 17:19 ?        00:00:00 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem -p 22 root.41.203 tar --overwrite -xf - -C /proc/22665/cwd
root     12526  4498  0 17:19 ?        00:00:00 tar --sparse -cf - --files-from -
root     12527  4498  1 17:19 ?        00:00:00 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem -p 22 root.41.203 tar --overwrite -xf - -C /proc/22662/cwd
root     12531  4186  0 17:19 pts/0    00:00:00 grep tar
[root@dhcp41-167 ~]#
[root@dhcp41-167 ~]# ps -eaf | grep tar
root     12520  4519  0 17:19 ?        00:00:00 [tar] <defunct>
root     12521  4522  0 17:19 ?        00:00:00 [tar] <defunct>
root     12524  4510  0 17:19 ?        00:00:00 [tar] <defunct>
root     12526  4498  0 17:19 ?        00:00:00 [tar] <defunct>
root     12533  4186  0 17:19 pts/0    00:00:00 grep tar
[root@dhcp41-167 ~]#
[root@dhcp41-167 ~]# ps -eaf | grep tar
root     12520  4519  0 17:19 ?        00:00:00 [tar] <defunct>
root     12521  4522  0 17:19 ?        00:00:00 [tar] <defunct>
root     12524  4510  0 17:19 ?        00:00:00 [tar] <defunct>
root     12526  4498  0 17:19 ?        00:00:00 [tar] <defunct>
root     12543  4186  0 17:19 pts/0    00:00:00 grep tar
[root@dhcp41-167 ~]# 


Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.7.9-12


How reproducible:
=================
Always


Steps to Reproduce:
===================
1. Setup geo-rep between master and slave
2. Set config parameter use-tarssh true
3. Start geo-replication
4. Write some data on master volume 
5, Monitor tar process on master nodes using "ps -eaf | grep tar" 

Actual results:
===============

Data at master and slave is synced and arequal checksum matches, However, lots of process gets defunct. 
[root@dhcp41-167 ~]# ps -eaf | grep tar
root     12520  4519  0 17:19 ?        00:00:00 [tar] <defunct>
root     12521  4522  0 17:19 ?        00:00:00 [tar] <defunct>
root     12524  4510  0 17:19 ?        00:00:00 [tar] <defunct>
root     12526  4498  0 17:19 ?        00:00:00 [tar] <defunct>
root     12543  4186  0 17:19 pts/0    00:00:00 grep tar
[root@dhcp41-167 ~]# 


Expected results:
=================
No tar process should be defunct

Comment 2 Aravinda VK 2016-09-08 12:04:37 UTC
Upstream Patch sent to fix the issue
http://review.gluster.org/15426

Comment 4 Atin Mukherjee 2016-09-19 08:55:54 UTC
Upstream mainline : http://review.gluster.org/15426
Upstream 3.8 : http://review.gluster.org/15489
Downstream patch : https://code.engineering.redhat.com/gerrit/85004

Comment 8 Rahul Hinduja 2016-11-11 12:14:49 UTC
Verified with the build: glusterfs-3.8.4-3.el7rhgs.x86_64

While the sync is inprogress, defunct tar process are available in the system. However after sync the defunct process are cleaned up. Since the fix is to clean the defunct process after sync. Moving this bug to verified state.

[root@dhcp37-76 scripts]# ps -eaf | grep tar
root     31101 27632  0 15:53 pts/0    00:00:00 grep --color=auto tar
[root@dhcp37-76 scripts]# 
[root@dhcp37-76 scripts]# 
[root@dhcp37-76 scripts]# ps -eaf | grep tar
root     31124 30560  1 15:54 ?        00:00:00 [tar] <defunct>
root     31126 30563  1 15:54 ?        00:00:00 [tar] <defunct>
root     31127 30558  1 15:54 ?        00:00:00 [tar] <defunct>
root     31128 30563  2 15:54 ?        00:00:00 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem -p 22 root.37.207 tar --overwrite -xf - -C /proc/9926/cwd
root     31129 30558  3 15:54 ?        00:00:00 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem -p 22 root.37.207 tar --overwrite -xf - -C /proc/9927/cwd
root     31143 27632  0 15:54 pts/0    00:00:00 grep --color=auto tar
[root@dhcp37-76 scripts]# 
[root@dhcp37-76 scripts]# 
[root@dhcp37-76 scripts]# 
[root@dhcp37-76 scripts]# ps -eaf | grep tar
root     31362 27632  0 16:11 pts/0    00:00:00 grep --color=auto tar
[root@dhcp37-76 scripts]#

Comment 10 errata-xmlrpc 2017-03-23 05:46:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html


Note You need to log in before you can comment on or make changes to this bug.