Description of problem: ======================= While syncing data using tar, the sync completes but lots of tar process becomes defunct. [root@dhcp41-167 ~]# ps -eaf | grep tar root 12520 4519 1 17:19 ? 00:00:00 tar --sparse -cf - --files-from - root 12521 4522 1 17:19 ? 00:00:00 tar --sparse -cf - --files-from - root 12522 4519 6 17:19 ? 00:00:00 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem -p 22 root.41.203 tar --overwrite -xf - -C /proc/22664/cwd root 12523 4522 6 17:19 ? 00:00:00 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem -p 22 root.41.203 tar --overwrite -xf - -C /proc/22663/cwd root 12524 4510 1 17:19 ? 00:00:00 tar --sparse -cf - --files-from - root 12525 4510 10 17:19 ? 00:00:00 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem -p 22 root.41.203 tar --overwrite -xf - -C /proc/22665/cwd root 12526 4498 0 17:19 ? 00:00:00 tar --sparse -cf - --files-from - root 12527 4498 0 17:19 ? 00:00:00 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem -p 22 root.41.203 tar --overwrite -xf - -C /proc/22662/cwd root 12529 4186 0 17:19 pts/0 00:00:00 grep tar [root@dhcp41-167 ~]# [root@dhcp41-167 ~]# ps -eaf | grep tar root 12520 4519 1 17:19 ? 00:00:00 [tar] <defunct> root 12521 4522 0 17:19 ? 00:00:00 tar --sparse -cf - --files-from - root 12523 4522 5 17:19 ? 00:00:00 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem -p 22 root.41.203 tar --overwrite -xf - -C /proc/22663/cwd root 12524 4510 1 17:19 ? 00:00:00 tar --sparse -cf - --files-from - root 12525 4510 7 17:19 ? 00:00:00 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem -p 22 root.41.203 tar --overwrite -xf - -C /proc/22665/cwd root 12526 4498 0 17:19 ? 00:00:00 tar --sparse -cf - --files-from - root 12527 4498 1 17:19 ? 00:00:00 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem -p 22 root.41.203 tar --overwrite -xf - -C /proc/22662/cwd root 12531 4186 0 17:19 pts/0 00:00:00 grep tar [root@dhcp41-167 ~]# [root@dhcp41-167 ~]# ps -eaf | grep tar root 12520 4519 0 17:19 ? 00:00:00 [tar] <defunct> root 12521 4522 0 17:19 ? 00:00:00 [tar] <defunct> root 12524 4510 0 17:19 ? 00:00:00 [tar] <defunct> root 12526 4498 0 17:19 ? 00:00:00 [tar] <defunct> root 12533 4186 0 17:19 pts/0 00:00:00 grep tar [root@dhcp41-167 ~]# [root@dhcp41-167 ~]# ps -eaf | grep tar root 12520 4519 0 17:19 ? 00:00:00 [tar] <defunct> root 12521 4522 0 17:19 ? 00:00:00 [tar] <defunct> root 12524 4510 0 17:19 ? 00:00:00 [tar] <defunct> root 12526 4498 0 17:19 ? 00:00:00 [tar] <defunct> root 12543 4186 0 17:19 pts/0 00:00:00 grep tar [root@dhcp41-167 ~]# Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.7.9-12 How reproducible: ================= Always Steps to Reproduce: =================== 1. Setup geo-rep between master and slave 2. Set config parameter use-tarssh true 3. Start geo-replication 4. Write some data on master volume 5, Monitor tar process on master nodes using "ps -eaf | grep tar" Actual results: =============== Data at master and slave is synced and arequal checksum matches, However, lots of process gets defunct. [root@dhcp41-167 ~]# ps -eaf | grep tar root 12520 4519 0 17:19 ? 00:00:00 [tar] <defunct> root 12521 4522 0 17:19 ? 00:00:00 [tar] <defunct> root 12524 4510 0 17:19 ? 00:00:00 [tar] <defunct> root 12526 4498 0 17:19 ? 00:00:00 [tar] <defunct> root 12543 4186 0 17:19 pts/0 00:00:00 grep tar [root@dhcp41-167 ~]# Expected results: ================= No tar process should be defunct
Upstream Patch sent to fix the issue http://review.gluster.org/15426
Upstream mainline : http://review.gluster.org/15426 Upstream 3.8 : http://review.gluster.org/15489 Downstream patch : https://code.engineering.redhat.com/gerrit/85004
Verified with the build: glusterfs-3.8.4-3.el7rhgs.x86_64 While the sync is inprogress, defunct tar process are available in the system. However after sync the defunct process are cleaned up. Since the fix is to clean the defunct process after sync. Moving this bug to verified state. [root@dhcp37-76 scripts]# ps -eaf | grep tar root 31101 27632 0 15:53 pts/0 00:00:00 grep --color=auto tar [root@dhcp37-76 scripts]# [root@dhcp37-76 scripts]# [root@dhcp37-76 scripts]# ps -eaf | grep tar root 31124 30560 1 15:54 ? 00:00:00 [tar] <defunct> root 31126 30563 1 15:54 ? 00:00:00 [tar] <defunct> root 31127 30558 1 15:54 ? 00:00:00 [tar] <defunct> root 31128 30563 2 15:54 ? 00:00:00 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem -p 22 root.37.207 tar --overwrite -xf - -C /proc/9926/cwd root 31129 30558 3 15:54 ? 00:00:00 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem -p 22 root.37.207 tar --overwrite -xf - -C /proc/9927/cwd root 31143 27632 0 15:54 pts/0 00:00:00 grep --color=auto tar [root@dhcp37-76 scripts]# [root@dhcp37-76 scripts]# [root@dhcp37-76 scripts]# [root@dhcp37-76 scripts]# ps -eaf | grep tar root 31362 27632 0 16:11 pts/0 00:00:00 grep --color=auto tar [root@dhcp37-76 scripts]#
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html