Bug 1100107

Summary: xfsprogs: xfs_copy succeeds but exits with error code
Product: Red Hat Enterprise Linux 6 Reporter: Junxiao Bi <junxiao.bi>
Component: xfsprogsAssignee: Eric Sandeen <esandeen>
Status: CLOSED ERRATA QA Contact: Eryu Guan <eguan>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.5CC: eguan
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: xfsprogs-3.1.1-16.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1100376 (view as bug list) Environment:
Last Closed: 2014-10-14 07:49:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1100376    

Description Junxiao Bi 2014-05-22 03:27:23 UTC
Description of problem:
xfs_copy used SIGKILL to kill its child thread before exit, that will end the whole process, so xfs_copy will exit with an error code 137. That will confuse script whether it successes.

This can fix it.

From b3580a15e10e153d7443a2e0c05f570d94b9b5a6 Mon Sep 17 00:00:00 2001
From: Junxiao Bi <junxiao.bi>
Date: Tue, 6 May 2014 14:27:31 +0800
Subject: [PATCH] xfsprogs: xfs_copy: use exit() to replace killall()

Sending a SIGKILL signal to child thread will terminate the whole process,
xfs_copy will return an error value 137. This cause confuse for script to
know whether the copy successes.

Calling exit() in main thread can terminate the whole process and return the
right value. Replace killall()+abort() with exit(1) to match the old way
exit in error case. Also remove killall()+pthread_exit(NULL) since return 0
will be followed by an exit(0) to terminate the process.

Bug story from Christoph Hellwig:
Btw, I think the reason for this cruft is that xfs_copy was originally
written using the IRIX sproc interface, and the port to pthreads didn't
remove this gem:

http://marc.info/?l=linux-xfs&m=99535721110020&w=2

Signed-off-by: Junxiao Bi <junxiao.bi>
Cc: Joe jin <joe.jin>
Reviewed-by: Christoph Hellwig <hch>
Reviewed-by: John Haxby <john.haxby>
Reviewed-by: Ethan Zhao <ethan.zhao>
---
 copy/xfs_copy.c |   30 +-----------------------------
 1 files changed, 1 insertions(+), 29 deletions(-)

diff --git a/copy/xfs_copy.c b/copy/xfs_copy.c
index 39517da..39bb9d7 100644
--- a/copy/xfs_copy.c
+++ b/copy/xfs_copy.c
@@ -217,25 +217,6 @@ handle_error:
 }
 
 void
-killall(void)
-{
-	int i;
-
-	/* only the parent gets to kill things */
-
-	if (getpid() != parent_pid)
-		return;
-
-	for (i = 0; i < num_targets; i++)  {
-		if (target[i].state == ACTIVE)  {
-			/* kill up target threads */
-			pthread_kill(target[i].pid, SIGKILL);
-			pthread_mutex_unlock(&targ[i].wait);
-		}
-	}
-}
-
-void
 handler(int sig)
 {
 	pid_t	pid = getpid();
@@ -400,8 +381,7 @@ read_wbuf(int fd, wbuf *buf, xfs_mount_t *mp)
 	if (buf->length > buf->size)  {
 		do_warn(_("assert error:  buf->length = %d, buf->size = %d\n"),
 			buf->length, buf->size);
-		killall();
-		abort();
+		exit(1);
 	}
 
 	if ((res = read(fd, buf->data, buf->length)) < 0)  {
@@ -591,11 +571,6 @@ main(int argc, char **argv)
 
 	parent_pid = getpid();
 
-	if (atexit(killall))  {
-		do_log(_("%s: couldn't register atexit function.\n"), progname);
-		die_perror();
-	}
-
 	/* open up source -- is it a file? */
 
 	open_flags = O_RDONLY;
@@ -1154,9 +1129,6 @@ main(int argc, char **argv)
 	}
 
 	check_errors();
-	killall();
-	pthread_exit(NULL);
-	/*NOTREACHED*/
 	return 0;
 }
 
-- 
1.7.1


Version-Release number of selected component (if applicable):
xfsprogs-3.1.1

How reproducible:


Steps to Reproduce:
1. xfs_copy source target
2. echo $?
3.

Actual results:


Expected results:


Additional info:

Comment 2 Eric Sandeen 2014-05-22 16:58:05 UTC
Yep, may as well fix this.  It is committed upstream now:

http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/cmds/xfsprogs.git;a=commitdiff;h=2277ce35c37c75aa3c146261d5abe32f9cc39baa

Comment 4 Eryu Guan 2014-06-29 12:29:03 UTC
Verified with /kernel/filesystems/xfs/1104956-xfs_copy-corrupt, test passed with xfsprogs-3.1.1-16.el6

Comment 5 errata-xmlrpc 2014-10-14 07:49:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1564.html