Hide Forgot
Description of problem: xfs_copy used SIGKILL to kill its child thread before exit, that will end the whole process, so xfs_copy will exit with an error code 137. That will confuse script whether it successes. This can fix it. From b3580a15e10e153d7443a2e0c05f570d94b9b5a6 Mon Sep 17 00:00:00 2001 From: Junxiao Bi <junxiao.bi@oracle.com> Date: Tue, 6 May 2014 14:27:31 +0800 Subject: [PATCH] xfsprogs: xfs_copy: use exit() to replace killall() Sending a SIGKILL signal to child thread will terminate the whole process, xfs_copy will return an error value 137. This cause confuse for script to know whether the copy successes. Calling exit() in main thread can terminate the whole process and return the right value. Replace killall()+abort() with exit(1) to match the old way exit in error case. Also remove killall()+pthread_exit(NULL) since return 0 will be followed by an exit(0) to terminate the process. Bug story from Christoph Hellwig: Btw, I think the reason for this cruft is that xfs_copy was originally written using the IRIX sproc interface, and the port to pthreads didn't remove this gem: http://marc.info/?l=linux-xfs&m=99535721110020&w=2 Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joe jin <joe.jin@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Haxby <john.haxby@oracle.com> Reviewed-by: Ethan Zhao <ethan.zhao@oracle.com> --- copy/xfs_copy.c | 30 +----------------------------- 1 files changed, 1 insertions(+), 29 deletions(-) diff --git a/copy/xfs_copy.c b/copy/xfs_copy.c index 39517da..39bb9d7 100644 --- a/copy/xfs_copy.c +++ b/copy/xfs_copy.c @@ -217,25 +217,6 @@ handle_error: } void -killall(void) -{ - int i; - - /* only the parent gets to kill things */ - - if (getpid() != parent_pid) - return; - - for (i = 0; i < num_targets; i++) { - if (target[i].state == ACTIVE) { - /* kill up target threads */ - pthread_kill(target[i].pid, SIGKILL); - pthread_mutex_unlock(&targ[i].wait); - } - } -} - -void handler(int sig) { pid_t pid = getpid(); @@ -400,8 +381,7 @@ read_wbuf(int fd, wbuf *buf, xfs_mount_t *mp) if (buf->length > buf->size) { do_warn(_("assert error: buf->length = %d, buf->size = %d\n"), buf->length, buf->size); - killall(); - abort(); + exit(1); } if ((res = read(fd, buf->data, buf->length)) < 0) { @@ -591,11 +571,6 @@ main(int argc, char **argv) parent_pid = getpid(); - if (atexit(killall)) { - do_log(_("%s: couldn't register atexit function.\n"), progname); - die_perror(); - } - /* open up source -- is it a file? */ open_flags = O_RDONLY; @@ -1154,9 +1129,6 @@ main(int argc, char **argv) } check_errors(); - killall(); - pthread_exit(NULL); - /*NOTREACHED*/ return 0; } -- 1.7.1 Version-Release number of selected component (if applicable): xfsprogs-3.1.1 How reproducible: Steps to Reproduce: 1. xfs_copy source target 2. echo $? 3. Actual results: Expected results: Additional info:
Yep, may as well fix this. It is committed upstream now: http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/cmds/xfsprogs.git;a=commitdiff;h=2277ce35c37c75aa3c146261d5abe32f9cc39baa
Verified with /kernel/filesystems/xfs/1104956-xfs_copy-corrupt, test passed with xfsprogs-3.1.1-16.el6
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-1564.html