Bug 77196
Summary: | GNU tar mishandled -EPIPE | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Russell King <rmk> |
Component: | tar | Assignee: | Martin Stransky <stransky> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Ben Levenson <benl> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 1 | CC: | jballes, redhat.com |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i386 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2005-01-10 15:08:16 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Russell King
2002-11-02 21:55:37 UTC
Thanks for the report. I think this may be related to a problem we're seeing. If we use a cgi script like this: #!/bin/sh echo "Content-type: application/gzip" echo tar cf - /bin /usr/bin 2> /dev/null | gzip --fast 2> /dev/null and hit "stop" while the file is downloading, we're left with a tar process that is eating up time. strace shows that it is looping on SIGPIPE errors. However, this does *not* happen if we use a scratch built version of tar-1.13, it's only a problem with tar-1.13.25. In examing the source code, I see that 1.13 has some explicit checks to terminate if a SIGPIPE is received, but 1.13.25 does not have those checks. Does this mean that this is in fact a tar bug then? That's my suspicion. I can't reproduce the problem from a command line shell script, only from the CGI script. (I can't reproduce the original problem using cron, either). So, I don't why cron/httpd changes the behavior, but the fact that the version of tar makes a difference seems significant, particularly since code dealing with SIGPIPE has been removed from the failing tar version. You can't reproduce the bug from the command line because, by default, SIGPIPE is not ignored. The difference with certainly cron, and maybe httpd (I've not tested httpd) is that they start the process with SIGPIPE ignored. This isn't the "default environment" for executing commands. Note that tar 1.13.25 is a later version than tar 1.13, so maybe its a bug that has been introduced sometime in the 1.13 development. That said, how many other programs are buggy wrt EPIPE? That sounds plausible. I did a quick comparison of the 1.13 and 1.13.25 sources. 1.13 has some code to handle SIGPIPE and terminate if it gets one. 1.13.25 doesn not have this code. So, that probably explains why the two versions behave differently. It's not clear to my why this change was made, but maybe the tar developers would argue that the default handling of SIGPIPE is OK for tar, so it's not a tar bug. Is there a reason why cron needs to ignore SIGPIPE? We find a similar problem when runnig tar in the cron, the command: #tar -cvg /tmp/notmind -f /dev/st0 / Running this command in the prompt it's OK, but when it is runned under the cron, tar fails, just create a void index file and that's all. Doing an strace over the command it sounds like SIGPIPE problem, in fact the process is killed by the SIGPIPE signal, the output of strace catch into cron: strace -o /tmp/STRACE tar -cvg /tmp/SALIDA --label "LABEL" -f /dev/st0 / write(2, "\n", 1) = 1 lstat64("/etc/locale/sv", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 write(2, "tar: ", 5) = 5 write(2, "/etc/locale/sv: Directory is new", 32) = -1 EPIPE (Broken pipe) --- SIGPIPE (Broken pipe) --- +++ killed by SIGPIPE +++ Any solution to solve this problem, we are not able to make incremental backups using tar with an index file, any solution? I seem unable to reproduce the original report with Fedora Core Test 2. Could you try to see if you still the problem with with RHL 9 or the current betas? tar: Removing leading `/' from member names tar (child): /dev/full: Wrote only 0 of 10240 bytes tar (child): Error is not recoverable: exiting now testjob: line 2: 3744 Broken pipe tar jcf /dev/full /bin Erm, well actually I can't reproduce it on an up2date RHL 7.3 box either... (I haven't tested the cgi problem.) I can still reproduce the cgi problem on an RHL 9 system. It behaves quite consistently. If we use /usr/bin/tar, and terminate the download early (e.g. with "Cancel" in Mozilla), we are left with a tar process with a PPID of 1, and eating up all the CPU time. If we use our privately built version of tar-1.13, then it works fine. I don't know if this matters, but we are using a privately built version of Apache 1.3.27, not the stock RHL 9 version. Not sure what other problems this might introduce, but I was able to solve our problems by adding one line to tar.c, right at the beginning of main(): signal(SIGPIPE, SIG_DFL); With tar no longer ignoring SIGPIPE, it terminates properly. Thanks a lot for comment 11. I suspect the problem still exists in Fedora Core, but if someone could confirm that is the case that would be much appreciated. :) Yes, Fedora Core 1 still has this problem. On a nearly stock system, I activated the web server, and put this script in /var/www/cgi-bin #!/bin/sh cat <<EOF Content-type: text/plain EOF tar czvf - /boot | count=100 After attempting to fetch this page, a tar process is left around eating up cpu time. (In "real life" we have a web pages that generate tarballs dynamically. We see this problem if someone runs one of these scripts, and then cancels the download) Thanks a lot. :) Presumably also in FC2 devel? :^/ I just ran the test (described in comment 13) on a Fedora Core 2-Test 3 system, and it has the same problem. The cron problem appears gone (and is "undefined" anyway so its a bug in your code 8)). The tar one is a bug in tar - its ignoring the SIGPIPE write fails and not exiting when it gets EPIPE. Moving to tar It is fixed in tar-1.14 (FC3 and RHEL4). |