Bug 77196

Summary:	GNU tar mishandled -EPIPE
Product:	[Fedora] Fedora	Reporter:	Russell King <rmk>
Component:	tar	Assignee:	Martin Stransky <stransky>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Ben Levenson <benl>
Severity:	high	Docs Contact:
Priority:	medium
Version:	1	CC:	jballes, redhat.com
Target Milestone:	---
Target Release:	---
Hardware:	i386
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2005-01-10 15:08:16 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Russell King 2002-11-02 21:55:37 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.0.0) Gecko/20020529

Description of problem:
When a script runs fron cron, it appears to be started with SIGPIPE
ignored, rather than setting SIGPIPE back to the default handling
(ie, killing off the process.)

This is a problem because if the output pipe of a "tar jcf" command
exits, tar will receive EPIPE errors for write() and receive SIGPIPE
signals.  Normally, the SIGPIPE will kill the tar process off.
However, with this bug in cron, the tar process continues running
indefinitely, and you end up with multiple tar processes eating lots
of CPU.


Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Create a script (as shown in "additional information")
2. Setup cron to run it every 5 minutes
3. After 20 minutes, count the number of tar processes actively eating CPU.
	

Actual Results:  Large number of tar processes eating CPU.

Expected Results:  tar should die off because SIGPIPE should NOT be ignored.

Additional info:

We're running: vixie-cron-3.0.1-64

Script:
#!/bin/sh
tar jcf /dev/full /bin

Comment 1 Jens Petersen 2002-11-11 06:30:03 UTC

Thanks for the report.

Comment 2 Peter Fales 2003-01-08 21:21:37 UTC

I think this may be related to a problem we're seeing.  If we use a cgi script
like this:

#!/bin/sh
echo "Content-type: application/gzip"
echo
tar cf - /bin /usr/bin 2> /dev/null | gzip --fast  2> /dev/null

and hit "stop" while the file is downloading, we're left with a tar process that
is eating up time.  strace shows that it is looping on SIGPIPE errors.  However,
this does *not* happen if we use a scratch built version of tar-1.13, it's only
a problem with  tar-1.13.25.  In examing the source code, I see that 1.13 has
some explicit checks to terminate if a SIGPIPE is received, but 1.13.25 does not
have those checks.

Comment 3 Jens Petersen 2003-01-09 05:18:44 UTC

Does this mean that this is in fact a tar bug then?

Comment 4 Peter Fales 2003-01-09 12:57:41 UTC

That's my suspicion.  I can't reproduce the problem from a command line shell
script, only from the CGI script.  (I can't reproduce the original problem
using cron, either).   So, I don't why cron/httpd changes the behavior, but
the fact that the version of tar makes a difference seems significant,
particularly since code dealing with SIGPIPE  has been removed from the failing
tar version.

Comment 5 Russell King 2003-01-13 17:37:28 UTC

You can't reproduce the bug from the command line because, by default,
SIGPIPE is not ignored.

The difference with certainly cron, and maybe httpd (I've not tested httpd)
is that they start the process with SIGPIPE ignored.  This isn't the "default
environment" for executing commands.

Note that tar 1.13.25 is a later version than tar 1.13, so maybe its a bug
that has been introduced sometime in the 1.13 development.  That said, how
many other programs are buggy wrt EPIPE?

Comment 6 Peter Fales 2003-01-13 17:45:58 UTC

That sounds plausible.  I did a quick comparison of the 1.13 and 1.13.25
sources.  1.13 has some code to handle SIGPIPE and terminate if it gets one.  
1.13.25 doesn not have this code.  So, that probably explains why the two
versions behave differently.  It's not clear to my why this change was made, but
maybe the tar developers would argue that the default handling of SIGPIPE is OK
for tar, so it's not a tar bug.  Is there a reason why cron needs to ignore
SIGPIPE?

Comment 7 Javier Ballesteros 2003-09-11 14:26:11 UTC

We find a similar problem when runnig tar in the cron, the command:

#tar -cvg /tmp/notmind -f /dev/st0 /

Running this command in the prompt it's OK, but when it is runned under the
cron, tar fails, just create a void index file and that's all. Doing an strace
over the command it sounds like SIGPIPE problem, in fact the process is killed
by the SIGPIPE signal, the output of strace catch into cron:

strace -o /tmp/STRACE tar -cvg /tmp/SALIDA --label "LABEL" -f /dev/st0 /

write(2, "\n", 1)                       = 1
lstat64("/etc/locale/sv", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
write(2, "tar: ", 5)                    = 5
write(2, "/etc/locale/sv: Directory is new", 32) = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) ---
+++ killed by SIGPIPE +++

Any solution to solve this problem, we are not able to make incremental backups
using tar with an index file, any solution?

Comment 8 Jens Petersen 2003-10-08 06:23:01 UTC

I seem unable to reproduce the original report with Fedora Core Test 2.
Could you try to see if you still the problem with with RHL 9 or the
current betas?

tar: Removing leading `/' from member names
tar (child): /dev/full: Wrote only 0 of 10240 bytes
tar (child): Error is not recoverable: exiting now
testjob: line 2:  3744 Broken pipe             tar jcf /dev/full /bin

Comment 9 Jens Petersen 2003-10-08 06:38:01 UTC

Erm, well actually I can't reproduce it on an up2date RHL 7.3 box either...

(I haven't tested the cgi problem.)

Comment 10 Peter Fales 2003-10-08 13:57:02 UTC

I can still reproduce the cgi problem on an RHL 9 system.  It behaves quite
consistently.   If we use /usr/bin/tar, and terminate the download early (e.g.
with "Cancel" in Mozilla), we are left with a tar process with a PPID of 1, and
eating up all the CPU time.  If we use our privately built version of tar-1.13,
then it works fine.  

I don't know if this matters, but we are using a privately built version of 
Apache 1.3.27, not the stock RHL 9 version.

Comment 11 Peter Fales 2004-04-26 19:56:45 UTC

Not sure what other problems this might introduce, but I was able to
solve our problems by adding one line to tar.c, right at the beginning
of main():

signal(SIGPIPE, SIG_DFL);

With tar no longer ignoring SIGPIPE, it terminates properly.

Comment 12 Jens Petersen 2004-04-26 21:52:37 UTC

Thanks a lot for comment 11.

I suspect the problem still exists in Fedora Core, but
if someone could confirm that is the case that would be
much appreciated. :)

Comment 13 Peter Fales 2004-04-27 02:41:40 UTC

Yes, Fedora Core 1 still has this problem.   On a nearly stock system,
I activated the web server, and put this script in /var/www/cgi-bin

#!/bin/sh
cat <<EOF
Content-type: text/plain

EOF
tar czvf - /boot  | count=100

After attempting to fetch this page, a tar process is left around 
eating up cpu time. (In "real life" we have a web pages that generate
tarballs dynamically.   We see this problem if someone runs one of
these scripts, and then cancels the download)

Comment 14 Jens Petersen 2004-04-27 04:42:16 UTC

Thanks a lot. :)  Presumably also in FC2 devel? :^/

Comment 15 Peter Fales 2004-04-30 19:16:02 UTC

I just ran the test (described in comment 13) on a Fedora Core 2-Test
3 system, and it has the same problem.

Comment 16 Alan Cox 2004-06-21 15:13:22 UTC

The cron problem appears gone (and is "undefined" anyway so its a bug
in your code 8)). The tar one is a bug in tar - its ignoring the
SIGPIPE write fails and not exiting when it gets EPIPE. Moving to tar

Comment 17 Martin Stransky 2005-01-10 15:08:16 UTC

It is fixed in tar-1.14 (FC3 and RHEL4).