Bug 440682 - wget aborts if invoked with --no-clobber when it encounters existing file
wget aborts if invoked with --no-clobber when it encounters existing file
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: wget (Show other bugs)
5.1
i386 Linux
low Severity medium
: rc
: ---
Assigned To: Karsten Hopp
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-04-04 09:59 EDT by Ivan Nejgebauer
Modified: 2015-12-31 16:26 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-05-28 14:24:56 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Patch for --no-clobber behavior in wget (347 bytes, patch)
2008-04-04 09:59 EDT, Ivan Nejgebauer
no flags Details | Diff

  None (edit)
Description Ivan Nejgebauer 2008-04-04 09:59:21 EDT
If wget (1.10.2-7.el5) is run with --no-clobber, it will abort after determining
that a file already exists locally. No further error is displayed. This can
always be reproduced:

    for i in 1 2;do wget -nc http://www.example.com/index.html;done

If the first retrieval was successful, the second will fail.

According to my quick look at wget sources as shipped with CentOS, this occurs
because of code rearrangement in the massive patch (wget-1.10.2-to11.patch)
which is applied to the original. In the patched version, the test for
"noclobber" is no longer within http_loop(), but one level deeper, in gethttp().
When this function returns RETROK after determining local existence of a file, a
switch statement doesn't know how to deal with that value, and aborts.

I created a quick-and-dirty patch which changes the return value from RETROK to
RETRUNNEEDED and should be applied after wget-1.10.2-to11.patch. After applying
that patch, wget no longer aborts in the circumstances described above. Someone
who knows more about wget should check for memory leaks and similar.
Comment 1 Ivan Nejgebauer 2008-04-04 09:59:21 EDT
Created attachment 300433 [details]
Patch for --no-clobber behavior in wget
Comment 2 Karsten Hopp 2008-04-28 11:17:50 EDT
this is expected behaviour, from the man page:
When running Wget without -N, -nc, -r, or p, downloading the same file in the
same directory will result in the original copy of file being preserved and the
second copy being named file.1.  If that file is downloaded yet again, the third
copy will be named file.2, and so on.  
When -nc is specified, this behavior is suppressed, and Wget will refuse to
download newer copies of file. Therefore, ""no-clobber"" is actually a misnomer
in this mode---it’s not clobbering that’s prevented (as the numeric suffixes
were already preventing clobbering), but rather the multiple version saving
that’s prevented.

Comment 3 Ivan Nejgebauer 2008-04-29 04:05:33 EDT
I am aware of that passage in the manual, but that was not what the report was
about. Skipping a file (or not creating another version, as the manual puts it)
is fine and expected; completely aborting the program when the first existing
file (out of possibly many) is encountered, not so.

Here is a slightly more real-life way to demonstrate the bug:

1. Populate a http-accessible directory with some files, e.g., half the
published RHEL 5 updates. Enable directory indexing for that directory.

2. Retrieve the contents with "wget -A rpm -r -np -nc http://example.com/path/"

3. Copy the rest of the updates to the directory.

4. Try to retrieve them with the above command. It fails.

That's how I found the bug -- trying to use wget for mirroring CentOS updates.
Comment 4 Jeff Burke 2009-05-28 14:23:17 EDT
wget-1.10.2-7.el5

This issue is causing additional problems in automation testing. When using the -nc option the application core dumps.

Simple test case, run this command twice in a row:
wget -nc http://www.redhat.com/legal/privacy_statement.html

$ wget -nc http://www.redhat.com/legal/privacy_statement.html
--14:20:55--  http://www.redhat.com/legal/privacy_statement.html
Resolving www.redhat.com... 209.132.177.50
Connecting to www.redhat.com|209.132.177.50|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 24615 (24K) [text/html]
Saving to: `privacy_statement.html'

100%[======================================================================================================================================================================>] 24,615       139K/s   in 0.2s   

14:20:55 (139 KB/s) - `privacy_statement.html' saved [24615/24615]

$ wget -nc http://www.redhat.com/legal/privacy_statement.html
--14:20:57--  http://www.redhat.com/legal/privacy_statement.html
Resolving www.redhat.com... 209.132.177.50
Connecting to www.redhat.com|209.132.177.50|:80... connected.
HTTP request sent, awaiting response... 200 OK
File `privacy_statement.html' already there; not retrieving.

Aborted (core dumped)
Comment 6 Petr Šplíchal 2009-06-24 08:13:44 EDT
Bug reproduced, fix verified in wget-1.11.4-2.el5.
This issue will be fixed in the upcoming 5.4 release.

Note You need to log in before you can comment on or make changes to this bug.