Bug 440682

Summary: wget aborts if invoked with --no-clobber when it encounters existing file
Product: Red Hat Enterprise Linux 5 Reporter: Ivan Nejgebauer <inejge>
Component: wgetAssignee: Karsten Hopp <karsten>
Status: CLOSED NOTABUG QA Contact:
Severity: medium Docs Contact:
Priority: low    
Version: 5.1CC: as.kmr.sinh+redhat, jburke, psplicha
Target Milestone: rcKeywords: Reopened
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-05-28 18:24:56 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Patch for --no-clobber behavior in wget none

Description Ivan Nejgebauer 2008-04-04 13:59:21 UTC
If wget (1.10.2-7.el5) is run with --no-clobber, it will abort after determining
that a file already exists locally. No further error is displayed. This can
always be reproduced:

    for i in 1 2;do wget -nc http://www.example.com/index.html;done

If the first retrieval was successful, the second will fail.

According to my quick look at wget sources as shipped with CentOS, this occurs
because of code rearrangement in the massive patch (wget-1.10.2-to11.patch)
which is applied to the original. In the patched version, the test for
"noclobber" is no longer within http_loop(), but one level deeper, in gethttp().
When this function returns RETROK after determining local existence of a file, a
switch statement doesn't know how to deal with that value, and aborts.

I created a quick-and-dirty patch which changes the return value from RETROK to
RETRUNNEEDED and should be applied after wget-1.10.2-to11.patch. After applying
that patch, wget no longer aborts in the circumstances described above. Someone
who knows more about wget should check for memory leaks and similar.

Comment 1 Ivan Nejgebauer 2008-04-04 13:59:21 UTC
Created attachment 300433 [details]
Patch for --no-clobber behavior in wget

Comment 2 Karsten Hopp 2008-04-28 15:17:50 UTC
this is expected behaviour, from the man page:
When running Wget without -N, -nc, -r, or p, downloading the same file in the
same directory will result in the original copy of file being preserved and the
second copy being named file.1.  If that file is downloaded yet again, the third
copy will be named file.2, and so on.  
When -nc is specified, this behavior is suppressed, and Wget will refuse to
download newer copies of file. Therefore, ""no-clobber"" is actually a misnomer
in this mode---it’s not clobbering that’s prevented (as the numeric suffixes
were already preventing clobbering), but rather the multiple version saving
that’s prevented.



Comment 3 Ivan Nejgebauer 2008-04-29 08:05:33 UTC
I am aware of that passage in the manual, but that was not what the report was
about. Skipping a file (or not creating another version, as the manual puts it)
is fine and expected; completely aborting the program when the first existing
file (out of possibly many) is encountered, not so.

Here is a slightly more real-life way to demonstrate the bug:

1. Populate a http-accessible directory with some files, e.g., half the
published RHEL 5 updates. Enable directory indexing for that directory.

2. Retrieve the contents with "wget -A rpm -r -np -nc http://example.com/path/"

3. Copy the rest of the updates to the directory.

4. Try to retrieve them with the above command. It fails.

That's how I found the bug -- trying to use wget for mirroring CentOS updates.

Comment 4 Jeff Burke 2009-05-28 18:23:17 UTC
wget-1.10.2-7.el5

This issue is causing additional problems in automation testing. When using the -nc option the application core dumps.

Simple test case, run this command twice in a row:
wget -nc http://www.redhat.com/legal/privacy_statement.html

$ wget -nc http://www.redhat.com/legal/privacy_statement.html
--14:20:55--  http://www.redhat.com/legal/privacy_statement.html
Resolving www.redhat.com... 209.132.177.50
Connecting to www.redhat.com|209.132.177.50|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 24615 (24K) [text/html]
Saving to: `privacy_statement.html'

100%[======================================================================================================================================================================>] 24,615       139K/s   in 0.2s   

14:20:55 (139 KB/s) - `privacy_statement.html' saved [24615/24615]

$ wget -nc http://www.redhat.com/legal/privacy_statement.html
--14:20:57--  http://www.redhat.com/legal/privacy_statement.html
Resolving www.redhat.com... 209.132.177.50
Connecting to www.redhat.com|209.132.177.50|:80... connected.
HTTP request sent, awaiting response... 200 OK
File `privacy_statement.html' already there; not retrieving.

Aborted (core dumped)

Comment 6 Petr Šplíchal 2009-06-24 12:13:44 UTC
Bug reproduced, fix verified in wget-1.11.4-2.el5.
This issue will be fixed in the upcoming 5.4 release.