Bug 649347

Summary: Fedora 14 anaconda fails to retrieve filelists.xml.gz from FTP installation repository.
Product: [Fedora] Fedora Reporter: Simon <number.cruncher>
Component: curlAssignee: Kamil Dudka <kdudka>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: 14CC: jeff, jonathan, kdudka, kklic, paul, vanmeeuwen+fedora
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: curl-7.20.1-5.fc13 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-11-28 20:41:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
FTP server network trace during install attempt
none
New network trace
none
Network trace with bz643656.img
none
Network trace on Retry
none
gzip compressed cpio archive with patched libcurl (x86_64)
none
New network trace with latest updates image
none
ext2 floppy image with patched libcurl (x86_64)
none
Network trace showing timeout-related FTP issue none

Description Simon 2010-11-03 15:28:39 UTC
Created attachment 457469 [details]
FTP server network trace during install attempt

Description of problem:
Anaconda fails to retrieve filelists.xml.gz from FTP repository and doesn't send FTP request to server, but simply reports [Error 12] Timeout

Version-Release number of selected component (if applicable):


How reproducible:
Always


Steps to Reproduce:
1. Started F14 via PXE boot with kickstart file pointing at local ftp server
2. After making package selection, starts to check package dependencies
3. Fails to retrieve filelists
  
Actual results:
Graphical window: "Unable to read package metadata."
Log console: Try 1/10 for ftp://....filelists.xml.gz failed: [Error 12] Timeout on ....


Expected results:
Downloads from FTP and continues installation.

Additional info:
Wireshark trace attached.

Comment 1 Simon 2010-11-03 15:39:49 UTC
Having changed the kickstart url from "ftp://ftp//install/F14/...." to "ftp://ftp.cfs/install/F14/...." (fqdn and no extra forward slash after host name), this now seems to be working.

However, I'm still mystified by the network behaviour which clearly saw anaconda connect to the correct FTP server, but then fail to issue a request....?

Comment 2 Jeffrey C. Ollie 2010-11-03 15:53:38 UTC
I'm assigning this to curl since pycurl is pretty much just a thin shim over curl.

Comment 3 Simon 2010-11-03 16:02:58 UTC
OK. Now its failing to download the first RPM. Seems to be a problem when the installation takes a while (because I'm choosing the specific packages to install). My package selection takes about 200 seconds, with no FTP traffic, then the client sends FTP "Request: QUIT" followed by [RST, ACK], followed by a new [SYN] connection.

Comment 4 Simon 2010-11-03 16:04:33 UTC
Created attachment 457485 [details]
New network trace

Adding attachment showing connection history between client (.118) and FTP server (.121)

Comment 5 Kamil Dudka 2010-11-03 17:44:06 UTC
It may be already fixed:

https://github.com/bagder/curl/commit/0c8e5f7

Are you able to test the following updates image?  attachment #454454 [details]

http://fedoraproject.org/wiki/Anaconda/Updates

Comment 6 Simon 2010-11-04 09:16:38 UTC
I've tried the updates image bz643656.img and this started retrieving RPMs but again fails, this time with pam_pkcs11. The anaconda log reports:
Try 1/10 for ... failed: [Errno 14] FTP Error 229

I'll attach another network trace.

Comment 7 Simon 2010-11-04 09:17:34 UTC
Created attachment 457738 [details]
Network trace with bz643656.img

Comment 8 Simon 2010-11-04 09:18:32 UTC
Note that the server is running vsftpd-2.0.5-16.el5_5.1 from CentOS 5.5

Comment 9 Simon 2010-11-04 09:28:10 UTC
Created attachment 457741 [details]
Network trace on Retry

Attaching network trace of what happens when clicking "Retry" in anaconda GUI. Doesn't seem to send _any_ FTP requests for the supposedly corrupt/missing file.

Comment 10 Kamil Dudka 2010-11-04 10:13:12 UTC
Chris, are you able to narrow it down to a curl-based minimal example?

Comment 11 Simon 2010-11-04 13:19:40 UTC
Another symptom might have been reported in https://bugzilla.redhat.com/show_bug.cgi?id=623067

Comment 12 Simon 2010-11-04 16:50:11 UTC
Duplicate issue: https://bugzilla.redhat.com/show_bug.cgi?id=624431

Comment 13 Kamil Dudka 2010-11-04 17:30:00 UTC
the corresponding thread at upstream ML:

http://thread.gmane.org/gmane.comp.web.curl.library/29663

Comment 15 Kamil Dudka 2010-11-04 21:45:23 UTC
Created attachment 457943 [details]
gzip compressed cpio archive with patched libcurl (x86_64)

Thanks in advance for testing it!

Comment 16 Simon 2010-11-05 09:27:13 UTC
I getting the same behaviour as originally; can't FTP the filelists.xml.gz. I'm not sure the updates mechanism is working.

anaconda.log has:
09:08:49,828 INFO loader: file location: ftp://ftp/install/F14/x86_64/images/bz649347.img
09:08:49,828 INFO loader: transferring ftp://ftp/install/F14/x86_64/images/bz649347.img
09:08:49,871 ERR loader: failed to mount loopback device /dev/loop7 on /tmp/update-disk as /tmp/updates.img: mount: you must specify the filesystem type
09:08:50,065 INFO loader: Running anaconda script /usr/bin/anaconda

But /tmp/updates does contain the new libcurl libraries (.so size 402456 bytes). How do I know if anaconda is using the new updates?

Comment 17 Simon 2010-11-05 09:35:00 UTC
Created attachment 458038 [details]
New network trace with latest updates image

Latest network trace for start of installation phase. Noticed that the first packet is a FTP QUIT from a previous connection that must have been idle for the 3 or 4 minutes that I took choosing packages, etc.

Comment 18 Kamil Dudka 2010-11-05 10:37:00 UTC
(In reply to comment #16)
> 09:08:49,871 ERR loader: failed to mount loopback device /dev/loop7 on
> /tmp/update-disk as /tmp/updates.img: mount: you must specify the filesystem
> type

The error message above is expected.  I guess Anaconda first tries to mount ext2 using a loopback device.  If it fails, it tries cpio ... but I am just guessing, Anaconda is a black box for me.

(In reply to comment #17)
> Created attachment 458038 [details]
> New network trace with latest updates image
> 
> Latest network trace for start of installation phase. Noticed that the first
> packet is a FTP QUIT from a previous connection that must have been idle for
> the 3 or 4 minutes that I took choosing packages, etc.

I can't see anything wrong in the trace, but it reminded me that we should probably send QUIT after ABOR before we close the connection.  I'll send an updated patch to the upstream mailing list.

Any chance to repeat the failure out of Anaconda?  It's nearly impossible to debug it in there...

Comment 19 Kamil Dudka 2010-11-05 11:50:55 UTC
Strange, I've just successfully installed F-14 from ftp://ftp.funet.fi with that image.  It wasn't possible with the previous one.  Is there any public FTP server I can try the update against?

Comment 20 Simon 2010-11-05 12:00:03 UTC
Yes, I've just completed a successful install upon a second attempt. I'm now trying the reproduce the error ;-)

Comment 21 Kamil Dudka 2010-11-05 12:03:07 UTC
Created attachment 458054 [details]
ext2 floppy image with patched libcurl (x86_64)

The attached image uses the latest version of the patch and should avoid mount error in anaconda's log.

Comment 22 Simon 2010-11-05 12:18:09 UTC
Created attachment 458056 [details]
Network trace showing timeout-related FTP issue

OK, I've reproduced the problem and it seems to be related to a timeout.

1. Start installation process, last to download from FTP is *-Fedora-14-comps.xml.gz

2. No ABOR is sent, and the RETR finishes with server "226 File Send OK".

3. FTP session remains connected for another 5 minutes whilst I make package selection/twiddle thumbs

4. 300 seconds later the server sends "421 Timeout" and pushes [FIN, ACK].

5. Eventually I finish package selection. Client sends QUIT request to a closed server port and gets a [RST] response.

6. Client sends new SYN packet to FTP, but now seems to be out of sync with expected server responses and never sends a RETR

If I'm quick and never let the server timeout, installation is successful.

Comment 23 Simon 2010-11-05 12:23:28 UTC
"Retry" button in anaconda make no difference, the client never sends any requests/login to the server after the initial connection.

I'll bump the local servers idle_session_timeout for now...

Comment 24 Simon 2010-11-05 12:32:50 UTC
I can confirm that with the latest updates image,
https://bugzilla.redhat.com/attachment.cgi?id=458054

the timeout issue still remains. If the server session timeout triggers, libcurl doesn't seem to be in the correct state.

Comment 25 Kamil Dudka 2010-11-05 12:58:36 UTC
If there is no ABOR sent, the patch can't make any difference.  It's simply yet another issue.  We should track it separately...

Comment 26 Simon 2010-11-05 14:10:59 UTC
I agree, this is an orthogonal issue to the ABOR, but it probably is the root cause of the particular symptoms I observed. My FTP server never original hung on the disconnects unlike the Pure-FTPD of https://bugzilla.redhat.com/show_bug.cgi?id=643656.

The timeout problem explains my first and third comments; https://bugzilla.redhat.com/show_bug.cgi?id=649347#c1
https://bugzilla.redhat.com/show_bug.cgi?id=649347#c3

Comment 27 Kamil Dudka 2010-11-05 14:34:18 UTC
(In reply to comment #26)
> I agree, this is an orthogonal issue to the ABOR, but it probably is the root
> cause of the particular symptoms I observed. My FTP server never original hung
> on the disconnects unlike the Pure-FTPD of
> https://bugzilla.redhat.com/show_bug.cgi?id=643656.

Do you mean it works also without any updates image, if only you are fast enough in selection of the packages to install?

Comment 28 Simon 2010-11-05 15:13:20 UTC
(In reply to comment #27)
> 
> Do you mean it works also without any updates image, if only you are fast
> enough in selection of the packages to install?

Yes, I did manage to install successfully once when not customizing packages. It's always confusing when there are two separate bugs. The original updates image which I assumed would help, https://bugzilla.redhat.com/attachment.cgi?id=454454 , failed not because of the timeout (when trying to download filelists.xml.gz), but when trying to download some packages (crontab or pam-something), because of the new ABOR.

So to clarify, the original symptoms were most likely caused by the server session timeout not being handled gracefully by curl (can't start installation). The other symptoms (can't finish installation), were cause by the ABOR patch....

Comment 29 Kamil Dudka 2010-11-05 15:32:44 UTC
So the patch fixes the ABOR issue successfully (without re-introducing bug 643656).  Please open a new ticket for the timeout issue with some steps to reproduce.  I'll give it a try next week.

Comment 30 Simon 2010-11-05 15:57:18 UTC
New bug at:
https://bugzilla.redhat.com/show_bug.cgi?id=650255

For now, I can successfully install having raised the FTP idle timeout. I'm using your latest patch without any issues.

Comment 31 Kamil Dudka 2010-11-08 20:28:13 UTC
Thank you for opening the new bug.  I'll get to it probably later this week.  For now, I've pushed the fix for the already resolved one:

https://github.com/bagder/curl/commit/c6b97a8

Comment 32 Kamil Dudka 2010-11-09 17:33:14 UTC
fixed in curl-7.21.2-4.fc15

Comment 33 Kamil Dudka 2010-11-10 09:53:33 UTC
Created attachment 459386 [details]
ext2 floppy image with latest libcurl (x86_64)

Comment 34 Fedora Update System 2010-11-26 14:31:44 UTC
curl-7.21.0-6.fc14 has been submitted as an update for Fedora 14.
https://admin.fedoraproject.org/updates/curl-7.21.0-6.fc14

Comment 35 Fedora Update System 2010-11-26 15:10:39 UTC
curl-7.20.1-5.fc13 has been submitted as an update for Fedora 13.
https://admin.fedoraproject.org/updates/curl-7.20.1-5.fc13

Comment 36 Fedora Update System 2010-11-26 21:05:01 UTC
curl-7.21.0-6.fc14 has been pushed to the Fedora 14 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update curl'.  You can provide feedback for this update here: https://admin.fedoraproject.org/updates/curl-7.21.0-6.fc14

Comment 37 Fedora Update System 2010-11-28 20:41:17 UTC
curl-7.21.0-6.fc14 has been pushed to the Fedora 14 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 38 Fedora Update System 2010-12-03 20:38:46 UTC
curl-7.20.1-5.fc13 has been pushed to the Fedora 13 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 39 Kamil Dudka 2010-12-22 00:23:06 UTC
FYI a new FTP command has been proposed by Anthony Bryan, Tatsuhiro Tsujikawa and Daniel Stenberg for exactly this purpose:

http://daniel.haxx.se/blog/2010/12/20/byte-ranges-for-ftp

The IETF draft is available at:

http://www.ietf.org/id/draft-bryan-ftp-range-00.txt