Bug 650255
Description
Simon
2010-11-05 15:56:17 UTC
I was playing with that and it seems to work in most cases. I set up a testing FTP server (proftpd) with idle timeout set to 10: # grep Idle /etc/proftpd.conf TimeoutIdle 10 Then I prepared a testing client to repeat the issue. It performs two FTP downloads using a specified delay between them. If the delay is 8 or 12, it works fine. If the delay is 10, it does not. Am I observing the same issue as you with Anaconda? Created attachment 459098 [details]
a testing client
Created attachment 459099 [details] output of the testing client taken as; $ curl -sJO 'https://bugzilla.redhat.com/attachment.cgi?id=459098' $ sh bz650255.c 2>&1 | tee bz650255.log Comment on attachment 459098 [details]
a testing client
Oops, one typo in there, but it still works:
s/exho/echo/
Comment on attachment 459099 [details] output of the testing client The following part of the log is interesting for us: > * Re-using existing connection! (#0) with host localhost > * Connected to localhost (127.0.0.1) port 21 (#0) > * Request has same path as previous transfer > > EPSV > * Connect data stream passively > < 421 Idle timeout (10 seconds): closing control connection > * disabling EPSV usage > > PASV > * Recv failure: Connection reset by peer > * Closing connection #0 > * Failure when receiving data from the peer > + exit 56 Created attachment 459106 [details]
Client output connecting to VSFTPD server
Output when re-connecting before timeout.
Could you please try to adjust the timeout, such that it reflects your FTP server's timeout? diff --git a/bz650255.c b/bz650255.c index ca11705..89de371 100644 --- a/bz650255.c +++ b/bz650255.c @@ -1,12 +1,12 @@ #if 0 gcc -pedantic -Wall -Wextra -O0 -g bz650255.c -lcurl || exit $? -./a.out 8 || exit $? -./a.out 12 || exit $? +./a.out 58 || exit $? +./a.out 62 || exit $? echo echo === still OK === -exho +echo set -x -./a.out 10 +./a.out 60 exit $? #endif Created attachment 459170 [details]
proposed fix
The attached fix at least helps to return a more appropriate error code in case of the timeout on server's side. I am not sure if libcurl itself should try to reconnect in that case... Daniel, what do you think about the issue? Is the behavior somehow specified in libcurl's API? I am having a hard time reproducing this with your client code. Perhaps the anaconda code is taking a different code path? Can't fault the above fix, though! Well, let's fix the issue with proftpd and my reproducer for now. Then we can retest with Anaconda. Based on your description, I suspect that Anaconda holds a libcurl handle across all the user interaction during the package selection. To be honest, I don't know if we are supposed to handle such cases transparently within libcurl and silently reconnect on server's idle timeout ... but the response in case of my reproducer looks wrong to me. (In reply to comment #9) > The attached fix at least helps to return a more appropriate error code in case > of the timeout on server's side. I am not sure if libcurl itself should try to > reconnect in that case... > > Daniel, what do you think about the issue? > > Is the behavior somehow specified in libcurl's API? I think that's a fair approach. 421 is documented in RFC959 to be exactly "we're closing down the control connection" and it can come at any time so it makes sense to put the logic in the common response reader function. I don't think libcurl needs to/should reconnect by itself. Please go ahead and commit this upstream as well! Daniel, thanks for looking at the issue. I've just pushed the patch upstream: https://github.com/bagder/curl/commit/12b2412 Anaconda developers have IMO two choices: 1) avoid holding the libcurl handle during packages selection 2) pick up the error code and reconnect on their own responsibility eventually fixed in curl-7.21.2-4.fc15 Such a change would need to be made somewhere farther down in the pile, like either in yum or python-urlgrabber. anaconda does not directly work with libcurl in the package downloading portion, aside from resetting the name server cache stuff. What really frustrated me was the offer to "Retry" the original FTP retrieval with no success. libcurl wasn't actually sending a new, clean request. See https://bugzilla.redhat.com/show_bug.cgi?id=649347#c9 Perhaps this latest patch would fix this behaviour, i.e. timeout cause initial retrieval failure, but subsequent "Retry" in anaconda would then succeed. This would be an acceptable behaviour. Is there any chance of a new updates image for libcurl, that I could test against my F14 install? If that kind of behavior is _not_ fixed with the patch mentioned in here, then there's a separate problem that needs to be dealt with as well... Created attachment 459386 [details] ext2 floppy image with latest libcurl (x86_64) (In reply to comment #16) > Is there any chance of a new updates image for libcurl, that I could test > against my F14 install? Sure thing, testing is appreciated! Created attachment 459453 [details]
FTP timeout during anaconda with curl patch
Still getting exactly the same observed behaviour as originally.
The network trace shows the server sending 421 followed by a TCP connection closure (FIN) which is duly ACK'd by the client TCP stack.
When the curl lib is invoked later, it sends a QUIT to the server *on a port which had previously ACK'd closure* (50816 src port in the full trace, but not shown in the FTP Request: QUIT packet).
The server tries to send a response to QUIT, but gets a connection reset.
I'm no TCP expert; but perhaps the application layer (libcurl) should just close the socket upon a 421 and not try to send new data.
I should add that the "Retry" fails (hence multiple connections with no request in https://bugzilla.redhat.com/attachment.cgi?id=459453 ) I've just added test case 1120 to the curl test suite upstreanm but it doesn't show that behavior... Created attachment 460042 [details]
follow-up fix, solution proposed by Daniel Stenberg
Thank you for the testing. I've attached yet another patch on top of that, which may fix it. Unfortunately I wasn't successful in reproducing the bug myself with proftpd. A new updates image for Anaconda will be available shortly.
Created attachment 460052 [details] ext2 floppy image with latest libcurl (x86_64), including attachment #460042 [details] Created attachment 460092 [details]
Network trace following FTP timeout
Still no luck. Different behaviour (no QUIT sent), but still no successful retry. Is this an anaconda problem, i.e. what gets called when the GUI Retry button is pressed?
(In reply to comment #24) > Still no luck. Different behaviour (no QUIT sent), but still no successful > retry. Is this an anaconda problem, i.e. what gets called when the GUI Retry > button is pressed? Chris, could you please answer ^^^? There are several places where there's a Retry button for fetching metadata, and in all of them we just jump right back into yum. That is, there's nothing going on in anaconda here. Let's look at them all: def doGroupSetup(self, anaconda): while True: try: self.ayum.doGroupSetup() except (GroupsError, NoSuchGroup, RepoError) as e: buttons = [_("_Exit installer"), _("_Retry")] log.error("Unable to read group information: %s" % e) else: break # success rc = anaconda.intf.messageWindow(_("Error"), _("Unable to read group information " "from repositories. This is " "a problem with the generation " "of your install tree."), type="custom", custom_icon="error", custom_buttons = buttons) if rc == 0: sys.exit(0) else: self.ayum._setGroups(None) continue There, self.ayum.doGroupSetup is just the superclass method in yum. while True: try: fn(repo) if callback: callback.disconnect() except RepoError, e: if callback: callback.disconnect() buttons = [_("_Exit installer"), _("Edit"), _("_Retry")] else: break # success if anaconda.ksdata: buttons.append(_("_Continue")) if not fatalerrors: raise RepoError, e rc = anaconda.intf.messageWindow(_("Error"), _("Unable to read package metadata. This may be " "due to a missing repodata directory. Please " "ensure that your install tree has been " "correctly generated.\n\n%s" % e), type="custom", custom_icon="error", custom_buttons=buttons) if rc == 0: # abort sys.exit(0) elif rc == 1: # edit anaconda.intf.editRepoWindow(repo) break elif rc == 2: # retry, but only if button is present continue else: # continue, but only if button is present self.ayum.repos.delete(repo.id) break Here, fn is one of self.ayum.doRepoSetup or self.ayum.doSackSetup. So on Retry, we just jump back and call whichever of those functions we're processing again. And again, those are just the superclass's methods in yum. while True: try: (code, msgs) = self.ayum.buildTransaction() ... except RepoError, e: # FIXME: would be nice to be able to recover here rc = anaconda.intf.messageWindow(_("Error"), _("Unable to read package metadata. This may be " "due to a missing repodata directory. Please " "ensure that your install tree has been " "correctly generated.\n\n%s" % e), type="custom", custom_icon="error", custom_buttons=[_("_Exit installer"), _("_Retry")]) dscb.pop() if rc == 0: sys.exit(0) else: continue And this last one does the same thing - we just jump back and retry calling the superclass's method again. anaconda doesn't actually do anything directly for downloading. It's all in yum or urlgrabber. Created attachment 460111 [details]
minimal example
Simon, could you please try this minimal example against your server and attach its output? You need to either set the timeout <16s, or increase the delay within the test.
okay it's not anaconda. The trick is that anaconda is using the header-retrieving code in yum and nothing else does that anymore. so it is definitely not code that's in anaconda -it's just code that only anaconda ends up using. Seth, how are you confirming that the observed behavior (wrongly handled session idle timeout) is related to the "header-retrieving code"? Created attachment 460518 [details]
Simple client 460052 output
F14 libcurl behaviour following timeout (libcurl-7.21.0-5.fc14.x86_64)
F14 libcurl does not contain the fix. Please try libcurl from the updates image: # curl -JO 'https://bugzilla.redhat.com/attachment.cgi?id=460052' # mkdir updates # mount -o loop bz650255.img updates/ # curl -o bz650255.c 'https://bugzilla.redhat.com/attachment.cgi?id=460111' # LD_PRELOAD=$PWD/updates/libcurl.so sh bz650255.c # umount updates/ Created attachment 460542 [details]
Simple client 460052 client output with patched libcurl
No discernible change. Seems to work fine here.
Comment on attachment 460542 [details] Simple client 460052 client output with patched libcurl >* Connection #0 seems to be dead! >* Closing connection #0 >* About to connect() to ftp port 21 (#0) >* Trying 192.168.10.121... * connected >* Connected to ftp (192.168.10.121) port 21 (#0) >< 220- The difference is at this point ^^^. In contrast to F-14 libcurl, there is no QUIT sent. Unfortunately it means that my test-case was not reflecting the Anaconda problem. a more generic version of attachment #460042 [details] proposed upstream: http://thread.gmane.org/gmane.comp.web.curl.library/29855 curl-7.21.0-6.fc14 has been submitted as an update for Fedora 14. https://admin.fedoraproject.org/updates/curl-7.21.0-6.fc14 curl-7.20.1-5.fc13 has been submitted as an update for Fedora 13. https://admin.fedoraproject.org/updates/curl-7.20.1-5.fc13 curl-7.21.0-6.fc14 has been pushed to the Fedora 14 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update curl'. You can provide feedback for this update here: https://admin.fedoraproject.org/updates/curl-7.21.0-6.fc14 curl-7.21.0-6.fc14 has been pushed to the Fedora 14 stable repository. If problems still persist, please make note of it in this bug report. curl-7.20.1-5.fc13 has been pushed to the Fedora 13 stable repository. If problems still persist, please make note of it in this bug report. |