Bug 1028444 - dnf update gets stuck in a cycle trying to download files
dnf update gets stuck in a cycle trying to download files
Status: CLOSED ERRATA
Product: Fedora
Classification: Fedora
Component: librepo (Show other bugs)
20
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Tomas Mlcoch
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-11-08 08:06 EST by Alberto Ruiz
Modified: 2013-11-24 18:45 EST (History)
8 users (show)

See Also:
Fixed In Version: librepo-1.4.0-1.fc20
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-11-24 18:45:27 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
A video showing the problem. (207.71 KB, video/webm)
2013-11-08 08:06 EST, Alberto Ruiz
no flags Details
screen shot (646.12 KB, image/png)
2013-11-20 03:36 EST, lnie
no flags Details
screenshot (616.01 KB, image/png)
2013-11-20 04:00 EST, lnie
no flags Details

  None (edit)
Description Alberto Ruiz 2013-11-08 08:06:21 EST
Created attachment 821606 [details]
A video showing the problem.

Description of problem:
Yesterday I left the system performing a dnf update, when I came back this morning there was a timeout because some packages are missing from the mirrors.

So I performed another one and dnf got stuck, it keeps a cycle showing several package names. I'm attaching a video of the behaviour for better reference.

Version-Release number of selected component (if applicable):
[996][aruiz@kerrigan ~]$ dnf --version
0.4.5
  Installed: dnf-0:0.4.5-1.fc20.noarch at 2013-10-21 13:25
  Built    : Fedora Project at 2013-10-20 12:10

  Installed: rpm-0:4.11.1-7.fc20.x86_64 at 2013-09-26 11:16
  Built    : Fedora Project at 2013-09-09 12:13


How reproducible:
If I close dnf and try again the problem starts again, however I am not quite sure how to get the system to this state from scratch.

Steps to Reproduce:
1.
2.
3.

Actual results:
dnf gets stuck in a cycle

Expected results:
the package shouldn't be missing from the mirrors in the first place, however, if it's gone it should timeout insted of falling in an infinite loop

Additional info:
Comment 1 Zdeněk Pavlas 2013-11-08 08:25:04 EST
This does not look like a bug. The cycling is intentional- it's the multi-file progress meter in action. DNF is downloading 3 packages in parallel.  But it seems to be stuck. How long did that last?  I think that librepo aborts stalled connections after some time, so it should not be a permanent condition.  The "cycling" is driven by a librepo callback, so it proves that the curl "perform" loop is still running.

The only strange thing I don't understand is the large average download speed 6.5MB/s, and ETA 0:00.  It's probably due to 350 packages being skipped?

DNF should abort downloading after few minutes.  If it does not, please try to add more info- eg. attach the output of "strace -p $(pidof dnf)".
Comment 2 Alberto Ruiz 2013-11-08 09:16:02 EST
The cycling is not the bug, is the fact that it's stuck there forever :-)
Comment 3 Alberto Ruiz 2013-11-08 09:18:44 EST
FWIW minutes in here is probably too much, if after 20-30 seconds (tops) you get no data whatsoever dnf should abort and inform the user about what is going on and what should he/she do to resume the operation.
Comment 4 Alberto Ruiz 2013-11-08 09:19:40 EST
I just tried again, and now the packages are available, so I am afraid that I can't reproduce anymore :/
Comment 5 Zdeněk Pavlas 2013-11-08 09:49:13 EST
It turns out librepo does not set CURLOPT_TIMEOUT, neither does it export any API to do so from DNF.  And the default is to never time out :( librepo should set it to something sane.

commit ef77d2660e9e96ba5329f61bee1a38ad35ea7075
Author: Zdenek Pavlas <zpavlas@redhat.com>
Date:   Fri Nov 8 15:42:33 2013 +0100

    Set both CURLOPT_CONNECTTIMEOUT & CURLOPT_TIMEOUT. BZ 1028444
    
    The default CURLOPT_TIMEOUT value is to never time out, and librepo
    provides no means to override it. We should set CONNECTTIMEOUT & TIMEOUT
    at the same time, to have both API and a sensible default.

diff --git a/librepo/handle.c b/librepo/handle.c
index 69fd73d..77440fe 100644
--- a/librepo/handle.c
+++ b/librepo/handle.c
@@ -323,9 +323,13 @@ lr_handle_setopt(LrHandle *handle,
         }
         break;
 
-    case LRO_CONNECTTIMEOUT:
-        c_rc = curl_easy_setopt(c_h, CURLOPT_CONNECTTIMEOUT, va_arg(arg, long));
+    case LRO_CONNECTTIMEOUT: {
+        long timeout = va_arg(arg, long);
+        c_rc = curl_easy_setopt(c_h, CURLOPT_CONNECTTIMEOUT, timeout);
+        if (c_rc == CURLE_OK)
+            c_rc = curl_easy_setopt(c_h, CURLOPT_TIMEOUT, timeout);
         break;
+    }
 
     case LRO_IGNOREMISSING:
         handle->ignoremissing = va_arg(arg, long) ? 1 : 0;
Comment 6 Tomas Mlcoch 2013-11-11 07:15:28 EST
Two new options have been added:

LRO_LOWSPEEDTIME
  The time in seconds that the transfer should be below the LRO_LOWSPEEDLIMIT for the library to consider it too slow and abort. Default: 10 (sec)

LRO_LOWSPEEDLIMIT
  The transfer speed in bytes per second that the transfer should be below during LRO_LOWSPEEDTIME seconds for the library to consider it too slow and abort. Default: 1000 (byte/s)

https://github.com/Tojaj/librepo/commit/78c9b645b0fbef1d8eb30e2b54271cd0d07f4b05

This new options and their default values should fix the issue.
Comment 7 Fedora Update System 2013-11-19 03:52:00 EST
librepo-1.4.0-1.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/librepo-1.4.0-1.fc20
Comment 8 Fedora Update System 2013-11-19 16:50:22 EST
Package librepo-1.4.0-1.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing librepo-1.4.0-1.fc20'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-21715/librepo-1.4.0-1.fc20
then log in and leave karma (feedback).
Comment 9 lnie 2013-11-20 03:35:06 EST
I tried to reproduce this bug with librepo-1.4.0-1 ,as it's hard for me to find mirrors missing some packages,so I just turned off the network,oh~oh~,and do some
dnf install,some thing weird happened.As you can see from the attachment,the progress just hang there ,I have to "ctrl"+"c" to kill it,or it will just keep hanging
Comment 10 lnie 2013-11-20 03:36:14 EST
Created attachment 826467 [details]
screen shot
Comment 11 Zdeněk Pavlas 2013-11-20 03:49:31 EST
Hi, thanks for the feedback.  First of all, it seems you ran "yum", not "dnf"!  
Also, the un-quoted '&' in URL interacts with bash job control, the process is started in the background, may hold the yum lock, and prevent other Yum instances from running.  You have to quote such URLs with single or double quotes.
Comment 12 Tomas Mlcoch 2013-11-20 03:51:05 EST
Hi Inie,
from your screenshot, it seems that you have been testing a Yum, not a Dnf.

Yum uses a urlgrabber, not a Librepo.
Moreover you uses a metalink address in place where should be a package name, so the message about "No package available" is expected behavior.
Comment 13 lnie 2013-11-20 04:00:01 EST
Created attachment 826470 [details]
screenshot
Comment 14 lnie 2013-11-20 04:04:31 EST
(In reply to Zdeněk Pavlas from comment #11)
> Hi, thanks for the feedback.  First of all, it seems you ran "yum", not
> "dnf"!  
> Also, the un-quoted '&' in URL interacts with bash job control, the process
> is started in the background, may hold the yum lock, and prevent other Yum
> instances from running.  You have to quote such URLs with single or double
> quotes.

Hi,
 This screenshot is the right one now.
Comment 15 Zdeněk Pavlas 2013-11-20 04:06:19 EST
Ok, this is the correct one.. again, there are some issues:

1) dnf does not support "dnf install http://..." syntax yet.  It's implemented, but not yet released- see bz 1030297.

2) The same issue with "&" in URL.. this has to be quoted.
Comment 16 lnie 2013-11-20 04:09:10 EST
(In reply to Tomas Mlcoch from comment #12)
> Hi Inie,
> from your screenshot, it seems that you have been testing a Yum, not a Dnf.
> 
> Yum uses a urlgrabber, not a Librepo.

Hi Tomas,
 Actually,just now,I were considering whether I should add one comment,like 
 FYI:yum install got the same result.

> Moreover you uses a metalink address in place where should be a package
> name, so the message about "No package available" is expected behavior.

The problem is that the progress keep hanging,unless I do the"ctrl+C"
Comment 17 lnie 2013-11-20 04:19:15 EST
(In reply to Zdeněk Pavlas from comment #15)
> Ok, this is the correct one.. again, there are some issues:
> 
> 1) dnf does not support "dnf install http://..." syntax yet.  It's
> implemented, but not yet released- see bz 1030297.
> 
> 2) The same issue with "&" in URL.. this has to be quoted.
Hi,thanks for your reply.It seems that I have to find mirrors missing some packages,if I want do this update test,yes?
Comment 18 lnie 2013-11-20 04:22:13 EST
> The problem is that the progress keep hanging,unless I do the"ctrl+C"
Hi Tomas,

I figure out the issue now,that's caused by"&".
Sorry for the error feedback.
Comment 19 Zdeněk Pavlas 2013-11-20 04:43:02 EST
No, this bug involved dnf not timing out when the server/network was extremely slow.  It should try next mirror or abort with an error instead. The new librepo tries new mirror when speed is below 1000 bytes/s for at least 30s. I'm not sure you want to reproduce this.
Comment 20 lnie 2013-11-20 05:04:07 EST
(In reply to Zdeněk Pavlas from comment #19)
> No, this bug involved dnf not timing out when the server/network was
> extremely slow.  It should try next mirror or abort with an error instead.
> The new librepo tries new mirror when speed is below 1000 bytes/s for at
> least 30s. I'm not sure you want to reproduce this.

yeah,that's not what I want to reproduce.According to Alborto's description,
I think the problem is :dnf just stuck there when some packages from the mirrors are unavailable.

I draw the conclusion mostly from #comment0:-):
>Description of problem:
>Yesterday I left the system performing a dnf update, when I came back this >morning there was a timeout because some packages are missing from the mirrors.
and #comment4>I just tried again, and now the packages are available, so I am afraid that I >can't reproduce anymore :/
Comment 21 Fedora Update System 2013-11-24 18:45:27 EST
librepo-1.4.0-1.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.