Bug 1678588

Summary: Net install should retry package download if it fails, and not just give up
Product: Red Hat Enterprise Linux 8 Reporter: Jens Petersen <petersen>
Component: dnfAssignee: Jaroslav Mracek <jmracek>
Status: CLOSED ERRATA QA Contact: Radek Bíba <rbiba>
Severity: high Docs Contact:
Priority: medium    
Version: 8.0CC: amatej, james.antill, jblazek, jkonecny, ksrot, mkolman, samuel-rhbugs
Target Milestone: rcKeywords: Triaged
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: librepo-1.10.6-1.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-04-28 16:47:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1681084    
Bug Blocks:    

Description Jens Petersen 2019-02-19 07:13:00 UTC
Description of problem:
It sometimes happens during a netinstall that an rpm fails to download
(due to some temporary network glitch or http timeout).
This appears to cause Anaconda to stop the installation completely
without even the option to retry re-downloading the package,
which is rather painful and frustrating.  I am pretty sure
in the past anaconda used to do this, but I am not exactly sure
when it stopped doing so: I don't remember if RHEL7 anaconda still does that?
(If it does I consider this a regression, and in any case I feel it should
be addressed, giving up an installation like this in 2019 is
not acceptable really.)

Version-Release number of selected component (if applicable):
anaconda-29.19.0.34-1.el8

How reproducible:
easily

Steps to Reproduce:
1. Do a netinstall of RHEL8
2. Sometimes a rpm download error occurs
3. If no error, goto 1

Actual results:
2. Anaconda gives up and pops up a dialog saying a fatal error occurred
with for example:

"Failed to download the following packages: Cannot download Packages/libepoxy-1.5.2-1.el8.x86_64.rpm: All mirrors were tried"

Expected results:
2. Anaconda should retry downloading the package at least a couple of times
or show a dialog allowing the user to trigger a download retry.

Additional info:
I think this affects all RHEL8 images (and recent Fedora releases too probably).
I have already hit this half a dozen times during RHEL 8 development,
it is annoying and a time waster.

Comment 1 Jiri Konecny 2019-02-19 09:32:39 UTC
There should be retry for the download for a few times but download and installation are done by the DNF :).

Switching to DNF. Feel free to switch it back if we should just adjust some configuration.

Comment 2 Martin Kolman 2019-02-19 12:03:37 UTC
(In reply to Jiri Konecny from comment #1)
> There should be retry for the download for a few times but download and
> installation are done by the DNF :).
> 
> Switching to DNF. Feel free to switch it back if we should just adjust some
> configuration.

Yeah, I also think DNF does some retries, though unlike the Anaconda+YUM based solution on RHEL7 it might not be visible in the UI. Also the retry logic might be different & less crazy than in RHEL7 (IIRC theoretically up to 10 retries with exponential back-off, waiting up to 30 minutes per package).

Comment 3 Martin Kolman 2019-02-19 17:15:15 UTC
I looked into the code in more detail and this is how the retry logic works & is used in RHEL7 Anaconda:

- a progressive delay function is used to implement the delay computation before trying again:
-> implementaion: https://github.com/rhinstaller/anaconda/blob/rhel7-branch/pyanaconda/iutil.py#L1125
-> for 10 retries the delay start at 0.5 s and progressively goes up to 256 seconds

- the retry logic is used for individual package downloads:
https://github.com/rhinstaller/anaconda/blob/rhel7-branch/scripts/anaconda-yum#L296

- when populating the transaction set:
https://github.com/rhinstaller/anaconda/blob/rhel7-branch/scripts/anaconda-yum#L131

- when gathering repository metadata:
https://github.com/rhinstaller/anaconda/blob/rhel7-branch/pyanaconda/packaging/yumpayload.py#L704

- and even when downloading the .treeinfo:
https://github.com/rhinstaller/anaconda/blob/rhel7-branch/pyanaconda/packaging/__init__.py#L496


Also, .treeinfo download is apparently the only place Anaconda source code itself where the retry logic is currently being used in RHEL8 Anaconda:
https://github.com/rhinstaller/anaconda/blob/rhel-devel/pyanaconda/payload/install_tree_metadata.py#L84

Comment 4 Jens Petersen 2019-02-20 08:06:03 UTC
Okay, it seems single package failing to download due to temporary network glitch will prevent anaconda completing the install and user has to start all over again.

If the failure dialog had a retry button this could be averted.  With net installs over the internet, the probability of network issues occurring is not that small.
Maybe our customer CDN is more reliable, I don't know...

Comment 9 Jaroslav Mracek 2019-07-07 18:24:31 UTC
I create a pull request (https://github.com/rpm-software-management/librepo/pull/158) that allows to downoload targets multiple times even when one option is available. By default it will try to dawnload each target at least 4x.

Comment 18 Jaroslav Mracek 2020-03-04 17:37:25 UTC
librepo cannot add any sleep step because it will negatively affects all users (already reported on Fedora). When a network is down, this is not an issue of DNF. DNF at least give a try.

Comment 19 Radek Bíba 2020-03-05 10:59:21 UTC
Fair enough. It should be noted though that there will likely be cases where even repeated attempts won't make netinstall happy as they'll all fail.

Comment 22 errata-xmlrpc 2020-04-28 16:47:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1823