From: https://copr-be.cloud.fedoraproject.org/results/thm/lxc3.0/fedora-30-aarch64/01006608-lua-lxc/chroot_scan/var/lib/mock/1006608-fedora-30-aarch64-1565957042.659271/root/var/log/dnf.log 2019-08-16T12:01:31Z DEBUG fedora: using metadata from Wed 24 Oct 2018 10:20:15 PM UTC. 2019-08-16T12:01:32Z DEBUG error: Status code: 503 for https://mirrors.fedoraproject.org/metalink?repo=updates-released-f29&arch=x86_64 (https://mirrors.fedoraproject.org/metalink?repo=updates-released-f29&arch=x86_64). 2019-08-16T12:01:32Z DEBUG error: Status code: 503 for https://mirrors.fedoraproject.org/metalink?repo=updates-released-f29&arch=x86_64 (https://mirrors.fedoraproject.org/metalink?repo=updates-released-f29&arch=x86_64). 2019-08-16T12:01:33Z DEBUG error: Status code: 503 for https://mirrors.fedoraproject.org/metalink?repo=updates-released-f29&arch=x86_64 (https://mirrors.fedoraproject.org/metalink?repo=updates-released-f29&arch=x86_64). 2019-08-16T12:01:33Z DEBUG error: Status code: 503 for https://mirrors.fedoraproject.org/metalink?repo=updates-released-f29&arch=x86_64 (https://mirrors.fedoraproject.org/metalink?repo=updates-released-f29&arch=x86_64). 2019-08-16T12:01:33Z DEBUG Cannot download 'https://mirrors.fedoraproject.org/metalink?repo=updates-released-f29&arch=x86_64': Cannot prepare internal mirrorlist: Status code: 503 for https://mirrors.fedoraproject.org/metalink?repo=updates-released-f29&arch=x86_64. 2019-08-16T12:01:33Z ERROR Failed to download metadata for repo 'updates' It seems like we really retry now, but we retry too fast. Would it be possible to insert there some sleep() or so? And can we assure that we do name resolution again? With default configuration, separate requests to mirrors.fedoraproject.org should end up on different IP addresses.
I'm thinking about follow-up for: https://github.com/rpm-software-management/librepo/pull/158
Pavel, what exactly do you expect the sleep to help with? I'm not convinced it will be good for anything, however, it surely is going to slow down the whole metadata download process. We should certainly look into the DNS resolution to query different servers though.
(In reply to Lukáš Hrázký from comment #2) > Pavel, what exactly do you expect the sleep to help with? I'm not convinced > it will be good for anything, however, it surely is going to slow down the > whole metadata download process. I'm not sure it would help, but I sort of expected that error 503 has some temporary character, so that - if we tried a bit later - there would be much higher chance the same mirror will start working. Dunno. Still, for those urls/metalinks which _are not_ backed-up by DNS pool of alternative addresses, I'd expect the sleep would help a lot. > We should certainly look into the DNS resolution to query different servers > though. This one should solve our (copr) issues by itself, I guess (copr uses official mirrorlists, which have alternative DNS addresses). Thank you!
A possibly related issue once fixed in yum: https://github.com/rpm-software-management/yum/commit/a7d50db151a2bfef09b3004c7afae5e1eed651e3
There's also a (very long) discussion in the related yum bug that could shed some light on how MirrorManager works: https://bugzilla.redhat.com/show_bug.cgi?id=1520454
Thanks, Michal, but the bug as well as the PR is about HTTP redirects, not DNS balancing. I think both mechanisms need to be properly supported.
You're right, Lukas. I think I considered leveraging the DNS A/AAAA list mechanism as well when dealing with the bug, but I ended up just adding support for MirrorManager's internal round-robin mechanism that works on top of HTTP redirects, as you say. Nevermind then (there still might be some useful insights in the bug comments though).
More specifically, I seem to have verified (Comment 12) that curl does support DNS balancing (A/AAAA records): <snip> 4) curl: if the resolved IP fails, try another one from the A/AAAA list returned for the hostname </snip>
So it means that urlgrabber (which is not used by dnf) behaves correctly, right? Can I work-around this somehow? Today we had again series of build failures because of this. Or may we help somehow to have this fixed soon?
urlgrabber itself does employ a simple mechanism to rotate the available mirror URLs (in a randomized fashion), but those have to be passed to it via the API from yum (such as parsed from a metalink.xml or mirrorlist.txt file). What I was referring to was a DNS round-robin mechanism that happens at the curl level.
There are actually 3 layers of mirror handling in a typical yum->urlgrabber->curl scenario; at the highest level, there's yum fetching and parsing a metalink.xml or mirrorlist.txt file from the repository server. After obtaining a list of mirrors, it passes them down to urlgrabber which tries them one by one until it succeeds. And finally, curl looks at the list of addresses returned by the DNS server for a particular URL and does something similar (but I'm not familiar with this part that much).
(In reply to Michal Domonkos from comment #11) > for a particular URL and does something similar s/URL/IP/
(In reply to Michal Domonkos from comment #11) > for a particular URL and does something similar s/URL/hostname/
(In reply to Michal Domonkos from comment #11) > And finally, curl > looks at the list of addresses returned by the DNS server for a particular > URL and does something similar (but I'm not familiar with this part that > much). I doubt this is what is happening on curl level in case of librepo (and dnf), because that would mean that all the hosts in `$ host mirrors.fedoraproject.org` are dead sometimes. At least not for the error 503 (immediate failure of server). So speaking of the errors 503, can we turn on some round-robin mechanism on?
I create a patch (https://github.com/rpm-software-management/librepo/pull/167) that adds sleep step after all mirrors where tried. It is mostly applied when one url is available - metalink, baseurl.
Still think that the issue was not solved properly
I create an improvement of the first patch https://github.com/rpm-software-management/librepo/pull/169. Still working on improvement of logging.
It looks like that even delay will not resolve the issue. I suggest that that the issue is caused by dead ip retrieved from DNS. I tried to use incorrect metalink (https://mirrors.fedoraproject.org/metalinks?repo=updates-released-f30&arch=x86_64) for testing (metalink was replaces by metalinks). During a single run I am nearly unable to to force curl to use different IP that the first one in the list. The problem is with CURL multi handle (https://curl.haxx.se/libcurl/c/libcurl-multi.html) where curl_easy_setopt(CURL *handle, CURLoption option, parameter) has no effect on dns using CURLOPT_DNS_CACHE_TIMEOUT, CURLOPT_DNS_SHUFFLE_ADDRESSES. What worked was patch https://github.com/rpm-software-management/librepo/pull/159/commits/ac80f6c26ebbf358f68eb62e31306c22597dbbdc.
*** Bug 1758383 has been marked as a duplicate of this bug. ***
Requerd patches were backported into f30
I'm not reopening because that's not our priority now (in copr we anyways re-try on higher level, which was initially a work-around for other issues). Just FYI, note that we probably refused the idea with re-trying of the same URL (with delay), but per discussion with OpenSUSE users that's exactly how zypper and OpenSUSE mirroring works [1] -- they are retrying the same URLs through redirector, and when the redirector recognizes that some mirror is temporarily down it would redirect the same URL request to different mirror automatically next time. But client would have to re-try (librepo doesn't seem to from my attempts on F31). [1] https://github.com/rpm-software-management/mock/issues/553