Description of problem:
A customer is encountering an issue while installing RHEL 8 due to its specific web server used to host the repositories. The root cause seems to be a change of behavior between yum/urlgrabber which does a urllib.quote() or a urllib.pathname2url(), and dnf/libdnf doesn't. I consider that as a regression compared to the yum behavior, that's why I didn't open it as a RFE.
On RHEL 7, from a syscall trace on connect()/sendto()/recvfrom(), we can see the URL is encoded from the client side (the '+' symbol becomes encoded to '%2B'):
20153 07:26:57.830796 sendto(8<TCP:[192.168.122.33:54206->AA.BB.CC.DD:80]>, "GET /content/dist/rhel/server/7/7Server/x86_64/os/Packages/l/libstdc%2B%2B-4.8.5-39.el7.x86_64.rpm HTTP/1.1\r\nUser-Agent: urlgrabber/3.10 yum/3.4.3\r\nHost: reposerver\r\nAccept: */*\r\n\r\n", 205, MSG_NOSIGNAL, NULL, 0) = 205 <0.000106>
On RHEL 8, the URL isn't encoded before sending the HTTP GET request:
8015 07:31:50.890569 sendto(17<TCP:[192.168.122.2:55536->AA.BB.CC.DD:80]>, "GET /content/dist/rhel8/8.1/x86_64/baseos/os/Packages/l/libstdc++-8.3.1-4.5.el8.x86_64.rpm HTTP/1.1\r\nHost: reposerver\r\nUser-Agent: libdnf\r\nAccept: */*\r\n\r\n", 178, MSG_NOSIGNAL, NULL, 0) = 178 <0.000021>
Version-Release number of selected component (if applicable):
Steps to Reproduce:
# dnf download libstdc++
Non encoded URL from the client side.
Encoded URL from the client side.
An urlEncode() function already exists in libdnf/repo/Repo.cpp, but it is used only for user/password form encoding.
I tested that RHEL in the default setup is not broken (it works even without urlencode):
> $ podman run -it ubi8:8.1
> $ rpm -q dnf libdnf
> $ dnf download libstdc++
> (1/2): libstdc++-8.3.1-4.5.el8.x86_64.rpm ...
> (2/2): libstdc++-8.3.1-4.5.el8.i686.rpm ...
I also verified that the upstream version of dnf/libdnf doesn't urlencode the string:
> sendto(41, "GET /pub/linux/fedora/linux/development/rawhide/Everything/x86_64/os/Packages/l/libstdc++-10.0.1-0.11.fc33.x86_64.rpm...)
It looks like we should really urlencode the paths.
Special URL characters are escaped in URLs when dnf downloads packages.
Christophe, I've been prompted to have a look at the URL RFC and the plus sign is actually not a reserved character in the path part of a URL (see RFC 1738, section 3.3 and possibly 2.2 for globally unsafe characters). Therefore, according to my interpretation, in case of a '+' DNF behavior is correct, it doesn't need to be escaped and the customer server implementation is at fault here.
While there potentially may be other characters that need to be escaped, those are also invalid in RPM package names. Therefore I'm not convinced my earlier PRs should be merged...
We've decided to put the encoding in place to make DNF more compatible. The customer should probably be advised to check their server as well...
Well, to be honest I'm not sure how to interpret that RFC.
As per https://www.w3schools.com/tags/ref_urlencode.ASP : "URL encoding normally replaces a space with a plus (+) sign or with %20."
As we can see in the URL while googling for example. So I think the real + sign has to be encoded.
That's also what we can see in the yum strace, "+" signs are replaced in the package names. So I continue to think there is a change of behaviour with dnf.
It's just too complicated; '+' for encoding a space is allowed only in the query part of a URL, which is usually in the application/x-www-form-urlencoded format, which specifies using '+' for encoding spaces. In path you can only encode as %20 as per RFC 1738. (also the fact that majority of the servers actually work with unencoded '+'es in paths speaks for something)
But it is a change of behaviour from yum, indeed. Not encoding '+' in path is strict behaviour, encoding it is (not entirely correctly) making dnf a bit more compatible with non-conformant servers.
Hmm, yum/urlgrabber used the following urllib function:
# python -c 'import urllib; print(urllib.pathname2url("test++"))'
I don't think this function has been designed with non-conformant servers in mind.
If I search the string "python + urllib" in google, I have the following in my browser url :)
oh okay for the "query part", I understand.
I don't know what's the best thing to do here...
Yeah, query is the part after a '?'. Don't worry, as I've said, we've decided to encode it anyway, for "better compatibility". yum did it before, so it shouldn't cause trouble. I just wanted to describe the situation for completeness.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (yum bug fix and enhancement update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.