Bug 1817130

Summary: dnf isn't able to download with encoded url
Product: Red Hat Enterprise Linux 8 Reporter: Christophe Besson <cbesson>
Component: libdnfAssignee: Lukáš Hrázký <lhrazky>
Status: CLOSED ERRATA QA Contact: Eva Mrakova <emrakova>
Severity: medium Docs Contact:
Priority: medium    
Version: 8.1CC: lhrazky, mdomonko, nsella, pkratoch, redhat
Target Milestone: rcKeywords: Triaged
Target Release: 8.0   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: libdnf-0.48.0-1.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-04 01:52:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1825061    

Description Christophe Besson 2020-03-25 16:07:06 UTC
Description of problem:
A customer is encountering an issue while installing RHEL 8 due to its specific web server used to host the repositories. The root cause seems to be a change of behavior between yum/urlgrabber which does a urllib.quote() or a urllib.pathname2url(), and dnf/libdnf doesn't. I consider that as a regression compared to the yum behavior, that's why I didn't open it as a RFE.

On RHEL 7, from a syscall trace on connect()/sendto()/recvfrom(), we can see the URL is encoded from the client side (the '+' symbol becomes encoded to '%2B'):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
20153 07:26:57.830796 sendto(8<TCP:[192.168.122.33:54206->AA.BB.CC.DD:80]>, "GET /content/dist/rhel/server/7/7Server/x86_64/os/Packages/l/libstdc%2B%2B-4.8.5-39.el7.x86_64.rpm HTTP/1.1\r\nUser-Agent: urlgrabber/3.10 yum/3.4.3\r\nHost: reposerver\r\nAccept: */*\r\n\r\n", 205, MSG_NOSIGNAL, NULL, 0) = 205 <0.000106>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

On RHEL 8, the URL isn't encoded before sending the HTTP GET request:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
8015  07:31:50.890569 sendto(17<TCP:[192.168.122.2:55536->AA.BB.CC.DD:80]>, "GET /content/dist/rhel8/8.1/x86_64/baseos/os/Packages/l/libstdc++-8.3.1-4.5.el8.x86_64.rpm HTTP/1.1\r\nHost: reposerver\r\nUser-Agent: libdnf\r\nAccept: */*\r\n\r\n", 178, MSG_NOSIGNAL, NULL, 0) = 178 <0.000021>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Version-Release number of selected component (if applicable):
libdnf-0.35.1-8.el8.x86_64
dnf-4.2.7-6.el8.noarch

How reproducible:
100%

Steps to Reproduce:
# dnf download libstdc++

Actual results:
Non encoded URL from the client side.

Expected results:
Encoded URL from the client side.


Additional info:
An urlEncode() function already exists in libdnf/repo/Repo.cpp, but it is used only for user/password form encoding.

Comment 1 Daniel Mach 2020-03-30 11:37:19 UTC
I tested that RHEL in the default setup is not broken (it works even without urlencode):
> $ podman run -it ubi8:8.1
> 
> $ rpm -q dnf libdnf
> dnf-4.2.7-7.el8_1.noarch
> libdnf-0.35.1-9.el8_1.x86_64
> 
> $ dnf download libstdc++
> (1/2): libstdc++-8.3.1-4.5.el8.x86_64.rpm ...
> (2/2): libstdc++-8.3.1-4.5.el8.i686.rpm ...

I also verified that the upstream version of dnf/libdnf doesn't urlencode the string:
> sendto(41, "GET /pub/linux/fedora/linux/development/rawhide/Everything/x86_64/os/Packages/l/libstdc++-10.0.1-0.11.fc33.x86_64.rpm...)

It looks like we should really urlencode the paths.

Comment 2 Lukáš Hrázký 2020-04-16 11:48:53 UTC
PRs:
https://github.com/rpm-software-management/librepo/pull/188
https://github.com/rpm-software-management/ci-dnf-stack/pull/819

AC:
Special URL characters are escaped in URLs when dnf downloads packages.

Comment 3 Lukáš Hrázký 2020-04-20 09:21:26 UTC
Christophe, I've been prompted to have a look at the URL RFC and the plus sign is actually not a reserved character in the path part of a URL (see RFC 1738, section 3.3 and possibly 2.2 for globally unsafe characters). Therefore, according to my interpretation, in case of a '+' DNF behavior is correct, it doesn't need to be escaped and the customer server implementation is at fault here.

While there potentially may be other characters that need to be escaped, those are also invalid in RPM package names. Therefore I'm not convinced my earlier PRs should be merged...

Comment 4 Lukáš Hrázký 2020-04-20 11:52:47 UTC
We've decided to put the encoding in place to make DNF more compatible. The customer should probably be advised to check their server as well...

Comment 5 Christophe Besson 2020-04-20 12:42:08 UTC
Well, to be honest I'm not sure how to interpret that RFC.

As per https://www.w3schools.com/tags/ref_urlencode.ASP : "URL encoding normally replaces a space with a plus (+) sign or with %20."

As we can see in the URL while googling for example. So I think the real + sign has to be encoded.

That's also what we can see in the yum strace, "+" signs are replaced in the package names. So I continue to think there is a change of behaviour with dnf.

Comment 6 Lukáš Hrázký 2020-04-20 13:26:11 UTC
It's just too complicated; '+' for encoding a space is allowed only in the query part of a URL, which is usually in the application/x-www-form-urlencoded format, which specifies using '+' for encoding spaces. In path you can only encode as %20 as per RFC 1738. (also the fact that majority of the servers actually work with unencoded '+'es in paths speaks for something)

But it is a change of behaviour from yum, indeed. Not encoding '+' in path is strict behaviour, encoding it is (not entirely correctly) making dnf a bit more compatible with non-conformant servers.

Comment 7 Christophe Besson 2020-04-20 13:49:20 UTC
Hmm, yum/urlgrabber used the following urllib function:

# python -c 'import urllib; print(urllib.pathname2url("test++"))'
test%2B%2B

I don't think this function has been designed with non-conformant servers in mind.

If I search the string "python + urllib" in google, I have the following in my browser url :)
&q=python+%2B+urllib
          ^^^

Comment 8 Christophe Besson 2020-04-20 14:04:48 UTC
oh okay for the "query part", I understand.
I don't know what's the best thing to do here...

Comment 9 Lukáš Hrázký 2020-04-20 14:09:10 UTC
Yeah, query is the part after a '?'. Don't worry, as I've said, we've decided to encode it anyway, for "better compatibility". yum did it before, so it shouldn't cause trouble. I just wanted to describe the situation for completeness.

Comment 16 errata-xmlrpc 2020-11-04 01:52:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (yum bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:4510