Bug 1724245

Summary: leapp should not rely on example.com
Product: Red Hat Enterprise Linux 7 Reporter: Christophe Besson <cbesson>
Component: leapp-repositoryAssignee: Leapp Notifications Bot <leapp-notifications-bot>
Status: NEW --- QA Contact: upgrades-and-conversions
Severity: medium Docs Contact:
Priority: medium    
Version: 7.6CC: cbesson, cww, fkrska, mbocek, pstodulk
Target Milestone: rcKeywords: Upgrades
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1818088    

Description Christophe Besson 2019-06-26 14:42:36 UTC
Description of problem:
During the leapp upgrade process, any problem related to a bad configured proxy may lead to a connection to "https://example.com" to check the internet access.

Some customers can be very restrictive in their proxy rules, so leapp should not rely on this kind of external address.


Version-Release number of selected component (if applicable):
leapp-repository-0.7.0-5.el7_6

How reproducible:


Steps to Reproduce:
1. Configure a correct proxy in rhsm.conf and yum.conf (in my case, 192.168.122.1:3128)
2. Set a "bad" proxy env var (to simulate a proxy error, e.g. 407 Proxy Auth Err ; in my case a non-listening port: export https_proxy=http://192.168.122.8080)
3. Run leapp upgrade from an up-to-date RHEL7.6

Actual results:
============================================================
                        ERRORS
============================================================

2019-06-26 09:09:03.240470 [ERROR] Actor: prepare_upgrade_transaction Message:  A Leapp Command Error occurred.  . Possible spurious failure: There was probably a problem with internet conection (Failed to open url 'https://example.com' with error: <urlopen error [Errno 113] No route to host>). Check your connection and try again.

Expected results:
At least replacing that with "redhat.com" seems to be better.

Additional info:
# Things looks good for RHSM and DNF downloads
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
27416 09:00:52.794760 getsockopt(3<TCP:[192.168.122.27:58290->192.168.122.1:3128]>, SOL_SOCKET, SO_ERROR, [0], [4]) = 0 <0.000017>
27416 09:00:52.794855 poll([{fd=3<TCP:[192.168.122.27:58290->192.168.122.1:3128]>, events=POLLOUT}], 1, 180000) = 1 ([{fd=3, revents=POLLOUT}]) <0.000015>
27416 09:00:52.794920 sendto(3<TCP:[192.168.122.27:58290->192.168.122.1:3128]>, "CONNECT subscription.rhsm.redhat.com:443 HTTP/1.0\r\n", 51, 0, NULL, 0) = 51 <0.000036>
...
27539 09:01:23.521448 connect(20<TCP:[66554]>, {sa_family=AF_INET, sin_port=htons(3128), sin_addr=inet_addr("192.168.122.1")}, 16) = -1 EINPROGRESS (Operation now in progres
s) <0.000039>
27539 09:01:23.521653 poll([{fd=20<TCP:[192.168.122.27:58314->192.168.122.1:3128]>, events=POLLOUT|POLLWRNORM}], 1, 0) = 1 ([{fd=20, revents=POLLOUT|POLLWRNORM}]) <0.000013>
27539 09:01:23.521763 getsockopt(20<TCP:[192.168.122.27:58314->192.168.122.1:3128]>, SOL_SOCKET, SO_ERROR, [0], [4]) = 0 <0.000011>
27539 09:01:23.521806 getpeername(20<TCP:[192.168.122.27:58314->192.168.122.1:3128]>, {sa_family=AF_INET, sin_port=htons(3128), sin_addr=inet_addr("192.168.122.1")}, [16]) = 0 <0.000010>
27539 09:01:23.521846 getsockname(20<TCP:[192.168.122.27:58314->192.168.122.1:3128]>, {sa_family=AF_INET, sin_port=htons(58314), sin_addr=inet_addr("192.168.122.27")}, [16]) = 0 <0.000010>
27539 09:01:23.521894 sendto(20<TCP:[192.168.122.27:58314->192.168.122.1:3128]>, "CONNECT cdn.redhat.com:443 HTTP/1.1\r\nHost: cdn.redhat.com:443\r\nUser-Agent: libdnf\r\nProxy-Connection: Keep-Alive\r\n\r\n", 115, MSG_NOSIGNAL, NULL, 0) = 115 <0.000029>
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

# A bad configured proxy leads here to a "No route to host"
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
27331 09:09:03.145710 socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 31<TCP:[74014]> <0.000856>
27331 09:09:03.146712 connect(31<TCP:[74014]>, {sa_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("192.168.122.1")}, 16) = -1 EHOSTUNREACH (No route to host) <0.000363>
27331 09:09:03.147187 close(31<TCP:[74014]>) = 0 <0.000013>
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

# A proxy auth error is difficult to diagnose (here a customer output):
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
9778  08:32:19.843879 close(8</var/cache/dnf/rhel-8-for-x86_64-appstream-rpms-1899f526e47881cb/tmpdir.qZnK52/repodata/repomd.xml>) = 0 <0.000009>
9778  08:32:19.843926 write(4</var/log/dnf.librepo.log>, "2019-06-25T06:32:19Z DEBUG check_transfer_statuses: Error during transfer: Curl error (56): Failure when receiving data from the peer for https://**************/pulp/repos/*******/Library/content/dist/rhel8/8/x86_64/appstream/os/repodata/repomd.xml [Received HTTP code 407 from proxy after CONNECT]\n", 307) = 307 <0.000013>
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

# Here is the faulty code from /usr/share/leapp-repository/repositories/system_upgrade/el7toel8/actors/prepareupgradetransaction/libraries/preparetransaction.py:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
def connection_guard(url='https://example.com'):
    def closure():
        try:
            urlopen(url)
            return None
        except URLError as e:
            cause = '''Failed to open url '{url}' with error: {error}'''.format(url=url, error=e)
            return ('There was probably a problem with internet conection ({cause}).'
                    ' Check your connection and try again.'.format(cause=cause))
    return closure
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Comment 2 Petr Stodulka 2019-06-26 15:16:44 UTC
I see. The point for that was to reduce significant amount of bugrepoports which people send, because of crashes when the reason of the problem is in network connection. I guess we will have to come up with different solution or print that info without any additional check.

Comment 3 Christophe Besson 2019-07-04 14:29:45 UTC
In the Actor "prepareupgradetransaction", there are at least 3 steps which may call the "guards" (connection_guard, space_guard and permission_guard soon):
- get_rhsm_system_release()
- update_rhel_subscription()
- dnf_plugin_rpm_download()

Any non-zero exit code from underlying commands leads to these "guards" checks, which are not unwelcomed, but that does not help to find the root cause of a problem. Several commands may return a non-zero exit code (e.g. iptables-service isn't present but this is not a blocking error), and this is not sufficient to identify why leapp stops with an undefined error.

-> A customer having its remote repositories on a Satellite server can't always access to an external site, so checking "example.com" isn't good.
-> This customer doesn't need a proxy to reach its repos, but he configured it anyway, he didn't it bad and didn't see that leads to a 407 Proxy Auth Error. Only a strace shows that.
-> Once the proxy issue was resolved, there was still a problem, with the same error message (can't access to example.com, please check the internet connection). The 2nd problem was an incomplete repomd.xml, there were missing dependencies and it was due to a sync problem on its Satellite server.

In order to help the debugging, copying the following logs in /var/log/leapp/dnf-debugdata could be a good thing:
/var/log/rhsm/rhsm.log
/var/log/dnf.log
/var/log/dnf.librepo.log
/var/log/dnf.rpm.log
/var/log/hawkey.log

Indeed, what happens isn't fully logged in a persistent manner, since these files are removed just before unmounting the overlay.