Bug 1750839
Summary: | fence_azure_arm agent intermittently reports the error "ERROR: Failed: unsupported operand type(s) for -=: 'Retry' and 'int'" | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | juholmes | |
Component: | python-requests | Assignee: | Python Maintainers <python-maint> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Jan Kepler <jkejda> | |
Severity: | medium | Docs Contact: | ||
Priority: | high | |||
Version: | 7.5 | CC: | alsin, angovind, asoni, cfeist, cluster-maint, cshabazi, cstratak, guybo, hhorak, jkejda, jruemker, jsturgis, mbasti, mheslin, nwahl, oalbrigt, pviktori, sbradley, torsava | |
Target Milestone: | rc | Keywords: | OtherQA, ZStream | |
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | python-requests-2.6.0-8.el7 | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1776292 1776293 1776294 2131394 (view as bug list) | Environment: | ||
Last Closed: | 2021-03-08 14:18:02 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1776292, 1776293, 1776294, 2131394 | |||
Attachments: |
Description
juholmes
2019-09-10 14:52:20 UTC
Created attachment 1613722 [details]
sosreport containing sos_commands directory (10.6 MB)
Created attachment 1613725 [details]
sosreport with var_logs directory
Created attachment 1613838 [details]
Partial Python trace of reproduction
I was able to reproduce the issue by running a loop of the fence command for a while with `python -m trace -t <command>`.
[clouduser@rhel7-node-2 ~]$ 2019-09-10 23:42:56,807 ERROR: Failed: unsupported operand type(s) for -=: 'Retry' and 'int'
2019-09-10 23:42:56,809 ERROR: Please use '-h' for usage
The trace from 23:42 is attached. There was also an identical exception and trace at 23:39.
My reproducer was run on an updated RHEL 7.7 Azure VM. These aren't complete traces, by the way. I was capturing only the last 60000 lines due to disk space constraints. If we need the full trace I'll try to reproduce again. Created attachment 1613839 [details]
Partial Python trace of fence_azure_arm success for comparison
"Good" result trace for comparison.
I added debugging to /usr/lib/python2.7/site-packages/urllib3/util/retry.py. ``` class Retry(object): ... def __init__(self, total=10, connect=None, read=None, redirect=None, method_whitelist=DEFAULT_METHOD_WHITELIST, status_forcelist=None, backoff_factor=0, raise_on_redirect=True, _observed_errors=0, remove_headers_on_redirect=DEFAULT_REDIRECT_HEADERS_BLACKLIST): with open('/tmp/py.out', 'a') as f: print >> f, 'Retry __init__: total is %s' % total ... def increment(self, method=None, url=None, response=None, error=None, _pool=None, _stacktrace=None): ... with open('/tmp/py.out', 'a') as f: print >> f, 'Retry increment: total is %s' % total ``` After each run of fence_azure_arm, this is appended to /tmp/py.out: ``` $ cat /tmp/py.out Retry __init__: total is 3 Retry __init__: total is 3 Retry __init__: total is 0 Retry __init__: total is 0 Retry __init__: total is 10 Retry __init__: total is 0 Retry __init__: total is 0 Retry __init__: total is 0 Retry __init__: total is 0 Retry __init__: total is Retry(total=4, connect=4, read=4, redirect=None) ``` That last one looks like a landmine to me. **Every time**, a Retry object is getting created with `total` getting assigned another Retry object instead of an `int`. However, we don't hit a TypeError unless we get a connection failure and have to call `Retry.increment()`, which then runs `total -= 1`. The `Retry.from_int()` method is broken. @classmethod def from_int(cls, retries, redirect=True, default=None): """ Backwards-compatibility for the old retries format.""" if retries is None: retries = default if default is not None else cls.DEFAULT if isinstance(retries, Retry): return retries redirect = bool(redirect) and None new_retries = cls(retries, redirect=redirect) log.debug("Converted retries value: %r -> %r" % (retries, new_retries)) <-- HERE return new_retries With loglevel set to DEBUG: 2019-09-11 01:16:20,035 DEBUG: Converted retries value: Retry(total=4, connect=4, read=4, redirect=None) -> Retry(total=Retry(total=4, connect=4, read=4, redirect=None), connect=None, read=None, redirect=0) Looks like a urllib3 bug then. from_int() should return retries if retries is an instance of the Retry class. It shouldn't create a new Retry object. But the isinstance check is failing. @classmethod def from_int(cls, retries, redirect=True, default=None): """ Backwards-compatibility for the old retries format.""" log.debug("retries is %r" % retries) log.debug("type of retries is %r" % type(retries)) log.debug("cls is %r" % cls) if retries is None: retries = default if default is not None else cls.DEFAULT if isinstance(retries, Retry): log.debug("YOU ARE HERE") return retries redirect = bool(redirect) and None new_retries = cls(retries, redirect=redirect) log.debug("Converted retries value: %r -> %r" % (retries, new_retries)) <-- HERE return new_retries 2019-09-11 01:41:00,519 DEBUG: retries is Retry(total=4, connect=4, read=4, redirect=None) 2019-09-11 01:41:00,519 DEBUG: type of retries is <class 'urllib3.util.retry.Retry'> 2019-09-11 01:41:00,519 DEBUG: cls is <class 'requests.packages.urllib3.util.retry.Retry'> 2019-09-11 01:41:00,519 DEBUG: Converted retries value: Retry(total=4, connect=4, read=4, redirect=None) -> Retry(total=Retry(total=4, connect=4, read=4, redirect=None), connect=None, read=None, redirect=0) On second look I think the issue is in the caller python-requests, based on: - the type mismatch in Comment 11, and - the fact that the DEBUG messages disappear entirely when I replace requests-2.6 (from yum) with requests-2.22 (from pip). (In reply to Reid Wahl from comment #12) > On second look I think the issue is in the caller python-requests, based on: > - the type mismatch in Comment 11, and > - the fact that the DEBUG messages disappear entirely when I replace > requests-2.6 (from yum) with requests-2.22 (from pip). Correction, DEBUG messages disappeared with requests-2.11, not 2.22. I also found this in the msrest library (from python-msrest in the HA repos): [root@fastvm-rhel-7-6-21 ~]# grep requests /usr/lib/python2.7/site-packages/msrest/pipeline.py | head -2 import requests from urllib3 import Retry # Needs requests 2.16 at least to be safe The latest python-requests that Red Hat ships is 2.6. I think the reason for the comment is that upstream requests-2.16 unbundled a bunch of libraries. - https://github.com/psf/requests/commit/8e17600ef60de4faf632acb55d15cb3c178de9bb - https://github.com/psf/requests/commit/d2713412e5ca97b68b7f38924cf902729ddbf96d#diff-d9ae695c8119e58c2ec435b6deae2d53R9 This version recommendation lines up with the type conflict between the two different Retry classes in Comment 11 and may be related. Our python-requests.spec file unbundles urllib3 and chardet by removing the bundled libs but doesn't change the import statements. I think the VendorAlias() class is pulling in the system libs as though they were in a vendored path. python-requests.spec: 94 # Unbundle chardet and urllib3. 95 rm -rf build/lib/requests/packages/chardet 96 rm -rf build/lib/requests/packages/urllib3 requests-2.6.0/requests/adapters.py: 17 from .packages.urllib3.util.retry import Retry requests-2.6.0/requests/packages/__init__.py: 107 sys.meta_path.append(VendorAlias(["urllib3", "chardet"])) Here's the old VendorAlias() code for anyone interested. - https://github.com/psf/requests/commit/3e3fc7683cab2d401981fdf01ae8cdc40c8d6222 See also the comments on the left side. I am still investigating so this is not a final conclusion, however here are my findings so far. The issue could be that the addition of urllib3 in sys.modules is not done correctly. requests added a mechanism, which is present in the 2.6.0 version we ship, in order to properly work for unvendored packages without changing the import statements, essentially trying to make urllib3 and requests.packages.urllib3 be the same thing: https://github.com/psf/requests/pull/2375 However it doesn't seem to work for all cases as the urllib3 submodules reside in a different namespace: https://github.com/psf/requests/issues/3287 See also: https://github.com/psf/requests/issues/2870 In the end they removed the mechanism and unbundled all the dependencies as well. pip is actually still using it in a quite refined way: https://github.com/pypa/pip/blob/master/src/pip/_vendor/__init__.py The obvious fix here would be to patch requests to just import urllib3 and not meddle with the modules namespace. Another approach would be to refine the mechanism. It shouldn't be that hard to revise the code but I'm quite worried about potential side effects, as requests is being used by many components, thus I'd prefer to just patch the import statements. (In reply to Charalampos Stratakis from comment #17) > The obvious fix here would be to patch requests to just import urllib3 and > not meddle with the modules namespace. That's what newer versions of requests do as of v2.16 (e.g., https://github.com/psf/requests/commit/168109f13c135d719a0be21ce220534e4b657aed). This commit deletes the vendored packages: - https://github.com/psf/requests/commit/8e17600ef60de4faf632acb55d15cb3c178de9bb This commit shows the intent of v2.16 to unvendor everything: - https://github.com/psf/requests/commit/d2713412e5ca97b68b7f38924cf902729ddbf96d#diff-d9ae695c8119e58c2ec435b6deae2d53R9 Has anyone reached out to Microsoft team with bits to test yet? It came up on today's joint engineering call that no one has heard anything yet. Can we get an update for Microsoft? Thanks. (In reply to Mark Heslin from comment #27) > Has anyone reached out to Microsoft team with bits to test yet? It came up > on today's joint engineering call that no one > has heard anything yet. Can we get an update for Microsoft? Thanks. I am curious to confirm if this issue is related to the PoC? It appears that this was opened under 7.5, the build that was provided for the PoC what 7.6? |