RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1750839 - fence_azure_arm agent intermittently reports the error "ERROR: Failed: unsupported operand type(s) for -=: 'Retry' and 'int'"
Summary: fence_azure_arm agent intermittently reports the error "ERROR: Failed: unsupp...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: python-requests
Version: 7.5
Hardware: x86_64
OS: Linux
high
medium
Target Milestone: rc
: ---
Assignee: Python Maintainers
QA Contact: Jan Kepler
URL:
Whiteboard:
Depends On:
Blocks: 1776292 1776293 1776294 2131394
TreeView+ depends on / blocked
 
Reported: 2019-09-10 14:52 UTC by juholmes
Modified: 2023-03-24 15:25 UTC (History)
19 users (show)

Fixed In Version: python-requests-2.6.0-8.el7
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1776292 1776293 1776294 2131394 (view as bug list)
Environment:
Last Closed: 2021-03-08 14:18:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
sosreport containing sos_commands directory (10.6 MB) (10.07 MB, application/x-tar)
2019-09-10 19:44 UTC, juholmes
no flags Details
sosreport with var_logs directory (15.73 MB, application/x-tar)
2019-09-10 20:08 UTC, juholmes
no flags Details
Partial Python trace of reproduction (3.66 MB, text/plain)
2019-09-10 23:59 UTC, Reid Wahl
no flags Details
Partial Python trace of fence_azure_arm success for comparison (3.91 MB, text/plain)
2019-09-11 00:04 UTC, Reid Wahl
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 4341911 0 None None None 2019-09-10 14:52:20 UTC


Description juholmes 2019-09-10 14:52:20 UTC
Description of problem:

Fence agent fence_azure_arm is logging failures during fence monitor operations with the error fence_azure_arm: Failed: unsupported operand type(s) for -=: 'Retry' and 'int'.

This occurs in the following function:
  /usr/lib/python2.7/site-packages/urllib3/util/retry.py
    def increment(self, method=None, url=None, response=None, error=None, _pool=None, _stacktrace=None):
	if self.total is False and error:
            # Disabled, indicate to re-raise the error.
            raise six.reraise(type(error), error, _stacktrace)

	total = self.total
        if total is not None:
	    total -= 1 <------ Throws exception
        [.....]


Looks like this is issue with the fence-agents-azure-arm, but something in one of the python libraries if this is indeed a bug of some sort. 

Looking at backtrace we see the different packages used based on the files:
  fence-agents-azure-arm
   -> python-azure-sdk
        -> python-msrest 
           -> python-requests
              ->  python-urllib3


It would appear that error is caused because it cannot decrement the variable "total" which is the same value that is in "self.total". 
The error is stating the following:
  - The "total" variable that is a "Retry" (object) cannot be decremented with an int. 



The variable self.total can be the following values based on comments in the code:
  - None
  - False
  - int

  - urllib3.util package — urllib3 1.25.3 documentation 
    https://urllib3.readthedocs.io/en/latest/reference/urllib3.util.html#module-urllib3.util.retry

What we know it is not:
  - We know it is not None.
  - We know it is not False and an "error" is not empty.

We know it is a "Retry" object as that is what error states the variable is. 
  Sep  3 19:17:47 sapds6-dhdb00 stonith-ng[1569]: warning: fence_azure_arm[94113] stderr: [ TypeError: unsupported operand type(s) for -=: 'Retry' and 'int' ]



Version-Release number of selected component (if applicable):

RHEL 7.5 
fence-agents-azure-arm-4.2.1-24.el7.x86_64
python-azure-sdk-4.0.0-1.el7.noarch
python-msrestazure-0.5.1-1.el7.noarch
python2-requests-oauthlib-0.8.0-5.el7.noarch
python-msrest-0.5.4-0.el7_6.1.noarch
python-urllib3-1.10.2-7.el7.noarch
python-requests-2.6.0-1.el7_1.noarch



How reproducible:

Allow fence_azure_arm agent to run the monitor operations, and the error will be randomly triggered.

Steps to Reproduce:
1.
2.
3.
We can create similar errors if we try to decrement a string or variable that is set to None.
  # python
  Python 2.7.5 (default, Jun 11 2019, 14:33:56)
  [GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> total = 10
  >>> print(total)
  10
  >>> total -= 1
  >>> print(total)
  9
  >>> total = False
  >>> print(total)
  False
  >>> total -= 1
  >>> print(total)
  -1
  >>> total = "nine"
  >>> print(total)
  nine
  >>> total -= 1
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  TypeError: unsupported operand type(s) for -=: 'str' and 'int'
  >>> total = 10
  >>> print(total)
  10
  >>> total -= 1
  >>> print(total) 
  9
  >>> total = None
  >>> print(total)
  None
  >>> total -= 1
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  TypeError: unsupported operand type(s) for -=: 'NoneType' and 'int'
  >>>


Actual results:

stonith-ng[1569]:  notice: Operation 'monitor' [54508] for device 'rsc_st_azure' returned: -201 (Generic Pacemaker error)
stonith-ng[1569]: warning: fence_azure_arm[56971] stderr: [ 2019-09-02 03:09:24,904 ERROR: Failed: unsupported operand type(s) for -=: 'Retry' and 'int' ]
stonith-ng[1569]: warning: fence_azure_arm[56971] stderr: [  ]
stonith-ng[1569]: warning: fence_azure_arm[56971] stderr: [ 2019-09-02 03:09:24,904 ERROR: Please use '-h' for usage ]
stonith-ng[1569]: warning: fence_azure_arm[56971] stderr: [  ]

Expected results:

No intermittent monitor operation failures.

Additional info:

Comment 3 juholmes 2019-09-10 19:44:54 UTC
Created attachment 1613722 [details]
sosreport containing sos_commands directory (10.6 MB)

Comment 4 juholmes 2019-09-10 20:08:04 UTC
Created attachment 1613725 [details]
sosreport with var_logs directory

Comment 5 Reid Wahl 2019-09-10 23:59:38 UTC
Created attachment 1613838 [details]
Partial Python trace of reproduction

I was able to reproduce the issue by running a loop of the fence command for a while with `python -m trace -t <command>`.

    [clouduser@rhel7-node-2 ~]$ 2019-09-10 23:42:56,807 ERROR: Failed: unsupported operand type(s) for -=: 'Retry' and 'int'

    2019-09-10 23:42:56,809 ERROR: Please use '-h' for usage

The trace from 23:42 is attached. There was also an identical exception and trace at 23:39.

Comment 6 Reid Wahl 2019-09-11 00:00:32 UTC
My reproducer was run on an updated RHEL 7.7 Azure VM.

Comment 7 Reid Wahl 2019-09-11 00:03:44 UTC
These aren't complete traces, by the way. I was capturing only the last 60000 lines due to disk space constraints. If we need the full trace I'll try to reproduce again.

Comment 8 Reid Wahl 2019-09-11 00:04:55 UTC
Created attachment 1613839 [details]
Partial Python trace of fence_azure_arm success for comparison

"Good" result trace for comparison.

Comment 9 Reid Wahl 2019-09-11 01:03:03 UTC
I added debugging to /usr/lib/python2.7/site-packages/urllib3/util/retry.py.

```
class Retry(object):
...
    def __init__(self, total=10, connect=None, read=None, redirect=None,
                 method_whitelist=DEFAULT_METHOD_WHITELIST, status_forcelist=None,
                 backoff_factor=0, raise_on_redirect=True, _observed_errors=0,
                 remove_headers_on_redirect=DEFAULT_REDIRECT_HEADERS_BLACKLIST):

        with open('/tmp/py.out', 'a') as f:
            print >> f, 'Retry __init__: total is %s' % total
    ...
    def increment(self, method=None, url=None, response=None, error=None, _pool=None, _stacktrace=None):
        ...
        with open('/tmp/py.out', 'a') as f:
            print >> f, 'Retry increment: total is %s' % total
```

After each run of fence_azure_arm, this is appended to /tmp/py.out:

```
$ cat /tmp/py.out 
Retry __init__: total is 3
Retry __init__: total is 3
Retry __init__: total is 0
Retry __init__: total is 0
Retry __init__: total is 10
Retry __init__: total is 0
Retry __init__: total is 0
Retry __init__: total is 0
Retry __init__: total is 0
Retry __init__: total is Retry(total=4, connect=4, read=4, redirect=None)
```


That last one looks like a landmine to me. **Every time**, a Retry object is getting created with `total` getting assigned another Retry object instead of an `int`. However, we don't hit a TypeError unless we get a connection failure and have to call `Retry.increment()`, which then runs `total -= 1`.

Comment 10 Reid Wahl 2019-09-11 01:17:41 UTC
The `Retry.from_int()` method is broken.

    @classmethod
    def from_int(cls, retries, redirect=True, default=None):
        """ Backwards-compatibility for the old retries format."""
        if retries is None:
            retries = default if default is not None else cls.DEFAULT

        if isinstance(retries, Retry):
            return retries

        redirect = bool(redirect) and None
        new_retries = cls(retries, redirect=redirect)
        log.debug("Converted retries value: %r -> %r" % (retries, new_retries))   <-- HERE
        return new_retries


With loglevel set to DEBUG:

2019-09-11 01:16:20,035 DEBUG: Converted retries value: Retry(total=4, connect=4, read=4, redirect=None) -> Retry(total=Retry(total=4, connect=4, read=4, redirect=None), connect=None, read=None, redirect=0)



Looks like a urllib3 bug then.

Comment 11 Reid Wahl 2019-09-11 01:41:27 UTC
from_int() should return retries if retries is an instance of the Retry class. It shouldn't create a new Retry object.

But the isinstance check is failing.

    @classmethod
    def from_int(cls, retries, redirect=True, default=None):
        """ Backwards-compatibility for the old retries format."""
        log.debug("retries is %r" % retries)
        log.debug("type of retries is %r" % type(retries))
        log.debug("cls is %r" % cls)
        if retries is None:
            retries = default if default is not None else cls.DEFAULT

        if isinstance(retries, Retry):
            log.debug("YOU ARE HERE")
            return retries
        redirect = bool(redirect) and None
        new_retries = cls(retries, redirect=redirect)
        log.debug("Converted retries value: %r -> %r" % (retries, new_retries))   <-- HERE
        return new_retries


2019-09-11 01:41:00,519 DEBUG: retries is Retry(total=4, connect=4, read=4, redirect=None)
2019-09-11 01:41:00,519 DEBUG: type of retries is <class 'urllib3.util.retry.Retry'>
2019-09-11 01:41:00,519 DEBUG: cls is <class 'requests.packages.urllib3.util.retry.Retry'>
2019-09-11 01:41:00,519 DEBUG: Converted retries value: Retry(total=4, connect=4, read=4, redirect=None) -> Retry(total=Retry(total=4, connect=4, read=4, redirect=None), connect=None, read=None, redirect=0)

Comment 12 Reid Wahl 2019-09-11 02:00:38 UTC
On second look I think the issue is in the caller python-requests, based on:
  - the type mismatch in Comment 11, and
  - the fact that the DEBUG messages disappear entirely when I replace requests-2.6 (from yum) with requests-2.22 (from pip).

Comment 13 Reid Wahl 2019-09-11 02:10:38 UTC
(In reply to Reid Wahl from comment #12)
> On second look I think the issue is in the caller python-requests, based on:
>   - the type mismatch in Comment 11, and
>   - the fact that the DEBUG messages disappear entirely when I replace
> requests-2.6 (from yum) with requests-2.22 (from pip).

Correction, DEBUG messages disappeared with requests-2.11, not 2.22.

Comment 14 Reid Wahl 2019-09-11 09:13:40 UTC
I also found this in the msrest library (from python-msrest in the HA repos):

    [root@fastvm-rhel-7-6-21 ~]# grep requests /usr/lib/python2.7/site-packages/msrest/pipeline.py | head -2
    import requests
    from urllib3 import Retry  # Needs requests 2.16 at least to be safe

The latest python-requests that Red Hat ships is 2.6. I think the reason for the comment is that upstream requests-2.16 unbundled a bunch of libraries.
  - https://github.com/psf/requests/commit/8e17600ef60de4faf632acb55d15cb3c178de9bb
  - https://github.com/psf/requests/commit/d2713412e5ca97b68b7f38924cf902729ddbf96d#diff-d9ae695c8119e58c2ec435b6deae2d53R9

This version recommendation lines up with the type conflict between the two different Retry classes in Comment 11 and may be related.

Our python-requests.spec file unbundles urllib3 and chardet by removing the bundled libs but doesn't change the import statements. I think the VendorAlias() class is pulling in the system libs as though they were in a vendored path.

    python-requests.spec:
     94 # Unbundle chardet and urllib3.
     95 rm -rf build/lib/requests/packages/chardet
     96 rm -rf build/lib/requests/packages/urllib3

    requests-2.6.0/requests/adapters.py:
     17 from .packages.urllib3.util.retry import Retry

    requests-2.6.0/requests/packages/__init__.py:
    107 sys.meta_path.append(VendorAlias(["urllib3", "chardet"]))

Comment 16 Reid Wahl 2019-09-11 20:14:48 UTC
Here's the old VendorAlias() code for anyone interested.
  - https://github.com/psf/requests/commit/3e3fc7683cab2d401981fdf01ae8cdc40c8d6222

See also the comments on the left side.

Comment 17 Charalampos Stratakis 2019-10-01 00:57:58 UTC
I am still investigating so this is not a final conclusion, however here are my findings so far.

The issue could be that the addition of urllib3 in sys.modules is not done correctly.

requests added a mechanism, which is present in the 2.6.0 version we ship, in order to properly work for unvendored packages without changing the import statements, essentially trying to make urllib3 and requests.packages.urllib3 be the same thing: https://github.com/psf/requests/pull/2375

However it doesn't seem to work for all cases as the urllib3 submodules reside in a different namespace: https://github.com/psf/requests/issues/3287  See also: https://github.com/psf/requests/issues/2870

In the end they removed the mechanism and unbundled all the dependencies as well. pip is actually still using it in a quite refined way: https://github.com/pypa/pip/blob/master/src/pip/_vendor/__init__.py

The obvious fix here would be to patch requests to just import urllib3 and not meddle with the modules namespace. Another approach would be to refine the mechanism. It shouldn't be that hard to revise the code but I'm quite worried about potential side effects, as requests is being used by many components, thus I'd prefer to just patch the import statements.

Comment 18 Reid Wahl 2019-10-01 02:20:35 UTC
(In reply to Charalampos Stratakis from comment #17)
> The obvious fix here would be to patch requests to just import urllib3 and
> not meddle with the modules namespace.

That's what newer versions of requests do as of v2.16 (e.g., https://github.com/psf/requests/commit/168109f13c135d719a0be21ce220534e4b657aed).

This commit deletes the vendored packages:
  - https://github.com/psf/requests/commit/8e17600ef60de4faf632acb55d15cb3c178de9bb

This commit shows the intent of v2.16 to unvendor everything:
  - https://github.com/psf/requests/commit/d2713412e5ca97b68b7f38924cf902729ddbf96d#diff-d9ae695c8119e58c2ec435b6deae2d53R9

Comment 27 Mark Heslin 🎸 2019-10-21 16:47:03 UTC
Has anyone reached out to Microsoft team with bits to test yet? It came up on today's joint engineering call that no one
has heard anything yet. Can we get an update for Microsoft? Thanks.

Comment 28 Jon Sturgis 2019-10-21 17:20:42 UTC
(In reply to Mark Heslin from comment #27)
> Has anyone reached out to Microsoft team with bits to test yet? It came up
> on today's joint engineering call that no one
> has heard anything yet. Can we get an update for Microsoft? Thanks.

I am curious to confirm if this issue is related to the PoC? It appears that this was opened under 7.5, the build that was provided for the PoC what 7.6?


Note You need to log in before you can comment on or make changes to this bug.