Bug 1896614 - Unexpected StopIteration in in-process cache management
Summary: Unexpected StopIteration in in-process cache management
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-django-horizon
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: z15
: 13.0 (Queens)
Assignee: Radomir Dopieralski
QA Contact: Radomir Dopieralski
URL:
Whiteboard:
Depends On: 1897294
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-11-11 03:48 UTC by Takashi Kajinami
Modified: 2022-08-23 16:37 UTC (History)
3 users (show)

Fixed In Version: python-django-horizon-13.0.3-8.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-03-18 13:08:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker DFGUI-1665 0 None None None 2022-08-23 16:37:08 UTC
Red Hat Issue Tracker OSP-3337 0 None None None 2022-08-23 16:26:53 UTC
Red Hat Product Errata RHBA-2021:0932 0 None None None 2021-03-18 13:10:11 UTC

Description Takashi Kajinami 2020-11-11 03:48:34 UTC
Description of problem:

It was observed that Horizon continuously hit StopIteration in cache management logic,
and we needed to restart horizon log to fix the error.

Version-Release number of selected component (if applicable):
RHOSP13z7

How reproducible:
The issue was reported once. The condition to reproduce the issue is not yet clear.

Steps to Reproduce:
 - TBD

Actual results:
 - Horizon encounters StopIteration and shows some error messages

Expected results:
 - No errors in Horizon

Additional info:

Comment 3 Radomir Dopieralski 2020-11-12 17:29:28 UTC
Looking at the code of Python's standard library, there is a race condition in their implementation of OrderedDict.popitem():

    def popitem(self, last=True):
        if not self:
            raise KeyError('dictionary is empty')
        key = next(reversed(self) if last else iter(self))
        value = self.pop(key)
        return key, value

You can see that they first check if the dictionary is empty, and then perform the operations assuming it is not. However, if it becomes empty somewhere in the mean time, a StopIteration will be raised. This seems to be a very rare case, but with this code being executed enough times, sooner or later it will happen.

I'm going to report this as a bug against Python.

Comment 4 Takashi Kajinami 2020-11-16 02:22:50 UTC
Hi Radomir,

Thank you for your investigation.
I agree that is the cause of the issue according to the error recorded.

Interestingly the issue was observed repeatedly in the deployment, while according to the mechanism the issue sounds like a kind of timing issue.
I guess some behavior in horizon cause some situation where we hit the error consistently, but I've not yet identified the actual way to reproduce the situation.

I agree that we need to fix the problem in python layer, but if the OrderedDict is not supposed to be really thread safe by design
I think we should introduce a lock mechanism for that clean-up step, IMO.

Comment 5 Radomir Dopieralski 2020-11-16 15:39:06 UTC
Ultimately we want to switch to using regular dicts, as they are now ordered by default in Python 3.6 and later.

Comment 6 Radomir Dopieralski 2020-11-17 14:33:56 UTC
Since we can't fix the underlying issue in Python 2.7, we can work around this problem by catching the unexpected exception — in the case when the dict is empty the call is doing nothing anyways.

However, it seems that this should be a very rare occurrence, and so far the only information we have about it is from the logs. I wonder if it's worth fixing it now, when OSP13 support is ending and everyone should be switching to OSP16, which already uses Python 3 and doesn't have that problem.

Comment 7 Takashi Kajinami 2020-11-17 23:57:34 UTC
RHOSP13 still has more than 2 years left until its ELS phase ends, so I'm afraid that still some customers stay on RHOSP13 .

When we hit this issue in the real deployment, the issue was not solved until we restarted horizon container(*1).
Also, it was difficult to notice the problem unless we carefully monitor horizon logs.

So I believe that the issue is something wotrh fixing in RHOSP13.

(*1) I guess the existing caching data, which was cleared by restarting horizon, caused the problem but I've not yet found its detail...

Comment 19 errata-xmlrpc 2021-03-18 13:08:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 13.0 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0932


Note You need to log in before you can comment on or make changes to this bug.