Bug 1950809 - cluster-samples-operator restarts approximately two times per day and logs too many same messages
Summary: cluster-samples-operator restarts approximately two times per day and logs to...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Samples
Version: 4.6
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.6.z
Assignee: Gabe Montero
QA Contact: XiuJuan Wang
URL:
Whiteboard:
Depends On: 1950808
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-18 22:26 UTC by Gabe Montero
Modified: 2021-07-08 06:59 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: based on the timing of the events it receives, the samples operator could end up breaking the contract with k8s SharedInformers an mutate the controller cache for the objects (Samples Config, Templates, Imagestreams) that it watches. Also, the frequency of concurrent updates to the samples config cr instance when tracking imagestream status lead to increasing the likelihood of hitting this timing window with incorrectly mutating the controller cache. Consequence: In many cases, robustness in k8s kept thing OK, but we've now seen cases where this violation produced a panic in k8s when samples operator tried to updated the objects it watches. Fix: Stop mutating the cache via better use of k8s DeepCopy prior to updates. Also adjusted when we copy config information from spec to status in the samples config CR instances. Also, in 4.6.z a change has been added which reduces, though we cannot eliminate, concurrent attempts during imagestream event proccessing to update the samples operator config CR instance. Result: the samples operator no longer mutates its SharedInformer cache, and avoids panics in k8s when updating the objects it manages. Also, the amount of update conflicts that can occur when concurrent imagestream events result in updating the samples operator CR instances, has been greatly reduced, though they cannot be eliminated, an a few of them still occurring are OK.
Clone Of: 1950808
Environment:
Last Closed: 2021-05-12 12:18:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-samples-operator pull 371 0 None open Bug 1950809: add DeepCopy to avoid SharedInformer cache mutation 2021-04-19 00:12:26 UTC
Red Hat Product Errata RHBA-2021:1487 0 None None None 2021-05-12 12:18:29 UTC

Comment 6 errata-xmlrpc 2021-05-12 12:18:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.28 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1487


Note You need to log in before you can comment on or make changes to this bug.