Bug 1921892 - MAO: controller runtime manager closes event recorder
Summary: MAO: controller runtime manager closes event recorder
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.7
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.8.0
Assignee: Michael Gugino
QA Contact: Milind Yadav
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-28 18:59 UTC by Michael Gugino
Modified: 2021-07-27 22:37 UTC (History)
0 users

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 22:37:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-api-operator pull 809 0 None open Bug 1921892: Ensure manager stops before ending the test 2021-02-24 15:19:48 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:37:37 UTC

Description Michael Gugino 2021-01-28 18:59:19 UTC
https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_machine-api-operator/791/pull-ci-openshift-machine-api-operator-master-unit/1354463724173266944

 E0127 16:21:57.389596   10599 controller.go:167] Failed to reconcile MachineSet "ms-test/foo": failed to update machine set status: MachineSet.machine.openshift.io "foo" not found
E0127 16:21:57.389740   10599 controller.go:267] controller-runtime/manager/controller/machineset_controller "msg"="Reconciler error" "error"="failed to update machine set status: MachineSet.machine.openshift.io \"foo\" not found" "name"="foo" "namespace"="ms-test" 
E0127 16:21:57.389870   10599 runtime.go:78] Observed a panic: "send on closed channel" (send on closed channel) 

This appears to be caused by a couple of compounding issues and is affecting our CI unit tests.

First, we call the manager cancel context: https://github.com/openshift/machine-api-operator/blob/release-4.7/pkg/controller/machineset/machineset_controller_test.go#L68

We don't wait for the manager to exit.  What appears to happen is one or more subsequent tests run, the canceled manager kills the recorder prior to stopping the reconcile loop, the canceled manager reconciles an object from the next test (or possibly a requeued object from it's own test), the recorder is unable to write because the manager closed it first.

Comment 1 Michael Gugino 2021-01-28 19:04:32 UTC
Appears possibly related to this change: https://github.com/kubernetes-sigs/controller-runtime/pull/1089

This is not a blocker as it will only impact our tests.

Comment 8 errata-xmlrpc 2021-07-27 22:37:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.