Bug 1782318

Summary:	Reconcile failed: [no matches for kind "ImageStream" in version "image.openshift.io/v1"]. See controller logs for details
Product:	OpenShift Container Platform	Reporter:	John Matthews <jmatthew>
Component:	Migration Tooling	Assignee:	Scott Seago <sseago>
Status:	CLOSED ERRATA	QA Contact:	Xin jiang <xjiang>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	4.2.0	CC:	aguadarr, dapark, dymurray, sregidor, sseago, xjiang
Target Milestone:	---
Target Release:	4.4.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1776120	Environment:
Last Closed:	2020-05-28 11:09:55 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1776120

Comment 1 John Matthews 2019-12-12 14:20:55 UTC

Aligning to 4.4.0 (our 1.2.0) as we lack info on a reproducer.

Comment 2 Xin jiang 2019-12-17 02:43:20 UTC

Hi Daein 


Would you please let me know how you stopped both clusters for 1 day? I am trying to reproduce this issue

Comment 3 Alejandro G 2020-02-05 16:09:09 UTC

@Xin I have the same issue in the Red Hat OpenShift Container Platform 4 Migration (PILOT) in labs opentlc , I think Daein means when the lab is stopped from opentlc (8 hours) and we start again the lab


The error occur in the screen Create a migration plan
General
Resources

When click NEXT to the 

Persistent Volumes screen
     Choose to move or copy persistent volumes:


Danger alert:Reconcile failed: [no matches for kind "DeploymentConfig" in version "apps.openshift.io/v1"]. See controller logs for details. 


This is the log of migration-controller-***

{
  "level": "error",
  "ts": 1580917497.6319182,
  "logger": "plan|bf8jf",
  "msg": "",
  "error": "no matches for kind \"DeploymentConfig\" in version \"apps.openshift.io/v1\"",
  "stacktrace": "github.com/fusor/mig-controller/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/fusor/mig-controller/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/fusor/mig-controller/pkg/logging.Logger.Error\n\t/go/src/github.com/fusor/mig-controller/pkg/logging/logger.go:64\ngithub.com/fusor/mig-controller/pkg/logging.Logger.Trace\n\t/go/src/github.com/fusor/mig-controller/pkg/logging/logger.go:70\ngithub.com/fusor/mig-controller/pkg/controller/migplan.(*ReconcileMigPlan).ensureClosed\n\t/go/src/github.com/fusor/mig-controller/pkg/controller/migplan/migplan_controller.go:299\ngithub.com/fusor/mig-controller/pkg/controller/migplan.(*ReconcileMigPlan).handleClosed\n\t/go/src/github.com/fusor/mig-controller/pkg/controller/migplan/migplan_controller.go:285\ngithub.com/fusor/mig-controller/pkg/controller/migplan.(*ReconcileMigPlan).Reconcile\n\t/go/src/github.com/fusor/mig-controller/pkg/controller/migplan/migplan_controller.go:195\ngithub.com/fusor/mig-controller/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/fusor/mig-controller/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:215\ngithub.com/fusor/mig-controller/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/fusor/mig-controller/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\ngithub.com/fusor/mig-controller/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/fusor/mig-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\ngithub.com/fusor/mig-controller/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/fusor/mig-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\ngithub.com/fusor/mig-controller/vendor/k8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/fusor/mig-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"
}

Comment 4 Scott Seago 2020-03-30 13:08:57 UTC

Does this only happen immediately after starting the cluster? In my development environment, I've noticed that I get the same reconcile failed error when I restart my controller, but it goes away on the next reconcile cycle. It seems to be a startup bootstrapping situation, although I don't know if everyone is seeing this.

Comment 5 John Matthews 2020-03-30 13:34:11 UTC

Daein,

Can you answer Scott's question in comment #4?


Scott:
I suspect that OpenShift needs a specific procedure followed for a shutdown/restart, below is a JIRA tracking documenting this procedure.  (Aligned to OCP 4.5).
https://issues.redhat.com/browse/MSTR-931

For CAM's perspective, let's ensure that our controllers are honoring whatever is required to ensure we function after a restart.
I'm assuming we should delay this BZ to after MSTR-931 is together so we can verify CAM works correctly after following the recommended OpenShift procedures.
I will change the Target Release for this BZ so we can revisit for CAM 1.3.0 (aligned to OpenShift 4.5.0)

Comment 6 Daein Park 2020-03-31 00:57:30 UTC

@Scott,

> Does this only happen immediately after starting the cluster? In my development environment, I've noticed that I get the same reconcile failed error when I restart my controller, but it goes away on the next reconcile cycle. It seems to be a startup bootstrapping situation, although I don't know if everyone is seeing this.

I think customer's issue is not temporary one. Customer opened the same case once again due to not disappearing the issue.
So I guided install the CAM again, and the issue has been passed away. Now the support case has been closed.

Comment 7 Scott Seago 2020-04-01 14:55:11 UTC

The immediate task here is to deal with the known bootstrapping issue. This comes up immediately after restarting the controller, and goes away shortly. There are two different stack traces here. In one, the failure happens in the close plan functionality. ensureClosed calls cluster.DeleteResources, and that seems to fail when calling `client.list()` passing in `ocapi.DeploymentConfigList` with 'no matches for kind "DeploymentConfig" in version "apps.openshift.io/v1"'

In the other case,  something very similar happens when attempting to pull a list of ImageStreams: 'no matches for kind "ImageStream" in version "image.openshift.io/v1"'

This is only happening where we're attempting to access openshift-specific resources. There seems to be some delay after startup before the api can cope with openshift CRD-related requests.

It sounds like the "doesn't go away" case that the customer was having isn't happening now, but if that resurfaces in a non-startup context, there may be a separate issue to contend with here.

Comment 8 Scott Seago 2020-04-01 15:34:30 UTC

It looks like we're only adding DeploymentConfig and ImageStream to the scheme in the remote watch but not in main.go, so if we reference those from the main controllers before the remote watch starts up, we get an error. The fix is probably to move the scheme registration for these to main.go

Comment 9 Scott Seago 2020-04-01 18:01:09 UTC

Fix PR is here: https://github.com/konveyor/mig-controller/pull/477

Comment 14 Sergio 2020-05-13 14:51:31 UTC

Verified using CAM 1.2 stage.

openshift-migration-controller-rhel8@sha256:cfa217207d99d11e44df551c1a6a2ccd9b815f8f7bc9a25dfcb8fcdaa4f8e7d6

In order to reproduce the issue the controller was restarted while executing a migration. This issue didn't happen.

Comment 16 errata-xmlrpc 2020-05-28 11:09:55 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:2326