Aligning to 4.4.0 (our 1.2.0) as we lack info on a reproducer.
Hi Daein Would you please let me know how you stopped both clusters for 1 day? I am trying to reproduce this issue
@Xin I have the same issue in the Red Hat OpenShift Container Platform 4 Migration (PILOT) in labs opentlc , I think Daein means when the lab is stopped from opentlc (8 hours) and we start again the lab The error occur in the screen Create a migration plan General Resources When click NEXT to the Persistent Volumes screen Choose to move or copy persistent volumes: Danger alert:Reconcile failed: [no matches for kind "DeploymentConfig" in version "apps.openshift.io/v1"]. See controller logs for details. This is the log of migration-controller-*** { "level": "error", "ts": 1580917497.6319182, "logger": "plan|bf8jf", "msg": "", "error": "no matches for kind \"DeploymentConfig\" in version \"apps.openshift.io/v1\"", "stacktrace": "github.com/fusor/mig-controller/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/fusor/mig-controller/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/fusor/mig-controller/pkg/logging.Logger.Error\n\t/go/src/github.com/fusor/mig-controller/pkg/logging/logger.go:64\ngithub.com/fusor/mig-controller/pkg/logging.Logger.Trace\n\t/go/src/github.com/fusor/mig-controller/pkg/logging/logger.go:70\ngithub.com/fusor/mig-controller/pkg/controller/migplan.(*ReconcileMigPlan).ensureClosed\n\t/go/src/github.com/fusor/mig-controller/pkg/controller/migplan/migplan_controller.go:299\ngithub.com/fusor/mig-controller/pkg/controller/migplan.(*ReconcileMigPlan).handleClosed\n\t/go/src/github.com/fusor/mig-controller/pkg/controller/migplan/migplan_controller.go:285\ngithub.com/fusor/mig-controller/pkg/controller/migplan.(*ReconcileMigPlan).Reconcile\n\t/go/src/github.com/fusor/mig-controller/pkg/controller/migplan/migplan_controller.go:195\ngithub.com/fusor/mig-controller/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/fusor/mig-controller/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:215\ngithub.com/fusor/mig-controller/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/fusor/mig-controller/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\ngithub.com/fusor/mig-controller/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/fusor/mig-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\ngithub.com/fusor/mig-controller/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/fusor/mig-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\ngithub.com/fusor/mig-controller/vendor/k8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/fusor/mig-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88" }
Does this only happen immediately after starting the cluster? In my development environment, I've noticed that I get the same reconcile failed error when I restart my controller, but it goes away on the next reconcile cycle. It seems to be a startup bootstrapping situation, although I don't know if everyone is seeing this.
Daein, Can you answer Scott's question in comment #4? Scott: I suspect that OpenShift needs a specific procedure followed for a shutdown/restart, below is a JIRA tracking documenting this procedure. (Aligned to OCP 4.5). https://issues.redhat.com/browse/MSTR-931 For CAM's perspective, let's ensure that our controllers are honoring whatever is required to ensure we function after a restart. I'm assuming we should delay this BZ to after MSTR-931 is together so we can verify CAM works correctly after following the recommended OpenShift procedures. I will change the Target Release for this BZ so we can revisit for CAM 1.3.0 (aligned to OpenShift 4.5.0)
@Scott, > Does this only happen immediately after starting the cluster? In my development environment, I've noticed that I get the same reconcile failed error when I restart my controller, but it goes away on the next reconcile cycle. It seems to be a startup bootstrapping situation, although I don't know if everyone is seeing this. I think customer's issue is not temporary one. Customer opened the same case once again due to not disappearing the issue. So I guided install the CAM again, and the issue has been passed away. Now the support case has been closed.
The immediate task here is to deal with the known bootstrapping issue. This comes up immediately after restarting the controller, and goes away shortly. There are two different stack traces here. In one, the failure happens in the close plan functionality. ensureClosed calls cluster.DeleteResources, and that seems to fail when calling `client.list()` passing in `ocapi.DeploymentConfigList` with 'no matches for kind "DeploymentConfig" in version "apps.openshift.io/v1"' In the other case, something very similar happens when attempting to pull a list of ImageStreams: 'no matches for kind "ImageStream" in version "image.openshift.io/v1"' This is only happening where we're attempting to access openshift-specific resources. There seems to be some delay after startup before the api can cope with openshift CRD-related requests. It sounds like the "doesn't go away" case that the customer was having isn't happening now, but if that resurfaces in a non-startup context, there may be a separate issue to contend with here.
It looks like we're only adding DeploymentConfig and ImageStream to the scheme in the remote watch but not in main.go, so if we reference those from the main controllers before the remote watch starts up, we get an error. The fix is probably to move the scheme registration for these to main.go
Fix PR is here: https://github.com/konveyor/mig-controller/pull/477
Verified using CAM 1.2 stage. openshift-migration-controller-rhel8@sha256:cfa217207d99d11e44df551c1a6a2ccd9b815f8f7bc9a25dfcb8fcdaa4f8e7d6 In order to reproduce the issue the controller was restarted while executing a migration. This issue didn't happen.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:2326