Bug 2000644 - Invalid migration plan causes "controller" pod to crash
Summary: Invalid migration plan causes "controller" pod to crash
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Migration Toolkit for Containers
Classification: Red Hat
Component: Controller
Version: 1.6.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 1.6.0
Assignee: Jason Montleon
QA Contact: Xin jiang
Avital Pinnick
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-02 15:07 UTC by Sergio
Modified: 2021-09-29 14:36 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-09-29 14:35:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2021:3694 0 None None None 2021-09-29 14:36:09 UTC

Description Sergio 2021-09-02 15:07:39 UTC
Description of problem:
When a migration plan is malformed and the 'destMigClusterRef' is pointing to a non existing migcluster resource the migration controller starts crashing.


Version-Release number of selected component (if applicable):
SOURCE CLUSTER: AWS 3.11 (MTC 1.5.1)
TARGET CLUSTER: AWS 4.9 (MTC 1.6.0)

How reproducible:
Always

Steps to Reproduce:
1. Create an empty namespace in the source cluster

$ oc new-project empty-project

2. Create a migration plan in order to migrate this namespace. Use this name for the migplan: "migplan-ocp-40171-malformed-crds"

3. Patch the migplan in order to create a malformed migplan

$ oc -n openshift-migration  patch migplans migplan-ocp-40171-malformed-crds -p '{"spec":{"destMigClusterRef": {"name": "foo"}}}' --type='merge'


Actual results:
The migration controller pod starts crashing

$ oc get pods
NAME                                    READY   STATUS             RESTARTS       AGE
migration-controller-6d854fd675-ntczb   1/2     CrashLoopBackOff   5 (103s ago)   6m32s


Expected results:

The migration plan should become "Not ready" and should have a Critical condition informing about the problem.


Additional info:

This is the crash in the migration controller pod:

{"level":"info","ts":1630591811310,"logger":"controller-runtime.manager.controller.migcluster-controller","msg":"Starting workers","worker count":1}
{"level":"info","ts":1630591811310,"logger":"controller-runtime.manager.controller.migplan-controller","msg":"Starting workers","worker count":1}
{"level":"info","ts":1630591811311,"logger":"controller-runtime.manager.controller.directimagemigration-controller","msg":"Starting workers","worker count":1}
{"level":"info","ts":1630591811312,"logger":"controller-runtime.manager.controller.migstorage-controller","msg":"Starting Controller"}
{"level":"info","ts":1630591811312,"logger":"controller-runtime.manager.controller.migstorage-controller","msg":"Starting workers","worker count":1}
{"level":"info","ts":1630591811313,"logger":"controller-runtime.manager.controller.miganalytic-controller","msg":"Starting workers","worker count":2}
{"level":"info","ts":1630591811313,"logger":"controller-runtime.manager.controller.directvolumemigration-controller","msg":"Starting workers","worker count":1}
{"level":"info","ts":1630591811313,"logger":"controller-runtime.manager.controller.mighook-controller","msg":"Starting workers","worker count":1}
{"level":"info","ts":1630591811313,"logger":"controller-runtime.manager.controller.directvolumemigrationprogress-controller","msg":"Starting workers","worker count":5}
{"level":"info","ts":1630591811314,"logger":"controller-runtime.manager.controller.migmigration-controller","msg":"Starting workers","worker count":1}
E0902 14:10:11.378266       1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 556 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x230f860, 0x3b865b0)
	/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0x95
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x86
panic(0x230f860, 0x3b865b0)
	/opt/rh/go-toolset-1.16/root/usr/lib/go-toolset-1.16-golang/src/runtime/panic.go:965 +0x1b9
github.com/konveyor/mig-controller/pkg/apis/migration/v1alpha1.(*MigCluster).BuildRestConfig(0x0, 0x2a4ab40, 0xc00139bdc0, 0x25e3ce0, 0x8, 0xc00487e720)
	/remote-source/app/pkg/apis/migration/v1alpha1/migcluster_types.go:423 +0x3a
github.com/konveyor/mig-controller/pkg/apis/migration/v1alpha1.(*MigCluster).GetClient(0x0, 0x2a4ab40, 0xc00139bdc0, 0xc00139bdc0, 0x0, 0x0, 0x0)
	/remote-source/app/pkg/apis/migration/v1alpha1/migcluster_types.go:209 +0x5a
github.com/konveyor/mig-controller/pkg/controller/migplan.ReconcileMigPlan.getPotentialFilePermissionConflictNamespaces(0x2a4a868, 0xc000bb4aa0, 0x2a1fb70, 0xc00054df40, 0xc0002a8230, 0x0, 0x0, 0xc001d10500, 0x3c7c9c0, 0x0, ...)
	/remote-source/app/pkg/controller/migplan/validation.go:307 +0x2c5
github.com/konveyor/mig-controller/pkg/controller/migplan.ReconcileMigPlan.validateNamespaces(0x2a4a868, 0xc000bb4aa0, 0x2a1fb70, 0xc00054df40, 0xc0002a8230, 0x0, 0x0, 0x2a2a020, 0xc002026e70, 0xc001d10500, ...)
	/remote-source/app/pkg/controller/migplan/validation.go:451 +0x389
github.com/konveyor/mig-controller/pkg/controller/migplan.ReconcileMigPlan.validate(0x2a4a868, 0xc000bb4aa0, 0x2a1fb70, 0xc00054df40, 0xc0002a8230, 0x0, 0x0, 0x2a2a020, 0xc002026e70, 0xc001d10500, ...)
	/remote-source/app/pkg/controller/migplan/validation.go:158 +0x39c
github.com/konveyor/mig-controller/pkg/controller/migplan.(*ReconcileMigPlan).Reconcile(0xc00054df80, 0x2a2a020, 0xc002026e70, 0xc000557590, 0x13, 0xc000c09cc0, 0x20, 0xc002026e00, 0x0, 0x0, ...)
	/remote-source/app/pkg/controller/migplan/migplan_controller.go:261 +0x553
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0008c52c0, 0x2a29f78, 0xc000702000, 0x23a4e20, 0xc001e9e240)
	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:263 +0x30d
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0008c52c0, 0x2a29f78, 0xc000702000, 0xc00011de00)
	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235 +0x205
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1(0x2a29f78, 0xc000702000)
	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:198 +0x4a
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1()
	/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185 +0x37
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc00011df50)
	/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x5f
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc003e15f50, 0x29dbba0, 0xc002026db0, 0xc000702001, 0xc000bca120)
	/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0x9b
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc00011df50, 0x3b9aca00, 0x0, 0xc80a01, 0xc000bca120)
	/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x98
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext(0x2a29f78, 0xc000702000, 0xc001f75de0, 0x3b9aca00, 0x0, 0x1)
	/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185 +0xa6
k8s.io/apimachinery/pkg/util/wait.UntilWithContext(0x2a29f78, 0xc000702000, 0xc001f75de0, 0x3b9aca00)
	/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:99 +0x57
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1
	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:195 +0x497
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x118 pc=0x196c9da]

goroutine 556 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:55 +0x109
panic(0x230f860, 0x3b865b0)
	/opt/rh/go-toolset-1.16/root/usr/lib/go-toolset-1.16-golang/src/runtime/panic.go:965 +0x1b9
github.com/konveyor/mig-controller/pkg/apis/migration/v1alpha1.(*MigCluster).BuildRestConfig(0x0, 0x2a4ab40, 0xc00139bdc0, 0x25e3ce0, 0x8, 0xc00487e720)
	/remote-source/app/pkg/apis/migration/v1alpha1/migcluster_types.go:423 +0x3a
github.com/konveyor/mig-controller/pkg/apis/migration/v1alpha1.(*MigCluster).GetClient(0x0, 0x2a4ab40, 0xc00139bdc0, 0xc00139bdc0, 0x0, 0x0, 0x0)
	/remote-source/app/pkg/apis/migration/v1alpha1/migcluster_types.go:209 +0x5a
github.com/konveyor/mig-controller/pkg/controller/migplan.ReconcileMigPlan.getPotentialFilePermissionConflictNamespaces(0x2a4a868, 0xc000bb4aa0, 0x2a1fb70, 0xc00054df40, 0xc0002a8230, 0x0, 0x0, 0xc001d10500, 0x3c7c9c0, 0x0, ...)
	/remote-source/app/pkg/controller/migplan/validation.go:307 +0x2c5
github.com/konveyor/mig-controller/pkg/controller/migplan.ReconcileMigPlan.validateNamespaces(0x2a4a868, 0xc000bb4aa0, 0x2a1fb70, 0xc00054df40, 0xc0002a8230, 0x0, 0x0, 0x2a2a020, 0xc002026e70, 0xc001d10500, ...)
	/remote-source/app/pkg/controller/migplan/validation.go:451 +0x389
github.com/konveyor/mig-controller/pkg/controller/migplan.ReconcileMigPlan.validate(0x2a4a868, 0xc000bb4aa0, 0x2a1fb70, 0xc00054df40, 0xc0002a8230, 0x0, 0x0, 0x2a2a020, 0xc002026e70, 0xc001d10500, ...)
	/remote-source/app/pkg/controller/migplan/validation.go:158 +0x39c
github.com/konveyor/mig-controller/pkg/controller/migplan.(*ReconcileMigPlan).Reconcile(0xc00054df80, 0x2a2a020, 0xc002026e70, 0xc000557590, 0x13, 0xc000c09cc0, 0x20, 0xc002026e00, 0x0, 0x0, ...)
	/remote-source/app/pkg/controller/migplan/migplan_controller.go:261 +0x553
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0008c52c0, 0x2a29f78, 0xc000702000, 0x23a4e20, 0xc001e9e240)
	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:263 +0x30d
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0008c52c0, 0x2a29f78, 0xc000702000, 0xc00011de00)
	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235 +0x205
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1(0x2a29f78, 0xc000702000)
	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:198 +0x4a
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1()
	/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185 +0x37
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc00011df50)
	/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x5f
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc003e15f50, 0x29dbba0, 0xc002026db0, 0xc000702001, 0xc000bca120)
	/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0x9b
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc00011df50, 0x3b9aca00, 0x0, 0xc80a01, 0xc000bca120)
	/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x98
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext(0x2a29f78, 0xc000702000, 0xc001f75de0, 0x3b9aca00, 0x0, 0x1)
	/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185 +0xa6
k8s.io/apimachinery/pkg/util/wait.UntilWithContext(0x2a29f78, 0xc000702000, 0xc001f75de0, 0x3b9aca00)
	/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:99 +0x57
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1
	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:195 +0x497

Comment 1 Sergio 2021-09-02 15:10:12 UTC
It happens too if the malformed field is ‘srcMigClusterRef’

Comment 2 Sergio 2021-09-02 15:26:09 UTC
Probably a duplicated of this other BZ https://bugzilla.redhat.com/show_bug.cgi?id=1951869

Comment 3 Erik Nelson 2021-09-03 13:17:17 UTC
We believe this to be resolved as of: https://github.com/konveyor/mig-controller/pull/1186

Comment 7 Sergio 2021-09-08 11:07:57 UTC
Verfied using
SOURCE CLUSTER: AWS OCP 3.11 (MTC 1.5.1) NFS
TARGET CLUSTER: AWS OCP 4.9 (MTC 1.6.0) OCS4

openshift-migration-rhel8-operator@sha256:ef00e934ed578a4acb429f8710284d10acf2cf98f38a2b2268bbea8b5fd7139c
    - name: MIG_CONTROLLER_REPO
      value: openshift-migration-controller-rhel8@sha256
    - name: MIG_CONTROLLER_TAG
      value: 27f465b2cd38cee37af5c3d0fd745676086fe0391e3c459d4df18dd3a12e7051
    - name: MIG_UI_REPO
      value: openshift-migration-ui-rhel8@sha256
    - name: MIG_UI_TAG


Now we get 2 critical conditions and the migration controller is not crashing.

For destination cluster:

status:
  conditions:
  - category: Critical
    lastTransitionTime: "2021-09-08T11:04:13Z"
    message: 'The `dstMigClusterRef` must reference a valid `migcluster`, subject: openshift-migration/foo.'
    reason: NotFound
    status: "True"
    type: InvalidDestinationClusterRef
  - category: Critical
    lastTransitionTime: "2021-09-08T11:04:13Z"
    message: 'Reconcile failed: [destination cluster not found]. See controller logs for details.'
    status: "True"
    type: ReconcileFailed


For source cluster:

status:
  conditions:
  - category: Critical
    lastTransitionTime: "2021-09-08T11:07:07Z"
    message: 'The `srcMigClusterRef` must reference a valid `migcluster`, subject: openshift-migration/foo.'
    reason: NotFound
    status: "True"
    type: InvalidSourceClusterRef
  - category: Critical
    lastTransitionTime: "2021-09-08T11:07:07Z"
    message: 'Reconcile failed: [source cluster not found]. See controller logs for details.'
    status: "True"
    type: ReconcileFailed


Moved to VERIFIED status

Comment 9 errata-xmlrpc 2021-09-29 14:35:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Migration Toolkit for Containers (MTC) 1.6.0 security & bugfix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3694


Note You need to log in before you can comment on or make changes to this bug.