Description of problem: change the value of srcMigClusterRef to a non-existent cluster name, the migplan still shows "The migration plan is ready" without error. but when change the value of destMigClusterRef to a non-existent cluster, the migplan will show a error message like 'The `dstMigClusterRef` must reference a valid `migcluster`, subject: openshift-migration/foo.' Version-Release number of selected component (if applicable): MTC 1.4.3 image: registry.redhat.io/rhmtc/openshift-migration-rhel7-operator@sha256:a0876b81eb32b03a805f84e6ebca656dc2704b69f1a169d695f8a48d861dba45 How reproducible: Always Steps to Reproduce: 1. Create a common migplan for an application as normal 2. edit migplan CR to change the value of srcMigClusterRef.name to a non-existent cluster name, such like "foo" $ oc get migcluster NAME READY URL HOST AGE host True true 4h2m source-cluster True https://api.cam-src-17360.qe.azure.devcluster.openshift.com:6443 false 3h23m $ oc edit migplan mig-plan-00000-ocp-00000-nginx ..... srcMigClusterRef: name: foo namespace: openshift-migration 3. check migplan status $ oc get migplan mig-plan-00000-ocp-00000-nginx -o yaml apiVersion: migration.openshift.io/v1alpha1 ..... spec: destMigClusterRef: name: host namespace: openshift-migration ..... srcMigClusterRef: name: foo namespace: openshift-migration status: conditions: - category: Warn lastTransitionTime: "2021-04-20T13:10:39Z" message: 'Found Pods with `Spec.NodeSelector` or `Spec.NodeName` set in namespaces: [ocp-00000-nginx]. These fields will be cleared on Pods restored into the target cluster.' reason: NodeSelectorsDetected status: "True" type: NamespacesHaveNodeSelectors - category: Required lastTransitionTime: "2021-04-20T13:10:17Z" message: The `persistentVolumes` list has been updated with discovered PVs. reason: Done status: "True" type: PvsDiscovered - category: Required lastTransitionTime: "2021-04-20T13:10:17Z" message: The storage resources have been created. reason: Done status: "True" type: StorageEnsured - category: Required lastTransitionTime: "2021-04-20T13:10:17Z" message: The migration plan is ready. status: "True" type: Ready destStorageClasses: - accessModes: - ReadWriteOnce default: true name: managed-premium provisioner: kubernetes.io/azure-disk - accessModes: - ReadWriteOnce name: ocs-storage.noobaa.io provisioner: ocs-storage.noobaa.io/obc excludedResources: - imagetags - templateinstances - clusterserviceversions - packagemanifests - subscriptions - servicebrokers - servicebindings - serviceclasses - serviceinstances - serviceplans - operatorgroups - events observedDigest: 66ab5a5a9c668eeb5c66d380e2e386fa38c0ec830662dc84737b0e7db64ec831 srcStorageClasses: - accessModes: - ReadWriteOnce default: true name: managed-premium provisioner: kubernetes.io/azure-disk Actual results: Even the srcMigClusterRef equal an invalid value, the migplan still goes to ready without error Expected results: The migplan should throw out a error message Additional info: When change the value of destMigClusterRef to a non-existent cluster name, the migplan will show a error message like 'The `dstMigClusterRef` must reference a valid `migcluster`, subject: openshift-migration/foo.' $ oc get migplan mig-plan-00000-ocp-00000-nginx -o yaml apiVersion: migration.openshift.io/v1alpha1 kind: MigPlan .... spec: destMigClusterRef: name: foo namespace: openshift-migration ..... status: conditions: ..... - category: Critical lastTransitionTime: "2021-04-20T13:07:17Z" message: 'The `dstMigClusterRef` must reference a valid `migcluster`, subject: openshift-migration/foo.' reason: NotFound status: "True" type: InvalidDestinationClusterRef The another problem is once the destMigClusterRef was set to a invalid value, the migplan throw out a error, after changed the value back to the valid value, like `host`, the migplan will stay in error status forever. unless delete migplan and recreate it again.
I'm concerned about this one because it indicates that there's something bad with verification during reconciliation, but marking as low because the likelyhood of this being set to something bad is basically zero if the UI is being used. If the API is being used, it's usually in a similarly controlled and scripted manner.
the change srcMigClusterRef to a non-existent cluster caused controller pod crashed
There is a nil ptr dereference coming from: https://github.com/konveyor/mig-controller/blame/master/pkg/controller/migplan/validation.go#L299 This was not merged yet when I wrote/tested my PR. (https://github.com/konveyor/mig-controller/commit/bd4e0a0035ff2ad9ca0483dbdbcb87dea95da568) Testing a fix. Looking in other areas it should be trivial. E0901 07:37:37.779754 1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) goroutine 580 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic(0x230f860, 0x3b865b0) /remote-source/app/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0x95 k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0) /remote-source/app/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x86 panic(0x230f860, 0x3b865b0) /opt/rh/go-toolset-1.16/root/usr/lib/go-toolset-1.16-golang/src/runtime/panic.go:965 +0x1b9 github.com/konveyor/mig-controller/pkg/apis/migration/v1alpha1.(*MigCluster).BuildRestConfig(0x0, 0x2a4ab40, 0xc005910280, 0x25e3ce0, 0x8, 0xc005c702a0) /remote-source/app/pkg/apis/migration/v1alpha1/migcluster_types.go:423 +0x3a github.com/konveyor/mig-controller/pkg/apis/migration/v1alpha1.(*MigCluster).GetClient(0x0, 0x2a4ab40, 0xc005910280, 0xc005910280, 0x0, 0x0, 0xc000e2aa50) /remote-source/app/pkg/apis/migration/v1alpha1/migcluster_types.go:209 +0x5a github.com/konveyor/mig-controller/pkg/controller/migplan.ReconcileMigPlan.getPotentialFilePermissionConflictNamespaces(0x2a4a868, 0xc00047bcc0, 0x2a1fb70, 0xc000d54080, 0xc0003282a0, 0x0, 0x0, 0xc000bbb400, 0x3c7c9c0, 0x0, ...) /remote-source/app/pkg/controller/migplan/validation.go:299 +0x165 github.com/konveyor/mig-controller/pkg/controller/migplan.ReconcileMigPlan.validateNamespaces(0x2a4a868, 0xc00047bcc0, 0x2a1fb70, 0xc000d54080, 0xc0003282a0, 0x0, 0x0, 0x2a2a020, 0xc0012d6b40, 0xc000bbb400, ...) /remote-source/app/pkg/controller/migplan/validation.go:451 +0x389 github.com/konveyor/mig-controller/pkg/controller/migplan.ReconcileMigPlan.validate(0x2a4a868, 0xc00047bcc0, 0x2a1fb70, 0xc000d54080, 0xc0003282a0, 0x0, 0x0, 0x2a2a020, 0xc0012d6b40, 0xc000bbb400, ...) /remote-source/app/pkg/controller/migplan/validation.go:158 +0x39c github.com/konveyor/mig-controller/pkg/controller/migplan.(*ReconcileMigPlan).Reconcile(0xc000d540c0, 0x2a2a020, 0xc0012d6b40, 0xc000eba378, 0x13, 0xc0006b97a0, 0x4, 0xc0012d6b00, 0x0, 0x0, ...) /remote-source/app/pkg/controller/migplan/migplan_controller.go:261 +0x553 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0002fa500, 0x2a29f78, 0xc000784800, 0x23a4e20, 0xc000b2c240) /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:263 +0x30d sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0002fa500, 0x2a29f78, 0xc000784800, 0x0) /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235 +0x205 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1(0x2a29f78, 0xc000784800) /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:198 +0x4a k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1() /remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185 +0x37 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc000136f50) /remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x5f k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0025edf50, 0x29dbba0, 0xc0016099e0, 0xc000784801, 0xc00097a060) /remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0x9b k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000136f50, 0x3b9aca00, 0x0, 0xc000ba2d01, 0xc00097a060) /remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x98 k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext(0x2a29f78, 0xc000784800, 0xc000e2bdb0, 0x3b9aca00, 0x0, 0x273f601) /remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185 +0xa6 k8s.io/apimachinery/pkg/util/wait.UntilWithContext(0x2a29f78, 0xc000784800, 0xc000e2bdb0, 0x3b9aca00) /remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:99 +0x57 created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1 /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:195 +0x497 panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x118 pc=0x196c9da]
https://github.com/konveyor/mig-controller/pull/1184
https://github.com/konveyor/mig-controller/pull/1186 for release-1.6.0
verified with MTC 1.6.0 registry.redhat.io/rhmtc/openshift-migration-controller-rhel8@sha256:6a5624104074029dc4e83cacadd1fc48b1561c7e2f237bb032bb73c9a274b9b5
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Migration Toolkit for Containers (MTC) 1.6.0 security & bugfix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3694