Bug 1951869 - MigPlan custom resource does not detect invalid source cluster reference
Summary: MigPlan custom resource does not detect invalid source cluster reference
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Migration Toolkit for Containers
Classification: Red Hat
Component: General
Version: 1.4.3
Hardware: Unspecified
OS: Unspecified
medium
low
Target Milestone: ---
: 1.6.0
Assignee: Jason Montleon
QA Contact: Xin jiang
Avital Pinnick
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-21 02:48 UTC by whu
Modified: 2021-09-29 14:34 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-09-29 14:34:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github konveyor mig-controller pull 1177 0 None None None 2021-08-24 13:42:03 UTC
Github konveyor mig-controller pull 1184 0 None None None 2021-09-01 13:10:19 UTC
Github konveyor mig-controller pull 1186 0 None None None 2021-09-01 13:40:28 UTC
Red Hat Product Errata RHSA-2021:3694 0 None None None 2021-09-29 14:34:55 UTC

Description whu 2021-04-21 02:48:40 UTC
Description of problem:
change the value of srcMigClusterRef to a non-existent cluster name, the migplan still shows "The migration plan is ready" without error.  but when change the value of destMigClusterRef to a non-existent cluster, the migplan will show a error message like 'The `dstMigClusterRef` must reference a valid `migcluster`, subject: openshift-migration/foo.'


Version-Release number of selected component (if applicable):
MTC 1.4.3
image: registry.redhat.io/rhmtc/openshift-migration-rhel7-operator@sha256:a0876b81eb32b03a805f84e6ebca656dc2704b69f1a169d695f8a48d861dba45


How reproducible:
Always

Steps to Reproduce:
1. Create a common migplan for an application as normal

2. edit migplan CR to change the value of srcMigClusterRef.name to a non-existent cluster name, such like "foo"
$ oc get migcluster
NAME             READY   URL                                                                HOST    AGE
host             True                                                                       true    4h2m
source-cluster   True    https://api.cam-src-17360.qe.azure.devcluster.openshift.com:6443   false   3h23m


$ oc edit migplan mig-plan-00000-ocp-00000-nginx
.....
  srcMigClusterRef:
    name: foo
    namespace: openshift-migration


3. check migplan status
$ oc get migplan mig-plan-00000-ocp-00000-nginx -o yaml
apiVersion: migration.openshift.io/v1alpha1
.....
spec:
  destMigClusterRef:
    name: host
    namespace: openshift-migration
  .....
  srcMigClusterRef:
    name: foo
    namespace: openshift-migration
status:
  conditions:
  - category: Warn
    lastTransitionTime: "2021-04-20T13:10:39Z"
    message: 'Found Pods with `Spec.NodeSelector` or `Spec.NodeName` set in namespaces: [ocp-00000-nginx]. These fields will be cleared on Pods restored into the target cluster.'
    reason: NodeSelectorsDetected
    status: "True"
    type: NamespacesHaveNodeSelectors
  - category: Required
    lastTransitionTime: "2021-04-20T13:10:17Z"
    message: The `persistentVolumes` list has been updated with discovered PVs.
    reason: Done
    status: "True"
    type: PvsDiscovered
  - category: Required
    lastTransitionTime: "2021-04-20T13:10:17Z"
    message: The storage resources have been created.
    reason: Done
    status: "True"
    type: StorageEnsured
  - category: Required
    lastTransitionTime: "2021-04-20T13:10:17Z"
    message: The migration plan is ready.
    status: "True"
    type: Ready
  destStorageClasses:
  - accessModes:
    - ReadWriteOnce
    default: true
    name: managed-premium
    provisioner: kubernetes.io/azure-disk
  - accessModes:
    - ReadWriteOnce
    name: ocs-storage.noobaa.io
    provisioner: ocs-storage.noobaa.io/obc
  excludedResources:
  - imagetags
  - templateinstances
  - clusterserviceversions
  - packagemanifests
  - subscriptions
  - servicebrokers
  - servicebindings
  - serviceclasses
  - serviceinstances
  - serviceplans
  - operatorgroups
  - events
  observedDigest: 66ab5a5a9c668eeb5c66d380e2e386fa38c0ec830662dc84737b0e7db64ec831
  srcStorageClasses:
  - accessModes:
    - ReadWriteOnce
    default: true
    name: managed-premium
    provisioner: kubernetes.io/azure-disk

Actual results:
Even the srcMigClusterRef equal an invalid value, the migplan still goes to ready without error

Expected results:
The migplan should throw out a error message


Additional info:
When change the value of destMigClusterRef to a non-existent cluster name, the migplan will show a error message like 'The `dstMigClusterRef` must reference a valid `migcluster`, subject: openshift-migration/foo.'

$ oc get migplan mig-plan-00000-ocp-00000-nginx -o yaml
apiVersion: migration.openshift.io/v1alpha1
kind: MigPlan
....
spec:
  destMigClusterRef:
    name: foo
    namespace: openshift-migration
  .....
status:
  conditions:
  .....
  - category: Critical
    lastTransitionTime: "2021-04-20T13:07:17Z"
    message: 'The `dstMigClusterRef` must reference a valid `migcluster`, subject: openshift-migration/foo.'
    reason: NotFound
    status: "True"
    type: InvalidDestinationClusterRef

The another problem is once the destMigClusterRef was set to a invalid value, the migplan throw out a error, after changed the value back to the valid value, like `host`, the migplan will stay in error status forever. unless delete migplan and recreate it again.

Comment 1 Erik Nelson 2021-06-15 01:38:52 UTC
I'm concerned about this one because it indicates that there's something bad with verification during reconciliation, but marking as low because the likelyhood of this being set to something bad is basically zero if the UI is being used. If the API is being used, it's usually in a similarly controlled and scripted manner.

Comment 7 Xin jiang 2021-09-01 07:57:34 UTC
the change srcMigClusterRef to a non-existent cluster caused controller pod crashed

Comment 8 Jason Montleon 2021-09-01 08:09:32 UTC
There is a nil ptr dereference coming from:
https://github.com/konveyor/mig-controller/blame/master/pkg/controller/migplan/validation.go#L299

This was not merged yet when I wrote/tested my PR. (https://github.com/konveyor/mig-controller/commit/bd4e0a0035ff2ad9ca0483dbdbcb87dea95da568)

Testing a fix. Looking in other areas it should be trivial.

E0901 07:37:37.779754       1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 580 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x230f860, 0x3b865b0)
	/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0x95
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x86
panic(0x230f860, 0x3b865b0)
	/opt/rh/go-toolset-1.16/root/usr/lib/go-toolset-1.16-golang/src/runtime/panic.go:965 +0x1b9
github.com/konveyor/mig-controller/pkg/apis/migration/v1alpha1.(*MigCluster).BuildRestConfig(0x0, 0x2a4ab40, 0xc005910280, 0x25e3ce0, 0x8, 0xc005c702a0)
	/remote-source/app/pkg/apis/migration/v1alpha1/migcluster_types.go:423 +0x3a
github.com/konveyor/mig-controller/pkg/apis/migration/v1alpha1.(*MigCluster).GetClient(0x0, 0x2a4ab40, 0xc005910280, 0xc005910280, 0x0, 0x0, 0xc000e2aa50)
	/remote-source/app/pkg/apis/migration/v1alpha1/migcluster_types.go:209 +0x5a
github.com/konveyor/mig-controller/pkg/controller/migplan.ReconcileMigPlan.getPotentialFilePermissionConflictNamespaces(0x2a4a868, 0xc00047bcc0, 0x2a1fb70, 0xc000d54080, 0xc0003282a0, 0x0, 0x0, 0xc000bbb400, 0x3c7c9c0, 0x0, ...)
	/remote-source/app/pkg/controller/migplan/validation.go:299 +0x165
github.com/konveyor/mig-controller/pkg/controller/migplan.ReconcileMigPlan.validateNamespaces(0x2a4a868, 0xc00047bcc0, 0x2a1fb70, 0xc000d54080, 0xc0003282a0, 0x0, 0x0, 0x2a2a020, 0xc0012d6b40, 0xc000bbb400, ...)
	/remote-source/app/pkg/controller/migplan/validation.go:451 +0x389
github.com/konveyor/mig-controller/pkg/controller/migplan.ReconcileMigPlan.validate(0x2a4a868, 0xc00047bcc0, 0x2a1fb70, 0xc000d54080, 0xc0003282a0, 0x0, 0x0, 0x2a2a020, 0xc0012d6b40, 0xc000bbb400, ...)
	/remote-source/app/pkg/controller/migplan/validation.go:158 +0x39c
github.com/konveyor/mig-controller/pkg/controller/migplan.(*ReconcileMigPlan).Reconcile(0xc000d540c0, 0x2a2a020, 0xc0012d6b40, 0xc000eba378, 0x13, 0xc0006b97a0, 0x4, 0xc0012d6b00, 0x0, 0x0, ...)
	/remote-source/app/pkg/controller/migplan/migplan_controller.go:261 +0x553
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0002fa500, 0x2a29f78, 0xc000784800, 0x23a4e20, 0xc000b2c240)
	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:263 +0x30d
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0002fa500, 0x2a29f78, 0xc000784800, 0x0)
	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:235 +0x205
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1(0x2a29f78, 0xc000784800)
	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:198 +0x4a
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1()
	/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185 +0x37
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc000136f50)
	/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x5f
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0025edf50, 0x29dbba0, 0xc0016099e0, 0xc000784801, 0xc00097a060)
	/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0x9b
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000136f50, 0x3b9aca00, 0x0, 0xc000ba2d01, 0xc00097a060)
	/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x98
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext(0x2a29f78, 0xc000784800, 0xc000e2bdb0, 0x3b9aca00, 0x0, 0x273f601)
	/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185 +0xa6
k8s.io/apimachinery/pkg/util/wait.UntilWithContext(0x2a29f78, 0xc000784800, 0xc000e2bdb0, 0x3b9aca00)
	/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:99 +0x57
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1
	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:195 +0x497
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x118 pc=0x196c9da]

Comment 9 Jason Montleon 2021-09-01 08:19:50 UTC
https://github.com/konveyor/mig-controller/pull/1184

Comment 10 Jason Montleon 2021-09-01 13:49:18 UTC
https://github.com/konveyor/mig-controller/pull/1186 for release-1.6.0

Comment 13 Xin jiang 2021-09-06 13:56:29 UTC
verified with MTC 1.6.0
registry.redhat.io/rhmtc/openshift-migration-controller-rhel8@sha256:6a5624104074029dc4e83cacadd1fc48b1561c7e2f237bb032bb73c9a274b9b5

Comment 15 errata-xmlrpc 2021-09-29 14:34:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Migration Toolkit for Containers (MTC) 1.6.0 security & bugfix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3694


Note You need to log in before you can comment on or make changes to this bug.