1741281 – The controller fails if the service account token of the source cluster changes.

Bug 1741281 - The controller fails if the service account token of the source cluster changes.

Summary: The controller fails if the service account token of the source cluster changes.

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Migration Toolkit for Containers
Classification:	Red Hat
Component:	General
Sub Component:
Version:	1.3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	1.5.0
Assignee:	John Matthews
QA Contact:	Xin jiang
Docs Contact:	Avital Pinnick
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-08-14 16:33 UTC by Sergio
Modified:	2024-03-25 15:22 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-04-07 20:47:02 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Sergio 2019-08-14 16:33:48 UTC

Description of problem:


Version-Release number of selected component (if applicable):

OCP4
$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0     True        False         5h1m    Cluster version is 4.1.0


OCP3
$ oc version
oc v3.11.126
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://
openshift v3.11.104
kubernetes v1.11.0+d4cacc0

velero
    image: quay.io/ocpmigrate/velero:fusor-dev
    imageID: quay.io/ocpmigrate/velero@sha256:e4e19be179221bf8a298cb7282f5890099633194dbc0c698c813e07b40b29302
    image: quay.io/ocpmigrate/migration-plugin:latest
    imageID: quay.io/ocpmigrate/migration-plugin@sha256:d34af290b3c6d808ad360a1f2d41d91e06bff5aa912f9a5a78fed3ea2f0f8f71

controller
    image: quay.io/ocpmigrate/mig-controller:latest
    imageID: quay.io/ocpmigrate/mig-controller@sha256:24e1dad428ca878d4b19f73148f485785c96a91d9aa9f738e7ee1b4b40726682



How reproducible:


Steps to Reproduce:
1. Set a normal environment with 1 ocp4 target cluster and 1 ocp3 source cluster. 
2. Verify that you can see in the UI both clusters and both are online
3. Uninstall mig operator and controller from ocp3 source cluster.
4. Install mig operator and controller in ocp3 source cluster. This will change the SA token of this cluster
5. In the UI update the token, so that the cluster is online again

Actual results:
1. The source cluster is online again in the UI
2. There is an error in the controller's logs:
$ oc logs $(oc get pod -l control-plane=controller-manager -o NAME)

E0814 16:06:10.902643       1 reflector.go:134] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: Failed to list *v1.Secret: Unauthorized
E0814 16:06:10.964137       1 reflector.go:134] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: Failed to list *v1.PersistentVolumeClaim: Unauthorized
E0814 16:06:10.989906       1 reflector.go:134] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: Failed to list *v1.StorageClass: Unauthorized
E0814 16:06:10.993861       1 reflector.go:134] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: Failed to list *v1.PersistentVolume: Unauthorized
E0814 16:06:10.995861       1 reflector.go:134] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:126: Failed to list *v1.BackupStorageLocation: Unauthorized





Expected results:

The controller should be aware of the change in the token.


Additional info:

Once we have the failure in the logs, after deleting the controller pod, the controller realizes of the new token and works fine again:

$ oc delete pod $(oc get pod -l control-plane=controller-manager -o jsonpath='{.items[].metadata.name}')

Comment 1 Derek Whatley 2019-08-27 20:34:21 UTC

This is happening because the remote watch system will only start a new remote watch if one isn't running for a MigCluster. The check simply looks at whether a remotewatch was started for a particular MigCluster ns/name.  Changing the SA token doesn't change this ns/name so the old remote watch remains with a stale SA token.

https://github.com/fusor/mig-controller/blob/master/pkg/remote/watch.go#L69
https://github.com/fusor/mig-controller/blob/master/pkg/controller/migcluster/migcluster_controller.go#L204-L207

We are missing support for stopping remote watches currently, which is required to handling changes to SA tokens. 

https://github.com/fusor/mig-controller/blob/master/pkg/remote/watch.go#L51

@Jeff, we should do some thinking on what kinds of situations should lead to shutdown / restart of a remote watch.

Comment 2 Derek Whatley 2021-03-31 14:29:53 UTC

Possibly related to https://bugzilla.redhat.com/show_bug.cgi?id=1945251, although in 1945251 I didn't see recovery happen at all.

Comment 3 Erik Nelson 2021-04-07 20:47:02 UTC

Closing as stale, please re-open if the issue persists with the current release.

Note You need to log in before you can comment on or make changes to this bug.