Bug 1737878 - Failed to install app-migration component by mig-operator on ocp 3.7-3.10
Summary: Failed to install app-migration component by mig-operator on ocp 3.7-3.10
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Migration Tooling
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.2.0
Assignee: Derek Whatley
QA Contact: Zhang Cheng
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-08-06 10:17 UTC by Zihan Tang
Modified: 2019-10-16 06:35 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-16 06:34:52 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:35:01 UTC

Description Zihan Tang 2019-08-06 10:17:46 UTC
Description of problem:
when mig-operator and controller, it fails to create velero restic pod and mig sa.

Version-Release number of selected component (if applicable):
oc v3.9.91
kubernetes v1.9.1+a0ce1bc657

mig-operator image: quay.io/ocpmigrate/mig-operator@sha256:1df62f5ce345f56520a8d0b9795fa9bc55fcac9c04a029f6ddf4da638b055a32

How reproducible:
always

Steps to Reproduce:
1. set up ocp 3.9 env.
2. create mig-operator: 
# oc create -f https://raw.githubusercontent.com/fusor/mig-operator/master/operator.yml

create operator successfully.
[root@ip-172-18-20-152 ~]# oc get pod
NAME                                  READY     STATUS    RESTARTS   AGE
migration-operator-765479f849-vqkmx   2/2       Running   0          1m
# oc logs -f migration-operator-765479f849-vqkmx -c operator
{"level":"info","ts":1565085714.806442,"logger":"cmd","msg":"Go Version: go1.12.7"}
{"level":"info","ts":1565085714.806561,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1565085714.8065784,"logger":"cmd","msg":"Version of operator-sdk: v0.9.0"}
{"level":"info","ts":1565085714.806622,"logger":"cmd","msg":"Watching namespace.","Namespace":"mig"}
{"level":"info","ts":1565085714.9451883,"logger":"leader","msg":"Trying to become the leader."}
{"level":"info","ts":1565085715.0599945,"logger":"leader","msg":"No pre-existing lock was found."}
{"level":"info","ts":1565085715.065651,"logger":"leader","msg":"Became the leader."}
{"level":"info","ts":1565085715.1911242,"logger":"metrics","msg":"Metrics Service object created","Service.Name":"migration-operator-metrics","Service.Namespace":"mig"}
{"level":"info","ts":1565085715.1928809,"logger":"proxy","msg":"Starting to serve","Address":"127.0.0.1:8888"}
{"level":"info","ts":1565085715.1932511,"logger":"ansible-controller","msg":"Watching resource","Options.Group":"migration.openshift.io","Options.Version":"v1alpha1","Options.Kind":"MigrationController"}
{"level":"info","ts":1565085715.1935658,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"migrationcontroller-controller","source":"kind source: migration.openshift.io/v1alpha1, Kind=MigrationController"}
{"level":"info","ts":1565085715.2955878,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"migrationcontroller-controller"}
{"level":"info","ts":1565085715.3963768,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"migrationcontroller-controller","worker count":1}


3. create migrationcontroller CR to install mig-controller
# oc create -f controller.yml 
migrationcontroller "migration-controller" created
[root@ip-172-18-20-152 ~]# cat controller.yml 
apiVersion: migration.openshift.io/v1alpha1
kind: MigrationController
metadata:
  name: migration-controller
  namespace: mig
spec:
  cluster_name: host
  migration_velero: true
  migration_controller: false
  migration_ui: false
  #To install the controller on Openshift 3 you will need to configure the API endpoint:
  #mig_ui_cluster_api_endpoint: https://replace-with-openshift-cluster-hostname:8443/api


Actual results:
3. did not create velero and restic pod and mig sa.
# oc logs -f migration-operator-765479f849-vqkmx -c operator
....
{"level":"error","ts":1565085936.5246923,"logger":"reconciler","msg":"Unable to update the status to mark cr as running","job":"8399405010839947362","name":"migration-controller","namespace":"mig","error":"the server could not find the requested resource","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tpkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\ngithub.com/operator-framework/operator-sdk/pkg/ansible/controller.(*AnsibleOperatorReconciler).Reconcile\n\tsrc/github.com/operator-framework/operator-sdk/pkg/ansible/controller/reconcile.go:127\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tpkg/mod/sigs.k8s.io/controller-runtime@v0.1.10/pkg/internal/controller/controller.go:215\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\tpkg/mod/sigs.k8s.io/controller-runtime@v0.1.10/pkg/internal/controller/controller.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\tpkg/mod/k8s.io/apimachinery@v0.0.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\tpkg/mod/k8s.io/apimachinery@v0.0.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:134\nk8s.io/apimachinery/pkg/util/wait.Until\n\tpkg/mod/k8s.io/apimachinery@v0.0.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:88"}
{"level":"error","ts":1565085936.5248399,"logger":"kubebuilder.controller","msg":"Reconciler error","controller":"migrationcontroller-controller","request":"mig/migration-controller","error":"the server could not find the requested resource","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tpkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tpkg/mod/sigs.k8s.io/controller-runtime@v0.1.10/pkg/internal/controller/controller.go:217\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\tpkg/mod/sigs.k8s.io/controller-runtime@v0.1.10/pkg/internal/controller/controller.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\tpkg/mod/k8s.io/apimachinery@v0.0.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\tpkg/mod/k8s.io/apimachinery@v0.0.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:134\nk8s.io/apimachinery/pkg/util/wait.Until\n\tpkg/mod/k8s.io/apimachinery@v0.0.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:88"}
{"level":"error","ts":1565085937.536295,"logger":"reconciler","msg":"Unable to update the status to mark cr as running","job":"8200720172580174527","name":"migration-controller","namespace":"mig","error":"the server could not find the requested resource","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tpkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\ngithub.com/operator-framework/operator-sdk/pkg/ansible/controller.(*AnsibleOperatorReconciler).Reconcile\n\tsrc/github.com/operator-framework/operator-sdk/pkg/ansible/controller/reconcile.go:127\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tpkg/mod/sigs.k8s.io/controller-runtime@v0.1.10/pkg/internal/controller/controller.go:215\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\tpkg/mod/sigs.k8s.io/controller-runtime@v0.1.10/pkg/internal/controller/controller.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\tpkg/mod/k8s.io/apimachinery@v0.0.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\tpkg/mod/k8s.io/apimachinery@v0.0.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:134\nk8s.io/apimachinery/pkg/util/wait.Until\n\tpkg/mod/k8s.io/apimachinery@v0.0.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:88"}

# oc get sa
NAME                 SECRETS   AGE
builder              2         5m
default              2         5m
deployer             2         5m
migration-operator   2         5m


Expected results:
1. create velero and restic pod successfully.
2. create mig sa successfully.
3. no errors in operator pod.

Additional info:

Comment 2 Derek Whatley 2019-08-06 17:30:46 UTC
The error message "Unable to update the status to mark cr as running" shows up in an operator-sdk issue matching our scenario of attempting to run an ansible-operator on OpenShift 3.10: https://github.com/operator-framework/operator-sdk/issues/1124

After speaking with some operator-sdk maintainers (Shawn + Fabian), we determined that operator-sdk doesn't support 3.7-3.10, support starts at 3.11. This indicates we'll need a different means to deploy the app migration solution onto earlier versions of OpenShift.

Tests we performed showed that this error isn't related to recent changes in mig-operator. We tested with the 'htb1' image tag, a known good tag of the operator, and saw the same issue in operator logs.

Interestingly, we were able to run mig-operator successfully on Origin 3.10, but experienced this issue on OCP 3.10. The official word from operator-sdk devs is that ansible-operator + 3.10 _may_ work in some cases, but we shouldn't rely on it, and it definitely won't work on earlier versions of OpenShift.

Comment 3 Zihan Tang 2019-08-07 02:31:31 UTC
Ok, we can use opc3.11 to continue testing, and waiting for the doc ready for ocp3.7-3.10 to install mig-controller components.

Comment 4 Zihan Tang 2019-08-21 03:25:28 UTC
Hi Derek,
what's the status of this bug? We have tested migration on ocp3.10 and 3.11, and we're planning to test on ocp3.9 and 3.7 in this week, does the mig-operator work for ocp3.9 or ocp3.7 now?
If operator not work, can you provide other workarounds to install migratoin components on ocp3.7 or 3.9?

Comment 5 Zihan Tang 2019-08-21 06:16:15 UTC
I use the latest operator.yml https://raw.githubusercontent.com/fusor/mig-operator/master/operator.yml in ocp 3.10, 
It also failed to create operator pod.
# oc get pod -n openshift-migration-operator
NAME                                 READY     STATUS             RESTARTS   AGE
migration-operator-978959bfc-r4m5m   1/2       CrashLoopBackOff   7          12m

# oc get pod -o yaml | grep image
      image: quay.io/ocpmigrate/mig-operator:latest
      imagePullPolicy: Always
      image: quay.io/ocpmigrate/mig-operator:latest
      imagePullPolicy: Always
    imagePullSecrets:
      image: quay.io/ocpmigrate/mig-operator:latest
      imageID: docker-pullable://quay.io/ocpmigrate/mig-operator@sha256:e81b8ee3ae8572b6562e308ee7b7b74e86a6914f1468830cd1f5d8e0372f1888
      image: quay.io/ocpmigrate/mig-operator:latest
      imageID: docker-pullable://quay.io/ocpmigrate/mig-operator@sha256:e81b8ee3ae8572b6562e308ee7b7b74e86a6914f1468830cd1f5d8e0372f1888

# oc logs -f migration-operator-978959bfc-r4m5m -c operator
{"level":"info","ts":1566368029.972873,"logger":"cmd","msg":"Go Version: go1.12.7"}
{"level":"info","ts":1566368029.9729533,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1566368029.9729784,"logger":"cmd","msg":"Version of operator-sdk: v0.9.0"}
{"level":"info","ts":1566368029.973028,"logger":"cmd","msg":"Watching namespace.","Namespace":"openshift-migration-operator"}
{"level":"info","ts":1566368030.0616763,"logger":"leader","msg":"Trying to become the leader."}
{"level":"info","ts":1566368030.1370168,"logger":"leader","msg":"Found existing lock with my name. I was likely restarted."}
{"level":"info","ts":1566368030.1370509,"logger":"leader","msg":"Continuing as the leader."}
{"level":"error","ts":1566368030.2092516,"logger":"cmd","msg":"Exposing metrics port failed.","Namespace":"openshift-migration-operator","error":"failed to initialize service object for metrics: replicasets.extensions \"migration-operator-978959bfc\" is forbidden: User \"system:serviceaccount:openshift-migration-operator:migration-operator\" cannot get replicasets.extensions in the namespace \"openshift-migration-operator\": User \"system:serviceaccount:openshift-migration-operator:migration-operator\" cannot get replicasets.extensions in project \"openshift-migration-operator\"","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tpkg/mod/github.com/go-logr/zapr@v0.1.1/zapr.go:128\ngithub.com/operator-framework/operator-sdk/pkg/ansible.Run\n\tsrc/github.com/operator-framework/operator-sdk/pkg/ansible/run.go:103\ngithub.com/operator-framework/operator-sdk/cmd/operator-sdk/run.newRunAnsibleCmd.func1\n\tsrc/github.com/operator-framework/operator-sdk/cmd/operator-sdk/run/ansible.go:38\ngithub.com/spf13/cobra.(*Command).execute\n\tpkg/mod/github.com/spf13/cobra@v0.0.3/command.go:762\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tpkg/mod/github.com/spf13/cobra@v0.0.3/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\tpkg/mod/github.com/spf13/cobra@v0.0.3/command.go:800\nmain.main\n\tsrc/github.com/operator-framework/operator-sdk/cmd/operator-sdk/main.go:85\nruntime.main\n\t/home/travis/.gimme/versions/go1.12.7.linux.amd64/src/runtime/proc.go:200"}
Error: failed to initialize service object for metrics: replicasets.extensions "migration-operator-978959bfc" is forbidden: User "system:serviceaccount:openshift-migration-operator:migration-operator" cannot get replicasets.extensions in the namespace "openshift-migration-operator": User "system:serviceaccount:openshift-migration-operator:migration-operator" cannot get replicasets.extensions in project "openshift-migration-operator"
Usage:
  operator-sdk run ansible [flags]

Flags:
  -h, --help                        help for ansible
      --inject-owner-ref            The ansible operator will inject owner references unless this flag is false (default true)
      --max-workers int             Maximum number of workers to use. Overridden by environment variable. (default 1)
      --reconcile-period duration   Default reconcile period for controllers (default 1m0s)
      --watches-file string         Path to the watches file to use (default "./watches.yaml")
      --zap-devel                   Enable zap development mode (changes defaults to console encoder, debug log level, and disables sampling)
      --zap-encoder encoder         Zap log encoding ('json' or 'console')
      --zap-level level             Zap log level (one of 'debug', 'info', 'error' or any integer value > 0) (default info)
      --zap-sample sample           Enable zap log sampling. Sampling will be disabled for integer log levels > 1

Global Flags:
      --verbose   Enable verbose logging

Comment 6 Jason Montleon 2019-08-21 12:19:25 UTC
Make sure you delete the previous migration-operator clusterrolebinding.

I deleted the mig namespace, and did an oc create -f operator.yml and saw the following two messages:
Error from server (AlreadyExists): error when creating "operator.yml": customresourcedefinitions.apiextensions.k8s.io "migrationcontrollers.migration.openshift.io" already exists
Error from server (AlreadyExists): error when creating "operator.yml": clusterrolebindings.rbac.authorization.k8s.io "migration-operator" already exists

Of course, the old version of the clusterrolebinding references the systemaccount from the old namespace and so doesn't provide the serviceaccount from the new namespace the appropriate permissions, and I saw the same error.

I deleted the clusterrolebinding and namespace, tried again, and this time things came up properly.

Comment 7 Zihan Tang 2019-08-23 02:44:53 UTC
Thanks, operator now works for OCP3.9 and OCP3.10

Comment 8 Derek Whatley 2019-08-27 19:32:14 UTC
Hey Zihan,

This BZ should have been resolved by https://github.com/fusor/mig-operator/pull/34. Could you please confirm that you're able to deploy to 3.7-3.11, and also 4.x successfully now?

Comment 10 Zihan Tang 2019-08-28 05:51:29 UTC
Verified.
Migration Tool now can install on ocp 3.7, 3.9,  3.10, 3.11 and 4.2 using https://raw.githubusercontent.com/fusor/mig-operator/master/operator.yml

Comment 12 errata-xmlrpc 2019-10-16 06:34:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.