Description of problem: when mig-operator and controller, it fails to create velero restic pod and mig sa. Version-Release number of selected component (if applicable): oc v3.9.91 kubernetes v1.9.1+a0ce1bc657 mig-operator image: quay.io/ocpmigrate/mig-operator@sha256:1df62f5ce345f56520a8d0b9795fa9bc55fcac9c04a029f6ddf4da638b055a32 How reproducible: always Steps to Reproduce: 1. set up ocp 3.9 env. 2. create mig-operator: # oc create -f https://raw.githubusercontent.com/fusor/mig-operator/master/operator.yml create operator successfully. [root@ip-172-18-20-152 ~]# oc get pod NAME READY STATUS RESTARTS AGE migration-operator-765479f849-vqkmx 2/2 Running 0 1m # oc logs -f migration-operator-765479f849-vqkmx -c operator {"level":"info","ts":1565085714.806442,"logger":"cmd","msg":"Go Version: go1.12.7"} {"level":"info","ts":1565085714.806561,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"} {"level":"info","ts":1565085714.8065784,"logger":"cmd","msg":"Version of operator-sdk: v0.9.0"} {"level":"info","ts":1565085714.806622,"logger":"cmd","msg":"Watching namespace.","Namespace":"mig"} {"level":"info","ts":1565085714.9451883,"logger":"leader","msg":"Trying to become the leader."} {"level":"info","ts":1565085715.0599945,"logger":"leader","msg":"No pre-existing lock was found."} {"level":"info","ts":1565085715.065651,"logger":"leader","msg":"Became the leader."} {"level":"info","ts":1565085715.1911242,"logger":"metrics","msg":"Metrics Service object created","Service.Name":"migration-operator-metrics","Service.Namespace":"mig"} {"level":"info","ts":1565085715.1928809,"logger":"proxy","msg":"Starting to serve","Address":"127.0.0.1:8888"} {"level":"info","ts":1565085715.1932511,"logger":"ansible-controller","msg":"Watching resource","Options.Group":"migration.openshift.io","Options.Version":"v1alpha1","Options.Kind":"MigrationController"} {"level":"info","ts":1565085715.1935658,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"migrationcontroller-controller","source":"kind source: migration.openshift.io/v1alpha1, Kind=MigrationController"} {"level":"info","ts":1565085715.2955878,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"migrationcontroller-controller"} {"level":"info","ts":1565085715.3963768,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"migrationcontroller-controller","worker count":1} 3. create migrationcontroller CR to install mig-controller # oc create -f controller.yml migrationcontroller "migration-controller" created [root@ip-172-18-20-152 ~]# cat controller.yml apiVersion: migration.openshift.io/v1alpha1 kind: MigrationController metadata: name: migration-controller namespace: mig spec: cluster_name: host migration_velero: true migration_controller: false migration_ui: false #To install the controller on Openshift 3 you will need to configure the API endpoint: #mig_ui_cluster_api_endpoint: https://replace-with-openshift-cluster-hostname:8443/api Actual results: 3. did not create velero and restic pod and mig sa. # oc logs -f migration-operator-765479f849-vqkmx -c operator .... {"level":"error","ts":1565085936.5246923,"logger":"reconciler","msg":"Unable to update the status to mark cr as running","job":"8399405010839947362","name":"migration-controller","namespace":"mig","error":"the server could not find the requested resource","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tpkg/mod/github.com/go-logr/zapr.1/zapr.go:128\ngithub.com/operator-framework/operator-sdk/pkg/ansible/controller.(*AnsibleOperatorReconciler).Reconcile\n\tsrc/github.com/operator-framework/operator-sdk/pkg/ansible/controller/reconcile.go:127\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tpkg/mod/sigs.k8s.io/controller-runtime.10/pkg/internal/controller/controller.go:215\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\tpkg/mod/sigs.k8s.io/controller-runtime.10/pkg/internal/controller/controller.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\tpkg/mod/k8s.io/apimachinery.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\tpkg/mod/k8s.io/apimachinery.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:134\nk8s.io/apimachinery/pkg/util/wait.Until\n\tpkg/mod/k8s.io/apimachinery.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:88"} {"level":"error","ts":1565085936.5248399,"logger":"kubebuilder.controller","msg":"Reconciler error","controller":"migrationcontroller-controller","request":"mig/migration-controller","error":"the server could not find the requested resource","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tpkg/mod/github.com/go-logr/zapr.1/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tpkg/mod/sigs.k8s.io/controller-runtime.10/pkg/internal/controller/controller.go:217\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\tpkg/mod/sigs.k8s.io/controller-runtime.10/pkg/internal/controller/controller.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\tpkg/mod/k8s.io/apimachinery.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\tpkg/mod/k8s.io/apimachinery.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:134\nk8s.io/apimachinery/pkg/util/wait.Until\n\tpkg/mod/k8s.io/apimachinery.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:88"} {"level":"error","ts":1565085937.536295,"logger":"reconciler","msg":"Unable to update the status to mark cr as running","job":"8200720172580174527","name":"migration-controller","namespace":"mig","error":"the server could not find the requested resource","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tpkg/mod/github.com/go-logr/zapr.1/zapr.go:128\ngithub.com/operator-framework/operator-sdk/pkg/ansible/controller.(*AnsibleOperatorReconciler).Reconcile\n\tsrc/github.com/operator-framework/operator-sdk/pkg/ansible/controller/reconcile.go:127\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tpkg/mod/sigs.k8s.io/controller-runtime.10/pkg/internal/controller/controller.go:215\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\tpkg/mod/sigs.k8s.io/controller-runtime.10/pkg/internal/controller/controller.go:158\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\tpkg/mod/k8s.io/apimachinery.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\tpkg/mod/k8s.io/apimachinery.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:134\nk8s.io/apimachinery/pkg/util/wait.Until\n\tpkg/mod/k8s.io/apimachinery.0-20190221213512-86fb29eff628/pkg/util/wait/wait.go:88"} # oc get sa NAME SECRETS AGE builder 2 5m default 2 5m deployer 2 5m migration-operator 2 5m Expected results: 1. create velero and restic pod successfully. 2. create mig sa successfully. 3. no errors in operator pod. Additional info:
The error message "Unable to update the status to mark cr as running" shows up in an operator-sdk issue matching our scenario of attempting to run an ansible-operator on OpenShift 3.10: https://github.com/operator-framework/operator-sdk/issues/1124 After speaking with some operator-sdk maintainers (Shawn + Fabian), we determined that operator-sdk doesn't support 3.7-3.10, support starts at 3.11. This indicates we'll need a different means to deploy the app migration solution onto earlier versions of OpenShift. Tests we performed showed that this error isn't related to recent changes in mig-operator. We tested with the 'htb1' image tag, a known good tag of the operator, and saw the same issue in operator logs. Interestingly, we were able to run mig-operator successfully on Origin 3.10, but experienced this issue on OCP 3.10. The official word from operator-sdk devs is that ansible-operator + 3.10 _may_ work in some cases, but we shouldn't rely on it, and it definitely won't work on earlier versions of OpenShift.
Ok, we can use opc3.11 to continue testing, and waiting for the doc ready for ocp3.7-3.10 to install mig-controller components.
Hi Derek, what's the status of this bug? We have tested migration on ocp3.10 and 3.11, and we're planning to test on ocp3.9 and 3.7 in this week, does the mig-operator work for ocp3.9 or ocp3.7 now? If operator not work, can you provide other workarounds to install migratoin components on ocp3.7 or 3.9?
I use the latest operator.yml https://raw.githubusercontent.com/fusor/mig-operator/master/operator.yml in ocp 3.10, It also failed to create operator pod. # oc get pod -n openshift-migration-operator NAME READY STATUS RESTARTS AGE migration-operator-978959bfc-r4m5m 1/2 CrashLoopBackOff 7 12m # oc get pod -o yaml | grep image image: quay.io/ocpmigrate/mig-operator:latest imagePullPolicy: Always image: quay.io/ocpmigrate/mig-operator:latest imagePullPolicy: Always imagePullSecrets: image: quay.io/ocpmigrate/mig-operator:latest imageID: docker-pullable://quay.io/ocpmigrate/mig-operator@sha256:e81b8ee3ae8572b6562e308ee7b7b74e86a6914f1468830cd1f5d8e0372f1888 image: quay.io/ocpmigrate/mig-operator:latest imageID: docker-pullable://quay.io/ocpmigrate/mig-operator@sha256:e81b8ee3ae8572b6562e308ee7b7b74e86a6914f1468830cd1f5d8e0372f1888 # oc logs -f migration-operator-978959bfc-r4m5m -c operator {"level":"info","ts":1566368029.972873,"logger":"cmd","msg":"Go Version: go1.12.7"} {"level":"info","ts":1566368029.9729533,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"} {"level":"info","ts":1566368029.9729784,"logger":"cmd","msg":"Version of operator-sdk: v0.9.0"} {"level":"info","ts":1566368029.973028,"logger":"cmd","msg":"Watching namespace.","Namespace":"openshift-migration-operator"} {"level":"info","ts":1566368030.0616763,"logger":"leader","msg":"Trying to become the leader."} {"level":"info","ts":1566368030.1370168,"logger":"leader","msg":"Found existing lock with my name. I was likely restarted."} {"level":"info","ts":1566368030.1370509,"logger":"leader","msg":"Continuing as the leader."} {"level":"error","ts":1566368030.2092516,"logger":"cmd","msg":"Exposing metrics port failed.","Namespace":"openshift-migration-operator","error":"failed to initialize service object for metrics: replicasets.extensions \"migration-operator-978959bfc\" is forbidden: User \"system:serviceaccount:openshift-migration-operator:migration-operator\" cannot get replicasets.extensions in the namespace \"openshift-migration-operator\": User \"system:serviceaccount:openshift-migration-operator:migration-operator\" cannot get replicasets.extensions in project \"openshift-migration-operator\"","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tpkg/mod/github.com/go-logr/zapr.1/zapr.go:128\ngithub.com/operator-framework/operator-sdk/pkg/ansible.Run\n\tsrc/github.com/operator-framework/operator-sdk/pkg/ansible/run.go:103\ngithub.com/operator-framework/operator-sdk/cmd/operator-sdk/run.newRunAnsibleCmd.func1\n\tsrc/github.com/operator-framework/operator-sdk/cmd/operator-sdk/run/ansible.go:38\ngithub.com/spf13/cobra.(*Command).execute\n\tpkg/mod/github.com/spf13/cobra.3/command.go:762\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tpkg/mod/github.com/spf13/cobra.3/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\tpkg/mod/github.com/spf13/cobra.3/command.go:800\nmain.main\n\tsrc/github.com/operator-framework/operator-sdk/cmd/operator-sdk/main.go:85\nruntime.main\n\t/home/travis/.gimme/versions/go1.12.7.linux.amd64/src/runtime/proc.go:200"} Error: failed to initialize service object for metrics: replicasets.extensions "migration-operator-978959bfc" is forbidden: User "system:serviceaccount:openshift-migration-operator:migration-operator" cannot get replicasets.extensions in the namespace "openshift-migration-operator": User "system:serviceaccount:openshift-migration-operator:migration-operator" cannot get replicasets.extensions in project "openshift-migration-operator" Usage: operator-sdk run ansible [flags] Flags: -h, --help help for ansible --inject-owner-ref The ansible operator will inject owner references unless this flag is false (default true) --max-workers int Maximum number of workers to use. Overridden by environment variable. (default 1) --reconcile-period duration Default reconcile period for controllers (default 1m0s) --watches-file string Path to the watches file to use (default "./watches.yaml") --zap-devel Enable zap development mode (changes defaults to console encoder, debug log level, and disables sampling) --zap-encoder encoder Zap log encoding ('json' or 'console') --zap-level level Zap log level (one of 'debug', 'info', 'error' or any integer value > 0) (default info) --zap-sample sample Enable zap log sampling. Sampling will be disabled for integer log levels > 1 Global Flags: --verbose Enable verbose logging
Make sure you delete the previous migration-operator clusterrolebinding. I deleted the mig namespace, and did an oc create -f operator.yml and saw the following two messages: Error from server (AlreadyExists): error when creating "operator.yml": customresourcedefinitions.apiextensions.k8s.io "migrationcontrollers.migration.openshift.io" already exists Error from server (AlreadyExists): error when creating "operator.yml": clusterrolebindings.rbac.authorization.k8s.io "migration-operator" already exists Of course, the old version of the clusterrolebinding references the systemaccount from the old namespace and so doesn't provide the serviceaccount from the new namespace the appropriate permissions, and I saw the same error. I deleted the clusterrolebinding and namespace, tried again, and this time things came up properly.
Thanks, operator now works for OCP3.9 and OCP3.10
Hey Zihan, This BZ should have been resolved by https://github.com/fusor/mig-operator/pull/34. Could you please confirm that you're able to deploy to 3.7-3.11, and also 4.x successfully now?
Verified. Migration Tool now can install on ocp 3.7, 3.9, 3.10, 3.11 and 4.2 using https://raw.githubusercontent.com/fusor/mig-operator/master/operator.yml
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922