+++ This bug was initially created as a clone of Bug #1866867 +++ Description of problem: When we upgrade CAM 1.2.0 to CAM 1.2.4 in an OCP4.5 cluster, the migration-operator pod reports a CrashLoopBackOff status. After deleting the pod, the upgrade finishes correctly. Version-Release number of selected component (if applicable): CAM 1.2.4 OCP 4.5 How reproducible: Always Steps to Reproduce: 1. Install 1.2.0 CAM in an OCP4.5 cluster 2. Upgrade it to CAM 1.2.4 Actual results: After the upgrade this is the migration-operator pod status is CrashLoopBackOff. NAME READY STATUS RESTARTS AGE migration-controller-7d68b9f9cd-x9q8j 2/2 Running 0 12m migration-operator-547d96f4d4-wxkxq 1/2 CrashLoopBackOff 6 7m6s migration-ui-d998d7bb9-862vl 1/1 Running 0 12m restic-6nphn 1/1 Running 0 13m restic-h7c5x 1/1 Running 0 13m restic-zcbk7 1/1 Running 0 13m velero-8446f669f-7kq7m 1/1 Running 0 13m If we have a look at the pod's yaml, the service account token secret mounted is the old one, and not the new one. $ oc get pods migration-operator-547d96f4d4-wxkxq -o yaml | grep -A1 serviceaccount - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: migration-operator-token-pxthl -- - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: migration-operator-token-pxthl THESE WERE THE SECRETS AFTER THE UPGRADE: $ oc get secret | grep migration-oper migration-operator-dockercfg-kfnxn kubernetes.io/dockercfg 1 39s migration-operator-token-fxtv4 kubernetes.io/service-account-token 4 39s migration-operator-token-pgp5p kubernetes.io/service-account-token 4 39s THESE WERE THE SECRETS BEFORE THE UPGRADE: $ oc get secret | grep migration-oper migration-operator-dockercfg-s9xkz kubernetes.io/dockercfg 1 4m2s migration-operator-token-22xwv kubernetes.io/service-account-token 4 4m2s migration-operator-token-pxthl kubernetes.io/service-account-token 4 4m2s As we can see, the new pod is using the old secrets instead of the new ones. Expected results: The new pod should be in Running status and all pods should be updated with the new versions. Additional info: If we delete the crashed pod, the pod is recreated and the upgrade is executed properly. It only happened in OCP4.5 cluster, in 4.2 was not happening for instance. I'm not sure about other versions. We can see this in the namespaces events: 16s Warning FailedMount pod/migration-operator-547d96f4d4-wxkxq MountVolume.SetUp failed for volume "migration-operator-token-pxthl" : secret "migration-operator-token-pxthl" not found This is the log inside the crashed operator: Setting up watches. Beware: since -r was given, this may take a while! Watches established. (python2_virtual_env) [fedora@preserve-appmigration-workmachine ~]$ oc logs migration-operator-6bc6c59b84-mjrhd -c operator {"level":"info","ts":1596722990.8317368,"logger":"cmd","msg":"Go Version: go1.13.4"} {"level":"info","ts":1596722990.8317828,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"} {"level":"info","ts":1596722990.8317914,"logger":"cmd","msg":"Version of operator-sdk: v0.12.0+git"} {"level":"info","ts":1596722990.8318172,"logger":"cmd","msg":"Watching namespace.","Namespace":"openshift-migration"} {"level":"error","ts":1596722990.8651814,"logger":"controller-runtime.manager","msg":"Failed to get API Group-Resources","error":"Unauthorized","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tsrc/github.com/operator-framework/operator-sdk/vendor/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/manager.New\n\tsrc/github.com/operator-framework/operator-sdk/vendor/sigs.k8s.io/controller-runtime/pkg/manager/manager.go:220\ngithub.com/operator-framework/operator-sdk/pkg/ansible.Run\n\tsrc/github.com/operator-framework/operator-sdk/pkg/ansible/run.go:80\ngithub.com/operator-framework/operator-sdk/cmd/operator-sdk/run.newRunAnsibleCmd.func1\n\tsrc/github.com/operator-framework/operator-sdk/cmd/operator-sdk/run/ansible.go:38\ngithub.com/spf13/cobra.(*Command).execute\n\tsrc/github.com/operator-framework/operator-sdk/vendor/github.com/spf13/cobra/command.go:826\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tsrc/github.com/operator-framework/operator-sdk/vendor/github.com/spf13/cobra/command.go:914\ngithub.com/spf13/cobra.(*Command).Execute\n\tsrc/github.com/operator-framework/operator-sdk/vendor/github.com/spf13/cobra/command.go:864\nmain.main\n\tsrc/github.com/operator-framework/operator-sdk/cmd/operator-sdk/main.go:84\nruntime.main\n\t/opt/rh/go-toolset-1.13/root/usr/lib/go-toolset-1.13-golang/src/runtime/proc.go:203"} {"level":"error","ts":1596722990.865287,"logger":"cmd","msg":"Failed to create a new manager.","Namespace":"openshift-migration","error":"Unauthorized","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tsrc/github.com/operator-framework/operator-sdk/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/operator-framework/operator-sdk/pkg/ansible.Run\n\tsrc/github.com/operator-framework/operator-sdk/pkg/ansible/run.go:86\ngithub.com/operator-framework/operator-sdk/cmd/operator-sdk/run.newRunAnsibleCmd.func1\n\tsrc/github.com/operator-framework/operator-sdk/cmd/operator-sdk/run/ansible.go:38\ngithub.com/spf13/cobra.(*Command).execute\n\tsrc/github.com/operator-framework/operator-sdk/vendor/github.com/spf13/cobra/command.go:826\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tsrc/github.com/operator-framework/operator-sdk/vendor/github.com/spf13/cobra/command.go:914\ngithub.com/spf13/cobra.(*Command).Execute\n\tsrc/github.com/operator-framework/operator-sdk/vendor/github.com/spf13/cobra/command.go:864\nmain.main\n\tsrc/github.com/operator-framework/operator-sdk/cmd/operator-sdk/main.go:84\nruntime.main\n\t/opt/rh/go-toolset-1.13/root/usr/lib/go-toolset-1.13-golang/src/runtime/proc.go:203"} Error: Unauthorized Usage: operator-sdk run ansible [flags] Flags: --ansible-verbosity int Ansible verbosity. Overridden by environment variable. (default 2) -h, --help help for ansible --inject-owner-ref The ansible operator will inject owner references unless this flag is false (default true) --max-workers int Maximum number of workers to use. Overridden by environment variable. (default 1) --reconcile-period duration Default reconcile period for controllers (default 1m0s) --watches-file string Path to the watches file to use (default "./watches.yaml") --zap-devel Enable zap development mode (changes defaults to console encoder, debug log level, and disables sampling) --zap-encoder encoder Zap log encoding ('json' or 'console') --zap-level level Zap log level (one of 'debug', 'info', 'error' or any integer value > 0) (default info) --zap-sample sample Enable zap log sampling. Sampling will be disabled for integer log levels > 1 --zap-time-encoding timeEncoding Sets the zap time format ('epoch', 'millis', 'nano', or 'iso8601') (default ) Global Flags: --verbose Enable verbose logging
Can this still be reproduced. I got the impression from Slack conversations it start working, possibly with a 4.5 z-stream update.
verify with MTC 1.3.0 $ oc get pods -n openshift-migration migration-operator-cb65d55b4-5nv66 -o yaml | grep -A1 serviceaccount - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: migration-operator-token-xjz6z $ oc get secret -n openshift-migration | grep migration-oper migration-operator-dockercfg-vm7nz kubernetes.io/dockercfg 1 55m migration-operator-token-mghqs kubernetes.io/service-account-token 4 55m migration-operator-token-xjz6z kubernetes.io/service-account-token 4 55m Operator image: $ oc get pods -n openshift-migration migration-operator-cb65d55b4-5nv66 -o yaml | grep image f:image: {} f:imagePullPolicy: {} image: quay-enterprise-quay-enterprise.apps.cam-tgt-8790.qe.devcluster.openshift.com/admin/openshift-migration-rhel7-operator@sha256:66efea27fa3d6498ef8c722ef9dec45ceba2a9db695b8092e0e65b5070c94d87 imagePullPolicy: Always imagePullSecrets: image: quay-enterprise-quay-enterprise.apps.cam-tgt-8790.qe.devcluster.openshift.com/admin/openshift-migration-rhel7-operator@sha256:66efea27fa3d6498ef8c722ef9dec45ceba2a9db695b8092e0e65b5070c94d87 imageID: quay-enterprise-quay-enterprise.apps.cam-tgt-8790.qe.devcluster.openshift.com/admin/openshift-migration-rhel7-operator@sha256:66efea27fa3d6498ef8c722ef9dec45ceba2a9db695b8092e0e65b5070c94d87
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Migration Toolkit for Containers (MTC) Tool image release advisory 1.3.0), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4148