Created attachment 1790270 [details] must-gather file Description of problem: When we migrate from OCP 4.7 to OCP 4.8 all migrations report a failure in the Restore operation. Version-Release number of selected component (if applicable): MTC 1.4.5 SOURCE CLUSTER: AWS OCP 4.7 TARGET CLUSTER: AWS OCP 4.8 (Controller) REPLICATION REPOSITORY: AWS S3 How reproducible: Always Steps to Reproduce: 1. Create an empty project in the OCP 4.7 source cluster oc new-project empty-project 2. Migrate this project to the OCP 4.8 cluster Actual results: The migration fails reporting a failure in the Restore oc get migmigration -o yaml .... status: conditions: - category: Warn durable: true lastTransitionTime: "2021-06-11T13:33:22Z" message: 'Final Restore openshift-migration/7be88a90-cab9-11eb-9838-2d8cab6f43fa-h52rz: partially failed on destination cluster' status: "True" type: VeleroFinalRestorePartiallyFailed When we describe the Restore resource we can see this information $ velero -n openshift-migration describe restore... Namespaces: empty-project: could not restore, configmaps "kube-root-ca.crt" already exists. Warning: the in-cluster version is different than the backed-up version. could not restore, rolebindings.rbac.authorization.k8s.io "system:deployers" already exists. Warning: the in-cluster version is different than the backed-up version. could not restore, rolebindings.rbac.authorization.k8s.io "system:image-builders" already exists. Warning: the in-cluster version is different than the backed-up version. could not restore, rolebindings.rbac.authorization.k8s.io "system:image-pullers" already exists. Warning: the in-cluster version is different than the backed-up version. Errors: Velero: <none> Cluster: <none> Namespaces: empty-project: error restoring rolebindings.authorization.openshift.io/empty-project/admin: Post "https://172.30.0.1:443/apis/rbac.authorization.k8s.io/v1/namespaces/empty-project/rolebindings": net/http: invalid header field name "Impersonate-Extra-authentication.kubernetes.io/pod-uid" error restoring rolebindings.authorization.openshift.io/empty-project/system:deployers: Post "https://172.30.0.1:443/apis/rbac.authorization.k8s.io/v1/namespaces/empty-project/rolebindings": net/http: invalid header field name "Impersonate-Extra-authentication.kubernetes.io/pod-name" error restoring rolebindings.authorization.openshift.io/empty-project/system:image-builders: Post "https://172.30.0.1:443/apis/rbac.authorization.k8s.io/v1/namespaces/empty-project/rolebindings": net/http: invalid header field name "Impersonate-Extra-authentication.kubernetes.io/pod-uid" error restoring rolebindings.authorization.openshift.io/empty-project/system:image-pullers: Post "https://172.30.0.1:443/apis/rbac.authorization.k8s.io/v1/namespaces/empty-project/rolebindings": net/http: invalid header field name "Impersonate-Extra-authentication.kubernetes.io/pod-uid" Expected results: The migration should not report any failure Additional info: Attached the must-gather information and a yaml file with the rolebindings in the empty namespace.
Created attachment 1790271 [details] rolebindings
This issue is affecting other components as well: https://bugzilla.redhat.com/show_bug.cgi?id=1971540 The root cause (my best guess) is that something in the OCP 4.8 nightly builds has broken support for these extra headers. We cannot reproduce this in Kubernetes 1.21 but the headers that are being rejected are in fact invalid: https://datatracker.ietf.org/doc/html/rfc7230#section-3.2.6. So we need to determine how this is being set and fix it. However, as a temporary workaround we can add all the OCP specific RBAC apigroups to the excluded resource list. This would allow us to simply create the kube equivalent versions of these resources and ignore this error.
After some investigation, there is a bug in the latest version of OCP 4.8. A recent change forces all users in OCP to use bounded service account tokens, which surfaced an existing bug around not decoding the utf8 representation of the headers. When a bug is filed I will attach it here, but this is not an issue within MTC to solve.