Bug 1970996

Summary: All migrations report a failure when migrating to OCP 4.8
Product: Migration Toolkit for Containers Reporter: Sergio <sregidor>
Component: GeneralAssignee: Dylan Murray <dymurray>
Status: CLOSED DUPLICATE QA Contact: Xin jiang <xjiang>
Severity: urgent Docs Contact: Avital Pinnick <apinnick>
Priority: urgent    
Version: 1.4.5CC: dymurray, ernelson, jmatthew, odepaz, shurley, whu, xjiang
Target Milestone: ---   
Target Release: 1.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-06-15 19:00:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
must-gather file
none
rolebindings none

Description Sergio 2021-06-11 15:45:29 UTC
Created attachment 1790270 [details]
must-gather file

Description of problem:
When we migrate from OCP 4.7 to OCP 4.8 all migrations report a failure in the Restore operation.

Version-Release number of selected component (if applicable):
MTC 1.4.5
SOURCE CLUSTER: AWS OCP 4.7 
TARGET CLUSTER: AWS OCP 4.8  (Controller)
REPLICATION REPOSITORY: AWS S3

How reproducible:
Always

Steps to Reproduce:
1. Create an empty project in the OCP 4.7 source cluster

oc new-project empty-project

2. Migrate this project to the OCP 4.8 cluster


Actual results:
The migration fails reporting a failure in the Restore

oc get migmigration -o yaml 
....
status:
  conditions:
  - category: Warn
    durable: true
    lastTransitionTime: "2021-06-11T13:33:22Z"
    message: 'Final Restore openshift-migration/7be88a90-cab9-11eb-9838-2d8cab6f43fa-h52rz: partially failed on destination cluster'
    status: "True"
    type: VeleroFinalRestorePartiallyFailed

When we describe the Restore resource we can see this information

$ velero -n openshift-migration describe restore...

  Namespaces:
    empty-project:  could not restore, configmaps "kube-root-ca.crt" already exists. Warning: the in-cluster version is different than the backed-up version.
                    could not restore, rolebindings.rbac.authorization.k8s.io "system:deployers" already exists. Warning: the in-cluster version is different than the backed-up version.
                    could not restore, rolebindings.rbac.authorization.k8s.io "system:image-builders" already exists. Warning: the in-cluster version is different than the backed-up version.
                    could not restore, rolebindings.rbac.authorization.k8s.io "system:image-pullers" already exists. Warning: the in-cluster version is different than the backed-up version.

Errors:
  Velero:     <none>
  Cluster:    <none>
  Namespaces:
    empty-project:  error restoring rolebindings.authorization.openshift.io/empty-project/admin: Post "https://172.30.0.1:443/apis/rbac.authorization.k8s.io/v1/namespaces/empty-project/rolebindings": net/http: invalid header field name "Impersonate-Extra-authentication.kubernetes.io/pod-uid"
                    error restoring rolebindings.authorization.openshift.io/empty-project/system:deployers: Post "https://172.30.0.1:443/apis/rbac.authorization.k8s.io/v1/namespaces/empty-project/rolebindings": net/http: invalid header field name "Impersonate-Extra-authentication.kubernetes.io/pod-name"
                    error restoring rolebindings.authorization.openshift.io/empty-project/system:image-builders: Post "https://172.30.0.1:443/apis/rbac.authorization.k8s.io/v1/namespaces/empty-project/rolebindings": net/http: invalid header field name "Impersonate-Extra-authentication.kubernetes.io/pod-uid"
                    error restoring rolebindings.authorization.openshift.io/empty-project/system:image-pullers: Post "https://172.30.0.1:443/apis/rbac.authorization.k8s.io/v1/namespaces/empty-project/rolebindings": net/http: invalid header field name "Impersonate-Extra-authentication.kubernetes.io/pod-uid"



Expected results:
The migration should not report any failure

Additional info:

Attached the must-gather information and a yaml file with the rolebindings in the empty namespace.

Comment 1 Sergio 2021-06-11 15:46:20 UTC
Created attachment 1790271 [details]
rolebindings

Comment 2 Dylan Murray 2021-06-14 15:55:28 UTC
This issue is affecting other components as well: https://bugzilla.redhat.com/show_bug.cgi?id=1971540

The root cause (my best guess) is that something in the OCP 4.8 nightly builds has broken support for these extra headers. We cannot reproduce this in Kubernetes 1.21 but the headers that are being rejected are in fact invalid: https://datatracker.ietf.org/doc/html/rfc7230#section-3.2.6.

So we need to determine how this is being set and fix it.

However, as a temporary workaround we can add all the OCP specific RBAC apigroups to the excluded resource list. This would allow us to simply create the kube equivalent versions of these resources and ignore this error.

Comment 3 Dylan Murray 2021-06-15 18:28:48 UTC
After some investigation, there is a bug in the latest version of OCP 4.8. A recent change forces all users in OCP to use bounded service account tokens, which surfaced an existing bug around not decoding the utf8 representation of the headers.

When a bug is filed I will attach it here, but this is not an issue within MTC to solve.