Bug 1970996 - All migrations report a failure when migrating to OCP 4.8
Summary: All migrations report a failure when migrating to OCP 4.8
Keywords:
Status: CLOSED DUPLICATE of bug 1972383
Alias: None
Product: Migration Toolkit for Containers
Classification: Red Hat
Component: General
Version: 1.4.5
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 1.5.0
Assignee: Dylan Murray
QA Contact: Xin jiang
Avital Pinnick
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-11 15:45 UTC by Sergio
Modified: 2021-06-15 19:00 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-06-15 19:00:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
must-gather file (3.36 MB, application/gzip)
2021-06-11 15:45 UTC, Sergio
no flags Details
rolebindings (4.31 KB, text/plain)
2021-06-11 15:46 UTC, Sergio
no flags Details

Description Sergio 2021-06-11 15:45:29 UTC
Created attachment 1790270 [details]
must-gather file

Description of problem:
When we migrate from OCP 4.7 to OCP 4.8 all migrations report a failure in the Restore operation.

Version-Release number of selected component (if applicable):
MTC 1.4.5
SOURCE CLUSTER: AWS OCP 4.7 
TARGET CLUSTER: AWS OCP 4.8  (Controller)
REPLICATION REPOSITORY: AWS S3

How reproducible:
Always

Steps to Reproduce:
1. Create an empty project in the OCP 4.7 source cluster

oc new-project empty-project

2. Migrate this project to the OCP 4.8 cluster


Actual results:
The migration fails reporting a failure in the Restore

oc get migmigration -o yaml 
....
status:
  conditions:
  - category: Warn
    durable: true
    lastTransitionTime: "2021-06-11T13:33:22Z"
    message: 'Final Restore openshift-migration/7be88a90-cab9-11eb-9838-2d8cab6f43fa-h52rz: partially failed on destination cluster'
    status: "True"
    type: VeleroFinalRestorePartiallyFailed

When we describe the Restore resource we can see this information

$ velero -n openshift-migration describe restore...

  Namespaces:
    empty-project:  could not restore, configmaps "kube-root-ca.crt" already exists. Warning: the in-cluster version is different than the backed-up version.
                    could not restore, rolebindings.rbac.authorization.k8s.io "system:deployers" already exists. Warning: the in-cluster version is different than the backed-up version.
                    could not restore, rolebindings.rbac.authorization.k8s.io "system:image-builders" already exists. Warning: the in-cluster version is different than the backed-up version.
                    could not restore, rolebindings.rbac.authorization.k8s.io "system:image-pullers" already exists. Warning: the in-cluster version is different than the backed-up version.

Errors:
  Velero:     <none>
  Cluster:    <none>
  Namespaces:
    empty-project:  error restoring rolebindings.authorization.openshift.io/empty-project/admin: Post "https://172.30.0.1:443/apis/rbac.authorization.k8s.io/v1/namespaces/empty-project/rolebindings": net/http: invalid header field name "Impersonate-Extra-authentication.kubernetes.io/pod-uid"
                    error restoring rolebindings.authorization.openshift.io/empty-project/system:deployers: Post "https://172.30.0.1:443/apis/rbac.authorization.k8s.io/v1/namespaces/empty-project/rolebindings": net/http: invalid header field name "Impersonate-Extra-authentication.kubernetes.io/pod-name"
                    error restoring rolebindings.authorization.openshift.io/empty-project/system:image-builders: Post "https://172.30.0.1:443/apis/rbac.authorization.k8s.io/v1/namespaces/empty-project/rolebindings": net/http: invalid header field name "Impersonate-Extra-authentication.kubernetes.io/pod-uid"
                    error restoring rolebindings.authorization.openshift.io/empty-project/system:image-pullers: Post "https://172.30.0.1:443/apis/rbac.authorization.k8s.io/v1/namespaces/empty-project/rolebindings": net/http: invalid header field name "Impersonate-Extra-authentication.kubernetes.io/pod-uid"



Expected results:
The migration should not report any failure

Additional info:

Attached the must-gather information and a yaml file with the rolebindings in the empty namespace.

Comment 1 Sergio 2021-06-11 15:46:20 UTC
Created attachment 1790271 [details]
rolebindings

Comment 2 Dylan Murray 2021-06-14 15:55:28 UTC
This issue is affecting other components as well: https://bugzilla.redhat.com/show_bug.cgi?id=1971540

The root cause (my best guess) is that something in the OCP 4.8 nightly builds has broken support for these extra headers. We cannot reproduce this in Kubernetes 1.21 but the headers that are being rejected are in fact invalid: https://datatracker.ietf.org/doc/html/rfc7230#section-3.2.6.

So we need to determine how this is being set and fix it.

However, as a temporary workaround we can add all the OCP specific RBAC apigroups to the excluded resource list. This would allow us to simply create the kube equivalent versions of these resources and ignore this error.

Comment 3 Dylan Murray 2021-06-15 18:28:48 UTC
After some investigation, there is a bug in the latest version of OCP 4.8. A recent change forces all users in OCP to use bounded service account tokens, which surfaced an existing bug around not decoding the utf8 representation of the headers.

When a bug is filed I will attach it here, but this is not an issue within MTC to solve.


Note You need to log in before you can comment on or make changes to this bug.