Bug 1945251 - Cluster remains in 'Critical' state after bad service account token is fixed
Summary: Cluster remains in 'Critical' state after bad service account token is fixed
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Migration Toolkit for Containers
Classification: Red Hat
Component: Controller
Version: 1.4.2
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 1.5.0
Assignee: Jaydip Gabani
QA Contact: Xin jiang
Avital Pinnick
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-31 14:25 UTC by Derek Whatley
Modified: 2021-07-28 04:08 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-28 04:08:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2021:2929 0 None None None 2021-07-28 04:08:11 UTC

Description Derek Whatley 2021-03-31 14:25:27 UTC
Description of problem:
I had to tear down my OCP 3 cluster, and when I brought back the new one and fixed the SA token using mig-ui, I wasn't able to get a good response.

The only way I fixed this was by restarting mig-controller.

Version-Release number of selected component (if applicable):
Exists in 1.4.2 and in master.

How reproducible:
Seems to be consistent.

Steps to Reproduce:
1. Set up a MigCluster with working SA token
2. Break the SA token (e.g. by removing the cluster, or just editing the token)
3. Try to fix the SA token in mig-ui or manually
4. See failure for MigCluster to become ready


This may be a validation ordering problem: as seen below, the entire reconcile fails due to being unauthorized. It may also be that we aren't requeueing when there is a validation failure.

Status conditions:
  status:
    conditions:
    - category: Critical
      lastTransitionTime: "2021-03-31T14:14:32Z"
      message: 'Test connect failed. '
      reason: ConnectFailed
      status: "True"
      type: TestConnectFailed
    - category: Critical
      lastTransitionTime: "2021-03-30T17:00:05Z"
      message: 'Reconcile failed: [Unauthorized]. See controller logs for details.'
      status: "True"
      type: ReconcileFailed
    observedDigest: 3bdd8189de081b4315990a49028a49b7df9b6dd503490774848d619eea48dfe8
    operatorVersion: latest
    registryPath: docker-registry-default.apps.djwocp3a.mg.dog8code.com

Comment 1 Derek Whatley 2021-03-31 14:29:29 UTC
Seems to be related to https://bugzilla.redhat.com/show_bug.cgi?id=1741281, but in this case I didn't see recovery happen.

Comment 2 Erik Nelson 2021-04-08 02:29:00 UTC
This is something I'm pretty confident I often did in the past without issue. I'm wondering if something potentially regressed here?

Comment 3 Derek Whatley 2021-06-17 17:06:40 UTC
Jaydip wasn't able to reproduce this issue. We're waiting on additional reproducer steps from Xin in a related BZ, but this may already be fixed, could be a UI regression that has since been resolved. Will reach out to Ian to see if he knows anything.

Comment 4 Derek Whatley 2021-06-17 17:19:22 UTC
I think this bug is fixed, I can no longer reproduce the issue. Guessing Pranav resolved it when doing testing on change of SA token for client caching work in https://github.com/konveyor/mig-controller/pull/1037, and Ian may have fixed UI related parts around waiting to update status of MigCluster until changes are made in modal.

Comment 10 Sergio 2021-06-29 12:06:56 UTC
Verified using MTC 1.5.0

   openshift-migration-rhel7-operator@sha256:00e77706ca22bcb557d13c16822180fc877e6ea1639a72fda8eb9f5488b039a2
    - name: MIG_CONTROLLER_REPO
      value: openshift-migration-controller-rhel8@sha256
    - name: MIG_CONTROLLER_TAG
      value: 7f657df15e9514df4ef42da3431f558a19b8d3233a2ef1222cd8e27793c93816

After breaking the migcluster connection I was able to fix it by editing the SA token and the migcluster resource returned to Ready status.

Moved to VERIFIED.

Comment 16 errata-xmlrpc 2021-07-28 04:08:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Migration Toolkit for Containers (MTC) image release advisory 1.5.0), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2021:2929


Note You need to log in before you can comment on or make changes to this bug.