Bug 2060717 - [MTC] Registry pod goes in CrashLoopBackOff several times when MCG Nooba is used as the Replication Repository
Summary: [MTC] Registry pod goes in CrashLoopBackOff several times when MCG Nooba is u...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Migration Toolkit for Containers
Classification: Red Hat
Component: General
Version: 1.7.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 1.7.1
Assignee: Jason Montleon
QA Contact: mohamed
Richard Hoch
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-03-04 04:45 UTC by ssingla
Modified: 2022-05-05 13:51 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-05-05 13:50:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2022:1734 0 None None None 2022-05-05 13:51:01 UTC

Description ssingla 2022-03-04 04:45:16 UTC
Description of the problem: When MCG Nooba is used as the Replication Repository, the registry pod gets restarted several times going through the CrashLoopBackOff state before finally coming to a stable Running state.

Severity: Medium

Version-Release number of selected component (if applicable):
MTC 1.7.0 + MCG Nooba as Replication Repository
Source cluster OCP version: 3.11
Target cluster OCP version: 4.7 (Controller) - MCG installed 

Steps to reproduce:

1. Deploy any application in the source cluster 
2. Login to the MTC UI and create a migplan
3. Execute Full/Cutover migration



Actual Results: During the StageBackup step, registry pods crashes several times which is also visible on the Migrations page.

Expected Results:  The registry pod should not crash.

Additional Info: Even after increasing the liveness/readiness timeout of the registry pod, it keeps crashing.

Logs
time="2022-03-03T15:54:40.011958156Z" level=debug msg="authorizing request" go.version=go1.16.12 http.request.host="10.129.2.89:5000" http.request.id=2b2a85f8-f3bb-4e57-8a6f-82fc8d00295f http.request.method=GET http.request.remoteaddr="10.129.2.1:54922" http.request.uri="/v2/_catalog?n=5" http.request.useragent="kube-probe/1.20" 
time="2022-03-03T15:54:40.160697615Z" level=debug msg="s3aws.ListObjectsV2Pages(automatic-registry-b9b2251f-bf91-4214-87e5-1ab5e5f27a9d/docker/registry/v2/repositories/django/django-psql-persistent/)" go.version=go1.16.12 http.request.host="10.129.2.89:5000" http.request.id=2b2a85f8-f3bb-4e57-8a6f-82fc8d00295f http.request.method=GET http.request.remoteaddr="10.129.2.1:54922" http.request.uri="/v2/_catalog?n=5" http.request.useragent="kube-probe/1.20" trace.duration=100.414272ms

Comment 1 Jason Montleon 2022-03-08 16:09:49 UTC
I have seen this happen as well with Noobaa in the last week or so. Did you increase the Readiness and Liveness timeouts in the MigrationController CR on both clusters? The values are set per cluster, so doing so on the controller node only is not sufficient.

If you did increase it on both, what did you increase it to? Can you try a large value like 300 on both clusters if you tried something smaller and see if it resolves the issue?

Comment 3 Jason Montleon 2022-04-06 02:15:14 UTC
To remedy this for the typical case we're increasing the default liveness and readiness probe timeouts from 3 to 300 seconds.

https://github.com/konveyor/mig-controller/pull/1269 / https://github.com/konveyor/mig-controller/pull/1270

https://github.com/konveyor/mig-controller/commit/0fda45f8771ed3ee4b9bd9a89ce49f50e2ee106f

Comment 9 errata-xmlrpc 2022-05-05 13:50:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Migration Toolkit for Containers (MTC) 1.7.1 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1734


Note You need to log in before you can comment on or make changes to this bug.