Bug 1881473 - Stage action is stuck at Running status for a long time due to stage pod OutOfcpu
Summary: Stage action is stuck at Running status for a long time due to stage pod Out...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Migration Toolkit for Containers
Classification: Red Hat
Component: General
Version: 1.3.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 1.6.0
Assignee: Alay Patel
QA Contact: Xin jiang
Avital Pinnick
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-22 13:30 UTC by Xin jiang
Modified: 2021-09-01 13:36 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-09-01 13:36:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Xin jiang 2020-09-22 13:30:43 UTC
Description of problem:
Since the stage pod creating failed due to OutOfcpu in the source cluster, the Stage action is stuck for a long time, more than 35 mins.  

Version-Release number of selected component (if applicable):
MTC 1.3.0

How reproducible:
always

Steps to Reproduce:
1. in order to reproduce this issue, you could add a resource quota to limit the cup resource to the application's namespace in the source cluster
2. create a migplan to migrate the application and launch Stage action

Actual results:
The Stage action is stuck at Stage Running status for a long time

Expected results:
The Stage should fail after the Stage process retrys a period of time

Additional info:
Source cluster:
$ oc get pod -n ocp-29918-hooks
NAME                                            READY   STATUS     RESTARTS   AGE
nginx-deployment-69ff56478c-d6rxn               1/1     Running    0          125m
stage-nginx-deployment-69ff56478c-d6rxn-2lxvt   0/1     OutOfcpu   0          15m

Target cluster:
$ oc get migmigration -n openshift-migration c530ab50-fcb6-11ea-a51b-e794783a4dec
NAME                                   READY   PLAN              STAGE   ITINERARY   PHASE              AGE
c530ab50-fcb6-11ea-a51b-e794783a4dec   True    ocp-29918-hooks   true    Stage       StagePodsCreated   45m

Comment 2 Erik Nelson 2021-06-29 17:57:55 UTC
As of 1.4.z+, we think a status condition should be raised in the event that the pod doesn't become healthy during a certain timeout period. Let's verify that to be the case as of 1.6.0 and we can close as fixed.

Comment 3 Alay Patel 2021-09-01 13:36:17 UTC
https://github.com/konveyor/mig-controller/pull/747

This PR will report that the stage pod failed and will fail the migration. Closing this since the fix is released, please re-open if the problem is not solved.


Note You need to log in before you can comment on or make changes to this bug.