Bug 1881473

Summary: Stage action is stuck at Running status for a long time due to stage pod OutOfcpu
Product: Migration Toolkit for Containers Reporter: Xin jiang <xjiang>
Component: GeneralAssignee: Alay Patel <alpatel>
Status: CLOSED NOTABUG QA Contact: Xin jiang <xjiang>
Severity: medium Docs Contact: Avital Pinnick <apinnick>
Priority: medium    
Version: 1.3.0CC: chezhang, ernelson, sregidor, whu
Target Milestone: ---   
Target Release: 1.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-09-01 13:36:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Xin jiang 2020-09-22 13:30:43 UTC
Description of problem:
Since the stage pod creating failed due to OutOfcpu in the source cluster, the Stage action is stuck for a long time, more than 35 mins.  

Version-Release number of selected component (if applicable):
MTC 1.3.0

How reproducible:
always

Steps to Reproduce:
1. in order to reproduce this issue, you could add a resource quota to limit the cup resource to the application's namespace in the source cluster
2. create a migplan to migrate the application and launch Stage action

Actual results:
The Stage action is stuck at Stage Running status for a long time

Expected results:
The Stage should fail after the Stage process retrys a period of time

Additional info:
Source cluster:
$ oc get pod -n ocp-29918-hooks
NAME                                            READY   STATUS     RESTARTS   AGE
nginx-deployment-69ff56478c-d6rxn               1/1     Running    0          125m
stage-nginx-deployment-69ff56478c-d6rxn-2lxvt   0/1     OutOfcpu   0          15m

Target cluster:
$ oc get migmigration -n openshift-migration c530ab50-fcb6-11ea-a51b-e794783a4dec
NAME                                   READY   PLAN              STAGE   ITINERARY   PHASE              AGE
c530ab50-fcb6-11ea-a51b-e794783a4dec   True    ocp-29918-hooks   true    Stage       StagePodsCreated   45m

Comment 2 Erik Nelson 2021-06-29 17:57:55 UTC
As of 1.4.z+, we think a status condition should be raised in the event that the pod doesn't become healthy during a certain timeout period. Let's verify that to be the case as of 1.6.0 and we can close as fixed.

Comment 3 Alay Patel 2021-09-01 13:36:17 UTC
https://github.com/konveyor/mig-controller/pull/747

This PR will report that the stage pod failed and will fail the migration. Closing this since the fix is released, please re-open if the problem is not solved.