1881473 – Stage action is stuck at Running status for a long time due to stage pod OutOfcpu

Bug 1881473 - Stage action is stuck at Running status for a long time due to stage pod OutOfcpu

Summary: Stage action is stuck at Running status for a long time due to stage pod Out...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Migration Toolkit for Containers
Classification:	Red Hat
Component:	General
Sub Component:
Version:	1.3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	1.6.0
Assignee:	Alay Patel
QA Contact:	Xin jiang
Docs Contact:	Avital Pinnick
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-09-22 13:30 UTC by Xin jiang
Modified:	2021-09-01 13:36 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-09-01 13:36:56 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Xin jiang 2020-09-22 13:30:43 UTC

Description of problem:
Since the stage pod creating failed due to OutOfcpu in the source cluster, the Stage action is stuck for a long time, more than 35 mins.  

Version-Release number of selected component (if applicable):
MTC 1.3.0

How reproducible:
always

Steps to Reproduce:
1. in order to reproduce this issue, you could add a resource quota to limit the cup resource to the application's namespace in the source cluster
2. create a migplan to migrate the application and launch Stage action

Actual results:
The Stage action is stuck at Stage Running status for a long time

Expected results:
The Stage should fail after the Stage process retrys a period of time

Additional info:
Source cluster:
$ oc get pod -n ocp-29918-hooks
NAME                                            READY   STATUS     RESTARTS   AGE
nginx-deployment-69ff56478c-d6rxn               1/1     Running    0          125m
stage-nginx-deployment-69ff56478c-d6rxn-2lxvt   0/1     OutOfcpu   0          15m

Target cluster:
$ oc get migmigration -n openshift-migration c530ab50-fcb6-11ea-a51b-e794783a4dec
NAME                                   READY   PLAN              STAGE   ITINERARY   PHASE              AGE
c530ab50-fcb6-11ea-a51b-e794783a4dec   True    ocp-29918-hooks   true    Stage       StagePodsCreated   45m

Comment 2 Erik Nelson 2021-06-29 17:57:55 UTC

As of 1.4.z+, we think a status condition should be raised in the event that the pod doesn't become healthy during a certain timeout period. Let's verify that to be the case as of 1.6.0 and we can close as fixed.

Comment 3 Alay Patel 2021-09-01 13:36:17 UTC

https://github.com/konveyor/mig-controller/pull/747

This PR will report that the stage pod failed and will fail the migration. Closing this since the fix is released, please re-open if the problem is not solved.

Note You need to log in before you can comment on or make changes to this bug.