Bug 1907828 - Stage pods stuck as a PVC they are trying to attach is in 'Terminating'
Summary: Stage pods stuck as a PVC they are trying to attach is in 'Terminating'
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Migration Toolkit for Containers
Classification: Red Hat
Component: General
Version: 1.4.0
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 1.4.0
Assignee: Shawn Hurley
QA Contact: Xin jiang
Avital Pinnick
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-12-15 10:48 UTC by Sergio
Modified: 2021-02-11 12:55 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-11 12:54:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github konveyor mig-controller pull 889 0 None closed Bug 1907828: Adding claim health checking for a waiting pods 2021-01-29 09:25:52 UTC
Github konveyor mig-controller pull 895 0 None closed Bug 1907828: Adding claim health checking for a waiting to be created pod 2021-01-29 09:25:08 UTC
Github konveyor mig-controller pull 932 0 None open Bug 1907828: Adding pvc health checking when pod is pending 2021-01-28 17:37:30 UTC
Github konveyor mig-controller pull 933 0 None open Bug 1907828: Adding pvc health checking when pod is pending 2021-01-28 20:02:55 UTC
Red Hat Product Errata RHBA-2020:5329 0 None None None 2021-02-11 12:55:09 UTC

Description Sergio 2020-12-15 10:48:13 UTC
Description of problem:
Sometimes a migration may fail as the stage pods are created and stay in Pending as they are unable to mount a PVC due to it being stuck in a Terminating state.

Version-Release number of selected component (if applicable):
MTC 1.4.0
SOURCE CLUSTER: AZURE 4.3
TARGET CLUSTER: AZURE 4.6
REPLICATION REPOSITORY: AZURE

How reproducible:
Always

Steps to Reproduce:
1) Create a Pod that mounts a PVC
2) Create a MigPlan that references the Pod and PVC. Indirect migration.
3) Delete the PVC while it is mounted to the Pod, PVC will be in terminating
4) Run a migration with the MigPlan

Actual results:
Stage pod is on "ContainerCreating" status forever and the migration run is stuck.

$ oc get pods
NAME                                           READY     STATUS              RESTARTS   AGE
nginx-deployment-b99766f9c-2zwfd               1/1       Running             0          36m
stage-nginx-deployment-b99766f9c-2zwfd-4jc75   0/1       ContainerCreating   0          32m

$ oc get pvc
NAME         STATUS        VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
nginx-html   Terminating   pvc-377f520b-4595-4c35-be3b-cc2c69853ead   1Gi        RWO            managed-premium   36m
nginx-logs   Terminating   pvc-e25278b3-1cc6-45c0-a8c5-2dd066ce70b4   1Gi        RWO            managed-premium   36m

MigMigration status:

  status:
    conditions:
    - category: Advisory
      lastTransitionTime: "2020-12-15T09:32:17Z"
      message: 'Step: 23/47'
      reason: StagePodsCreated
      status: "True"
      type: Running
    - category: Required
      lastTransitionTime: "2020-12-15T09:30:56Z"
      message: The migration is ready.
      status: "True"
      type: Ready
    - category: Required
      durable: true
      lastTransitionTime: "2020-12-15T09:31:48Z"
      message: The migration registries are healthy.
      status: "True"
      type: RegistriesHealthy
    - category: Advisory
      durable: true
      lastTransitionTime: "2020-12-15T09:32:15Z"
      message: '[1] Stage pods created.'
      status: "True"
      type: StagePodsCreated
    itinerary: Final
    observedDigest: a02f199bd5de68b777effef8d73e22948d8990bd9f46678bbef11823f36899fa
    phase: StagePodsCreated
    pipeline:
    - completed: "2020-12-15T09:31:51Z"
      message: Completed
      name: Prepare
      started: "2020-12-15T09:30:55Z"
    - completed: "2020-12-15T09:32:14Z"
      message: Completed
      name: Backup
      progress:
      - 'Backup openshift-migration/ocp-32834-pvc-terminating-mig-1608024651-s2s7w:
        40 out of estimated total of 40 objects backed up (18s)'
      started: "2020-12-15T09:31:51Z"
    - message: Waiting for all Stage Pods to start.
      name: StageBackup
      phase: StagePodsCreated
      progress:
      - 'Pod ocp-32834-pvc-terminating/stage-nginx-deployment-b99766f9c-2zwfd-4jc75:
        Container sleep-0 '
      started: "2020-12-15T09:32:14Z"
    - message: Not started
      name: StageRestore
    - message: Not started
      name: DirectImage
    - message: Not started
      name: DirectVolume
    - message: Not started
      name: Restore
    - message: Not started
      name: Cleanup
    startTimestamp: "2020-12-15T09:30:55Z"


Expected results:
The migration should fail

Additional info:
This is a regression of BZ https://bugzilla.redhat.com/show_bug.cgi?id=1854914

Comment 1 Erik Nelson 2020-12-15 15:26:01 UTC
@dymurray what's the expected beahvior here? I'm not sure I understand what would be considered a "Bug" here.

Dylan, can you actually confirm this is a bug here? I'm not sure what we should actually expect to happen under these circumstances.

Comment 2 Erik Nelson 2020-12-15 15:39:51 UTC
Discussing with Dylan, it's not clear this is entirely unexpected behavior given the PVC was actually deleted. Furthermore, we'll be moving away from a stage pod approach as direct migrations become the default mode of copy transfer. We'll keep this bz around for archive sake, but will be descoping for the near future.

Comment 3 Sergio 2020-12-17 08:56:36 UTC
This link is a BZ openend by John long ago: https://bugzilla.redhat.com/show_bug.cgi?id=1854914

As a result the bug was fixed, and since then the migration fails when it finds PVCs in "Terminating" status.

In 1.4.0 the fix is not there any more.

Comment 4 Erik Nelson 2021-01-05 17:14:56 UTC
Okay it sounds like there is potentially a regression here that has to be looked into.

Comment 12 Xin jiang 2021-01-29 09:42:57 UTC
verified. it will report a error in the UI 

Danger alert:This migration has following error conditions:
PVC: ocp-cakephpaaaa/mysql, deleted.

Comment 14 errata-xmlrpc 2021-02-11 12:54:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Migration Toolkit for Containers (MTC) tool image release advisory 1.4.0), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5329


Note You need to log in before you can comment on or make changes to this bug.