Bug 1899562 - MigMigration custom resource does not display an error message when a migration fails because of volume mount error
Summary: MigMigration custom resource does not display an error message when a migrati...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Migration Toolkit for Containers
Classification: Red Hat
Component: General
Version: 1.3.z
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 1.6.0
Assignee: Alay Patel
QA Contact: Xin jiang
Avital Pinnick
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-11-19 14:54 UTC by Sergio
Modified: 2023-09-15 00:51 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-09-29 14:34:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
must-gather-file (10.37 MB, application/gzip)
2020-11-19 14:54 UTC, Sergio
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2021:3694 0 None None None 2021-09-29 14:34:55 UTC

Description Sergio 2020-11-19 14:54:03 UTC
Created attachment 1730956 [details]
must-gather-file

Description of problem:
When we execute a migration, the stage pod creation can fail because it is trying to mount a volume and it fails (FailedMount). When it happens, the migration succeeds but the volume data is not migrated.


Version-Release number of selected component (if applicable):
CAM 1.2.3
SOURCE CLUSTER: 3.11 AWS
TARGET CLUSTER: 4.2 AWS
REPLICATION REPOSITORY: S3

How reproducible:
Inttermitent


Steps to Reproduce:
We have not been able to reproduce it


Actual results:
When the stage pod is created in the target cluster, if there are problems mounting the volume the volume data will not be migrated. The migration status will be successful, though.

Expected results:
The migration must fail (or at least must have a warning).


Additional info:
Attached the must-gather file. The migration with this problem is:  ocp-24659-mysql-migplan-1605706788


Those are the events in the target cluster's migrated namespace:

$ oc get events
LAST SEEN   TYPE      REASON                   OBJECT                          MESSAGE
115m        Normal    Scheduled                pod/mysql-1-deploy              Successfully assigned ocp-24659-mysql/mysql-1-deploy to ip-10-0-155-228.us-east-2.compute.internal
115m        Normal    Pulled                   pod/mysql-1-deploy              Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:dc4f29574e61490a3136c99f872027e2ed4a3502cb4aefab6353cccc499b9b7d" already present on machine
115m        Normal    Created                  pod/mysql-1-deploy              Created container deployment
115m        Normal    Started                  pod/mysql-1-deploy              Started container deployment
115m        Normal    Scheduled                pod/mysql-1-pjlj5               Successfully assigned ocp-24659-mysql/mysql-1-pjlj5 to ip-10-0-155-228.us-east-2.compute.internal
115m        Normal    SuccessfulAttachVolume   pod/mysql-1-pjlj5               AttachVolume.Attach succeeded for volume "pvc-ddd792c8-29a3-11eb-934a-02177daf6b80"
115m        Normal    Pulling                  pod/mysql-1-pjlj5               Pulling image "quay.io/openshifttest/mysql:5.7"
114m        Normal    Pulled                   pod/mysql-1-pjlj5               Successfully pulled image "quay.io/openshifttest/mysql:5.7"
114m        Normal    Created                  pod/mysql-1-pjlj5               Created container mysql
114m        Normal    Started                  pod/mysql-1-pjlj5               Started container mysql
115m        Normal    SuccessfulCreate         replicationcontroller/mysql-1   Created pod: mysql-1-pjlj5
116m        Normal    WaitForFirstConsumer     persistentvolumeclaim/mysql     waiting for first consumer to be created before binding
116m        Normal    ProvisioningSucceeded    persistentvolumeclaim/mysql     Successfully provisioned volume pvc-ddd792c8-29a3-11eb-934a-02177daf6b80 using kubernetes.io/aws-ebs
115m        Normal    DeploymentCreated        deploymentconfig/mysql          Created new replication controller "mysql-1" for version 1
116m        Normal    Scheduled                pod/stage-mysql-1-9bqt7-jfrw4   Successfully assigned ocp-24659-mysql/stage-mysql-1-9bqt7-jfrw4 to ip-10-0-155-228.us-east-2.compute.internal
116m        Warning   FailedAttachVolume       pod/stage-mysql-1-9bqt7-jfrw4   AttachVolume.Attach failed for volume "pvc-ddd792c8-29a3-11eb-934a-02177daf6b80" : "Error attaching EBS volume \"vol-0304082dd6c6a6561\"" to instance "i-0c12bf5d5091d3855" since volume is in "creating" state
116m        Normal    SuccessfulAttachVolume   pod/stage-mysql-1-9bqt7-jfrw4   AttachVolume.Attach succeeded for volume "pvc-ddd792c8-29a3-11eb-934a-02177daf6b80"
114m        Warning   FailedMount              pod/stage-mysql-1-9bqt7-jfrw4   Unable to mount volumes for pod "stage-mysql-1-9bqt7-jfrw4_ocp-24659-mysql(de0b1637-29a3-11eb-934a-02177daf6b80)": timeout expired waiting for volumes to attach or mount for pod "ocp-24659-mysql"/"stage-mysql-1-9bqt7-jfrw4". list of unmounted volumes=[mysql-data default-token-tbmbv]. list of unattached volumes=[mysql-data default-token-tbmbv]





We can see that the migration execution does not report a failure here:


apiVersion: migration.openshift.io/v1alpha1
kind: MigMigration
metadata:
  annotations:
    openshift.io/touch: 09007540-29a4-11eb-bc67-0a580a81020a
  creationTimestamp: "2020-11-18T13:40:15Z"
  generation: 30
  labels:
    controller-tools.k8s.io: "1.0"
    migration.openshift.io/migplan-name: ocp-24659-mysql-migplan-1605706788
  name: ocp-24659-mysql-mig-1605706788
  namespace: openshift-migration
  ownerReferences:
  - apiVersion: migration.openshift.io/v1alpha1
    kind: MigPlan
    name: ocp-24659-mysql-migplan-1605706788
    uid: 87d78219-29a3-11eb-934a-02177daf6b80
  resourceVersion: "95120"
  selfLink: /apis/migration.openshift.io/v1alpha1/namespaces/openshift-migration/migmigrations/ocp-24659-mysql-mig-1605706788
  uid: 9726dd4c-29a3-11eb-934a-02177daf6b80
spec:
  migPlanRef:
    name: ocp-24659-mysql-migplan-1605706788
    namespace: openshift-migration
  stage: false
status:
  conditions:
  - category: Advisory
    durable: true
    lastTransitionTime: "2020-11-18T13:43:26Z"
    message: The migration has completed successfully.
    reason: Completed
    status: "True"
    type: Succeeded
  itinerary: Final
  observedDigest: b5aa4b8db19eb80e26b8af29c725d849332f5ff9c561c2db78c3ad60df28a89f
  phase: Completed
  startTimestamp: "2020-11-18T13:40:15Z"

Comment 2 Erik Nelson 2021-06-29 17:54:13 UTC
As of MTC 1.4.z+, we're expecting a status condition to be raised that warns in the event that pods are hung.

There are also no stage pods in DVM, and we're expecting to deprecate indirect as a legacy state transfer method.

Let's verify this as fixed for 1.6.0

Comment 3 Erik Nelson 2021-08-11 01:15:40 UTC
indirect migrations will not be deprecated for 1.6.0, let’s confirm this status condition on the eng side and hand to QE for verification.

Comment 8 Aziza Karol 2021-09-22 15:06:04 UTC
Alay,


I do not see any PR attach to this BZ.Can you share the PR which implements this fix?

Thanks,
Aziza

Comment 9 Sergio 2021-09-22 15:38:43 UTC
Hello,

We are having problems reproducing this issue. We are not able to force a failure in the stage pod so that it cannot mount the volume.

We observed this behavior intermittently in MTC 1.2.3, but we can't see this error mounting the volume now in our executions using MTC 1.6.0.

Comment 10 Sergio 2021-09-23 17:41:08 UTC
Verified using MTC 1.6.0

SOURCE CLUSTER: AWS 4.6 PROXY
TARGET CLUSTER: AWS 4.9 PROXY (CONTROLLER + UI)

openshift-migration-rhel8-operator@sha256:7963e612abfe195c9d7781b45324c3af2d3b0fdca6900bbc9603a643b9b
66cac
    - name: MIG_CONTROLLER_REPO
      value: openshift-migration-controller-rhel8@sha256
    - name: MIG_CONTROLLER_TAG
      value: 2cfae5a025cad6e0ec421958ff9bdff1bceb6bec132d1992ea2a9e342be1c04f


The stage pod, once it cannot mount the volume, remains in the namespace in ContainerCreating status.


Moved to VERIFIED.

Comment 12 errata-xmlrpc 2021-09-29 14:34:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Migration Toolkit for Containers (MTC) 1.6.0 security & bugfix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3694

Comment 13 Red Hat Bugzilla 2023-09-15 00:51:29 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.