Bug 1916281

Summary: Different PVCs are using the same DVMP resource
Product: Migration Toolkit for Containers Reporter: Sergio <sregidor>
Component: GeneralAssignee: Alay Patel <alpatel>
Status: CLOSED ERRATA QA Contact: Xin jiang <xjiang>
Severity: high Docs Contact: Avital Pinnick <apinnick>
Priority: high    
Version: 1.4.0CC: chezhang, ernelson, rjohnson, rpattath, whu, xjiang
Target Milestone: ---   
Target Release: 1.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-11 12:55:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sergio 2021-01-14 13:20:53 UTC
Description of problem:
DVMP resource name is created using the name of the PVC. Two PVCs with the same name in two different namespaces will use the same DVMP resource and they will collide.

Version-Release number of selected component (if applicable):
MTC 1.4.0

How reproducible:
Always

Steps to Reproduce:
1. Deploy one application using a pvc in one namespace

for instance (it could be any application using a pvc):
oc process -p NAMESPACE=name1 -f  https://gitlab.cee.redhat.com/app-mig/cam-helper/raw/master/ocp-26160/nginx_with_pv_defaultsc_template.yml | oc create -f -


2. Deploy the same application used in step 1 in another namespace 

oc process -p NAMESPACE=name2 -f  https://gitlab.cee.redhat.com/app-mig/cam-helper/raw/master/ocp-26160/nginx_with_pv_defaultsc_template.yml | oc create -f -

3. Create a migration plan for the namespace created in step 1

4. Create a migration plan for the namespace created in step 2 

5. Run both migrations in parallel using DVM.

Actual results:
Every PVC will have only one DVMP resource. And it will lead to conflicts.

Some times one of the migrations will be stuck forever since it cannot find the right pods in the right namespace.


Sometimes it seems that MTC cannot find out the right secret either (maybe?), and one of the migrations will fail complaining about not being able to authenticate.


Expected results:
Regarding DVMP, we should take into account that 2 PVCs with the same name can exist in 2 different namespaces.



Additional info:

It is probably related with this BZ too https://bugzilla.redhat.com/show_bug.cgi?id=1915717


This is, for instance, the status of dvm and dvmp when we run the 2 migrations and we have problems with the authentication:

$ oc get dvm  cef14e00-565d-11eb-805d-432851d7046c-c6rrk -o yaml
apiVersion: migration.openshift.io/v1alpha1
kind: DirectVolumeMigration
metadata:
  annotations:
    openshift.io/touch: 2fd89d4d-565e-11eb-92d5-0a580a800208
  creationTimestamp: "2021-01-14T11:45:48Z"
  generateName: cef14e00-565d-11eb-805d-432851d7046c-
  generation: 20
  labels:
    app.kubernetes.io/part-of: openshift-migration
    migmigration: fa816bc7-32b4-43a4-bae2-be22a2ec7357
    migration-direct-volume: fa816bc7-32b4-43a4-bae2-be22a2ec7357
  name: cef14e00-565d-11eb-805d-432851d7046c-c6rrk
  namespace: openshift-migration
  ownerReferences:
  - apiVersion: migration.openshift.io/v1alpha1
    controller: true
    kind: MigMigration
    name: cef14e00-565d-11eb-805d-432851d7046c
    uid: fa816bc7-32b4-43a4-bae2-be22a2ec7357
  resourceVersion: "107962"
  selfLink: /apis/migration.openshift.io/v1alpha1/namespaces/openshift-migration/directvolumemigrations/cef14e00-565d-11eb-805d-432851d7046c-c6rrk
  uid: 0ac55b34-52d2-4f0a-a8e4-2ff75490c329
spec:
  createDestinationNamespaces: true
  destMigClusterRef:
    name: host
    namespace: openshift-migration
  persistentVolumeClaims:
  - name: nginx-logs
    namespace: name2
    targetAccessModes:
    - ReadWriteOnce
    targetStorageClass: gp2
  - name: nginx-html
    namespace: name2
    targetAccessModes:
    - ReadWriteOnce
    targetStorageClass: gp2
  srcMigClusterRef:
    name: source-cluster
    namespace: openshift-migration
status:
  conditions:
  - category: Advisory
    durable: true
    lastTransitionTime: "2021-01-14T11:46:49Z"
    message: 'The migration has failed.  See: Errors.'
    reason: WaitForRsyncClientPodsCompleted
    status: "True"
    type: Failed
  errors:
  - One or more pods are in error state
  failedPods:
  - name: directvolumemigration-rsync-transfer-nginx-logs
    namespace: name2
  - name: directvolumemigration-rsync-transfer-nginx-html
    namespace: name2
  itinerary: VolumeMigration
  observedDigest: fc2aa0f78223d00730b46da66e46deb3a8543990dfc0d11749067fce3cf2e3f3
  phase: Completed
  phaseDescription: Complete
  startTimestamp: "2021-01-14T11:45:48Z"
(python2_virtual_env) [fedora@preserve-appmigration-workmachine cam-e2e-qe]$ oc get dvm,dvmp -o yaml
apiVersion: v1
items:
- apiVersion: migration.openshift.io/v1alpha1
  kind: DirectVolumeMigration
  metadata:
    annotations:
      openshift.io/touch: 2fd89d4d-565e-11eb-92d5-0a580a800208
    creationTimestamp: "2021-01-14T11:45:48Z"
    generateName: cef14e00-565d-11eb-805d-432851d7046c-
    generation: 20
    labels:
      app.kubernetes.io/part-of: openshift-migration
      migmigration: fa816bc7-32b4-43a4-bae2-be22a2ec7357
      migration-direct-volume: fa816bc7-32b4-43a4-bae2-be22a2ec7357
    name: cef14e00-565d-11eb-805d-432851d7046c-c6rrk
    namespace: openshift-migration
    ownerReferences:
    - apiVersion: migration.openshift.io/v1alpha1
      controller: true
      kind: MigMigration
      name: cef14e00-565d-11eb-805d-432851d7046c
      uid: fa816bc7-32b4-43a4-bae2-be22a2ec7357
    resourceVersion: "107962"
    selfLink: /apis/migration.openshift.io/v1alpha1/namespaces/openshift-migration/directvolumemigrations/cef14e00-565d-11eb-805d-432851d7046c-c6rrk
    uid: 0ac55b34-52d2-4f0a-a8e4-2ff75490c329
  spec:
    createDestinationNamespaces: true
    destMigClusterRef:
      name: host
      namespace: openshift-migration
    persistentVolumeClaims:
    - name: nginx-logs
      namespace: name2
      targetAccessModes:
      - ReadWriteOnce
      targetStorageClass: gp2
    - name: nginx-html
      namespace: name2
      targetAccessModes:
      - ReadWriteOnce
      targetStorageClass: gp2
    srcMigClusterRef:
      name: source-cluster
      namespace: openshift-migration
  status:
    conditions:
    - category: Advisory
      durable: true
      lastTransitionTime: "2021-01-14T11:46:49Z"
      message: 'The migration has failed.  See: Errors.'
      reason: WaitForRsyncClientPodsCompleted
      status: "True"
      type: Failed
    errors:
    - One or more pods are in error state
    failedPods:
    - name: directvolumemigration-rsync-transfer-nginx-logs
      namespace: name2
    - name: directvolumemigration-rsync-transfer-nginx-html
      namespace: name2
    itinerary: VolumeMigration
    observedDigest: fc2aa0f78223d00730b46da66e46deb3a8543990dfc0d11749067fce3cf2e3f3
    phase: Completed
    phaseDescription: Complete
    startTimestamp: "2021-01-14T11:45:48Z"
- apiVersion: migration.openshift.io/v1alpha1
  kind: DirectVolumeMigration
  metadata:
    annotations:
      openshift.io/touch: 3739eba1-565e-11eb-92d5-0a580a800208
    creationTimestamp: "2021-01-14T11:46:05Z"
    generateName: d0fcfc80-565d-11eb-805d-432851d7046c-
    generation: 20
    labels:
      app.kubernetes.io/part-of: openshift-migration
      migmigration: 9bfbd9d0-e434-4603-9c9f-1fc354c41f50
      migration-direct-volume: 9bfbd9d0-e434-4603-9c9f-1fc354c41f50
    name: d0fcfc80-565d-11eb-805d-432851d7046c-5z4zh
    namespace: openshift-migration
    ownerReferences:
    - apiVersion: migration.openshift.io/v1alpha1
      controller: true
      kind: MigMigration
      name: d0fcfc80-565d-11eb-805d-432851d7046c
      uid: 9bfbd9d0-e434-4603-9c9f-1fc354c41f50
    resourceVersion: "108074"
    selfLink: /apis/migration.openshift.io/v1alpha1/namespaces/openshift-migration/directvolumemigrations/d0fcfc80-565d-11eb-805d-432851d7046c-5z4zh
    uid: 143e3efe-b2c5-44e4-897f-25785b91166b
  spec:
    createDestinationNamespaces: true
    destMigClusterRef:
      name: host
      namespace: openshift-migration
    persistentVolumeClaims:
    - name: nginx-logs
      namespace: name1
      targetAccessModes:
      - ReadWriteOnce
      targetStorageClass: gp2
    - name: nginx-html
      namespace: name1
      targetAccessModes:
      - ReadWriteOnce
      targetStorageClass: gp2
    srcMigClusterRef:
      name: source-cluster
      namespace: openshift-migration
  status:
    conditions:
    - category: Advisory
      durable: true
      lastTransitionTime: "2021-01-14T11:47:01Z"
      message: 'The migration has failed.  See: Errors.'
      reason: WaitForRsyncClientPodsCompleted
      status: "True"
      type: Failed
    errors:
    - One or more pods are in error state
    failedPods:
    - name: directvolumemigration-rsync-transfer-nginx-logs
      namespace: name1
    - name: directvolumemigration-rsync-transfer-nginx-html
      namespace: name1
    itinerary: VolumeMigration
    observedDigest: c3222bbe2028829ef2bdbdcf86b81040fac5e0f3392b1bb136296d3f5f9e08b2
    phase: Completed
    phaseDescription: Complete
    startTimestamp: "2021-01-14T11:46:07Z"
- apiVersion: migration.openshift.io/v1alpha1
  kind: DirectVolumeMigrationProgress
  metadata:
    annotations:
      openshift.io/touch: bdf1896c-565e-11eb-92d5-0a580a800208
    creationTimestamp: "2021-01-14T11:46:08Z"
    generation: 4
    name: directvolumemigration-rsync-transfer-nginx-html
    namespace: openshift-migration
    ownerReferences:
    - apiVersion: migration.openshift.io/v1alpha1
      controller: true
      kind: DirectVolumeMigration
      name: cef14e00-565d-11eb-805d-432851d7046c-c6rrk
      uid: 0ac55b34-52d2-4f0a-a8e4-2ff75490c329
    resourceVersion: "109612"
    selfLink: /apis/migration.openshift.io/v1alpha1/namespaces/openshift-migration/directvolumemigrationprogresses/directvolumemigration-rsync-transfer-nginx-html
    uid: 6b639caa-162d-48bf-9c61-9e95dfbd2ae7
  spec:
    clusterRef:
      name: source-cluster
      namespace: openshift-migration
    podRef:
      name: directvolumemigration-rsync-transfer-nginx-html
      namespace: name2
  status:
    conditions:
    - category: Required
      lastTransitionTime: "2021-01-14T11:46:44Z"
      message: The progress is available
      status: "True"
      type: Ready
    logMessage: |
      @ERROR: auth failed on module nginx-html
      rsync error: error starting client-server protocol (code 5) at main.c(1657) [sender=3.1.3]
      2021/01/14 11:46:47 [1] @ERROR: auth failed on module nginx-html
      2021/01/14 11:46:47 [1] rsync error: error starting client-server protocol (code 5) at main.c(1657) [sender=3.1.3]
    observedDigest: 18c12ae50745e070e374f7d47d9c3e0ee07978fac5eabe1af4c2bc89389e34e1
    phase: Failed
- apiVersion: migration.openshift.io/v1alpha1
  kind: DirectVolumeMigrationProgress
  metadata:
    annotations:
      openshift.io/touch: bd94c0bf-565e-11eb-92d5-0a580a800208
    creationTimestamp: "2021-01-14T11:46:08Z"
    generation: 4
    name: directvolumemigration-rsync-transfer-nginx-logs
    namespace: openshift-migration
    ownerReferences:
    - apiVersion: migration.openshift.io/v1alpha1
      controller: true
      kind: DirectVolumeMigration
      name: cef14e00-565d-11eb-805d-432851d7046c-c6rrk
      uid: 0ac55b34-52d2-4f0a-a8e4-2ff75490c329
    resourceVersion: "109609"
    selfLink: /apis/migration.openshift.io/v1alpha1/namespaces/openshift-migration/directvolumemigrationprogresses/directvolumemigration-rsync-transfer-nginx-logs
    uid: 2632cadd-fdd7-4393-9c42-0735804526bb
  spec:
    clusterRef:
      name: source-cluster
      namespace: openshift-migration
    podRef:
      name: directvolumemigration-rsync-transfer-nginx-logs
      namespace: name2
  status:
    conditions:
    - category: Required
      lastTransitionTime: "2021-01-14T11:46:44Z"
      message: The progress is available
      status: "True"
      type: Ready
    logMessage: |
      @ERROR: auth failed on module nginx-logs
      rsync error: error starting client-server protocol (code 5) at main.c(1657) [sender=3.1.3]
      2021/01/14 11:46:47 [1] @ERROR: auth failed on module nginx-logs
      2021/01/14 11:46:47 [1] rsync error: error starting client-server protocol (code 5) at main.c(1657) [sender=3.1.3]
    observedDigest: 178550a1c70033979601eed2ee6e072d99d5367bfdc94f10085260f9d5aa3dac
    phase: Failed
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Comment 1 Alay Patel 2021-01-15 16:25:42 UTC
This PR resolves the bug https://github.com/konveyor/mig-controller/pull/891

Comment 5 Xin jiang 2021-01-25 13:31:34 UTC
verified.

$  oc get dvmp
NAME                               CLUSTER          POD NAME                                          POD NAMESPACE                                         PROGRESS PERCENT   TRANSFER RATE   AGE
4bb56edb85f9307d7e102572f7447e89   source-cluster   directvolumemigration-rsync-transfer-nginx-logs   ocp-55555-nginx-defaultocp-37316-brokenregistry1111   100%                               5h29m
4fc89b2080ac0039872bf47d0b2c69ba   source-cluster   directvolumemigration-rsync-transfer-nginx-logs   name1                                                                                    34s
59edf630dbb9ea8dd1e3d9c6612ceb65   source-cluster   directvolumemigration-rsync-transfer-postgresql   ocp-24730-django                                      100%               57.89MB/s       5h57m
7e1277a4ff45c72e2c99eaeb1af5add8   source-cluster   directvolumemigration-rsync-transfer-nginx-logs   name2                                                                                    51s
86bba428e80d43c351f600841bc6e14c   source-cluster   directvolumemigration-rsync-transfer-nginx-html   ocp-55555-nginx-defaultocp-37316-brokenregistry1111   100%                               5h29m
98a66b6e8e3480dc1f965b2cb449c1ac   source-cluster   directvolumemigration-rsync-transfer-nginx-html   name2                                                                                    51s
a3c85dbd541aefd794f74110b85a1778   source-cluster   directvolumemigration-rsync-transfer-nginx-html   name1                                                                                    34s
e282bda331a55bf11d14bd8f1368a4a9   source-cluster   directvolumemigration-rsync-transfer-postgresql   ocp-24730-django                                      100%               5.30MB/s        5h47m



images:
registry.redhat.io/rhmtc/openshift-migration-controller-rhel8@sha256:cdf1bd56e353f076693cb7373c0a876be8984593d664ee0d7e1aeae7a3c54c1f
registry.redhat.io/rhmtc/openshift-migration-log-reader-rhel8@sha256:6dbd4c4aa27dcaede49f68159b9923840732d67bfb4f14e4107e8ff28f56defa
registry.redhat.io/rhmtc/openshift-migration-rhel7-operator@sha256:79f524931e7188bfbfddf1e3d23f491b627d691ef7849a42432c7aec2d5f8a54
registry.redhat.io/rhmtc/openshift-migration-ui-rhel8@sha256:8d460632dd50529aa0054b14c95e7a44dd6478ad3116ef5a27a4b904fe4360d7
registry.redhat.io/rhmtc/openshift-migration-velero-plugin-for-aws-rhel8@sha256:7c8d143d1ba9605e33e33392dece4a06607ddbdaccfeece36259b7d4fbbeff96
registry.redhat.io/rhmtc/openshift-migration-velero-plugin-for-gcp-rhel8@sha256:4ef0b71cf9d464d39086c024f26df7579279877afbab31935c4fb00ca7c883b9
registry.redhat.io/rhmtc/openshift-migration-velero-plugin-for-microsoft-azure-rhel8@sha256:bc8beadfeaac4ca72c9aa185176a097501781ad35064f1785183c34e577505f4
registry.redhat.io/rhmtc/openshift-migration-velero-rhel8@sha256:60b3aa7f53afccbbba9630904b10ba3257921769e7b142fd9ceb3df2c5016302
registry.redhat.io/rhmtc/openshift-velero-plugin-rhel8@sha256:c0f0a0698ae9b1ac8ff4279653f688f5cfbf80615efce50e9a03a194a02ede2a

Comment 7 errata-xmlrpc 2021-02-11 12:55:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Migration Toolkit for Containers (MTC) tool image release advisory 1.4.0), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5329