Bug 1908721

Summary: directvolumemigration-rsync-transfer pods completed correctly but DVM status threw out error against pods
Product: Migration Toolkit for Containers Reporter: whu
Component: GeneralAssignee: John Matthews <jmatthew>
Status: CLOSED NOTABUG QA Contact: Xin jiang <xjiang>
Severity: unspecified Docs Contact: Avital Pinnick <apinnick>
Priority: unspecified    
Version: 1.4.0CC: ernelson, sregidor, whu, xjiang
Target Milestone: ---   
Target Release: 1.3.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-12-18 14:06:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
the log-reader dump from souce and target cluster during the stage running none

Description whu 2020-12-17 12:32:43 UTC
Created attachment 1739975 [details]
the log-reader dump from souce and target  cluster during the stage running

Description of problem:
When run DVM stage migration for a nginx application, the stage process is complete with warning message "DirectImageMigration (dvm): openshift-migration/34688070-404f-11eb-b9a9-d3558b12b711-hb7r2 failed". Check the DVM status, it showed directvolumemigration-rsync-transfer pods failed. but check these pods, they are completed without any error.


*Version-Release number of selected components (if applicable):*
MTC 1.4.0
source cluster : AWS OCP 4.4
target cluster : AWS OCP 4.7

How reproducible:
Always

Steps to Reproduce:
1. deploy a nginx with 2 pvs application in source cluster

$ oc create namespace ocp-00000
$ oc process -p NAMESPACE= ocp-00000 -f nginx_with_pv_defaultsc_template.yml | oc create -f -
$ oc get pvc -n ocp-00000
NAME         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
nginx-html   Bound    pvc-21bfa21a-d89e-4387-bf53-c72bdf79aa76   1Gi        RWO            gp2            149m
nginx-logs   Bound    pvc-dcb0476b-036f-43d0-b05c-ee6d4058e5c9   1Gi        RWO            gp2            149m


2. create migration plan for ocp-00000
Just follow common migplan creation process, when enter into "Migration option" phase, checked "Use direct PV migration for filesystem copies" option.

3. trigger stage migration

Actual results:
Stage migration was complete with warning message "Staging succeeded e with warnings,DirectImageMigration (dvm): openshift-migration/34688070-404f-11eb-b9a9-d3558b12b711-hb7r2 failed. See in dvm status.Errors"

The DVM status shows 
status:
 .......
  errors:
  - One or more pods are in error state
  failedPods:
  - name: directvolumemigration-rsync-transfer-nginx-html
    namespace: ocp-00000
  - name: directvolumemigration-rsync-transfer-nginx-logs
    namespace: ocp-00000

But the pods directvolumemigration-rsync-transfer-nginx-html and directvolumemigration-rsync-transfer-nginx-logs completed without any error.

Expected results:
If the related pods completed without any error. the stage migration should finished without any error warning message. 

Additional info:

In target cluster

$ oc get migmigration 34688070-404f-11eb-b9a9-d3558b12b711 -o yaml
apiVersion: migration.openshift.io/v1alpha1
kind: MigMigration
.......
spec:
  migPlanRef:
    name: hwh2
    namespace: openshift-migration
  stage: true
status:
  conditions:
  - category: Warn
    durable: true
    lastTransitionTime: "2020-12-17T10:05:25Z"
    message: 'DirectImageMigration (dvm): openshift-migration/34688070-404f-11eb-b9a9-d3558b12b711-hb7r2 failed. See in dvm status.Errors'
    status: "True"
    type: DirectVolumeMigrationFailed
  - category: Advisory
    durable: true
    lastTransitionTime: "2020-12-17T10:05:27Z"
    message: The migration has completed with warnings, please look at `Warn` conditions.
    reason: Completed
    status: "True"
    type: SucceededWithWarnings
  itinerary: Stage
  observedDigest: 3d65757a3919d0ae0518294eb16229c6f2d8782fa8c412589e1bda7da134b9af
  phase: Completed
  pipeline:
  - completed: "2020-12-17T10:04:38Z"
    message: Completed
    name: Prepare
    started: "2020-12-17T10:04:20Z"
  - completed: "2020-12-17T10:05:19Z"
    message: Completed
    name: StageBackup
    started: "2020-12-17T10:04:38Z"
  - message: Skipped
    name: StageRestore
    skipped: true
  - message: Skipped
    name: DirectImage
    skipped: true
  - completed: "2020-12-17T10:05:25Z"
    message: Completed
    name: DirectVolume
    progress:
    - 'Rsync Client Pod ocp-00000/directvolumemigration-rsync-transfer-nginx-html: Failed'
    - 'Rsync Client Pod ocp-00000/directvolumemigration-rsync-transfer-nginx-logs: Failed'
    started: "2020-12-17T10:05:19Z"
  - completed: "2020-12-17T10:05:27Z"
    message: Completed
    name: Cleanup
    started: "2020-12-17T10:05:25Z"
  startTimestamp: "2020-12-17T10:04:20Z"


$ oc get dvm  34688070-404f-11eb-b9a9-d3558b12b711-hb7r2  -o yaml
apiVersion: migration.openshift.io/v1alpha1
kind: DirectVolumeMigration
......
spec:
  createDestinationNamespaces: true
  destMigClusterRef:
    name: host
    namespace: openshift-migration
  persistentVolumeClaims:
  - name: nginx-html
    namespace: ocp-00000
    targetAccessModes:
    - ReadWriteOnce
    targetStorageClass: gp2
  - name: nginx-logs
    namespace: ocp-00000
    targetAccessModes:
    - ReadWriteOnce
    targetStorageClass: gp2
  srcMigClusterRef:
    name: source-cluster
    namespace: openshift-migration
status:
  conditions:
  - category: Advisory
    durable: true
    lastTransitionTime: "2020-12-17T10:05:24Z"
    message: 'The migration has failed.  See: Errors.'
    reason: WaitForRsyncClientPodsCompleted
    status: "True"
    type: Failed
  errors:
  - One or more pods are in error state
  failedPods:
  - name: directvolumemigration-rsync-transfer-nginx-html
    namespace: ocp-00000
  - name: directvolumemigration-rsync-transfer-nginx-logs
    namespace: ocp-00000
  itinerary: VolumeMigration
  observedDigest: 28bf54b9b6e3c72a6e029b99f89d52e56036f66628d36f15656737f46f9597e7
  phase: Completed
  phaseDescription: Complete
  startTimestamp: "2020-12-17T10:04:39Z"


In source cluster

$ oc get pod -n ocp-00000
NAME                                              READY   STATUS      RESTARTS   AGE
directvolumemigration-rsync-transfer-nginx-html   0/1     Completed   0          65m
directvolumemigration-rsync-transfer-nginx-logs   0/1     Completed   0          65m
directvolumemigration-stunnel-transfer            1/1     Running     0          65m
nginx-deployment-6578f8dd49-bqmnx                 1/1     Running     0          72m


$ oc logs directvolumemigration-rsync-transfer-nginx-html -n ocp-00000
2020/12/17 10:05:27 [1] building file list
sending incremental file list
2020/12/17 10:05:27 [1] .d..tp.g... ./
             21 100%    0.00kB/s    0:00:00 (xfr#1, to-chk=1/3)2020/12/17 10:05:27 [1] <f+++++++++ index.html
2020/12/17 10:05:27 [1] .d..tp.g... lost+found/
             21 100%    0.00kB/s    0:00:00 (xfr#1, to-chk=0/3)

2020/12/17 10:05:27 [1] Number of files: 3 (reg: 1, dir: 2)
Number of files: 3 (reg: 1, dir: 2)
2020/12/17 10:05:27 [1] Number of created files: 1 (reg: 1)
Number of created files: 1 (reg: 1)
2020/12/17 10:05:27 [1] Number of deleted files: 0
Number of deleted files: 0
2020/12/17 10:05:27 [1] Number of regular files transferred: 1
Number of regular files transferred: 1
2020/12/17 10:05:27 [1] Total file size: 21 bytes
Total file size: 21 bytes
2020/12/17 10:05:27 [1] Total transferred file size: 21 bytes
Total transferred file size: 21 bytes
2020/12/17 10:05:27 [1] Literal data: 21 bytes
Literal data: 21 bytes
2020/12/17 10:05:27 [1] Matched data: 0 bytes
Matched data: 0 bytes
2020/12/17 10:05:27 [1] File list size: 0
File list size: 0
2020/12/17 10:05:27 [1] File list generation time: 0.001 seconds
File list generation time: 0.001 seconds
2020/12/17 10:05:27 [1] File list transfer time: 0.000 seconds
File list transfer time: 0.000 seconds
2020/12/17 10:05:27 [1] Total bytes sent: 195
Total bytes sent: 195
2020/12/17 10:05:27 [1] Total bytes received: 46
Total bytes received: 46
2020/12/17 10:05:27 [1] sent 195 bytes  received 46 bytes  482.00 bytes/sec
sent 195 bytes  received 46 bytes  482.00 bytes/sec
2020/12/17 10:05:27 [1] total size is 21  speedup is 0.09
total size is 21  speedup is 0.09

$ oc get pod directvolumemigration-rsync-transfer-nginx-html -n ocp-00000 -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.v1.cni.cncf.io/networks-status: ""
    openshift.io/scc: privileged
  creationTimestamp: "2020-12-17T10:05:23Z"
  labels:
    app: directvolumemigration-rsync-transfer
    directvolumemigration: rsync-client
  name: directvolumemigration-rsync-transfer-nginx-html
  namespace: ocp-00000
  resourceVersion: "172587"
  selfLink: /api/v1/namespaces/ocp-00000/pods/directvolumemigration-rsync-transfer-nginx-html
  uid: 4796e8aa-87c6-4824-808e-e4eaa674733c
spec:
  containers:
  - command:
    - rsync
    - --archive
    - --hard-links
    - --human-readable
    - --partial
    - --delete
    - --port
    - "2222"
    - --log-file
    - /dev/stdout
    - --info=COPY2,DEL2,REMOVE2,SKIP2,FLIST2,PROGRESS2,STATS2
    - /mnt/ocp-00000/nginx-html/
    - rsync://root.54.74/nginx-html
    env:
    - name: RSYNC_PASSWORD
      value: hxyNIq
    image: quay.io/konveyor/rsync-transfer:latest
    imagePullPolicy: Always
    name: rsync-client
    ports:
    - containerPort: 22
      name: rsync-client
      protocol: TCP
    resources: {}
    securityContext:
      privileged: true
      readOnlyRootFilesystem: true
      runAsUser: 0
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: FallbackToLogsOnError
    volumeMounts:
    - mountPath: /mnt/ocp-00000/nginx-html
      name: nginx-html
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-ld5c7
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  imagePullSecrets:
  - name: default-dockercfg-qlczm
  nodeName: ip-10-0-144-174.us-east-2.compute.internal
  priority: 0
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: nginx-html
    persistentVolumeClaim:
      claimName: nginx-html
  - name: default-token-ld5c7
    secret:
      defaultMode: 420
      secretName: default-token-ld5c7
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-12-17T10:05:23Z"
    reason: PodCompleted
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2020-12-17T10:05:23Z"
    reason: PodCompleted
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2020-12-17T10:05:23Z"
    reason: PodCompleted
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2020-12-17T10:05:23Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: cri-o://4e388df2a621e67b5010464c1427db17c28bf8d7d0fc6e6f830d21b0c8ad1afc
    image: quay.io/konveyor/rsync-transfer:latest
    imageID: quay.io/konveyor/rsync-transfer@sha256:c38f5aad71e9ea45ed775727114f6b0931c0a2ee0acd3bbbd6792f0312bcaa69
    lastState: {}
    name: rsync-client
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: cri-o://4e388df2a621e67b5010464c1427db17c28bf8d7d0fc6e6f830d21b0c8ad1afc
        exitCode: 0
        finishedAt: "2020-12-17T10:05:27Z"
        reason: Completed
        startedAt: "2020-12-17T10:05:27Z"
  hostIP: 10.0.144.174
  phase: Succeeded
  podIP: 10.129.2.34
  podIPs:
  - ip: 10.129.2.34
  qosClass: BestEffort
  startTime: "2020-12-17T10:05:23Z"

In target cluster, after stage migration 
$ oc get pvc -n ocp-00000
NAME         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
nginx-html   Bound    pvc-585bee25-6206-4e7c-b775-29df7e8522a2   1Gi        RWO            gp2            62m
nginx-logs   Bound    pvc-060a3710-5548-423b-b659-60e850e9ec9d   1Gi        RWO            gp2            62m

$ oc get all -n ocp-00000
NAME                                       READY   STATUS    RESTARTS   AGE
pod/directvolumemigration-rsync-transfer   2/2     Running   0          68m

NAME                                               TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/directvolumemigration-rsync-transfer-svc   ClusterIP   172.30.242.227   <none>        2222/TCP   68m

NAME                                                                  HOST/PORT                                                                                             PATH   SERVICES                                   PORT   TERMINATION   WILDCARD
route.route.openshift.io/directvolumemigration-rsync-transfer-route   directvolumemigration-rsync-transfer-route-ocp-00000.apps.cam-tgt-12870.qe.devcluster.openshift.com          directvolumemigration-rsync-transfer-svc   2222   passthrough   None

Comment 1 whu 2020-12-18 14:06:33 UTC
In the latest MTC 1.4.0 build, this bug has gone. so close this bug.

image: registry.redhat.io/rhmtc/openshift-migration-rhel7-operator@sha256:236d5f964061938f4e845e6d9e62926da68435c0f0304329a0473e060cdb53f1