Bug 2005246 - Rsync pods are failing in parallel migrations
Summary: Rsync pods are failing in parallel migrations
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Migration Toolkit for Containers
Classification: Red Hat
Component: General
Version: 1.6.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 1.7.0
Assignee: Pranav Gaikwad
QA Contact: Xin jiang
Avital Pinnick
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-17 08:29 UTC by Sergio
Modified: 2022-03-24 06:32 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-24 06:32:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github konveyor mig-controller pull 1254 0 None open Bug 2005246: Separate rsync password secrets per plan 2022-01-18 15:32:16 UTC
Red Hat Product Errata RHBA-2022:1043 0 None None None 2022-03-24 06:32:38 UTC

Description Sergio 2021-09-17 08:29:52 UTC
Description of problem:
When we execute several direct migrations in parallel, some rsync pods are reporting this error:

$ oc logs rsync-vc492 -c rsync
2021/09/16 14:09:18 [26] @ERROR: auth failed on module 81c3b080dad537de7e10e0987a4bf52e
2021/09/16 14:09:18 [26] rsync error: error starting client-server protocol (code 5) at main.c(1661) [sender=3.1.3]
@ERROR: auth failed on module 81c3b080dad537de7e10e0987a4bf52e
rsync error: error starting client-server protocol (code 5) at main.c(1661) [sender=3.1.3]


Version-Release number of selected component (if applicable):
SOURCE CLUSTER: AWS OCP 3.11 (MTC 1.5.1)
TARGET CLSUTER: AWS OCP 4.9 (MTC 1.6.0) (CONTROLLER + UI)
REPLICATION REPOSITORY: AWS S3

How reproducible:
Intermittent

Steps to Reproduce:
1. Execute several direct migrations in parallel
2.
3.

Actual results:
Some rsync pods are failing. In the rsync pod in source cluster we can see this error in the logs:

$ oc logs rsync-vc492 -c rsync
2021/09/16 14:09:18 [26] @ERROR: auth failed on module 81c3b080dad537de7e10e0987a4bf52e
2021/09/16 14:09:18 [26] rsync error: error starting client-server protocol (code 5) at main.c(1661) [sender=3.1.3]
@ERROR: auth failed on module 81c3b080dad537de7e10e0987a4bf52e
rsync error: error starting client-server protocol (code 5) at main.c(1661) [sender=3.1.3]


In the source cluster’s rsync pod yaml information we can see this:    xinjiang: which pod? 
    env:
    - name: RSYNC_PASSWORD
    image: registry.redhat.io/rhmtc/openshift-migration-rsync-transfer-rhel8:v1.5.1-3
    imagePullPolicy: IfNotPresent

It seems that the RSYNC_PASSWORD value is missing

Expected results:
Rsync pods should not fail and the migrations should be executed successfully.

Additional info:

This is the full rsync pod yaml info:

$ oc get pods -o yaml rsync-vc492
apiVersion: v1
kind: Pod
metadata:
  annotations:
    openshift.io/scc: rsync-anyuid
  creationTimestamp: 2021-09-16T14:09:15Z
  generateName: rsync-
  labels:
    app: directvolumemigration-rsync-transfer
    app.kubernetes.io/part-of: openshift-migration
    directvolumemigration: ca619596-b358-4f3f-ba44-645b6958d501
    migration.openshift.io/created-for-pvc: 81c3b080dad537de7e10e0987a4bf52e
    migration.openshift.io/dvmp-done: "True"
    migration.openshift.io/migrated-by-migplan: 8686365f-71b3-46b0-9988-6fafdb5ff3f5
    migration.openshift.io/rsync-attempt: "20"
    owner: directvolumemigration
  name: rsync-vc492
  namespace: ocp-24659-mysql
  resourceVersion: "53860"
  selfLink: /api/v1/namespaces/ocp-24659-mysql/pods/rsync-vc492
  uid: ac97e99f-16f7-11ec-8bf3-0e062dacb45f
spec:
  containers:
  - command:
    - /bin/bash
    - -c
    - trap "touch /usr/share/rsync/rsync-client-container-done" EXIT SIGINT SIGTERM;
      timeout=120; SECONDS=0; while [ $SECONDS -lt $timeout ]; do nc -z localhost
      6443; rc=$?; if [ $rc -eq 0 ]; then /usr/bin/rsync --recursive --links --perms
      --devices --specials --times --owner --group --hard-links --delete --partial
      --human-readable --log-file=/dev/stdout --info=COPY2,DEL2,REMOVE2,SKIP2,FLIST2,PROGRESS2,STATS2
      /mnt/ocp-24659-mysql/81c3b080dad537de7e10e0987a4bf52e/ rsync://root@localhost/81c3b080dad537de7e10e0987a4bf52e
      --port 6443; rc=$?; break; fi; done; exit $rc;
    env:
    - name: RSYNC_PASSWORD
    image: registry.redhat.io/rhmtc/openshift-migration-rsync-transfer-rhel8:v1.5.1-3
    imagePullPolicy: IfNotPresent
    name: rsync
    resources:
      limits:
        cpu: "1"
        memory: 1Gi
      requests:
        cpu: 100m
        memory: 1Gi
    securityContext:
      capabilities:
        drop:
        - MKNOD
        - SETPCAP
      privileged: false
      readOnlyRootFilesystem: true
      runAsUser: 0
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /mnt/ocp-24659-mysql/81c3b080dad537de7e10e0987a4bf52e
      name: mnt
    - mountPath: /usr/share/rsync
      name: rsync-communication
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-n7qg9
      readOnly: true
  - command:
    - /bin/bash
    - -c
    - |-
      /bin/stunnel /etc/stunnel/stunnel.conf
      while true
      do test -f /usr/share/rsync/rsync-client-container-done
      if [ $? -eq 0 ]
      then
      break
      fi
      done
      exit 0
    image: registry.redhat.io/rhmtc/openshift-migration-rsync-transfer-rhel8:v1.5.1-3
    imagePullPolicy: IfNotPresent
    name: stunnel
    ports:
    - containerPort: 6443
      name: stunnel
      protocol: TCP
    resources:
      limits:
        cpu: "1"
        memory: 1Gi
      requests:
        cpu: 100m
        memory: 1Gi
    securityContext:
      capabilities:
        drop:
        - MKNOD
        - SETPCAP
      privileged: false
      readOnlyRootFilesystem: true
      runAsUser: 0
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /etc/stunnel/stunnel.conf
      name: crane2-stunnel-client-config
      subPath: stunnel.conf
    - mountPath: /etc/stunnel/certs
      name: crane2-stunnel-client-secret
    - mountPath: /usr/share/rsync
      name: rsync-communication
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-n7qg9
      readOnly: true
  dnsPolicy: ClusterFirst
  imagePullSecrets:
  - name: default-dockercfg-9hhr4
  nodeName: ip-172-18-6-255.ec2.internal
  nodeSelector:
    node-role.kubernetes.io/compute: "true"
  priority: 0
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext:
    seLinuxOptions:
      level: s0:c26,c10
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  volumes:
  - name: mnt
    persistentVolumeClaim:
      claimName: mysql
  - emptyDir: {}
    name: rsync-communication
  - configMap:
      defaultMode: 420
      name: crane2-stunnel-client-config
    name: crane2-stunnel-client-config
  - name: crane2-stunnel-client-secret
    secret:
      defaultMode: 420
      items:
      - key: tls.crt
        path: tls.crt
      - key: tls.key
        path: tls.key
      secretName: crane2-stunnel-client-secret
  - name: default-token-n7qg9
    secret:
      defaultMode: 420
      secretName: default-token-n7qg9
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: 2021-09-16T14:09:15Z
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: 2021-09-16T14:09:15Z
    message: 'containers with unready status: [rsync stunnel]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: null
    message: 'containers with unready status: [rsync stunnel]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: 2021-09-16T14:09:15Z
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://517d80bf430ff242ea4217ba08a4d760f90a001a0b5fd14bc7f189b203efad44
    image: registry.redhat.io/rhmtc/openshift-migration-rsync-transfer-rhel8:v1.5.1-3
    imageID: docker-pullable://registry.redhat.io/rhmtc/openshift-migration-rsync-transfer-rhel8@sha256:d08650fb7ee7ce1b48e44515d794285dd5f9b9effec984aa034e329845bbe802
    lastState: {}
    name: rsync
    ready: false
    restartCount: 0
    state:
      terminated:
        containerID: docker://517d80bf430ff242ea4217ba08a4d760f90a001a0b5fd14bc7f189b203efad44
        exitCode: 5
        finishedAt: 2021-09-16T14:09:18Z
        reason: Error
        startedAt: 2021-09-16T14:09:18Z
  - containerID: docker://807c0e1f4449af089efc7dc6d7ad38d4474ff98b4220612659595cc6b2e3614c
    image: registry.redhat.io/rhmtc/openshift-migration-rsync-transfer-rhel8:v1.5.1-3
    imageID: docker-pullable://registry.redhat.io/rhmtc/openshift-migration-rsync-transfer-rhel8@sha256:d08650fb7ee7ce1b48e44515d794285dd5f9b9effec984aa034e329845bbe802
    lastState: {}
    name: stunnel
    ready: false
    restartCount: 0
    state:
      terminated:
        containerID: docker://807c0e1f4449af089efc7dc6d7ad38d4474ff98b4220612659595cc6b2e3614c
        exitCode: 0
        finishedAt: 2021-09-16T14:09:18Z
        reason: Completed
        startedAt: 2021-09-16T14:09:18Z
  hostIP: 172.18.6.255
  phase: Failed
  podIP: 10.130.2.95
  qosClass: Burstable
  startTime: 2021-09-16T14:09:15Z

Comment 7 errata-xmlrpc 2022-03-24 06:32:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Migration Toolkit for Containers (MTC) 1.7.0 release advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:1043


Note You need to log in before you can comment on or make changes to this bug.