Description of problem: With the currrent 3 second limit the registry pod readiness and liveness probes timeout during migration. Event: Readiness probe failed: Get http://10.128.6.214:5000/v2/_catalog?n=5: net/http: request canceled (Client.Timeout exceeded while awaiting headers) Please allow configurable timeout for readiness and liveness probe Version-Release number of selected component (if applicable): v1.4.1 How reproducible: Always Steps to Reproduce: 1. Create migration plan 2. Launch staging 3. When staging gets to image migration phase the registry pods in source and dest cluster start crashlooping Actual results: Migration fails Expected results: Migration succeeds Additional info:
We have now introduced the timeout configuration values in MigCluster configmap on each cluster. Please find the documentation on configuring timeout here in the upstream doc: https://github.com/konveyor/mig-operator/blob/master/docs/usage/MigClusterConfiguration.md
Verified using MTC 1.5.0 SOURCE CLUSTER: AWS OCP 3.11 (CONTROLLER + UI) TARGET CLUSETR: AWS OCP 4.7 Operator: registry.redhat.io/rhmtc/openshift-migration-rhel7-operator@sha256:c0375fa6ecff4d50c181fc3f31d66b6c13023fecb8bcef6899197ccd96c50a30 - name: MIG_CONTROLLER_REPO value: openshift-migration-controller-rhel8@sha256 - name: MIG_CONTROLLER_TAG value: 83f26020b731f78dc9e817186d3247ab46d7daedec62c808be3259ed571656aa - name: MIG_UI_REPO value: openshift-migration-ui-rhel8@sha256 - name: MIG_UI_TAG value: 4e177e58e311ff2d9c37935308591df5680838255e35b138a696b065c03044f8 - name: VELERO_REPO value: openshift-migration-velero-rhel8@sha256 - name: VELERO_TAG value: e776a798ce8c1b1e6fcc10edaded1e70514a5c6cc2c177dead2d82ef562becde With this configuration, we get the following values for readiness and liveness migration_registry_liveness_timeout: 100 migration_registry_readiness_timeout: 200 livenessProbe: failureThreshold: 3 httpGet: path: /v2/_catalog?n=5 port: 5000 scheme: HTTP initialDelaySeconds: 15 periodSeconds: 5 successThreshold: 1 timeoutSeconds: 100 readinessProbe: failureThreshold: 3 httpGet: path: /v2/_catalog?n=5 port: 5000 scheme: HTTP initialDelaySeconds: 15 periodSeconds: 5 successThreshold: 1 timeoutSeconds: 200 Every OCP cluster in MTC is using the values configured in its MigrationController resource. So I can configure different values in source and destination clusters. We move the status to VERIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Migration Toolkit for Containers (MTC) image release advisory 1.5.0), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2021:2929