Bug 2002420 - "Stage" pod not created for completed application pod, causing the "mig-controller" to stall
Summary: "Stage" pod not created for completed application pod, causing the "mig-contr...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Migration Toolkit for Containers
Classification: Red Hat
Component: Controller
Version: 1.6.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 1.6.0
Assignee: Jaydip Gabani
QA Contact: Xin jiang
Avital Pinnick
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-08 18:38 UTC by Derek Whatley
Modified: 2021-09-29 14:36 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-09-29 14:36:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Logs showing the controller "creating" the stage pod and then waiting (12.61 KB, text/plain)
2021-09-08 18:38 UTC, Derek Whatley
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github konveyor mig-controller pull 1198 0 None None None 2021-09-08 20:35:12 UTC
Red Hat Product Errata RHSA-2021:3694 0 None None None 2021-09-29 14:36:26 UTC

Description Derek Whatley 2021-09-08 18:38:37 UTC
Created attachment 1821605 [details]
Logs showing the controller "creating" the stage pod and then waiting

Description of problem:
When I create the test app mentioned in the reproducer steps below, the result is:
1. Logs indicate stage pods are getting launched for completed app 'validator' app Pod
2. stage pod is not actually launched
3. mig-controller stalls waiting for stage pod to enter running state, which will never happen.

I've been able to trace this issue to having been introduced in PR (doesn't happen pre this commit, happens after) https://github.com/konveyor/mig-controller/pull/1164, but I'm not sure of the complete intent and reasoning behind the changes made there.

Version-Release number of selected component (if applicable):
MTC 1.6.0. 
Clusters: OCP 4.8 AWS (control) / OCP 3.11 AWS (remote)

How reproducible:
Always

Steps to Reproduce:
1.Create a namespace and a quota

$ oc new-project ocp-31309-quotanoattach

Create this quota

$ cat <<EOF | oc create -f -
apiVersion: v1
kind: ResourceQuota
metadata:
  name: object-quota
  namespace: ocp-31309-quotanoattach
spec:
  hard:
    persistentvolumeclaims: "2"
    services.loadbalancers: "0"
    services.nodeports: "0"
    pods: "1"
    replicationcontrollers: "1"
    secrets: "6"
    configmaps: "4"
    services: "10"
    limits.cpu: "20"
    limits.memory: 20Gi
    requests.cpu: "10"
    requests.memory: 10Gi
EOF

2. Create a PVC

$ cat <<EOF | oc create -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: quoatdev-test
  namespace: ocp-31309-quotanoattach
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Mi
EOF


3. Provision the PVC

$ cat <<EOF | oc create -f -
apiVersion: v1
kind: Pod
metadata:
  name: provisioner-pod
  namespace: ocp-31309-quotanoattach
  labels:
    app: provision
spec:
  restartPolicy: OnFailure
  containers:
  - name: provisioner
    resources:
      limits:
        cpu: "0.01"
        memory: 128Mi
    image: alpine
    command: [ "/bin/sh", "-c", "--" ]
    args: [ "echo 'data inserted' > /data/vol/data.txt ; dd if=/dev/urandom of=/data/vol/binary.rnd  bs=1000000  count=1" ]
    volumeMounts:
    - name: testvolume
      mountPath: /data/vol
  volumes:
  - name: testvolume
    persistentVolumeClaim:
      claimName: quoatdev-test
EOF


4. Remove the provisioner pod once it's completed

$ oc delete pod provisioner-pod -n ocp-31309-quotanoattach

5. Create a validation pod job

$ cat <<EOF | oc create -f -
apiVersion: batch/v1
kind: Job
metadata:
  name: validator-job
  namespace: ocp-31309-quotanoattach
  labels:
    app: validation
spec:
  template:
    spec:
      restartPolicy: OnFailure
      containers:
      - name: validator
        image: alpine
        resources:
          limits:
            cpu: "0.01"
            memory: 128Mi
        command: [ "/bin/sh", "-c", "--" ]
        args:
          - set -e;
            echo 'Validating';
            cd /data/vol;
            ls data.txt;
            ls binary.rnd;
            export CONTENT=\$(cat data.txt);
            [[ "\$CONTENT" == 'data inserted' ]] ||  { echo 'Wrong data content' && exit 1; } ;
            export SIZE=\$( wc -c binary.rnd  | cut -d ' ' -f 1 );
            [[ \$SIZE  == '1000000' ]] || { echo 'Wrong binary file size' && exit 1; };
        volumeMounts:
        - name: testvolume
          mountPath: /data/vol
      volumes:
      - name: testvolume
        persistentVolumeClaim:
          claimName: quoatdev-test
  backoffLimit: 4
EOF


6. Migrate the namespace once the validator pod is completed (do not delete the validator pod)

Actual results:
Migration will get stuck waiting for stage pods to come online, but it never created the pods


Expected results:
Migration will either:
1) create stage pods and then wait for them
2) not create stage pods and not wait for them


Additional info:

Comment 2 Jaydip Gabani 2021-09-09 18:08:55 UTC
This PR cherry-pick the change in release branch - https://github.com/konveyor/mig-controller/pull/1199

Changing the status to MODIFIED

Comment 7 Xin jiang 2021-09-15 09:08:03 UTC
verified with mtc 1.6.0

registry.redhat.io/rhmtc/openshift-migration-controller-rhel8@sha256:3b5efa9c8197fe0313a2ab7eb184d135ba9749c9a4f0d15a6abb11c0d18b9194

Comment 9 errata-xmlrpc 2021-09-29 14:36:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Migration Toolkit for Containers (MTC) 1.6.0 security & bugfix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3694


Note You need to log in before you can comment on or make changes to this bug.