Description of problem: When create migplan for an application with PV in AWS OCP3.9 cluster, the MigAnalytic fails to get the source cluster resources, all reported as 0 sometimes. The probability of occurrence is very high. Version-Release number of selected component (if applicable): MTC 1.5.0 image: quay-enterprise-quay-enterprise.apps.cam-tgt-21420.qe.devcluster.openshift.com/admin/openshift-migration-rhel7-operator:v1.5.0-23 Source cluster : AWS OCP 3.9 (controller) Target cluster: AWS OCP 4.8 How reproducible: 1. Prepare an nginx application in 3.9 source cluster $ ansible-playbook deploy-app.yml -e use_role=ocp-nginxpv -e namespace=ocp-24706-basicvolmig 2. Create indirect migration plan against nginx 3. Check analytics message in migration plan Actual results: The migplan will be ready status, but there is warning message “ Failed gathering extended PV usage information for PVs [nginx-logs nginx-html] in migplan. The MigAnalytic reported as 0 Expected results: The migplan will be ready status without warning and error message. The MigAnalytic reported as 0 Additional info: $ oc get pvc -n ocp-24706-basicvolmig NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE nginx-html Bound pvc-602321ba-e51c-11eb-b736-0e2f00abf38f 1Gi RWO gp2 1h nginx-logs Bound pvc-601ee567-e51c-11eb-b736-0e2f00abf38f 1Gi RWO gp2 1h $ oc get migplan ocp-24706-basicvolmig-migplan-1626319591 -o yaml apiVersion: migration.openshift.io/v1alpha1 kind: MigPlan metadata: ……... name: ocp-24706-basicvolmig-migplan-1626319591 namespace: openshift-migration spec: destMigClusterRef: name: target-cluster namespace: openshift-migration indirectImageMigration: true indirectVolumeMigration: true migStorageRef: name: automatic namespace: openshift-migration namespaces: - ocp-24706-basicvolmig persistentVolumes: - capacity: 1Gi name: pvc-601ee567-e51c-11eb-b736-0e2f00abf38f proposedCapacity: "0" pvc: accessModes: - ReadWriteOnce hasReference: true name: nginx-logs namespace: ocp-24706-basicvolmig selection: action: copy copyMethod: filesystem storageClass: gp2 storageClass: gp2 supported: actions: - copy - move copyMethods: - filesystem - snapshot - capacity: 1Gi name: pvc-602321ba-e51c-11eb-b736-0e2f00abf38f proposedCapacity: "0" pvc: accessModes: - ReadWriteOnce hasReference: true name: nginx-html namespace: ocp-24706-basicvolmig selection: action: copy copyMethod: filesystem storageClass: gp2 storageClass: gp2 supported: actions: - copy - move copyMethods: - filesystem - snapshot srcMigClusterRef: name: host namespace: openshift-migration status: conditions: - category: Required lastTransitionTime: 2021-07-15T03:26:36Z message: The `persistentVolumes` list has been updated with discovered PVs. reason: Done status: "True" type: PvsDiscovered - category: Required lastTransitionTime: 2021-07-15T03:26:36Z message: The storage resources have been created. reason: Done status: "True" type: StorageEnsured - category: Warn lastTransitionTime: 2021-07-15T04:11:44Z message: Failed gathering extended PV usage information for PVs [nginx-logs nginx-html], please see MigAnalytic openshift-migration/ocp-24706-basicvolmig-migplan-1626319591-szwd6 for details reason: FailedRunningDf status: "True" type: ExtendedPVAnalysisFailed - category: Required lastTransitionTime: 2021-07-15T03:26:36Z message: The migration plan is ready. status: "True" type: Ready destStorageClasses: - accessModes: - ReadWriteOnce default: true name: gp2 provisioner: kubernetes.io/aws-ebs - accessModes: - ReadWriteOnce name: gp2-csi provisioner: ebs.csi.aws.com excludedResources: ……… $ oc get miganalytic ocp-24706-basicvolmig-migplan-1626319591-szwd6 -n openshift-migration -o yaml apiVersion: migration.openshift.io/v1alpha1 kind: MigAnalytic metadata: ……... name: ocp-24706-basicvolmig-migplan-1626319591-szwd6 namespace: openshift-migration spec: analyzeExtendedPVCapacity: true analyzeImageCount: false analyzeK8SResources: false analyzePVCapacity: false migPlanRef: name: ocp-24706-basicvolmig-migplan-1626319591 namespace: openshift-migration status: analytics: excludedk8sResourceTotal: 0 imageCount: 0 imageSizeTotal: "0" incompatiblek8sResourceTotal: 0 k8sResourceTotal: 0 namespaces: - excludedK8SResourceTotal: 0 imageCount: 0 imageSizeTotal: "0" incompatibleK8SResourceTotal: 0 k8sResourceTotal: 0 namespace: ocp-24706-basicvolmig persistentVolumes: - actualCapacity: "0" comment: No change in PV capacity is needed. name: nginx-logs proposedCapacity: "0" requestedCapacity: 1Gi - actualCapacity: "0" comment: No change in PV capacity is needed. name: nginx-html proposedCapacity: "0" requestedCapacity: 1Gi pvCapacity: "0" pvCount: 0 percentComplete: 100 plan: ocp-24706-basicvolmig-migplan-1626319591 pvCapacity: "0" pvCount: 0 conditions: - category: Warn lastTransitionTime: 2021-07-15T03:26:34Z message: Failed gathering extended PV usage information for PVs [nginx-logs nginx-html] reason: FailedRunningDf status: "True" type: ExtendedPVAnalysisFailed - category: Required lastTransitionTime: 2021-07-15T03:26:34Z message: The analytic is ready. status: "True" type: Ready observedGeneration: 1
one similar bug : https://bugzilla.redhat.com/show_bug.cgi?id=1918504
Copying my notes from the 1.4.6 bug: https://bugzilla.redhat.com/show_bug.cgi?id=1982729#c2 This isn't as much a problem with Analytics as it is the pv resize feature which attempts to use the restic daemonset to determine the actual disk usage of the volumes. IIUC the failures are limited to using analytics for pv resize when migrating from older OCP releases (3.7, 3.9) and the pod comes into existence after the restic daemonset was started. restic uses a hostPath mount to peer into the volume and bind remount does not exist on these versions so if the application comes up after the daemonset it is oblivious to it. Possible solutions might include restarting the daeonset before running the analytic (I think this would be costly performance wise on large clusters) or creating a pod on the node to run the size check instead of using the restic daemonset so it always exists after the application.
Additional information After clicking "refresh` button, the analytic were updated and got the correct value. But the warning regarding the pv resize is still in migplan.
I captured a note describing this limitation in our upstream docs here: https://github.com/konveyor/mig-operator/pull/716
This is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1982729 and we do not intend to fix the underlying problem in MTC as it only affects certain cases on 3.9 clusters and there is a workaround available to unblock the users. The documentation around this is already merged. As a result, I am closing this issue. *** This bug has been marked as a duplicate of bug 1982729 ***