Bug 1982604

Summary: MigAnalytic fails to get source cluster resources, all reported as 0 in ocp 3.9 sometimes
Product: Migration Toolkit for Containers Reporter: whu
Component: GeneralAssignee: Pranav Gaikwad <pgaikwad>
Status: CLOSED DUPLICATE QA Contact: Xin jiang <xjiang>
Severity: low Docs Contact: Avital Pinnick <apinnick>
Priority: high    
Version: 1.5.0CC: ernelson, sregidor, whu, xjiang
Target Milestone: ---   
Target Release: 1.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1982729 (view as bug list) Environment:
Last Closed: 2021-08-26 18:55:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1982729    

Description whu 2021-07-15 09:12:19 UTC
Description of problem:
When create migplan for an application with PV in AWS OCP3.9 cluster, the MigAnalytic fails to get the source cluster resources, all reported as 0 sometimes.  The probability of occurrence is very high.

Version-Release number of selected component (if applicable):
MTC 1.5.0
image: quay-enterprise-quay-enterprise.apps.cam-tgt-21420.qe.devcluster.openshift.com/admin/openshift-migration-rhel7-operator:v1.5.0-23
Source cluster : AWS OCP 3.9 (controller)
Target cluster: AWS OCP 4.8

How reproducible:
1.  Prepare an nginx application in 3.9 source cluster
$ ansible-playbook deploy-app.yml -e use_role=ocp-nginxpv -e namespace=ocp-24706-basicvolmig

2. Create indirect migration plan against nginx 

3. Check analytics message in migration plan 

Actual results:
The migplan will be ready status, but there is warning message “ Failed gathering extended PV usage information for PVs [nginx-logs nginx-html] in migplan. The MigAnalytic reported as 0 

Expected results:
The migplan will be ready status without warning and error message.  The MigAnalytic reported as 0 

Additional info:
$ oc get pvc -n ocp-24706-basicvolmig
NAME     	STATUS	VOLUME                                 	CAPACITY   ACCESS MODES   STORAGECLASS   AGE
nginx-html   Bound 	pvc-602321ba-e51c-11eb-b736-0e2f00abf38f   1Gi    	RWO        	gp2        	1h
nginx-logs   Bound 	pvc-601ee567-e51c-11eb-b736-0e2f00abf38f   1Gi    	RWO        	gp2        	1h

$ oc get migplan ocp-24706-basicvolmig-migplan-1626319591  -o yaml
apiVersion: migration.openshift.io/v1alpha1
kind: MigPlan
metadata:
  ……...
  name: ocp-24706-basicvolmig-migplan-1626319591
  namespace: openshift-migration
spec:
  destMigClusterRef:
	name: target-cluster
	namespace: openshift-migration
  indirectImageMigration: true
  indirectVolumeMigration: true
  migStorageRef:
	name: automatic
	namespace: openshift-migration
  namespaces:
  - ocp-24706-basicvolmig
  persistentVolumes:
  - capacity: 1Gi
	name: pvc-601ee567-e51c-11eb-b736-0e2f00abf38f
	proposedCapacity: "0"
	pvc:
  	accessModes:
  	- ReadWriteOnce
  	hasReference: true
  	name: nginx-logs
  	namespace: ocp-24706-basicvolmig
	selection:
  	action: copy
  	copyMethod: filesystem
  	storageClass: gp2
	storageClass: gp2
	supported:
  	actions:
  	- copy
  	- move
  	copyMethods:
  	- filesystem
  	- snapshot
  - capacity: 1Gi
	name: pvc-602321ba-e51c-11eb-b736-0e2f00abf38f
	proposedCapacity: "0"
	pvc:
  	accessModes:
  	- ReadWriteOnce
  	hasReference: true
  	name: nginx-html
  	namespace: ocp-24706-basicvolmig
	selection:
  	action: copy
  	copyMethod: filesystem
  	storageClass: gp2
	storageClass: gp2
	supported:
  	actions:
  	- copy
  	- move
  	copyMethods:
  	- filesystem
  	- snapshot
  srcMigClusterRef:
	name: host
	namespace: openshift-migration
status:
  conditions:
  - category: Required
	lastTransitionTime: 2021-07-15T03:26:36Z
	message: The `persistentVolumes` list has been updated with discovered PVs.
	reason: Done
	status: "True"
	type: PvsDiscovered
  - category: Required
	lastTransitionTime: 2021-07-15T03:26:36Z
	message: The storage resources have been created.
	reason: Done
	status: "True"
	type: StorageEnsured
  - category: Warn
	lastTransitionTime: 2021-07-15T04:11:44Z
	message: Failed gathering extended PV usage information for PVs [nginx-logs nginx-html],
  	please see MigAnalytic openshift-migration/ocp-24706-basicvolmig-migplan-1626319591-szwd6
  	for details
	reason: FailedRunningDf
	status: "True"
	type: ExtendedPVAnalysisFailed
  - category: Required
	lastTransitionTime: 2021-07-15T03:26:36Z
	message: The migration plan is ready.
	status: "True"
	type: Ready
  destStorageClasses:
  - accessModes:
	- ReadWriteOnce
	default: true
	name: gp2
	provisioner: kubernetes.io/aws-ebs
  - accessModes:
	- ReadWriteOnce
	name: gp2-csi
	provisioner: ebs.csi.aws.com
  excludedResources:
………

$ oc get miganalytic ocp-24706-basicvolmig-migplan-1626319591-szwd6  -n openshift-migration -o yaml
apiVersion: migration.openshift.io/v1alpha1
kind: MigAnalytic
metadata:
 ……...
  name: ocp-24706-basicvolmig-migplan-1626319591-szwd6
  namespace: openshift-migration
spec:
  analyzeExtendedPVCapacity: true
  analyzeImageCount: false
  analyzeK8SResources: false
  analyzePVCapacity: false
  migPlanRef:
	name: ocp-24706-basicvolmig-migplan-1626319591
	namespace: openshift-migration
status:
  analytics:
	excludedk8sResourceTotal: 0
	imageCount: 0
	imageSizeTotal: "0"
	incompatiblek8sResourceTotal: 0
	k8sResourceTotal: 0
	namespaces:
	- excludedK8SResourceTotal: 0
  	imageCount: 0
  	imageSizeTotal: "0"
  	incompatibleK8SResourceTotal: 0
  	k8sResourceTotal: 0
  	namespace: ocp-24706-basicvolmig
  	persistentVolumes:
  	- actualCapacity: "0"
    	comment: No change in PV capacity is needed.
    	name: nginx-logs
    	proposedCapacity: "0"
    	requestedCapacity: 1Gi
  	- actualCapacity: "0"
    	comment: No change in PV capacity is needed.
    	name: nginx-html
    	proposedCapacity: "0"
    	requestedCapacity: 1Gi
  	pvCapacity: "0"
  	pvCount: 0
	percentComplete: 100
	plan: ocp-24706-basicvolmig-migplan-1626319591
	pvCapacity: "0"
	pvCount: 0
  conditions:
  - category: Warn
	lastTransitionTime: 2021-07-15T03:26:34Z
	message: Failed gathering extended PV usage information for PVs [nginx-logs nginx-html]
	reason: FailedRunningDf
	status: "True"
	type: ExtendedPVAnalysisFailed
  - category: Required
	lastTransitionTime: 2021-07-15T03:26:34Z
	message: The analytic is ready.
	status: "True"
	type: Ready
  observedGeneration: 1

Comment 1 whu 2021-07-15 09:16:22 UTC
one similar bug : https://bugzilla.redhat.com/show_bug.cgi?id=1918504

Comment 2 Jason Montleon 2021-07-15 17:22:13 UTC
Copying my notes from the 1.4.6 bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1982729#c2

This isn't as much a problem with Analytics as it is the pv resize feature which attempts to use the restic daemonset to determine the actual disk usage of the volumes.

IIUC the failures are limited to using analytics for pv resize when migrating from older OCP releases (3.7, 3.9) and the pod comes into existence after the restic daemonset was started.

restic uses a hostPath mount to peer into the volume and bind remount does not exist on these versions so if the application comes up after the daemonset it is oblivious to it.

Possible solutions might include restarting the daeonset before running the analytic (I think this would be costly performance wise on large clusters) or creating a pod on the node to run the size check instead of using the restic daemonset so it always exists after the application.

Comment 3 whu 2021-07-16 09:46:49 UTC
Additional information

After clicking "refresh` button, the analytic were updated and got the correct value.
But the warning regarding the pv resize is still in migplan.

Comment 4 Pranav Gaikwad 2021-07-16 19:11:47 UTC
I captured a note describing this limitation in our upstream docs here: https://github.com/konveyor/mig-operator/pull/716

Comment 5 Pranav Gaikwad 2021-08-26 18:55:00 UTC
This is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1982729 and we do not intend to fix the underlying problem in MTC as it only affects certain cases on 3.9 clusters and there is a workaround available to unblock the users. The documentation around this is already merged. As a result, I am closing this issue.

*** This bug has been marked as a duplicate of bug 1982729 ***