Bug 2189408

Summary: [FaaS-Migration] After migration of Provider , new provider also start uninstalling after some time
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: suchita <sgatfane>
Component: odf-managed-serviceAssignee: Ritesh Chikatwar <rchikatw>
Status: VERIFIED --- QA Contact: Neha Berry <nberry>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.13CC: kramdoss, odf-bz-bot, rchikatw, resoni, sgatfane
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description suchita 2023-04-25 06:50:16 UTC
Description of problem:
Faas Provider also in uninstallation state after 20-30 min of successful migration of provider

Version-Release number of selected component (if applicable):
$ oc get csv -n fusion-storage
NAME                                      DISPLAY                       VERSION             REPLACES                                  PHASE
managed-fusion-agent.v2.0.11              Managed Fusion Agent          2.0.11                                                        Succeeded
observability-operator.v0.0.20            Observability Operator        0.0.20              observability-operator.v0.0.19            Succeeded
ocs-operator.v4.13.0-168.stable           OpenShift Container Storage   4.13.0-168.stable                                             Succeeded
ose-prometheus-operator.4.10.0            Prometheus Operator           4.10.0                                                        Succeeded
route-monitor-operator.v0.1.500-6152b76   Route Monitor Operator        0.1.500-6152b76     route-monitor-operator.v0.1.498-e33e391   Pending

clusterVersion:
NAME      VERSION   
version   4.12.13

How reproducible:
2/2

Steps to Reproduce:
1. Install Appliance mode cluster
2. install FasS agent provider
3. Use migrate.sh script to migrate cluster. 
 ./migrate.sh -provider <oldClusterID> <newClusterID> -d -dev
repo checked out after PR#33 merged
4. wait for provider  migration completion
5. Keep a watch on all cluste staus using rosa list cluster command 


Actual results:
provider Migration is completed and old provider and its service are deleted along with new FasS agent provider cluster . 

Expected results:
provider Migration is completed and old provider and its service are deleted. New FasS agent provider cluster should remain in a ready state. 

Additional info:

Workaround: 
The root cause is noted in another BUg https://bugzilla.redhat.com/show_bug.cgi?id=2189409
this causes the deletion of the New cluster while deleting the older appliance provider.

the workaround is as soon as migration script shows below message, Go to aws consume-> volumes-> add filer of volumes with new provider names. Go to each mons and OSD volumes details page, tags-> managed tags and delete the tag with key contails name of oldprovider 
ex: key kubernetes.io/cluster/sgatfane-p1425-svjlq" with value "owned" where sgatfane-p1425 is appliance cluster name

Comment 1 Rewant 2023-04-25 10:43:49 UTC
The script fetches the aws EBS volumes key, using 

aws ec2 describe-volumes --volume-id $volumeID --filters Name=tag:kubernetes.io/created-for/pvc/namespace,Values=openshift-storage  --region $region --query "Volumes[*].Tags" | jq .[] | jq -r '.[]| select (.Value == "owned")|.Key', 

which is then used to replace the tag for name. if the default output is not set to json in aws configure, it will fail.

If the owned tag is not deleted for old provider, the EBS volumes will be deleted when it deletes the cluster. We think this might be the reason the new provider got deleted.

We added the --output flag to each commands where we fetch the tags. That should solve the issue.

PR: https://github.com/rchikatw/odf-managed-service-migration/pull/34

Comment 2 Ritesh Chikatwar 2023-04-25 13:04:24 UTC
Suchita,

PR: https://github.com/rchikatw/odf-managed-service-migration/pull/34 is merged please take the latest changes and verify the migration.

Comment 3 suchita 2023-04-26 07:13:09 UTC
Yesterday's migration provider is even with a workaround -Removal of tags with appliance mode provider name, in volumes. 
Still this issue is observed and not immediately after migration, this is after typically 12-14 hours of the first FasS provider creation.

I will update further after migration with changes from PR#34

Comment 4 suchita 2023-04-26 07:13:10 UTC
Yesterday's migration provider is even with a workaround -Removal of tags with appliance mode provider name, in volumes. 
Still this issue is observed and not immediately after migration, this is after typically 12-14 hours of the first FasS provider creation.

I will update further after migration with changes from PR#34

Comment 5 suchita 2023-05-08 08:06:28 UTC
After observation on 4 migration setups, ( migration with >= PR#34), this uninstallation of the provider is not observed.
Marking this BZ as verified.