Bug 1565326

Summary:	Duplicate container node entries after changing flavor size
Product:	Red Hat CloudForms Management Engine	Reporter:	Ryan Spagnola <rspagnol>
Component:	Providers	Assignee:	Oved Ourfali <oourfali>
Status:	CLOSED INSUFFICIENT_DATA	QA Contact:	Einat Pacifici <epacific>
Severity:	high	Docs Contact:
Priority:	high
Version:	5.9.0	CC:	azellner, cben, cpelland, gblomqui, jfrey, jhardy, mfeifer, obarenbo, rspagnol
Target Milestone:	GA
Target Release:	5.9.3
Hardware:	All
OS:	All
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-05-02 17:31:34 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	Bug
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	Container Management	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1561041

Comment 2 Greg Blomquist 2018-04-09 20:26:06 UTC

Oved, I talked to Ryan, and he said he doesn't see AWS as a provider in CFME.  So, I think this is purely in OpenShift management in CFME.

Comment 5 Beni Paskin-Cherniavsky 2018-04-11 07:39:46 UTC

Ryan, is there reason to have the BZ description confidential?

I'd like to first understand what happens on OCP size.
Having `oc get node -o yaml` dump before and after such AWS flavor change would be ideal.

CFME uses node's metadata.uid for the ems_ref.
The info in BZ description does confirm uid changed.
Thus from CFME perspective the old node disappeared and a new one appeared.

We recently implemented "archiving" of old nodes in CFME.
"Archiving" means old node is "soft-deleted" — remains in DB but gets `deleted_on` column set. UI hides archived nodes but historical reporting can still see them.
Prior to that, disappeared nodes would simply be deleted from CFME db.
bug 1536101 says node archiving was released in 5.9.0.17. What's the CFME version customer is running?
I assume ContainerNode id 10000000000005 exists in CFME db?

=> Do we need a BZ on openshift? From the case, the process was (1) Power Off (2) Change Flavor from m4.large to r4.large (3) Power On. IIUC node storage was retained, it's not a new node, so it ought to retain UID?

=> Given OCP UID changed, CFME refresh is working as designed.
It must deal with situations where a node is genuinely destroyed and new node(s) are created anyway.
The more interesting question is whether reporting deals with it well.

- Could/should CFME use some other field as better ems_ref to workaround OCP behavior?
Having full `oc get node -o yaml` before/after dumps would help understand if this is even possible.

- Is it possible to edit customer's DB to squash the duplicates?
Probably feasible (repoint container_node_id from other tables — including all metrics! — to latest non-archived node, drop the archive one).
But IMHO this is complex, risky and not worth it, especially if resizing nodes will be a recurring scenario...
Plus if you do this, you're losing some data, for example the AWS instance type pods were running on before the change. If say, you need later a report related to how much things were costing you in AWS, you'll have wrong data...

- Reporting idea: could you group by node Name? (or real_name label, or whatever best represents "same node" to you)

Comment 6 Ryan Spagnola 2018-04-11 12:30:02 UTC

Hi Beni,

The BZ description is set to private as it contains customer ip addresses in a '-' format in the hostnames.

Customer is running CFME 5.9.0.22

I'll work with the customer to collect the `oc get node -o yaml` and I'll inquire about the reporting idea you referenced. 

Thanks for the information.

Comment 7 Oved Ourfali 2018-04-28 09:36:54 UTC

Putting needinfo on Ryan.