1565326 – Duplicate container node entries after changing flavor size

Bug 1565326 - Duplicate container node entries after changing flavor size

Summary: Duplicate container node entries after changing flavor size

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	Red Hat CloudForms Management Engine
Classification:	Red Hat
Component:	Providers
Sub Component:
Version:	5.9.0
Hardware:	All
OS:	All
Priority:	high
Severity:	high
Target Milestone:	GA
Target Release:	5.9.3
Assignee:	Oved Ourfali
QA Contact:	Einat Pacifici
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1561041
TreeView+	depends on / blocked

Reported:	2018-04-09 20:18 UTC by Ryan Spagnola
Modified:	2021-06-10 15:44 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-05-02 17:31:34 UTC
Category:	Bug
Cloudforms Team:	Container Management
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Comment 2 Greg Blomquist 2018-04-09 20:26:06 UTC

Oved, I talked to Ryan, and he said he doesn't see AWS as a provider in CFME.  So, I think this is purely in OpenShift management in CFME.

Comment 5 Beni Paskin-Cherniavsky 2018-04-11 07:39:46 UTC

Ryan, is there reason to have the BZ description confidential?

I'd like to first understand what happens on OCP size.
Having `oc get node -o yaml` dump before and after such AWS flavor change would be ideal.

CFME uses node's metadata.uid for the ems_ref.
The info in BZ description does confirm uid changed.
Thus from CFME perspective the old node disappeared and a new one appeared.

We recently implemented "archiving" of old nodes in CFME.
"Archiving" means old node is "soft-deleted" — remains in DB but gets `deleted_on` column set. UI hides archived nodes but historical reporting can still see them.
Prior to that, disappeared nodes would simply be deleted from CFME db.
bug 1536101 says node archiving was released in 5.9.0.17. What's the CFME version customer is running?
I assume ContainerNode id 10000000000005 exists in CFME db?

=> Do we need a BZ on openshift? From the case, the process was (1) Power Off (2) Change Flavor from m4.large to r4.large (3) Power On. IIUC node storage was retained, it's not a new node, so it ought to retain UID?

=> Given OCP UID changed, CFME refresh is working as designed.
It must deal with situations where a node is genuinely destroyed and new node(s) are created anyway.
The more interesting question is whether reporting deals with it well.

- Could/should CFME use some other field as better ems_ref to workaround OCP behavior?
Having full `oc get node -o yaml` before/after dumps would help understand if this is even possible.

- Is it possible to edit customer's DB to squash the duplicates?
Probably feasible (repoint container_node_id from other tables — including all metrics! — to latest non-archived node, drop the archive one).
But IMHO this is complex, risky and not worth it, especially if resizing nodes will be a recurring scenario...
Plus if you do this, you're losing some data, for example the AWS instance type pods were running on before the change. If say, you need later a report related to how much things were costing you in AWS, you'll have wrong data...

- Reporting idea: could you group by node Name? (or real_name label, or whatever best represents "same node" to you)

Comment 6 Ryan Spagnola 2018-04-11 12:30:02 UTC

Hi Beni,

The BZ description is set to private as it contains customer ip addresses in a '-' format in the hostnames.

Customer is running CFME 5.9.0.22

I'll work with the customer to collect the `oc get node -o yaml` and I'll inquire about the reporting idea you referenced. 

Thanks for the information.

Comment 7 Oved Ourfali 2018-04-28 09:36:54 UTC

Putting needinfo on Ryan.

Note You need to log in before you can comment on or make changes to this bug.