1441566 – Some containers have both ems_id and old_ems_id set to nil

Bug 1441566 - Some containers have both ems_id and old_ems_id set to nil

Summary: Some containers have both ems_id and old_ems_id set to nil

Keywords:
Status:	CLOSED DUPLICATE of bug 1443661
Alias:	None
Product:	Red Hat CloudForms Management Engine
Classification:	Red Hat
Component:	Providers
Sub Component:
Version:	5.7.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	GA
Target Release:	5.7.3
Assignee:	Ari Zellner
QA Contact:	Einat Pacifici
Docs Contact:
URL:
Whiteboard:	container
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-04-12 08:46 UTC by Federico Simoncelli
Modified:	2017-12-05 15:59 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-06-14 14:50:41 UTC
Category:	---
Cloudforms Team:	Container Management
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Federico Simoncelli 2017-04-12 08:46:49 UTC

Description of problem:
We have found systems with both ems_id and old_ems_id being nil. It seems reproducible enough: ~2 containers with ems_id and old_ems_id being nil a day in an env with 4000 pods.

Steps to Reproduce:
Unknown.

Actual results:
Some containers with ems_id and old_ems_id being nil.

Expected results:
A container should be attached to an ems either by ems_id or old_ems_id.

Comment 7 Federico Simoncelli 2017-05-17 16:53:24 UTC

It was confirmed that this PR is preventing the issue from happening:

  https://github.com/ManageIQ/manageiq/pull/14815

Although IIUC we don't understand why the disconnection is issued twice.

Adam do you feel we should move this to the core team to investigate why the disconnection was issued twice?

If not, then this will be closed with the above PR that is just preventing it for the specific OpenShift provider.

Comment 8 Adam Grare 2017-05-17 17:16:18 UTC

I believe the reason it was disconnected twice was because the container was being disconnected directly by save_container_inventory [0] then also by container_definition#disconnect_inv [1]

[0] https://github.com/ManageIQ/manageiq/blob/6e2ed956d8ec84ec0131e9efce9587ad4edd3c85/app/models/ems_refresh/save_inventory_container.rb#L354

[1] https://github.com/ManageIQ/manageiq/blob/6e2ed956d8ec84ec0131e9efce9587ad4edd3c85/app/models/container_definition.rb#L14

Comment 9 Federico Simoncelli 2017-05-17 17:20:18 UTC

(In reply to Adam Grare from comment #8)
> I believe the reason it was disconnected twice was because the container was
> being disconnected directly by save_container_inventory [0] then also by
> container_definition#disconnect_inv [1]
> 
> [0]
> https://github.com/ManageIQ/manageiq/blob/
> 6e2ed956d8ec84ec0131e9efce9587ad4edd3c85/app/models/ems_refresh/
> save_inventory_container.rb#L354
> 
> [1]
> https://github.com/ManageIQ/manageiq/blob/
> 6e2ed956d8ec84ec0131e9efce9587ad4edd3c85/app/models/container_definition.
> rb#L14

If that's the case shouldn't have happened 100% and not only in rare cases?
(it was only 100 out of 4000, and I don't know how many already archived...)

Comment 10 Adam Grare 2017-06-13 13:51:33 UTC

My guess is it was a timing issue on if the container was dissociated from the container definition before the definition was disconnected.

From a core ems_refresh point of view we disconnect both in only one place and any races on container definition disconnect should be handled by that method.

Comment 11 Federico Simoncelli 2017-06-14 14:23:36 UTC

Based on comment 7 the PR solved the issue in the openshift provider:

https://github.com/ManageIQ/manageiq/pull/14815

Comment 12 Satoe Imaishi 2017-06-14 14:50:41 UTC


*** This bug has been marked as a duplicate of bug 1443661 ***

Note You need to log in before you can comment on or make changes to this bug.