Fedora Account System
Red Hat Associate
Red Hat Customer
Description of problem: We have found systems with both ems_id and old_ems_id being nil. It seems reproducible enough: ~2 containers with ems_id and old_ems_id being nil a day in an env with 4000 pods. Steps to Reproduce: Unknown. Actual results: Some containers with ems_id and old_ems_id being nil. Expected results: A container should be attached to an ems either by ems_id or old_ems_id.
It was confirmed that this PR is preventing the issue from happening: https://github.com/ManageIQ/manageiq/pull/14815 Although IIUC we don't understand why the disconnection is issued twice. Adam do you feel we should move this to the core team to investigate why the disconnection was issued twice? If not, then this will be closed with the above PR that is just preventing it for the specific OpenShift provider.
I believe the reason it was disconnected twice was because the container was being disconnected directly by save_container_inventory [0] then also by container_definition#disconnect_inv [1] [0] https://github.com/ManageIQ/manageiq/blob/6e2ed956d8ec84ec0131e9efce9587ad4edd3c85/app/models/ems_refresh/save_inventory_container.rb#L354 [1] https://github.com/ManageIQ/manageiq/blob/6e2ed956d8ec84ec0131e9efce9587ad4edd3c85/app/models/container_definition.rb#L14
(In reply to Adam Grare from comment #8) > I believe the reason it was disconnected twice was because the container was > being disconnected directly by save_container_inventory [0] then also by > container_definition#disconnect_inv [1] > > [0] > https://github.com/ManageIQ/manageiq/blob/ > 6e2ed956d8ec84ec0131e9efce9587ad4edd3c85/app/models/ems_refresh/ > save_inventory_container.rb#L354 > > [1] > https://github.com/ManageIQ/manageiq/blob/ > 6e2ed956d8ec84ec0131e9efce9587ad4edd3c85/app/models/container_definition. > rb#L14 If that's the case shouldn't have happened 100% and not only in rare cases? (it was only 100 out of 4000, and I don't know how many already archived...)
My guess is it was a timing issue on if the container was dissociated from the container definition before the definition was disconnected. From a core ems_refresh point of view we disconnect both in only one place and any races on container definition disconnect should be handled by that method.
Based on comment 7 the PR solved the issue in the openshift provider: https://github.com/ManageIQ/manageiq/pull/14815
*** This bug has been marked as a duplicate of bug 1443661 ***