Bug 1732442 - refresh: Error when saving InventoryCollection:<ContainerImage> ... undefined method `[]=' for nil:NilClass
Summary: refresh: Error when saving InventoryCollection:<ContainerImage> ... undefined...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Providers
Version: 5.10.4
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: GA
: 5.11.0
Assignee: Ladislav Smola
QA Contact: juwatts
Red Hat CloudForms Documentation
URL:
Whiteboard:
Depends On:
Blocks: 1749060
TreeView+ depends on / blocked
 
Reported: 2019-07-23 12:06 UTC by Beni Paskin-Cherniavsky
Modified: 2019-12-13 14:54 UTC (History)
10 users (show)

Fixed In Version: 5.11.0.23
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1749060 (view as bug list)
Environment:
Last Closed: 2019-12-13 14:54:48 UTC
Category: Bug
Cloudforms Team: Container Management
Target Upstream Version:


Attachments (Terms of Use)

Description Beni Paskin-Cherniavsky 2019-07-23 12:06:19 UTC
Description of problem:

Seen in customer logs: "prod1-ocp" env refreshes occasionally failed, in 10–20% of runs:

[----] E, [2019-06-17T15:01:55.414115 #17048:4a8f58] ERROR -- : Error when saving InventoryCollection:<ContainerImage> with strategy: , saver_strategy: batch, targeted: false. Message: undefined method `[]=' for
 nil:NilClass
[----] I, [2019-06-17T15:01:55.414492 #17048:4a8f58]  INFO -- : Exception in realtime_block :ems_refresh - Timings: {:collect_inventory_for_targets=>49.55200004577637, :parse_targeted_inventory=>96.7337815761566
2, :save_inventory=>52.38033437728882, :ems_refresh=>198.66671109199524}
[----] E, [2019-06-17T15:01:55.414643 #17048:4a8f58] ERROR -- : MIQ(ManageIQ::Providers::Openshift::ContainerManager::Refresher#refresh) EMS: [prod1-ocp], id: [1000000000004] Refresh failed
[----] E, [2019-06-17T15:01:55.416552 #17048:4a8f58] ERROR -- : [NoMethodError]: undefined method `[]=' for nil:NilClass  Method:[block (2 levels) in <class:LogProxy>]
[----] E, [2019-06-17T15:01:55.419961 #17048:4a8f58] ERROR -- : /var/www/miq/vmdb/app/models/manageiq/providers/inventory/persister/builder/container_manager.rb:252:in `block (3 levels) in custom_reconnect_block
'
/opt/rh/cfme-gemset/gems/activerecord-5.0.7.2/lib/active_record/relation/delegation.rb:38:in `each'
/opt/rh/cfme-gemset/gems/activerecord-5.0.7.2/lib/active_record/relation/delegation.rb:38:in `each'
/var/www/miq/vmdb/app/models/manageiq/providers/inventory/persister/builder/container_manager.rb:243:in `block (2 levels) in custom_reconnect_block'
/var/www/miq/vmdb/app/models/manageiq/providers/inventory/persister/builder/container_manager.rb:242:in `each'
/var/www/miq/vmdb/app/models/manageiq/providers/inventory/persister/builder/container_manager.rb:242:in `each_slice'
/var/www/miq/vmdb/app/models/manageiq/providers/inventory/persister/builder/container_manager.rb:242:in `block in custom_reconnect_block'
/opt/rh/cfme-gemset/gems/inventory_refresh-0.1.2/lib/inventory_refresh/save_collection/saver/concurrent_safe_batch.rb:107:in `save!'
/opt/rh/cfme-gemset/gems/inventory_refresh-0.1.2/lib/inventory_refresh/save_collection/saver/base.rb:84:in `save_inventory_collection!'
/opt/rh/cfme-gemset/gems/inventory_refresh-0.1.2/lib/inventory_refresh/save_collection/base.rb:37:in `save_inventory'
/opt/rh/cfme-gemset/gems/inventory_refresh-0.1.2/lib/inventory_refresh/save_collection/base.rb:24:in `save_inventory_object_inventory'
/opt/rh/cfme-gemset/gems/inventory_refresh-0.1.2/lib/inventory_refresh/save_collection/topological_sort.rb:29:in `block (2 levels) in save_collections'
/opt/rh/cfme-gemset/gems/inventory_refresh-0.1.2/lib/inventory_refresh/save_collection/topological_sort.rb:28:in `each'
/opt/rh/cfme-gemset/gems/inventory_refresh-0.1.2/lib/inventory_refresh/save_collection/topological_sort.rb:28:in `block in save_collections'
/opt/rh/cfme-gemset/gems/inventory_refresh-0.1.2/lib/inventory_refresh/save_collection/topological_sort.rb:26:in `each'
/opt/rh/cfme-gemset/gems/inventory_refresh-0.1.2/lib/inventory_refresh/save_collection/topological_sort.rb:26:in `each_with_index'
/opt/rh/cfme-gemset/gems/inventory_refresh-0.1.2/lib/inventory_refresh/save_collection/topological_sort.rb:26:in `save_collections'
/opt/rh/cfme-gemset/gems/inventory_refresh-0.1.2/lib/inventory_refresh/save_inventory.rb:24:in `save_inventory'
/var/www/miq/vmdb/app/models/manageiq/providers/inventory/persister.rb:29:in `persist!'
/var/www/miq/vmdb/app/models/manageiq/providers/base_manager/refresher.rb:189:in `save_inventory'
/var/www/miq/vmdb/app/models/manageiq/providers/base_manager/refresher.rb:109:in `block in refresh_targets_for_ems'

Version-Release number of selected component (if applicable):
5.10.4.3

Steps to Reproduce:
?

Actual results:
10–20% of refreshes failed.

Expected results:
refresh always succeeds.

Additional info:
[Splitting from bug 1722808 to track separately.
Got some analysis there, believe know how to fix.]

Comment 2 Beni Paskin-Cherniavsky 2019-07-23 12:28:55 UTC
Just for clarity: unlike bug 1722808 at same customer, this one is completely on CloudForms side.

According to logs, CF gets all data (>10 requests) from "prod1-ocp" Openshift within 40~50sec without problems, this bug is later in CF saving to its database.

Comment 3 Ladislav Smola 2019-08-07 15:24:21 UTC
I've commented what to do to fix this in https://bugzilla.redhat.com/show_bug.cgi?id=1722808#c12

So applying what we have in our cloud/infra reconnect code https://github.com/Ladas/manageiq/blob/6834b42669bb0d33494736d0653ca1158c25f6ab/app/models/manageiq/providers/inventory/persister/builder/shared.rb#L21 (the MIQ DB is not protected from duplicates)

To OpenShift codebase: https://github.com/ManageIQ/manageiq/blob/hammer-7/app/models/manageiq/providers/inventory/persister/builder/container_manager.rb#L252 (was taken from OpenShift provider)

(or change it to use the shared code)

Comment 6 CFME Bot 2019-09-04 14:45:38 UTC
New commit detected on ManageIQ/manageiq/master:

https://github.com/ManageIQ/manageiq/commit/95c44a3f7950aef12c38f5087e7ae5119090c393
commit 95c44a3f7950aef12c38f5087e7ae5119090c393
Author:     Ladislav Smola <lsmola@redhat.com>
AuthorDate: Wed Sep  4 03:04:12 2019 -0400
Commit:     Ladislav Smola <lsmola@redhat.com>
CommitDate: Wed Sep  4 03:04:12 2019 -0400

    Skip already reconnected object

    The current reconnect code may try to reconnect multiple
    times, because MIQ DB doesn't have constraints preventing
    duplicates.

    Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1732442

 app/models/manageiq/providers/inventory/persister/builder/container_manager.rb | 3 +
 1 file changed, 3 insertions(+)

Comment 7 CFME Bot 2019-09-04 19:10:37 UTC
New commit detected on ManageIQ/manageiq/ivanchuk:

https://github.com/ManageIQ/manageiq/commit/d113674aa773fd21bfab53d73091330c034bc131
commit d113674aa773fd21bfab53d73091330c034bc131
Author:     Adam Grare <agrare@redhat.com>
AuthorDate: Wed Sep  4 10:44:12 2019 -0400
Commit:     Adam Grare <agrare@redhat.com>
CommitDate: Wed Sep  4 10:44:12 2019 -0400

    Merge pull request #19251 from Ladas/cover_containers_reconnect_with_duplicates_in_db

    Skip already reconnected object

    (cherry picked from commit 91092dcacfd6663cbed3711ecae913a663e033bd)

    https://bugzilla.redhat.com/show_bug.cgi?id=1732442

 app/models/manageiq/providers/inventory/persister/builder/container_manager.rb | 3 +
 1 file changed, 3 insertions(+)

Comment 9 juwatts 2019-09-09 18:49:00 UTC
Verified in 5.11.0.23.20190904213640_d113674

Added duplicate records into the database (different ids):

vmdb_production=# select * from container_groups where id=1;
-[ RECORD 1 ]------------+------------------------------------------------------------------
id                       | 1
ems_ref                  | 6574ecad-cdd3-11e9-8f26-001a4a520000
name                     | ansible-1-tkqzg
ems_created_on           | 2019-09-02 22:45:43
resource_version         | 72938575
restart_policy           | Always
dns_policy               | ClusterFirst
ems_id                   | 34
container_node_id        | 2
last_perf_capture_on     | 2019-09-09 18:03:00
container_replicator_id  | 
ipaddress                | 10.130.0.174
type                     | ManageIQ::Providers::Kubernetes::ContainerManager::ContainerGroup
container_project_id     | 3
phase                    | Running
message                  | 
reason                   | 
container_build_pod_id   | 
created_on               | 2019-09-09 15:19:20.332876
deleted_on               | 
old_ems_id               | 
old_container_project_id | 

vmdb_production=# select * from container_groups where id=50;
-[ RECORD 1 ]------------+------------------------------------------------------------------
id                       | 50
ems_ref                  | 3e5d73db-cdd0-11e9-8f26-001a4a520000
name                     | httpd-1-vcvxx
ems_created_on           | 2019-09-02 22:23:09
resource_version         | 72935498
restart_policy           | Always
dns_policy               | ClusterFirst
ems_id                   | 34
container_node_id        | 2
last_perf_capture_on     | 2019-09-09 18:03:00
container_replicator_id  | 
ipaddress                | 10.130.0.166
type                     | ManageIQ::Providers::Kubernetes::ContainerManager::ContainerGroup
container_project_id     | 3
phase                    | Running
message                  | 
reason                   | 
container_build_pod_id   | 
created_on               | 2019-09-09 15:19:20.332876
deleted_on               | 2019-09-09 18:31:46.173302
old_ems_id               | 
old_container_project_id | 3

vmdb_production=# 

Refreshed the provider and verified that no errors were observed in the evm.log

Comment 10 Beni Paskin-Cherniavsky 2019-11-11 18:07:42 UTC
Requesting 5.9.z for case https://access.redhat.com/support/cases/#/case/02507124.

Backport PR: https://github.com/ManageIQ/manageiq-providers-kubernetes/pull/348


Note You need to log in before you can comment on or make changes to this bug.