Description of problem: Refresh failed 2 times with this backtrace. Version-Release number of selected component (if applicable): Version: 5.7.1.3 Build: 20170221135006_818f133 Diagnostics: [Installed RPMs]: cfme 5.7.1.3-22.el7cf with multiple hotfixes... Expected results: Refresh succeeds Additional info: [----] E, [2017-08-17T11:02:47.798803 #16498:4f1140] ERROR -- : MIQ(ManageIQ::Providers::OpenshiftEnterprise::ContainerManager::Refresher#refresh) EMS: [CTC Core], id: [10000000000001] Refresh failed [----] E, [2017-08-17T11:02:47.800110 #16498:4f1140] ERROR -- : [NoMethodError]: undefined method `[]' for nil:NilClass Method:[rescue in block in refresh] [----] E, [2017-08-17T11:02:47.800321 #16498:4f1140] ERROR -- : /var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresh_parser.rb:740:in `parse_image_name' /var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresh_parser.rb:644:in `parse_container_image' /var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresh_parser.rb:624:in `parse_container' /var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresh_parser.rb:305:in `block in parse_pod' /var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresh_parser.rb:304:in `each' /var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresh_parser.rb:304:in `parse_pod' /var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresh_parser.rb:58:in `block in get_pods' /var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresh_parser.rb:124:in `process_collection_item' /var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresh_parser.rb:118:in `block in process_collection' /opt/rh/rh-ruby23/root/usr/share/ruby/delegate.rb:341:in `each' /opt/rh/rh-ruby23/root/usr/share/ruby/delegate.rb:341:in `block in delegating_block' /var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresh_parser.rb:118:in `process_collection' /var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresh_parser.rb:58:in `get_pods' /var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresh_parser.rb:24:in `ems_inv_to_hashes' /var/www/miq/vmdb/app/models/manageiq/providers/openshift/container_manager/refresh_parser.rb:5:in `ems_inv_to_hashes' /var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresh_parser.rb:7:in `ems_inv_to_hashes' /var/www/miq/vmdb/app/models/manageiq/providers/openshift/container_manager/refresher.rb:24:in `parse_legacy_inventory' /var/www/miq/vmdb/app/models/ems_refresh/refreshers/ems_refresher_mixin.rb:122:in `block in parse_targeted_inventory' /var/www/miq/vmdb/gems/pending/util/extensions/miq-benchmark.rb:11:in `realtime_store' /var/www/miq/vmdb/gems/pending/util/extensions/miq-benchmark.rb:30:in `realtime_block' /var/www/miq/vmdb/app/models/ems_refresh/refreshers/ems_refresher_mixin.rb:122:in `parse_targeted_inventory' /var/www/miq/vmdb/app/models/ems_refresh/refreshers/ems_refresher_mixin.rb:87:in `block in refresh_targets_for_ems' /var/www/miq/vmdb/gems/pending/util/extensions/miq-benchmark.rb:11:in `realtime_store' /var/www/miq/vmdb/gems/pending/util/extensions/miq-benchmark.rb:30:in `realtime_block' /var/www/miq/vmdb/app/models/ems_refresh/refreshers/ems_refresher_mixin.rb:86:in `refresh_targets_for_ems' /var/www/miq/vmdb/app/models/ems_refresh/refreshers/ems_refresher_mixin.rb:24:in `block (2 levels) in refresh' /var/www/miq/vmdb/gems/pending/util/extensions/miq-benchmark.rb:11:in `realtime_store' /var/www/miq/vmdb/gems/pending/util/extensions/miq-benchmark.rb:30:in `realtime_block' /var/www/miq/vmdb/app/models/ems_refresh/refreshers/ems_refresher_mixin.rb:24:in `block in refresh' /var/www/miq/vmdb/app/models/ems_refresh/refreshers/ems_refresher_mixin.rb:14:in `each' /var/www/miq/vmdb/app/models/ems_refresh/refreshers/ems_refresher_mixin.rb:14:in `refresh' /var/www/miq/vmdb/app/models/manageiq/providers/base_manager/refresher.rb:10:in `refresh' /var/www/miq/vmdb/app/models/ems_refresh.rb:91:in `block in refresh' /var/www/miq/vmdb/app/models/ems_refresh.rb:90:in `each' /var/www/miq/vmdb/app/models/ems_refresh.rb:90:in `refresh' /var/www/miq/vmdb/app/models/miq_queue.rb:347:in `block in deliver' /opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:91:in `block in timeout' /opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `block in catch' /opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `catch' /opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `catch' /opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:106:in `timeout' /var/www/miq/vmdb/app/models/miq_queue.rb:343:in `deliver' /var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:106:in `deliver_queue_message' /var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:134:in `deliver_message' /var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:152:in `block in do_work' /var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:146:in `loop' /var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:146:in `do_work' /var/www/miq/vmdb/app/models/miq_worker/runner.rb:334:in `block in do_work_loop' /var/www/miq/vmdb/app/models/miq_worker/runner.rb:331:in `loop' /var/www/miq/vmdb/app/models/miq_worker/runner.rb:331:in `do_work_loop' /var/www/miq/vmdb/app/models/miq_worker/runner.rb:153:in `run' /var/www/miq/vmdb/app/models/miq_worker/runner.rb:128:in `start' /var/www/miq/vmdb/app/models/miq_worker/runner.rb:21:in `start_worker' /var/www/miq/vmdb/app/models/miq_worker.rb:343:in `block in start' /opt/rh/cfme-gemset/gems/nakayoshi_fork-0.0.3/lib/nakayoshi_fork.rb:24:in `fork' /opt/rh/cfme-gemset/gems/nakayoshi_fork-0.0.3/lib/nakayoshi_fork.rb:24:in `fork' /var/www/miq/vmdb/app/models/miq_worker.rb:341:in `start' /var/www/miq/vmdb/app/models/miq_worker.rb:270:in `start_worker' /var/www/miq/vmdb/app/models/mixins/per_ems_worker_mixin.rb:68:in `start_worker_for_ems' /var/www/miq/vmdb/app/models/mixins/per_ems_worker_mixin.rb:46:in `block in sync_workers' /var/www/miq/vmdb/app/models/mixins/per_ems_worker_mixin.rb:45:in `each' /var/www/miq/vmdb/app/models/mixins/per_ems_worker_mixin.rb:45:in `sync_workers' /var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:52:in `block in sync_workers' /var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:50:in `each' /var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:50:in `sync_workers' /var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:22:in `monitor_workers' /var/www/miq/vmdb/app/models/miq_server.rb:346:in `block in monitor' /var/www/miq/vmdb/gems/pending/util/extensions/miq-benchmark.rb:11:in `realtime_store' /var/www/miq/vmdb/gems/pending/util/extensions/miq-benchmark.rb:30:in `realtime_block' /var/www/miq/vmdb/app/models/miq_server.rb:346:in `monitor' /var/www/miq/vmdb/app/models/miq_server.rb:368:in `block (2 levels) in monitor_loop' /var/www/miq/vmdb/gems/pending/util/extensions/miq-benchmark.rb:11:in `realtime_store' /var/www/miq/vmdb/gems/pending/util/extensions/miq-benchmark.rb:30:in `realtime_block' /var/www/miq/vmdb/app/models/miq_server.rb:368:in `block in monitor_loop' /var/www/miq/vmdb/app/models/miq_server.rb:367:in `loop' /var/www/miq/vmdb/app/models/miq_server.rb:367:in `monitor_loop' /var/www/miq/vmdb/app/models/miq_server.rb:250:in `start' /var/www/miq/vmdb/lib/workers/evm_server.rb:65:in `start' /var/www/miq/vmdb/lib/workers/evm_server.rb:92:in `start' /var/www/miq/vmdb/lib/workers/bin/evm_server.rb:4:in `<main>' and again with same backtrace: [----] E, [2017-08-17T11:07:26.086178 #16498:4f1140] ERROR -- : MIQ(ManageIQ::Providers::OpenshiftEnterprise::ContainerManager::Refresher#refresh) EMS: [CTC Core], id: [10000000000001] Refresh failed [----] E, [2017-08-17T11:07:26.087355 #16498:4f1140] ERROR -- : [NoMethodError]: undefined method `[]' for nil:NilClass Method:[rescue in block in refresh] [----] E, [2017-08-17T11:07:26.087576 #16498:4f1140] ERROR -- : /var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresh_parser.rb:740:in `parse_image_name' ... On that day, these are the only 2 refershes that say "...Complete" (with this error), others run out of time or RAM. But later refresh succeeded. We recorded VCR few days later, but they don't trigger this error (the recording script runs the refresh parser). The code around there has changed; at 5.7.1.3-22 line 740 is: hostname = image_parts[:host] || image_parts[:host2] || image_parts[:localhost] https://gitlab.cloudforms.lab.eng.rdu2.redhat.com/cloudforms/cfme/blob/5.7.1.3-22/app/models/manageiq/providers/kubernetes/container_manager/refresh_parser.rb#L740 which seems to be same file as upstream euwe-3: https://github.com/ManageIQ/manageiq/blob/euwe-3/app/models/manageiq/providers/kubernetes/container_manager/refresh_parser.rb#L740
Beni I assigned this to Erez because he worked with images, if you think that this is a more generic issue in the refresher feel free to take it.
https://github.com/ManageIQ/manageiq-providers-kubernetes/pull/94 is a proposed workaround that would let refresh complete and log what we failed on.
This BZ reminds me of https://bugzilla.redhat.com/show_bug.cgi?id=1449278 where some fields that we assumed will always be are not present. I think this is a similar case.
New commit detected on ManageIQ/manageiq-providers-kubernetes/master: https://github.com/ManageIQ/manageiq-providers-kubernetes/commit/22f8ab0e1716c6871239486a76ffef8ebf2e1c5e commit 22f8ab0e1716c6871239486a76ffef8ebf2e1c5e Author: Adam Grare <agrare> AuthorDate: Tue Aug 15 17:00:56 2017 -0400 Commit: Adam Grare <agrare> CommitDate: Fri Aug 25 12:56:41 2017 -0400 Skip invalid container_images https://bugzilla.redhat.com/show_bug.cgi?id=1484337 .../kubernetes/container_manager/refresh_parser.rb | 25 ++++++++++++++++++++-- 1 file changed, 23 insertions(+), 2 deletions(-)
To be clear, this did not solve the root cause, there is still input our refresh parser doesn't understand. We don't yet know what that input looks like. That "skip invalid container images" commit merely lets rest of refresh complete when that happens, and logs the offending input.
Great point Beni, I think the bug here is that refresh fails with undefined method `[]' for nil:NilClass but once we figure out the cause there should be another BZ to handle whatever name was causing this parsing error.
Verified on version: 5.9.0.11.20171127204214_e316988 Verification Steps: 1) Add an OpenShift provider to CFME appliance 2) On the OpenShift provider, created a pod json file and changed the "image" key value from a valid image to "bad-Image-name" 3) Saved and created the pod 4) On the CFME appliance, ran a manual refresh 5) Verified the refresh was successful and the above ERROR was not logged