Bug 1484337

Summary: Refresh fails: undefined method `[]' for nil:NilClass in `parse_image_name'
Product: Red Hat CloudForms Management Engine Reporter: Beni Paskin-Cherniavsky <cben>
Component: ProvidersAssignee: Adam Grare <agrare>
Status: CLOSED CURRENTRELEASE QA Contact: juwatts
Severity: high Docs Contact:
Priority: high    
Version: 5.7.0CC: agrare, cben, efreiber, fsimonce, gblomqui, jfrey, jhardy, obarenbo, simaishi
Target Milestone: GAKeywords: TestOnly
Target Release: 5.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: container
Fixed In Version: 5.9.0.1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1487280 1487283 (view as bug list) Environment:
Last Closed: 2018-03-06 14:50:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: Container Management Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1487280, 1487283    

Description Beni Paskin-Cherniavsky 2017-08-23 09:44:43 UTC
Description of problem:
Refresh failed 2 times with this backtrace.

Version-Release number of selected component (if applicable):
Version: 5.7.1.3
Build:   20170221135006_818f133
Diagnostics: [Installed RPMs]: cfme 5.7.1.3-22.el7cf
with multiple hotfixes...

Expected results:
Refresh succeeds

Additional info:

[----] E, [2017-08-17T11:02:47.798803 #16498:4f1140] ERROR -- : MIQ(ManageIQ::Providers::OpenshiftEnterprise::ContainerManager::Refresher#refresh) EMS: [CTC Core], id: [10000000000001] Refresh failed
[----] E, [2017-08-17T11:02:47.800110 #16498:4f1140] ERROR -- : [NoMethodError]: undefined method `[]' for nil:NilClass  Method:[rescue in block in refresh]
[----] E, [2017-08-17T11:02:47.800321 #16498:4f1140] ERROR -- : /var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresh_parser.rb:740:in `parse_image_name'
/var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresh_parser.rb:644:in `parse_container_image'
/var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresh_parser.rb:624:in `parse_container'
/var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresh_parser.rb:305:in `block in parse_pod'
/var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresh_parser.rb:304:in `each'
/var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresh_parser.rb:304:in `parse_pod'
/var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresh_parser.rb:58:in `block in get_pods'
/var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresh_parser.rb:124:in `process_collection_item'
/var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresh_parser.rb:118:in `block in process_collection'
/opt/rh/rh-ruby23/root/usr/share/ruby/delegate.rb:341:in `each'
/opt/rh/rh-ruby23/root/usr/share/ruby/delegate.rb:341:in `block in delegating_block'
/var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresh_parser.rb:118:in `process_collection'
/var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresh_parser.rb:58:in `get_pods'
/var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresh_parser.rb:24:in `ems_inv_to_hashes'
/var/www/miq/vmdb/app/models/manageiq/providers/openshift/container_manager/refresh_parser.rb:5:in `ems_inv_to_hashes'
/var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresh_parser.rb:7:in `ems_inv_to_hashes'
/var/www/miq/vmdb/app/models/manageiq/providers/openshift/container_manager/refresher.rb:24:in `parse_legacy_inventory'
/var/www/miq/vmdb/app/models/ems_refresh/refreshers/ems_refresher_mixin.rb:122:in `block in parse_targeted_inventory'
/var/www/miq/vmdb/gems/pending/util/extensions/miq-benchmark.rb:11:in `realtime_store'
/var/www/miq/vmdb/gems/pending/util/extensions/miq-benchmark.rb:30:in `realtime_block'
/var/www/miq/vmdb/app/models/ems_refresh/refreshers/ems_refresher_mixin.rb:122:in `parse_targeted_inventory'
/var/www/miq/vmdb/app/models/ems_refresh/refreshers/ems_refresher_mixin.rb:87:in `block in refresh_targets_for_ems'
/var/www/miq/vmdb/gems/pending/util/extensions/miq-benchmark.rb:11:in `realtime_store'
/var/www/miq/vmdb/gems/pending/util/extensions/miq-benchmark.rb:30:in `realtime_block'
/var/www/miq/vmdb/app/models/ems_refresh/refreshers/ems_refresher_mixin.rb:86:in `refresh_targets_for_ems'
/var/www/miq/vmdb/app/models/ems_refresh/refreshers/ems_refresher_mixin.rb:24:in `block (2 levels) in refresh'
/var/www/miq/vmdb/gems/pending/util/extensions/miq-benchmark.rb:11:in `realtime_store'
/var/www/miq/vmdb/gems/pending/util/extensions/miq-benchmark.rb:30:in `realtime_block'
/var/www/miq/vmdb/app/models/ems_refresh/refreshers/ems_refresher_mixin.rb:24:in `block in refresh'
/var/www/miq/vmdb/app/models/ems_refresh/refreshers/ems_refresher_mixin.rb:14:in `each'
/var/www/miq/vmdb/app/models/ems_refresh/refreshers/ems_refresher_mixin.rb:14:in `refresh'
/var/www/miq/vmdb/app/models/manageiq/providers/base_manager/refresher.rb:10:in `refresh'
/var/www/miq/vmdb/app/models/ems_refresh.rb:91:in `block in refresh'
/var/www/miq/vmdb/app/models/ems_refresh.rb:90:in `each'
/var/www/miq/vmdb/app/models/ems_refresh.rb:90:in `refresh'
/var/www/miq/vmdb/app/models/miq_queue.rb:347:in `block in deliver'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:91:in `block in timeout'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `block in catch'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `catch'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:33:in `catch'
/opt/rh/rh-ruby23/root/usr/share/ruby/timeout.rb:106:in `timeout'
/var/www/miq/vmdb/app/models/miq_queue.rb:343:in `deliver'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:106:in `deliver_queue_message'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:134:in `deliver_message'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:152:in `block in do_work'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:146:in `loop'
/var/www/miq/vmdb/app/models/miq_queue_worker_base/runner.rb:146:in `do_work'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:334:in `block in do_work_loop'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:331:in `loop'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:331:in `do_work_loop'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:153:in `run'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:128:in `start'
/var/www/miq/vmdb/app/models/miq_worker/runner.rb:21:in `start_worker'
/var/www/miq/vmdb/app/models/miq_worker.rb:343:in `block in start'
/opt/rh/cfme-gemset/gems/nakayoshi_fork-0.0.3/lib/nakayoshi_fork.rb:24:in `fork'
/opt/rh/cfme-gemset/gems/nakayoshi_fork-0.0.3/lib/nakayoshi_fork.rb:24:in `fork'
/var/www/miq/vmdb/app/models/miq_worker.rb:341:in `start'
/var/www/miq/vmdb/app/models/miq_worker.rb:270:in `start_worker'
/var/www/miq/vmdb/app/models/mixins/per_ems_worker_mixin.rb:68:in `start_worker_for_ems'
/var/www/miq/vmdb/app/models/mixins/per_ems_worker_mixin.rb:46:in `block in sync_workers'
/var/www/miq/vmdb/app/models/mixins/per_ems_worker_mixin.rb:45:in `each'
/var/www/miq/vmdb/app/models/mixins/per_ems_worker_mixin.rb:45:in `sync_workers'
/var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:52:in `block in sync_workers'
/var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:50:in `each'
/var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:50:in `sync_workers'
/var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:22:in `monitor_workers'
/var/www/miq/vmdb/app/models/miq_server.rb:346:in `block in monitor'
/var/www/miq/vmdb/gems/pending/util/extensions/miq-benchmark.rb:11:in `realtime_store'
/var/www/miq/vmdb/gems/pending/util/extensions/miq-benchmark.rb:30:in `realtime_block'
/var/www/miq/vmdb/app/models/miq_server.rb:346:in `monitor'
/var/www/miq/vmdb/app/models/miq_server.rb:368:in `block (2 levels) in monitor_loop'
/var/www/miq/vmdb/gems/pending/util/extensions/miq-benchmark.rb:11:in `realtime_store'
/var/www/miq/vmdb/gems/pending/util/extensions/miq-benchmark.rb:30:in `realtime_block'
/var/www/miq/vmdb/app/models/miq_server.rb:368:in `block in monitor_loop'
/var/www/miq/vmdb/app/models/miq_server.rb:367:in `loop'
/var/www/miq/vmdb/app/models/miq_server.rb:367:in `monitor_loop'
/var/www/miq/vmdb/app/models/miq_server.rb:250:in `start'
/var/www/miq/vmdb/lib/workers/evm_server.rb:65:in `start'
/var/www/miq/vmdb/lib/workers/evm_server.rb:92:in `start'
/var/www/miq/vmdb/lib/workers/bin/evm_server.rb:4:in `<main>'

and again with same backtrace:
[----] E, [2017-08-17T11:07:26.086178 #16498:4f1140] ERROR -- : MIQ(ManageIQ::Providers::OpenshiftEnterprise::ContainerManager::Refresher#refresh) EMS: [CTC Core], id: [10000000000001] Refresh failed
[----] E, [2017-08-17T11:07:26.087355 #16498:4f1140] ERROR -- : [NoMethodError]: undefined method `[]' for nil:NilClass  Method:[rescue in block in refresh]
[----] E, [2017-08-17T11:07:26.087576 #16498:4f1140] ERROR -- : /var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresh_parser.rb:740:in `parse_image_name'
...

On that day, these are the only 2 refershes that say "...Complete" (with this error), others run out of time or RAM.  But later refresh succeeded.

We recorded VCR few days later, but they don't trigger this error (the recording script runs the refresh parser).

The code around there has changed; at 5.7.1.3-22 line 740 is:
  hostname = image_parts[:host] || image_parts[:host2] || image_parts[:localhost]
https://gitlab.cloudforms.lab.eng.rdu2.redhat.com/cloudforms/cfme/blob/5.7.1.3-22/app/models/manageiq/providers/kubernetes/container_manager/refresh_parser.rb#L740
which seems to be same file as upstream euwe-3:
https://github.com/ManageIQ/manageiq/blob/euwe-3/app/models/manageiq/providers/kubernetes/container_manager/refresh_parser.rb#L740

Comment 2 Federico Simoncelli 2017-08-23 10:28:09 UTC
Beni I assigned this to Erez because he worked with images, if you think that this is a more generic issue in the refresher feel free to take it.

Comment 3 Beni Paskin-Cherniavsky 2017-08-23 11:51:49 UTC
https://github.com/ManageIQ/manageiq-providers-kubernetes/pull/94 is a proposed workaround that would let refresh complete and log what we failed on.

Comment 4 Erez Freiberger 2017-08-23 12:52:53 UTC
This BZ reminds me of https://bugzilla.redhat.com/show_bug.cgi?id=1449278 where some fields that we assumed will always be are not present. I think this is a similar case.

Comment 5 CFME Bot 2017-08-29 15:24:45 UTC
New commit detected on ManageIQ/manageiq-providers-kubernetes/master:
https://github.com/ManageIQ/manageiq-providers-kubernetes/commit/22f8ab0e1716c6871239486a76ffef8ebf2e1c5e

commit 22f8ab0e1716c6871239486a76ffef8ebf2e1c5e
Author:     Adam Grare <agrare>
AuthorDate: Tue Aug 15 17:00:56 2017 -0400
Commit:     Adam Grare <agrare>
CommitDate: Fri Aug 25 12:56:41 2017 -0400

    Skip invalid container_images
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1484337

 .../kubernetes/container_manager/refresh_parser.rb | 25 ++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

Comment 9 Beni Paskin-Cherniavsky 2017-09-23 22:11:45 UTC
To be clear, this did not solve the root cause, there is still input our refresh parser doesn't understand.
We don't yet know what that input looks like.

That "skip invalid container images" commit merely lets rest of refresh complete when that happens, and logs the offending input.

Comment 10 Adam Grare 2017-09-25 12:27:54 UTC
Great point Beni, I think the bug here is that refresh fails with undefined method `[]' for nil:NilClass but once we figure out the cause there should be another BZ to handle whatever name was causing this parsing error.

Comment 12 juwatts 2017-11-29 16:39:06 UTC
Verified on version:
5.9.0.11.20171127204214_e316988

Verification Steps:
1) Add an OpenShift provider to CFME appliance
2) On the OpenShift provider, created a pod json file and changed the "image" key value from a valid image to "bad-Image-name"
3) Saved and created the pod
4) On the CFME appliance, ran a manual refresh
5) Verified the refresh was successful and the above ERROR was not logged