Created attachment 1267211 [details] Process 22215 from March 24 2017 logs of *6824 appliance with multiple errors for msgids: 10000221306827,10000221346529,10000221367114,10000221381043 and many others Description of problem: processing by method perf_capture_realtime encounters reported error after successfully scheduling dozens of messages for other VMDB instance types. Version-Release number of selected component (if applicable):Version: 5.7.1.3 How reproducible: Seems to be associated with OpenShift Steps to Reproduce: 1. 2. 3. Actual results:metod errors consistently and never completes without error and never runs so long as to encounter timeout Expected results:normal method terminatation after creating all performance capture messages. Additional info: Error log sequence follows: ===== [----] I, [2017-03-24T09:51:05.975054 #22215:49113c] INFO -- : MIQ(MiqPriorityWorker::Runner#get_message_via_drb) Message id: [10000221727923], MiqWorker id: [10000001072912], Zone: [CTC DMZ], Role: [ems_metrics_coordinator], Server: [], Ident: [generic], Target id: [], Instance id: [], Task id: [], Command: [Metric::Capture.perf_capture_timer], Timeout: [600], Priority: [20], State: [dequeue], Deliver On: [], Data: [], Args: [], Dequeued in: [4.352411201] seconds [----] I, [2017-03-24T09:51:05.975459 #22215:49113c] INFO -- : MIQ(Metric::Capture.perf_capture_timer) Queueing performance capture... ..... [----] I, [2017-03-24T09:54:56.837117 #22215:49113c] INFO -- : MIQ(MiqQueue.put) Message id: [10000221732431], id: [], Zone: [CTC DMZ], Role: [ems_metrics_collector], Server: [], Ident: [openshift_enterprise], Target id: [], Instance id: [10000000003166], Task id: [], Command: [ManageIQ::Providers::Kubernetes::ContainerManager::Container.perf_capture_realtime], Timeout: [600], Priority: [100], State: [ready], Deliver On: [], Data: [], Args: [2017-03-24 11:09:40 UTC, 2017-03-24 14:54:56 UTC] [----] E, [2017-03-24T09:55:00.459866 #22215:49113c] ERROR -- : MIQ(MiqQueue#deliver) Message id: [10000221727923], Error: [Unsupported type ManageIQ::Providers::Kubernetes::ContainerManager::Container (id: 10000000028453)] [----] E, [2017-03-24T09:55:00.460078 #22215:49113c] ERROR -- : [RuntimeError]: Unsupported type ManageIQ::Providers::Kubernetes::ContainerManager::Container (id: 10000000028453) Method:[rescue in deliver] [----] E, [2017-03-24T09:55:00.460336 #22215:49113c] ERROR -- : /var/www/miq/vmdb/app/models/metric/ci_mixin/capture.rb:16:in `queue_name_for_metrics_collection' /var/www/miq/vmdb/app/models/metric/ci_mixin/capture.rb:68:in `perf_capture_queue' /var/www/miq/vmdb/app/models/metric/capture.rb:215:in `block in queue_captures' /var/www/miq/vmdb/app/models/metric/capture.rb:210:in `each' /var/www/miq/vmdb/app/models/metric/capture.rb:210:in `queue_captures' /var/www/miq/vmdb/app/models/metric/capture.rb:51:in `perf_capture_timer' /var/www/miq/vmdb/app/models/miq_queue.rb:347:in `block in deliver' ..... =====
Thomas can you please provide 1. the output of: (from rails console) ManageIQ::Providers::Kubernetes::ContainerManager::Container.find(10000000028453) 2. Also grep of this on all logs (including rolled ones) grep "Disconnecting Container" evm.log|grep 10000000028453 quick reference points: a. https://github.com/manageiq/manageiq/blob/4ad0054c6a92d2b6ee63437b4f1508fb0a6952e5/app/models/container.rb#L45 b. https://github.com/manageiq/manageiq/blob/7bd3090330bf4ee8076737492fbf0dec1001da9c/app/models/metric/ci_mixin/capture.rb#L8
Mooli, Thanks for looking at the case, but I'm afraid that the short answer to you question is, I cannot comply as this particular problem is one of many which has surfaced at an active customer where many other issues are actively being worked, and I do not have direct access to the customer for this reason. I have opened this case because, as with other others recently opened for this customer, while the current blocker is provider refresh, the end objective is to have a complete set of C&U reports once inventory is reliably gathered, and this case becomes one of several that will need to be addressed in order to allow complete C&U collections to proceed. I can provide you access to the full evm.log file if that is of any assistance,
Thomas, The phenomena described above (shown in the log) happens on C & U collection, when the system can not resolve the ext_management_system of the target object (can not reference this object to a specific provider). One of the reasons for this to happen is the failure of the inventory refresh, which is exactly what we know happens in this case. I am not sure this bug has its own justification.
So based on your description above, you appear to be saying that underlying reason for this error will have become repaired when all of the ems refresh activity has been completed for all of the providers in this customer's database. Hopefully that state will soon be achieved.
(In reply to Barak from comment #4) > Thomas, > > The phenomena described above (shown in the log) happens on C & U collection, > when the system can not resolve the ext_management_system of the target > object (can not reference this object to a specific provider). From the error message reported in the description this seems a duplicate of bug 1408968. In the end the issue was fixed in bug 1420721: https://github.com/ManageIQ/manageiq/pull/13686 If indeed the issue here is a duplicate of the BZs above then an inventory refresh won't help.
Right, my mistake (was looking at the already fixed master branch - I'll make sure to pull the relevant one next time) +1 for closing as duplicate of 1420721
per comments #6 & #7 moving this bug to CLOSED DUPLICATE, It will be shipped as a part of 5.7.2 *** This bug has been marked as a duplicate of bug 1426683 ***