Description of problem: Unable to collect inventory from an OpenShift provider with 40,000+ container images causing the EmsRefresh to fail. The kubeclient in the openshift provider throws a KubeException due to a timeout. [----] E, [2017-03-24T04:10:01.291803 #27999:e53134] ERROR -- : [KubeException]: Timed out reading data from server Method:[rescue in block in refresh] [----] E, [2017-03-24T04:10:01.291935 #27999:e53134] ERROR -- : /opt/rh/cfme-gemset/gems/kubeclient-2.3.0/lib/kubeclient/common.rb:107:in `rescue in handle_exception' /opt/rh/cfme-gemset/gems/kubeclient-2.3.0/lib/kubeclient/common.rb:99:in `handle_exception' /opt/rh/cfme-gemset/gems/kubeclient-2.3.0/lib/kubeclient/common.rb:258:in `get_entities' /opt/rh/cfme-gemset/gems/kubeclient-2.3.0/lib/kubeclient/common.rb:169:in `block (2 levels) in define_entity_methods' /var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresher_mixin.rb:16:in `block in fetch_entities' /var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresher_mixin.rb:14:in `each' /var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresher_mixin.rb:14:in `each_with_object' /var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresher_mixin.rb:14:in `fetch_entities' /var/www/miq/vmdb/app/models/manageiq/providers/openshift/container_manager/refresher.rb:20:in `block in parse_legacy_inventory' /var/www/miq/vmdb/app/models/ext_management_system.rb:362:in `with_provider_connection' /var/www/miq/vmdb/app/models/manageiq/providers/openshift/container_manager/refresher.rb:19:in `parse_legacy_inventory' Version-Release number of selected component (if applicable): 5.7.1.3 (Kubeclient version 2.3.1) We observed that with a raw curl command there was approximately 2min 10sec before any data was returned to the caller. It appears that with this many container images the openshift server is spending the entire timeout just building the response before sending any data back.
Beni please link the PRs that should address this.
https://github.com/abonas/kubeclient/pull/244 is kubeclient side. That PR may get stuck on backward-compatibility concerns, I'm considering reducing it to something easier. Then we'd need a kubeclient release and a manageiq PR (probably 2 after the repo split).
https://github.com/ManageIQ/manageiq-providers-kubernetes/pull/10 https://github.com/ManageIQ/manageiq-providers-openshift/pull/8 (or possibly #7 if deemed backportable) and will need a manageiq-gems-pending PR to bump gemspec referring to kubeclient
https://github.com/ManageIQ/manageiq-gems-pending/pull/156
New commit detected on ManageIQ/manageiq-gems-pending/master: https://github.com/ManageIQ/manageiq-gems-pending/commit/dfb49ce830e93b2a9f9aabf3e04aaf18280c9cc6 commit dfb49ce830e93b2a9f9aabf3e04aaf18280c9cc6 Author: Beni Cherniavsky-Paskin <cben> AuthorDate: Wed May 10 16:03:00 2017 +0300 Commit: Beni Cherniavsky-Paskin <cben> CommitDate: Wed May 10 16:03:00 2017 +0300 bump kubeclient ~> 2.4.0 Needed to control timeout. https://bugzilla.redhat.com/show_bug.cgi?id=1440950 manageiq-gems-pending.gemspec | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Proposing 5.7.z? 5.8.z? because customer is running 5.7.1.
Yes, they're all on master — but not on fine nor euwe. And this BZ is directly targetted at 5.7. Should I move this BZ to POST? Shouldn't this be say 5.9, with 5.8, 5.7 clones so I can move just the 5.9 one to POST?
(In reply to Beni Paskin-Cherniavsky from comment #9) > Yes, they're all on master — but not on fine nor euwe. And this BZ is > directly targetted at 5.7. Should I move this BZ to POST? Yes. Clones will be created after the fix on master has been merged and this is on POST.