1440950 – Unable to collect inventory for 40,000 container images, results in kubeclient timeout

Bug 1440950 - Unable to collect inventory for 40,000 container images, results in kubeclient timeout

Summary: Unable to collect inventory for 40,000 container images, results in kubeclien...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat CloudForms Management Engine
Classification:	Red Hat
Component:	Providers
Sub Component:
Version:	5.7.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	GA
Target Release:	5.9.0
Assignee:	Beni Paskin-Cherniavsky
QA Contact:	Einat Pacifici
Docs Contact:
URL:
Whiteboard:	container
Depends On:
Blocks:	1454383 1459929
TreeView+	depends on / blocked

Reported:	2017-04-10 19:26 UTC by Adam Grare
Modified:	2020-06-11 13:33 UTC (History)
CC List:	9 users (show)
Fixed In Version:	5.9.0.1
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1454383 1459929 (view as bug list)
Environment:
Last Closed:	2018-03-06 15:43:36 UTC
Category:	---
Cloudforms Team:	Container Management
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Adam Grare 2017-04-10 19:26:17 UTC

Description of problem:
Unable to collect inventory from an OpenShift provider with 40,000+ container images causing the EmsRefresh to fail.  The kubeclient in the openshift provider throws a KubeException due to a timeout.

[----] E, [2017-03-24T04:10:01.291803 #27999:e53134] ERROR -- : [KubeException]: Timed out reading data from server  Method:[rescue in block in refresh]
[----] E, [2017-03-24T04:10:01.291935 #27999:e53134] ERROR -- : /opt/rh/cfme-gemset/gems/kubeclient-2.3.0/lib/kubeclient/common.rb:107:in `rescue in handle_exception'
/opt/rh/cfme-gemset/gems/kubeclient-2.3.0/lib/kubeclient/common.rb:99:in `handle_exception'
/opt/rh/cfme-gemset/gems/kubeclient-2.3.0/lib/kubeclient/common.rb:258:in `get_entities'
/opt/rh/cfme-gemset/gems/kubeclient-2.3.0/lib/kubeclient/common.rb:169:in `block (2 levels) in define_entity_methods'
/var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresher_mixin.rb:16:in `block in fetch_entities'
/var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresher_mixin.rb:14:in `each'
/var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresher_mixin.rb:14:in `each_with_object'
/var/www/miq/vmdb/app/models/manageiq/providers/kubernetes/container_manager/refresher_mixin.rb:14:in `fetch_entities'
/var/www/miq/vmdb/app/models/manageiq/providers/openshift/container_manager/refresher.rb:20:in `block in parse_legacy_inventory'
/var/www/miq/vmdb/app/models/ext_management_system.rb:362:in `with_provider_connection'
/var/www/miq/vmdb/app/models/manageiq/providers/openshift/container_manager/refresher.rb:19:in `parse_legacy_inventory'

Version-Release number of selected component (if applicable):
5.7.1.3 (Kubeclient version 2.3.1)

We observed that with a raw curl command there was approximately 2min 10sec before any data was returned to the caller.  It appears that with this many container images the openshift server is spending the entire timeout just building the response before sending any data back.

Comment 2 Federico Simoncelli 2017-04-25 07:30:14 UTC

Beni please link the PRs that should address this.

Comment 3 Beni Paskin-Cherniavsky 2017-04-25 09:30:11 UTC

https://github.com/abonas/kubeclient/pull/244 is kubeclient side.
That PR may get stuck on backward-compatibility concerns, I'm considering reducing it to something easier.

Then we'd need a kubeclient release and a manageiq PR (probably 2 after the repo split).

Comment 4 Beni Paskin-Cherniavsky 2017-05-01 13:17:20 UTC

https://github.com/ManageIQ/manageiq-providers-kubernetes/pull/10
https://github.com/ManageIQ/manageiq-providers-openshift/pull/8 (or possibly #7 if deemed backportable)
and will need a manageiq-gems-pending PR to bump gemspec referring to kubeclient

Comment 5 CFME Bot 2017-05-10 12:43:15 UTC

https://github.com/ManageIQ/manageiq-gems-pending/pull/156

Comment 6 CFME Bot 2017-05-10 13:07:59 UTC

New commit detected on ManageIQ/manageiq-gems-pending/master:
https://github.com/ManageIQ/manageiq-gems-pending/commit/dfb49ce830e93b2a9f9aabf3e04aaf18280c9cc6

commit dfb49ce830e93b2a9f9aabf3e04aaf18280c9cc6
Author:     Beni Cherniavsky-Paskin <cben>
AuthorDate: Wed May 10 16:03:00 2017 +0300
Commit:     Beni Cherniavsky-Paskin <cben>
CommitDate: Wed May 10 16:03:00 2017 +0300

    bump kubeclient ~> 2.4.0
    
    Needed to control timeout.
    https://bugzilla.redhat.com/show_bug.cgi?id=1440950

 manageiq-gems-pending.gemspec | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comment 7 Beni Paskin-Cherniavsky 2017-05-10 13:36:03 UTC

Proposing 5.7.z? 5.8.z? because customer is running 5.7.1.

Comment 9 Beni Paskin-Cherniavsky 2017-05-20 20:46:42 UTC

Yes, they're all on master — but not on fine nor euwe.  And this BZ is directly targetted at 5.7.  Should I move this BZ to POST?  Shouldn't this be say 5.9, with 5.8, 5.7 clones so I can move just the 5.9 one to POST?

Comment 10 Federico Simoncelli 2017-05-22 08:20:35 UTC

(In reply to Beni Paskin-Cherniavsky from comment #9)
> Yes, they're all on master — but not on fine nor euwe.  And this BZ is
> directly targetted at 5.7.  Should I move this BZ to POST?

Yes.
Clones will be created after the fix on master has been merged and this is on POST.

Note You need to log in before you can comment on or make changes to this bug.