Bug 1702757 - Image pruning on api.ci is wedged due to too many images
Summary: Image pruning on api.ci is wedged due to too many images
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: ImageStreams
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: 4.2.0
Assignee: Oleg Bulatov
QA Contact: XiuJuan Wang
URL:
Whiteboard:
Depends On: 1702346
Blocks: 1710561
TreeView+ depends on / blocked
 
Reported: 2019-04-24 17:07 UTC by Adam Kaplan
Modified: 2019-10-16 06:28 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: the pruner were getting all images in a single request Consequence: this request took too long Fix: use pager to get all images Result: the pruner can get all images without hitting timeout
Clone Of: 1702346
: 1710561 (view as bug list)
Environment:
Last Closed: 2019-10-16 06:28:06 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Bugzilla 1702346 unspecified CLOSED [3.11] Image pruning on api.ci is wedged due to too many images 2020-10-14 00:28:05 UTC
Red Hat Product Errata RHBA-2019:2922 None None None 2019-10-16 06:28:21 UTC

Internal Links: 1710561

Description Adam Kaplan 2019-04-24 17:07:25 UTC
+++ This bug was initially created as a clone of Bug #1702346 +++

Approximately 55 days ago image pruning stopped working on api.ci, probably either due to a transient failure, or hitting a certain limit.

The current error is

oc logs jobs/image-pruner-clayton-debug
Error from server (Timeout): the server was unable to return a response in the time allotted, but may still be processing the request (get images.image.openshift.io)

We have 200k images on the cluster, and it looks like the images call times out trying to load them all into memory.  When I ran a paged call (locally in oc get) it took several minutes and we eventually hit the compaction window, so I was unable.  Testing locally to see size.

The cluster needs to be able to prune, and we will need to take action to get it back under the threshold.  We then need to ensure that this failure mode doesn't happen in the future.

--- Additional comment from Clayton Coleman on 2019-04-23 17:33:29 UTC ---

Testing on the API server directly images was 39M in JSON and took about 3m20s to retrieve.  Compaction is set to 5m so we are close to the "unable to read all images before compaction window".

Comment 1 Adam Kaplan 2019-04-24 17:09:28 UTC
PR: https://github.com/openshift/origin/pull/22655

Comment 2 Adam Kaplan 2019-04-24 17:12:43 UTC
Backport request to 3.11: https://bugzilla.redhat.com/show_bug.cgi?id=1702346

Comment 4 XiuJuan Wang 2019-06-25 08:34:35 UTC
Could do pruning 200K images operation in a pod with 4.2 version(4.2.0-0.nightly-2019-06-25-003324)
$ ./oc version 
Client Version: version.Info{Major:"4", Minor:"2+", GitVersion:"v4.2.0-201906241832+7a0a2f2-dirty", GitCommit:"7a0a2f2", GitTreeState:"dirty", BuildDate:"2019-06-24T23:20:08Z", GoVersion:"go1.12.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.0+952fea3", GitCommit:"952fea3", GitTreeState:"clean", BuildDate:"2019-06-24T23:20:31Z", GoVersion:"go1.12.6", Compiler:"gc", Platform:"linux/amd64"}

$ time ./oc get images  | wc -l 
200210

real	2m55.400s
user	0m14.086s
sys	0m3.886s

$./oc adm prune images --registry-url=default-route-openshift-image-registry.apps.xiuwang-42-largeimages.qe.devcluster.openshift.com --certificate-authority=ca.crt --all --loglevel=8 2>> 20kimageprune-2.log >> 20kimageprune-2.log
===========================snip==============================
I0625 08:14:38.409367     174 round_trippers.go:423] Request Headers:
I0625 08:14:38.409374     174 round_trippers.go:426]     Accept: application/json, */*
I0625 08:14:38.409381     174 round_trippers.go:426]     User-Agent: oc/v1.14.0+7a0a2f2 (linux/amd64) kubernetes/7a0a2f2
I0625 08:14:38.409388     174 round_trippers.go:426]     Authorization: Bearer 9QZf_Y30gHVBa1FW6eqdp7124c1i7nr_fxlytnV6o88
I0625 08:14:39.945956     174 round_trippers.go:441] Response Status: 200 OK in 1536 milliseconds
I0625 08:14:39.945993     174 round_trippers.go:444] Response Headers:
I0625 08:14:39.945999     174 round_trippers.go:447]     Content-Type: application/json
I0625 08:14:39.946012     174 round_trippers.go:447]     Date: Tue, 25 Jun 2019 08:14:39 GMT
I0625 08:14:39.946016     174 round_trippers.go:447]     Audit-Id: f8b1c280-c595-457f-b2dc-8c9cdefaeb56
I0625 08:14:39.946021     174 round_trippers.go:447]     Cache-Control: no-store
I0625 08:14:39.946027     174 round_trippers.go:447]     Cache-Control: no-store
I0625 08:14:39.996312     174 request.go:942] Response Body: {"kind":"ImageStreamList","apiVersion":"image.openshift.io/v1","metadata":{"selfLink":"/apis/image.openshift.io/v1/imagestreams","resourceVersion":"272039"},"items":[{"metadata":{"name":"apicast-gateway","namespace":"openshift","selfLink":"/apis/image.openshift.io/v1/namespaces/openshift/imagestreams/apicast-gateway","uid":"23493dea-96fd-11e9-825f-0a580a820014","resourceVersion":"8161","generation":2,"creationTimestamp":"2019-06-25T03:55:57Z","labels":{"samples.operator.openshift.io/managed":"true"},"annotations":{"openshift.io/display-name":"3scale APIcast API Gateway","openshift.io/image.dockerRepositoryCheck":"2019-06-25T03:56:14Z","samples.operator.openshift.io/version":"4.2.0-0.nightly-2019-06-25-003324"}},"spec":{"lookupPolicy":{"local":false},"tags":[{"name":"2.1.0.GA","annotations":{"description":"3scale's APIcast is an NGINX based API gateway used to integrate your internal and external API services with 3scale's API Management Platform. It supports OpenID connect to integrate with external Identity  [truncated 283646 chars]
I0625 08:14:40.001577     174 prune.go:277] Creating image pruner with keepYoungerThan=1h0m0s, keepTagRevisions=3, pruneOverSizeLimit=<nil>, allImages=true
I0625 08:14:40.001604     174 prune.go:356] Adding image "sha256:0089883f8e4387618946cd24378a447b8cf7e5dfaa146b94acab27fc5e170a14" to graph
I0625 08:14:40.001842     174 prune.go:378] Adding image layer "sha256:26e5ed6899dbf4b1e93e0898255e8aaf43465cecd3a24910f26edb5d43dafa3c" to graph
===================================snip=====================================

Comment 5 errata-xmlrpc 2019-10-16 06:28:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.