Summary: Image pruning on api.ci is wedged due to too many images
Cause: the pruner were getting all images in a single request Consequence: this request took too long Fix: use pager to get all images Result: the pruner can get all images without hitting timeout
Approximately 55 days ago image pruning stopped working on api.ci, probably either due to a transient failure, or hitting a certain limit.

The current error is

oc logs jobs/image-pruner-clayton-debug
Error from server (Timeout): the server was unable to return a response in the time allotted, but may still be processing the request (get images.image.openshift.io)

We have 200k images on the cluster, and it looks like the images call times out trying to load them all into memory.  When I ran a paged call (locally in oc get) it took several minutes and we eventually hit the compaction window, so I was unable.  Testing locally to see size.

The cluster needs to be able to prune, and we will need to take action to get it back under the threshold.  We then need to ensure that this failure mode doesn't happen in the future.

Testing on the API server directly images was 39M in JSON and took about 3m20s to retrieve.  Compaction is set to 5m so we are close to the "unable to read all images before compaction window".

PR: https://github.com/openshift/origin/pull/22655

Backport request to 3.11: https://bugzilla.redhat.com/show_bug.cgi?id=1702346

Could do pruning 200K images operation in a pod with 4.2 version(4.2.0-0.nightly-2019-06-25-003324)
$./oc adm prune images --registry-url=default-route-openshift-image-registry.apps.xiuwang-42-largeimages.qe.devcluster.openshift.com --certificate-authority=ca.crt --all --loglevel=8 2>> 20kimageprune-2.log >> 20kimageprune-2.log
