Hide Forgot
Description of problem: When add invalid proxy to config.imagereigstry cluster, registry pods are CrashLoopBackOff,and co/image-registry is degrade Version-Release number of selected component (if applicable): 4.9.0-0.nightly-2021-10-18-182325 How reproducible: Always Steps to Reproduce: 1.Setup a cluster with a global proxy [wewang@localhost]$ oc get proxy.config -oyaml apiVersion: v1 items: - apiVersion: config.openshift.io/v1 kind: Proxy metadata: creationTimestamp: "2021-10-19T02:49:51Z" generation: 1 name: cluster resourceVersion: "568" uid: daf12f74-5b4e-44b6-b6f1-0643e79a8990 spec: httpProxy: http://proxy-user1:xxxRZV4DY4PXJbxJK@10.0.xx.xx:31xx httpsProxy: http://proxy-user1:xxxx8qRZV4DY4PXJbxJK@10.0.xx.xx:31xx noProxy: test.no-proxy.com trustedCA: name: "" status: httpProxy: http://proxy-user1:xxxxxRZV4DY4PXJbxJK@10.0.xxx.xx:31xx httpsProxy: http://proxy-user1:xxRZV4DY4PXJbxJK@10.0.xx.xx:31xx noProxy: .cluster.local,.svc,10.0.0.0/xx,10.xx.0.0/14,xx0.0.1,xx.xx.169.254,xxx.xx0.0.0/16,xx-int.wewang-reprod.xx.azure.xxxluster.openshift.com,localhost,test.no-proxy.com 2. Add invalid proxy to config.image/cluster [wewang@localhost]$ oc get config.image -oyaml `` managementState: Managed observedConfig: null operatorLogLevel: Normal proxy: http: http://test:3128 https: http://test:3128 noProxy: test.no-proxy.com replicas: 2 requests: read: maxWaitInQueue: 0s write: maxWaitInQueue: 0s rolloutStrategy: RollingUpdate storage: azure: accountName: imageregistrywewangkxwzj cloudName: AzurePublicCloud container: wewang-reprod-b6mhg-image-registry-vamlnqokmkldqfvmgtelldhkurj managementState: Managed ``` 3. Check the image registry pods [wewang@localhost ~]$ oc get pods -n openshift-image-registry NAME READY STATUS RESTARTS AGE cluster-image-registry-operator-5998498858-jsc77 1/1 Running 1 (3h47m ago) 3h58m image-registry-7b557bd6fb-v6n84 1/1 Running 0 3h45m image-registry-8554cf844-j7lgd 0/1 CrashLoopBackOff 6 (116s ago) 10m image-registry-8554cf844-xrkkc 0/1 CrashLoopBackOff 6 (100s ago) 10m Here’s log: http://pastebin.test.redhat.com/1002261 4.Check the image registry operator [wewang@localhost]$ oc get co/image-registry NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE image-registry 4.9.0-0.nightly-2021-10-18-182325 True True True 3h53m Degraded: Registry deployment has timed out progressing: ReplicaSet "image-registry-8554cf844" has timed out progressing. 5. Must-gather info: http://virt-openshift-05.lab.eng.nay.redhat.com/wewang/image-registry/ Actual results: 1. Pods are not running Expected results: 2. Pods are running Additional info: Only reproduced this issue on azure & proxy and openstack & proxy clusters, proxy clusters on other platforms like aws,vsphere,gcp and bm, did not met the issue.
The registry pods cannot reach storage when an invalid proxy is set, so they should become unhealthy and be killed. That's exactly what happens on your cluster. I'd say it's a bug that it doesn't happen on AWS/GCP. It's ok for the registry to stay alive when it doesn't use HTTP connections and uses a regular file system (i.e. PVC).
Verfied in version: Version 4.10.0-0.ci.test-2021-11-23-070259-ci-ln-jintmht-latest tested in azure and openstack cluster, when set invalid proxy in config.image, registry pods are running.
That's not how it should work. The pod should be unhealthy (and eventually be killed) when invalid proxy is set.
On my local cluster: $ oc get config.imageregistry/cluster -o json | jq .spec.proxy { "http": "http://localhost", "https": "http://localhost" } $ oc -n openshift-image-registry get pods -l docker-registry=default NAME READY STATUS RESTARTS AGE image-registry-6b674466bf-8kp5j 0/1 Running 2 (71s ago) 4m14s image-registry-6b674466bf-vxqjp 0/1 Running 2 (67s ago) 4m14s The pod starts to crash (see restarts).
Verified on 4.11.0-0.nightly-2022-03-17-024314 Image registry pod will report crash(restart) when add invalid proxy Warning ProbeError 10s (x2 over 20s) kubelet Readiness probe error: HTTP probe failed with statuscode: 503 body: {"errors":[{"code":"UNAVAILABLE","message":"service unavailable","detail":"health check failed: please see /debug/health"}]} Warning Unhealthy 10s (x2 over 20s) kubelet Readiness probe failed: HTTP probe failed with statuscode: 503 Warning ProbeError 10s (x2 over 20s) kubelet Liveness probe error: HTTP probe failed with statuscode: 503 body: {"errors":[{"code":"UNAVAILABLE","message":"service unavailable","detail":"health check failed: please see /debug/health"}]} Warning Unhealthy 10s (x2 over 20s) kubelet Liveness probe failed: HTTP probe failed with statuscode: 503
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069