Description of problem: The cluster is deployed as OpenStack IPI + OVNKubernetes (4.8.5). As per the following documentation, the customer configured the AWS S3 bucket for the internal image registry. --> https://docs.openshift.com/container-platform/4.8/registry/configuring_registry_storage/configuring-registry-storage-aws-user-infrastructure.html However, the image-resgitry operator isn't able to connect to that S3 bucket due to the following errors in cluster-image-registry-operator pod logs. ~~~ 2021-09-21T14:54:17.809059328Z E0921 14:54:17.808892 1 controller.go:369] unable to sync: unable to sync storage configuration: RequestError : send request failed 2021-09-21T14:54:17.809059328Z caused by: Head "https://mpp-xxxxx.s3.dualstack.us-east-1.amazonaws.com/": x509: certificate signed by unknown authority, requeuing ~~~ - In the customer's environment, we ran the curl command as well as nslookup command on the URL "mpp-xxxxx.s3.dualstack.us-east-1.amazonaws.com" from the pod which worked fine. Please check comment number 28. Version-Release number of selected component (if applicable): OpenShift 4.8.5 Actual results: The image-registry operator is degraded after configuring the AWS S3 bucket for storage. Expected results: The image-registry operator should be working fine. Additional info: I will be sharing the must-gather and other details.
The config map kube-cloud-config in the namespace openshift-config-managed overrides CA trust store for the image registry operator. It supposed to have CAs for S3, but has only one internal CA.
Hello Oleg, Thanks for the detailed analysis. One thing I am not sure about is why we need to add the DigiCert CA in that configmap as that configmap contains the details about the underlying CloudProvider only (in this case OpenStack) and shouldn't be responsible for the CAs which are getting added to the nodes and pods truststore. Please correct me if I am wrong.
Hello Oleg, I have attached the TCPDUMP for the cluster image registry operator. Please feel free to update if anything else is needed. Regards, Ayush Garg
No, I don't need extra data. If you have kube-cloud-config, the image registry operator will use its ca-bundle.pem to verify S3 certificates. We don't really expect you to use S3 on OpenStack, the recommended storage is Swift. But if you want to use S3 on OpenStack, you should add its CA into the trust bundle as well. We may revisit this implementation detail and use `ca-bundle.pem` as addition (not a replacement) to system-wide trust bundle in future versions (4.10 or later), but in 4.8 it works this way.
Okay, got your point. I was just wanted to confirm the same with you as ideally the OpenShift cluster already contains the global CA. Is it fine if I convey the same to the customer and ask for adding the CAs manually in "cloud-provider-config" configmap?
Hello Oleg, Thanks a lot for sharing the solution. We were able to resolve the issue by adding the DigiCert CA in the configmap and deleting the pods. The only ask from the customer's side is that whether this is specific to OpenStack and expected behaviour or it's a bug. Regards, Ayush Garg
This is a corner case, but I'd say it's not the expected behavior. AFAIK the only affected platform at the moment is OpenStack with a custom CA, but eventually other platforms may start to use this config map and affect the registry operator.
Launch cluster on osp Add custom CA to openshift-config/cloud-provider-config $oc set data configmap/cloud-provider-config --from-file=ca-bundle.pem=tls.crt -n openshift-config Patch to use s3 bucket oc patch configs.imageregistry.operator.openshift.io/cluster --type merge -p '{"spec":{"storage":{"managementState":"Unmanaged","swift":null,"s3":{"bucket":"noobaas3-2aa915bc-8ba0-409d-8048-13822a47b514","encrypt":true,"region":"us-east-2","regionEndpoint":"https://s3-openshift-storage.apps.wxjosp510.qe.devcluster.openshift.com","virtualHostedStyle":false"}}}}' --type=merge The registry pod could be running with the new configure, don't met x509 error. Could pull and push image to imageregistry then. But the deployer is failed to pull image with x509 error oc get pods NAME READY STATUS RESTARTS AGE httpd-ex-1-build 0/1 Completed 0 120m httpd-ex-2-build 0/1 Completed 0 31m httpd-ex-58c95cc947-bwpzd 0/1 ImagePullBackOff 0 30m httpd-ex-599c9b4676-g6995 0/1 ImagePullBackOff 0 117m mypod 0/1 ImagePullBackOff 0 23s test 0/1 ImagePullBackOff 0 5m10s $podman tag 83aa35aa1c79 default-route-openshift-image-registry.apps.wxjosp510.qe.devcluster.openshift.com/wxj/mypushimage:latest podman push default-route-openshift-image-registry.apps.wxjosp510.qe.devcluster.openshift.com/wxj/mypushimage --tls-verify=false --authfile=registry-auth $oc get is NAME IMAGE REPOSITORY TAGS UPDATED httpd-ex default-route-openshift-image-registry.apps.wxjosp510.qe.devcluster.openshift.com/wxj/httpd-ex latest 29 minutes ago mypushimage default-route-openshift-image-registry.apps.wxjosp510.qe.devcluster.openshift.com/wxj/mypushimage latest 45 seconds ago $podman pull default-route-openshift-image-registry.apps.wxjosp510.qe.devcluster.openshift.com/wxj/mypushimage:latest --tls-verify=false --authfile=registry-auth Trying to pull default-route-openshift-image-registry.apps.wxjosp510.qe.devcluster.openshift.com/wxj/mypushimage:latest... Getting image source signatures Copying blob sha256:76efd6277e67c4b87ed7ddc888cf58b45236080c853748d48e24a15d46ef2a7e Copying blob sha256:76efd6277e67c4b87ed7ddc888cf58b45236080c853748d48e24a15d46ef2a7e Copying config sha256:83aa35aa1c79e4b6957e018da6e322bfca92bf3b4696a211b42502543c242d6f Writing manifest to image destination Storing signatures 83aa35aa1c79e4b6957e018da6e322bfca92bf3b4696a211b42502543c242d6f $oc run mypod --image=image-registry.openshift-image-registry.svc:5000/wxj/mypushimage:latest -- sleep 300 $oc describe pods mypod Warning Failed 5m36s kubelet Failed to pull image "image-registry.openshift-image-registry.svc:5000/wxj/mypushimage:latest": rpc error: code = Unknown desc = parsing image configuration: Get "https://s3-openshift-storage.apps.wxjosp510.qe.devcluster.openshift.com/noobaas3-34eda07e-a2be-44db-a0b8-9e1de649654d/docker/registry/v2/blobs/sha256/83/83aa35aa1c79e4b6957e018da6e322bfca92bf3b4696a211b42502543c242d6f/data?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=paKDtgFqHc7DIi5TeXH7%2F20220510%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20220510T101841Z&X-Amz-Expires=1200&X-Amz-SignedHeaders=host&X-Amz-Signature=f50fd32953fc993eb20fa1a6a4a4bd1546e6cf4dfa9008216e02cd219a69666e": x509: certificate signed by unknown authority Check my CA has been added to registry pod /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem.
As the comment in #18, the deployer can't pull image from image registry.
You should be able to workaround it by setting disableRediret: true. This limitation will be lifted once https://github.com/openshift/cluster-image-registry-operator/pull/759 lands.
After set disableRediret: true and update configmap/cloud-provider-config Could push and pull images to internal registry configured s3 bucket with trusted CA
But with https://github.com/openshift/cluster-image-registry-operator/pull/759, still need configure disableRediret: true on osp cluster to add costom CA to configmap/cloud-provider-config
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069