Bug 2007611 - TLS issues with the internal registry and AWS S3 bucket
Summary: TLS issues with the internal registry and AWS S3 bucket
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Image Registry
Version: 4.8
Hardware: Unspecified
OS: Linux
medium
high
Target Milestone: ---
: 4.11.0
Assignee: Oleg Bulatov
QA Contact: XiuJuan Wang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-24 12:01 UTC by aygarg
Modified: 2024-12-20 21:12 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: the cluster-scoped CA trust bundle, when it is provided, were used by the image-registry operator instead of the system trust bundle. Consequence: the image-registry operator doesn't work with AWS S3 on OpenStack that needs a custom CA. Fix: merge the cluster-scoped CAs with the system trust bundle. Result: the image-registry operator trusts AWS S3 certificates on any platform.
Clone Of:
Environment:
Last Closed: 2022-08-10 10:37:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-image-registry-operator pull 770 0 None open Bug 2007611: Merge S3 CA bundle with system CA bundle 2022-04-28 14:08:14 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 10:38:04 UTC

Description aygarg 2021-09-24 12:01:52 UTC
Description of problem:
The cluster is deployed as OpenStack IPI + OVNKubernetes (4.8.5). As per the following documentation, the customer configured the AWS S3 bucket for the internal image registry.
--> https://docs.openshift.com/container-platform/4.8/registry/configuring_registry_storage/configuring-registry-storage-aws-user-infrastructure.html

However, the image-resgitry operator isn't able to connect to that S3 bucket due to the following errors in cluster-image-registry-operator pod logs.
~~~
2021-09-21T14:54:17.809059328Z E0921 14:54:17.808892       1 controller.go:369] unable to sync: unable to sync storage configuration: RequestError
: send request failed
2021-09-21T14:54:17.809059328Z caused by: Head "https://mpp-xxxxx.s3.dualstack.us-east-1.amazonaws.com/": x509: certificate signed by unknown authority, requeuing
~~~

- In the customer's environment, we ran the curl command as well as nslookup command on the URL "mpp-xxxxx.s3.dualstack.us-east-1.amazonaws.com" from the pod which worked fine. Please check comment number 28.

Version-Release number of selected component (if applicable):
OpenShift 4.8.5


Actual results:
The image-registry operator is degraded after configuring the AWS S3 bucket for storage.


Expected results:
The image-registry operator should be working fine.

Additional info:
I will be sharing the must-gather and other details.

Comment 2 Oleg Bulatov 2021-09-27 12:15:31 UTC
The config map kube-cloud-config in the namespace openshift-config-managed overrides CA trust store for the image registry operator. It supposed to have CAs for S3, but has only one internal CA.

Comment 4 aygarg 2021-09-27 12:30:01 UTC
Hello Oleg,

Thanks for the detailed analysis. One thing I am not sure about is why we need to add the DigiCert CA in that configmap as that configmap contains the details about the underlying CloudProvider only (in this case OpenStack) and shouldn't be responsible for the CAs which are getting added to the nodes and pods truststore. Please correct me if I am wrong.

Comment 7 aygarg 2021-09-27 15:34:46 UTC
Hello Oleg,

I have attached the TCPDUMP for the cluster image registry operator. Please feel free to update if anything else is needed.

Regards,
Ayush Garg

Comment 8 Oleg Bulatov 2021-09-27 16:38:39 UTC
No, I don't need extra data.

If you have kube-cloud-config, the image registry operator will use its ca-bundle.pem to verify S3 certificates.

We don't really expect you to use S3 on OpenStack, the recommended storage is Swift. But if you want to use S3 on OpenStack, you should add its CA into the trust bundle as well.

We may revisit this implementation detail and use `ca-bundle.pem` as addition (not a replacement) to system-wide trust bundle in future versions (4.10 or later), but in 4.8 it works this way.

Comment 9 aygarg 2021-09-27 16:48:42 UTC
Okay, got your point. I was just wanted to confirm the same with you as ideally the OpenShift cluster already contains the global CA. Is it fine if I convey the same to the customer and ask for adding the CAs manually in "cloud-provider-config" configmap?

Comment 11 aygarg 2021-09-28 15:03:18 UTC
Hello Oleg,

Thanks a lot for sharing the solution. We were able to resolve the issue by adding the DigiCert CA in the configmap and deleting the pods. The only ask from the customer's side is that whether this is specific to OpenStack and expected behaviour or it's a bug.

Regards,
Ayush Garg

Comment 13 Oleg Bulatov 2021-10-05 07:20:02 UTC
This is a corner case, but I'd say it's not the expected behavior. AFAIK the only affected platform at the moment is OpenStack with a custom CA, but eventually other platforms may start to use this config map and affect the registry operator.

Comment 18 XiuJuan Wang 2022-05-10 10:30:40 UTC
Launch cluster on osp
Add custom CA to openshift-config/cloud-provider-config
$oc set data configmap/cloud-provider-config --from-file=ca-bundle.pem=tls.crt -n openshift-config

Patch to use s3 bucket
oc patch configs.imageregistry.operator.openshift.io/cluster --type merge -p '{"spec":{"storage":{"managementState":"Unmanaged","swift":null,"s3":{"bucket":"noobaas3-2aa915bc-8ba0-409d-8048-13822a47b514","encrypt":true,"region":"us-east-2","regionEndpoint":"https://s3-openshift-storage.apps.wxjosp510.qe.devcluster.openshift.com","virtualHostedStyle":false"}}}}' --type=merge

The registry pod could be running with the new configure, don't met x509 error.

Could pull and push image to imageregistry then. But the deployer is failed to pull image with x509 error
oc get pods
NAME                        READY   STATUS             RESTARTS   AGE
httpd-ex-1-build            0/1     Completed          0          120m
httpd-ex-2-build            0/1     Completed          0          31m
httpd-ex-58c95cc947-bwpzd   0/1     ImagePullBackOff   0          30m
httpd-ex-599c9b4676-g6995   0/1     ImagePullBackOff   0          117m
mypod                       0/1     ImagePullBackOff   0          23s
test                        0/1     ImagePullBackOff   0          5m10s

$podman tag 83aa35aa1c79  default-route-openshift-image-registry.apps.wxjosp510.qe.devcluster.openshift.com/wxj/mypushimage:latest
podman push default-route-openshift-image-registry.apps.wxjosp510.qe.devcluster.openshift.com/wxj/mypushimage --tls-verify=false --authfile=registry-auth

$oc get is
NAME          IMAGE REPOSITORY                                                                                    TAGS     UPDATED
httpd-ex      default-route-openshift-image-registry.apps.wxjosp510.qe.devcluster.openshift.com/wxj/httpd-ex      latest   29 minutes ago
mypushimage   default-route-openshift-image-registry.apps.wxjosp510.qe.devcluster.openshift.com/wxj/mypushimage   latest   45 seconds ago

$podman pull default-route-openshift-image-registry.apps.wxjosp510.qe.devcluster.openshift.com/wxj/mypushimage:latest --tls-verify=false --authfile=registry-auth
Trying to pull default-route-openshift-image-registry.apps.wxjosp510.qe.devcluster.openshift.com/wxj/mypushimage:latest...
Getting image source signatures
Copying blob sha256:76efd6277e67c4b87ed7ddc888cf58b45236080c853748d48e24a15d46ef2a7e
Copying blob sha256:76efd6277e67c4b87ed7ddc888cf58b45236080c853748d48e24a15d46ef2a7e
Copying config sha256:83aa35aa1c79e4b6957e018da6e322bfca92bf3b4696a211b42502543c242d6f
Writing manifest to image destination
Storing signatures
83aa35aa1c79e4b6957e018da6e322bfca92bf3b4696a211b42502543c242d6f

$oc run mypod --image=image-registry.openshift-image-registry.svc:5000/wxj/mypushimage:latest -- sleep 300

$oc describe pods mypod
  Warning  Failed          5m36s                 kubelet            Failed to pull image "image-registry.openshift-image-registry.svc:5000/wxj/mypushimage:latest": rpc error: code = Unknown desc = parsing image configuration: Get "https://s3-openshift-storage.apps.wxjosp510.qe.devcluster.openshift.com/noobaas3-34eda07e-a2be-44db-a0b8-9e1de649654d/docker/registry/v2/blobs/sha256/83/83aa35aa1c79e4b6957e018da6e322bfca92bf3b4696a211b42502543c242d6f/data?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=paKDtgFqHc7DIi5TeXH7%2F20220510%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20220510T101841Z&X-Amz-Expires=1200&X-Amz-SignedHeaders=host&X-Amz-Signature=f50fd32953fc993eb20fa1a6a4a4bd1546e6cf4dfa9008216e02cd219a69666e": x509: certificate signed by unknown authority

Check my CA has been added to registry pod /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem.

Comment 19 XiuJuan Wang 2022-05-12 10:12:27 UTC
As the comment in #18, the deployer can't pull image from image registry.

Comment 20 Oleg Bulatov 2022-05-30 08:29:41 UTC
You should be able to workaround it by setting disableRediret: true.

This limitation will be lifted once https://github.com/openshift/cluster-image-registry-operator/pull/759 lands.

Comment 21 XiuJuan Wang 2022-06-01 06:48:28 UTC
After set disableRediret: true and update configmap/cloud-provider-config 
Could push and pull images to internal registry configured s3 bucket with trusted CA

Comment 22 XiuJuan Wang 2022-06-01 06:56:15 UTC
But with https://github.com/openshift/cluster-image-registry-operator/pull/759, 
still need configure disableRediret: true on osp cluster to add costom CA to configmap/cloud-provider-config

Comment 24 errata-xmlrpc 2022-08-10 10:37:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.