Bug 1810461

Summary: Unable to pull images from swift-backed internal image registry: x509 error with self-signed OSP16
Product: OpenShift Container Platform Reporter: Robert Sandoval <rsandova>
Component: DocumentationAssignee: Max Bridges <mbridges>
Status: CLOSED CURRENTRELEASE QA Contact: XiuJuan Wang <xiuwang>
Severity: low Docs Contact: Latha S <lmurthy>
Priority: high    
Version: 4.4CC: adam.kaplan, adhingra, aos-bugs, bmcelvee, cjanisze, gparente, lmurthy, m.andre, mbridges, meggen, obulatov, pprinett, pweil, racedoro, vkochuku, wzheng, xiuwang
Target Milestone: ---Flags: mbridges: needinfo-
Target Release: 4.4.z   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-04 17:54:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Robert Sandoval 2020-03-05 10:13:08 UTC
Description of problem:

Deployed OCP 4.4 on OSP16 and tried to deploy an app using the s2i builder.
Image Registry is backed by Swift 

App built fine and was pushed into the internal registry. The deployment failed with

 x509: certificate signed by unknown authority

on trying to pull the image from the internal registry

Version-Release number of selected component (if applicable):


How reproducible:
With OCP4.4 on OSP16 use an s2i builder to deploy a sample app

Steps to Reproduce:
1.
2.
3.

Actual results:

Error: ImagePullBackOff


Expected results:

Application to deploy normally


Additional info:

Workaround is to set  
config.imageregistry/cluster and set spec.disableRedirects = true

This allows the client to pull the image layers from the image registry rather than from links directly from Swift. The Swift CA needs to be added to worker nodes.

Comment 1 Oleg Bulatov 2020-03-23 13:20:45 UTC
*** Bug 1816042 has been marked as a duplicate of this bug. ***

Comment 2 Wenjing Zheng 2020-03-24 01:13:08 UTC
This issue only happens on self-signed OSP16. QE has tested with OSP16+kury, no such issue.

Comment 3 Wenjing Zheng 2020-03-24 07:01:33 UTC
Hi Oleg, this bug is targeted to 4.5, will we fix this in 4.4 too?

Comment 4 Oleg Bulatov 2020-03-24 10:18:22 UTC
Eventually we may backport it to 4.4, but it's not our highest priority. The workaround is simple: changing one field on the config.imageregistry object.

Comment 6 Chris Janiszewski 2020-05-22 18:10:27 UTC
I just hit this in my environment and there is a typo in the first comment (workaround) as well as the release notes. The variable should be: 
spec.disableRedirect: true

and not
spec.disableRedirects: true

(no s at the end)

Also for anyone that hits this issue the cli command to add that parameter is:
oc edit configs.imageregistry.operator.openshift.io/cluster

Comment 7 Adam Kaplan 2020-05-29 12:30:45 UTC
Moving this to Docs.

The Swift service needs to use a trusted certificate - either one signed by a globally trusted CA, or a CA that has been added to the cluster trust store [1]. The current docs do not mention this [2].

[1] https://docs.openshift.com/container-platform/4.4/networking/configuring-a-custom-pki.html
[2] https://docs.openshift.com/container-platform/4.4/installing/installing_openstack/installing-openstack-installer-custom.html#installation-osp-enabling-swift_installing-openstack-installer-custom

Comment 8 Chris Janiszewski 2020-05-29 16:14:14 UTC
Adam, can you clarify why do you think this is Documentation issue? I have trusted certs injected in both cloud.yaml and config-install.yaml and I am still getting this error when deploying apps.

error: build error: After retrying 2 times, Pull image still failed due to error: while pulling "docker://image-registry.openshift-image-registry.svc:5000/
openshift/python@sha256:cc03f354f2a298de72f0d9dcb39a82178c996faf033321c56f8c4756b0cd3a90" as "image-registry.openshift-image-registry.svc:5000/openshift/py
thon@sha256:cc03f354f2a298de72f0d9dcb39a82178c996faf033321c56f8c4756b0cd3a90": Error parsing image configuration: Get https://10.9.65.100:13808/swift/v1/AU
TH_1b80108965b748b7aeff8b6ec2017129/ocpra-nt6wn-image-registry-ilbncpaljsphjuurgihkefoehsjnxouadfx/files/docker/registry/v2/blobs/sha256/7d/7ddee4f67b8369d
b4795540eadab163e17442a380ee339891069fbf676753a09/data?temp_url_sig=e839524d45193f91872ec6ef5c7c78836a6fd046&temp_url_expires=1590769493: x509: certificate
 signed by unknown authority


(shiftstack) [stack@chrisj-undercloud-osp13 ~]$ cat clouds.yaml | grep cacert
    cacert: /home/stack/ssl/overcloud.pem

(shiftstack) [stack@chrisj-undercloud-osp13 ~]$ cat ocpra-config/install-config.yaml | grep -A2 additionalTrustBundle
additionalTrustBundle: |
    -----BEGIN CERTIFICATE-----
    MIIF4TCCA8mgAwIBAgIJANsI/G7mHc83MA0GCSqGSIb3DQEBCwUAMIGFMQswCQYD

Comment 9 Adam Kaplan 2020-06-25 12:18:07 UTC
@Chris apologies for not getting back to you on this.

I'm moving this back to the Image Registry per your comment. It appears that the installation should have added the CA to the global trust bundle, and the image registry should be able to pick it up.

Comment 10 Chris Janiszewski 2020-06-25 13:39:11 UTC
Thank you. I appreciate looking into it again.

Comment 11 Oleg Bulatov 2020-06-29 11:03:18 UTC
Lowering severity as workaround is trivial.

> the image registry should be able to pick it up.

The problem is that the client (buildah, I guess) doesn't trust storage (Swift).

Either the registry should proxy traffic to storage through itself (i.e. spec.disableRedirect should be true), but in this case the registry will require much more resources, especially network bandwidth.

Or clients should trust storage certificates.

The registry operator doesn't know if clients trust this certificate, so it doesn't know if redirects should be disabled. The operator's expectation is that the object storage is world accessible, like S3, GCS, or Azure Blob Storage.

So it either day 2 operation (tuning the registry for clients that don't trust storage) or WONTFIX.

Comment 12 Anil Dhingra 2020-07-21 03:20:31 UTC
hi

is this bug fixed in 4.5 GA or will be on radar for later release  , as we have another instance of self-signed OSP16 reported

Comment 13 Wenjing Zheng 2020-07-21 04:56:27 UTC
(In reply to Anil Dhingra from comment #12)
> hi
> 
> is this bug fixed in 4.5 GA or will be on radar for later release  , as we
> have another instance of self-signed OSP16 reported

We are using this workaround to fix this error: $oc patch configs.imageregistry.operator.openshift.io cluster --type merge --patch '{"spec":{"disableRedirect":"true"}}'

Comment 17 Max Bridges 2021-12-08 16:33:29 UTC
Continuing in https://github.com/openshift/openshift-docs/pull/39641

Comment 22 Max Bridges 2022-02-23 15:57:47 UTC
Dev ack on PR. Moving to QE.