Bug 1614025 - image pull fails with certificate signed from unknown authority error
Summary: image pull fails with certificate signed from unknown authority error
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.11.0
Assignee: Scott Dodson
QA Contact: Johnny Liu
URL:
Whiteboard:
: 1615337 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-08-08 20:24 UTC by Siva Reddy
Modified: 2018-10-11 07:24 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-10-11 07:24:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:2652 0 None None None 2018-10-11 07:24:34 UTC

Description Siva Reddy 2018-08-08 20:24:48 UTC
Description of problem:
   When trying to build a new app the build fails in pulling image from registry step with certificate signed by unknown authority error.

Version-Release number of selected component (if applicable):
oc v3.11.0-0.11.0
kubernetes v1.11.0+d4cacc0

openshift v3.11.0-0.11.0
kubernetes v1.11.0+d4cacc0

How reproducible:
Always

Steps to Reproduce:
1. Create a new project 
    oc new-project test
2. Create a new app
    oc new-app --template=cakephp-mysql-example
3. The app pod fails to create with ImagePullBackOff
4. Get the error from the failed event
    oc get events

Actual results:
   The creation of the app pod fails with the following error:
Failed to pull image "xx.xx.xx.xx:5000/@sha256:9a8a4ed1a621
82b61c33bbe767240754c7486fafebdf3068ffd18a0def31338f": rpc error: code = Unknown desc = Get https://xxx.xx.xxx.xxx:5000/v2/: x509: certificate signed by unknown authority

Expected results:
   The app pod should be created without any errors

Additional info:

Comment 1 Ben Parees 2018-08-08 20:33:33 UTC
this means you're using an imagestream w/ pullthrough enabled.  If you're using one of our imagestreams, it should be pointing to a registry that has a trusted certificate.

If you're using your own imagestream that points to an untrusted registry, or you've modified our imagestreams to point to an untrusted registry, then you'll need to add the appropriate CA to your openshift registry pod so it can trust the upstream registry:

https://docs.okd.io/latest/install_config/registry/extended_registry_configuration.html#middleware-repository-pullthrough

"You must ensure that your registry has appropriate certificates to trust any external registries you do a pullthrough against. The certificates need to be placed in the /etc/pki/tls/certs directory on the pod. You can mount the certificates using a configuration map or secret. Note that the entire /etc/pki/tls/certs directory must be replaced. You must include the new certificates and replace the system certificates in your secret or configuration map that you mount."

Comment 2 Mike Fiedler 2018-08-09 12:45:57 UTC
The pull error is occurring while trying to pull from the internal OpenShift registry.  Is that an untrusted registry?   This is something that just started happening in 3.11.0-0.11.0.  It did not occur in 3.11.0-0.9.0.   

Pull error after running a successful s2i build on nodejs-mongodb-example:

10m         11m          2         nodejs-mongodb-example-1-6s525.154937464bb75a42    Pod                     spec.containers{nodejs-mongodb-example}   Normal    Pulling                       kubelet, ip-172-31-39-197.us-west-2.compute.internal   pulling image "172.27.101.217:5000/mff/nodejs-mongodb-example@sha256:0109f72e3fbd54f817c806d956121e09efeb6462b967390bd39983293fce8ad2"
10m         11m          2         nodejs-mongodb-example-1-6s525.154937464cfa925b    Pod                     spec.containers{nodejs-mongodb-example}   Warning   Failed                        kubelet, ip-172-31-39-197.us-west-2.compute.internal   Error: ErrImagePull               
10m         11m          2         nodejs-mongodb-example-1-6s525.154937464cfa2c2a    Pod                     spec.containers{nodejs-mongodb-example}   Warning   Failed                        kubelet, ip-172-31-39-197.us-west-2.compute.internal   Failed to pull image "172.27.101.217:5000/mff/nodejs-mongodb-example@sha256:0109f72e3fbd54f817c806d956121e09efeb6462b967390bd39983293fce8ad2": rpc error: code = Unknown desc = Get https://172.27.101.217:5000/v2/: x509: certificate signed by unknown authority                                                          
10m         1

Comment 3 Ben Parees 2018-08-09 14:02:26 UTC
> The pull error is occurring while trying to pull from the internal OpenShift registry.  Is that an untrusted registry? 

in this context by untrusted i mean "registry which serves content using a certificate which is not trusted by the default system CAs".

I would have expected it to start happening in 3.10 because that's when we started using pullthrough for the default imagestreams we ship.

I'll come by later and see exactly what you guys are doing.

Comment 4 Mike Fiedler 2018-08-09 14:50:55 UTC
I changed the referencePolicy to Source for all imagestreams in the openshift ns (using registry.access.redhat.com as the registry) and the same issue occurs.    ImagePullBackoff pulling the newly built image from the local registry.

Comparing this cluster to a working one now.

Comment 5 Mike Fiedler 2018-08-09 16:38:08 UTC
"Fixed" this by adding openshift_docker_hosted_registry_insecure=true to the inventory which adds an entry to /etc/sysconfig/docker for an insecure registry CIDR in the service network range.   Still investigating why we have to add this now.

Comment 6 Ben Parees 2018-08-09 17:04:12 UTC
Sorry, I think i misunderstood the original issue you were having.

it sounds like the certificate the registry is using (which I think is signed by the cluster CA) isn't trusted by your host which means your host doesn't have the cluster CA for some reason.

Comment 7 Mike Fiedler 2018-08-10 19:41:46 UTC
Re-opening while investigated at https://github.com/openshift/origin/issues/20604

Comment 8 Ben Parees 2018-08-10 20:27:16 UTC
there seem to be two issues on the cluster (Mike's cluster):

1) REGISTRY_OPENSHIFT_SERVER_ADDR was not set on the registry DC (it needs to be set to docker-registry.default.svc:5000

i've fixed that, and Scott has a PR to revert the change that caused it to not be set:  https://github.com/openshift/openshift-ansible/pull/9533

after triggering a new build the new pod successfully deployd.

2) not clear to me why the certificate is not accepted, because the certificate does appear to contain the service ip as we'd expect.

Comment 9 Ben Parees 2018-08-10 20:59:37 UTC
ok Scott figured out why the cert isn't accepted, the docker/certs.d only contains an entry for the svc hostname.

that said, there is also a missing trust anchor on the node which should have the cluster CA and might(?) cause docker to trust the ip.

Once i put that in place, the ip is trusted as expected.

So the resolution to this bug is:

1) put the logic back in place to set the registry hostname env var on the registry DC

2) determine why the cluster CA wasn't populated on the node.


Unfortunately this still doesn't explain why in GCP we see certificates w/ no ip address, but(for better or worse) if the registry hostname var were set, that would not break anything.

Handing over to Scott to shepherd the env var fix through as that is the primary issue.

Comment 10 Scott Dodson 2018-08-13 15:05:52 UTC
*** Bug 1615337 has been marked as a duplicate of this bug. ***

Comment 11 Scott Dodson 2018-08-13 15:06:41 UTC
https://github.com/openshift/openshift-ansible/pull/9533 reverted the problematic commit that introduced this problem

Comment 12 Johnny Liu 2018-08-14 01:35:21 UTC
Actually this issue is introduced in openshift-ansible-3.11.0-0.13.0.git.0.16dc599None.noarch, but not openshift-ansible-3.11.0-0.11.0.git.0.3c66516None.noarch, so removing "3.11.0-0.11.0" from summary.

Comment 13 Johnny Liu 2018-08-14 05:23:01 UTC
The fix PR is already merged to openshift-ansible-3.11.0-0.14.0, after a retest, it works well now.
# oc get dc docker-registry -o yaml
<--snip-->
        - name: OPENSHIFT_DEFAULT_REGISTRY
          value: docker-registry.default.svc:5000
<--snip-->
        - name: REGISTRY_OPENSHIFT_SERVER_ADDR
          value: docker-registry.default.svc:5000
<--snip-->

# oc describe po nodejs-mongodb-example-6-48dkm -n install-test
<--snip-->
Events:
  Type    Reason     Age   From                                   Message
  ----    ------     ----  ----                                   -------
  Normal  Scheduled  1h    default-scheduler                      Successfully assigned install-test/nodejs-mongodb-example-6-48dkm to ip-172-18-15-97.ec2.internal
  Normal  Pulled     1h    kubelet, ip-172-18-15-97.ec2.internal  Container image "docker-registry.default.svc:5000/install-test/nodejs-mongodb-example@sha256:4b0722d367e47a43647acd84c91a81b03fa001bc2468f87bca1327c0aeb6a5be" already present on machine
  Normal  Created    1h    kubelet, ip-172-18-15-97.ec2.internal  Created container
  Normal  Started    1h    kubelet, ip-172-18-15-97.ec2.internal  Started container

Comment 14 Scott Dodson 2018-08-14 21:21:07 UTC
In openshift-ansible-3.11.0-0.15.0

Comment 15 Johnny Liu 2018-08-15 09:05:16 UTC
Verified this bug with openshift-ansible-3.11.0-0.15.0.git.0.842d3d1None.noarch, and PASS. The same result as comment 13.

Comment 17 errata-xmlrpc 2018-10-11 07:24:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652


Note You need to log in before you can comment on or make changes to this bug.