Bug 1316341 - [3.6] installer need provide a way to add docker auth to kubelet for auto pulling infra image from an authenticated registry
[3.6] installer need provide a way to add docker auth to kubelet for auto pul...
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer (Show other bugs)
3.2.0
Unspecified Unspecified
high Severity medium
: ---
: 3.6.z
Assigned To: Michael Gugino
Johnny Liu
: OpsBlocker
: 1360900 1377899 (view as bug list)
Depends On: 1481251
Blocks: 1484063 1484068 1500642
  Show dependency treegraph
 
Reported: 2016-03-09 21:04 EST by Johnny Liu
Modified: 2017-10-17 07:45 EDT (History)
17 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Feature: The installer now allows you to specify the variables 'oreg_auth_user' and 'oreg_auth_password' to specify the credentials used to pull infrastructure images from an authenticated registry which is defined by setting 'oreg_url'. Reason: Your environment may require credentials to pull infrastructure images from your private registry defined via oreg_url. Result: OCP may now pull images from a private registry requiring username and password credentials.
Story Points: ---
Clone Of:
: 1481251 1484063 1500642 (view as bug list)
Environment:
Last Closed: 2017-10-17 07:45:24 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Comment 2 Avesh Agarwal 2016-04-04 09:33:22 EDT
PR to kube upstream (not yet merged):
https://github.com/kubernetes/kubernetes/pull/23686
Comment 3 Brenton Leanhardt 2016-04-05 08:45:00 EDT
Hi Avesh,

I looked through the PR and it does seem like it would allow the infra image to be pulled.  The other problem we have is that images such as the deployer, the builders, router and registry also have the same problem.  It wasn't clear to me if this PR would handle those.

Ideally the admin would have a way to provide credentials that would be used for any pull that is unauthenticated.  Some environments will have an authenticated registry yet they do not want users to be forced to add pull secrets to every build config simply because they want to pull the stock s2i images from this registry.

When I dug through kubernetes to find out if something like this was supported it appeared as it that was the reason for the pkg/credentialprovider/keyring.go but I could be mistaken.  Would you mind taking a look?
Comment 4 Avesh Agarwal 2016-04-05 09:12:00 EDT
Hi Brendon,

I will look into that.
Comment 5 Andy Goldstein 2016-04-05 11:38:45 EDT
Every image other than the pod infra container image is pulled via either the user's secrets (standalone pod) or the secrets associated with the service account running the pod (RC/DC/Job/DS). If we need to be able to do similar things for deployer/builder/etc/ images, we should probably create a separate issue.
Comment 6 Brenton Leanhardt 2016-04-05 13:36:06 EDT
After looking through the steps Avesh tried I noticed there was one difference.  He was passing in an argument to the kubelet:

kubeletArguments:
 pod-infra-container-image:
 - "registry.qe.openshift.com/openshift3/ose-pod"

In environments installed by Ansible we only reference the registry hostnames in /etc/sysconfig/docker in either the ADD_REGISTRY, BLOCK_REGISTRY or INSECURE_REGISTRY variables.  The motivation for this approach was that we did not want to customize the imagestreams or templates for each environment.  We wanted these hostnames in one place.  This means in our Node config the imageConfig was configured as follows:

imageConfig:
  format: /openshift3/ose-${component}:${version}
  latest: false

I suspect the lack of hostname and reliance on the ADD_REGISTRY settings is what caused this problem.  Somehow Kubelet doesn't know to associate the request for /openshift3/ose-pod:v3.2.0.11 with the correct configuration in the Docker keyring.  If you fully qualify the hostname everywhere it works.  This is pretty inconvenient but I think we could make due if there was no other practical option.  Ideally there would be a simple way for the Kubelet to handle docker pulls that are not fully qualified.  Obviously the docker cli handles it.
Comment 7 Andy Goldstein 2016-04-05 13:47:45 EDT
The --add-registry setting for the Docker daemon is a Red Hat only feature, so it's unlikely there is any alternative we can pursue upstream, given that upstream already supports credentials at the node level assuming it can match the image pull spec to a set of credentials.

The workaround for now is to fully qualify any references to images on registry.access.redhat.com. This means setting the kubeletArguments -> pod-infra-container-image in node-config.yaml, the imageConfig in master-config.yaml, and any default image streams and templates. Next, make sure all your nodes have the same Docker credentials file to be able to pull from the registry requiring authentication: http://kubernetes.io/docs/user-guide/images/#configuring-nodes-to-authenticate-to-a-private-repository.
Comment 8 Brenton Leanhardt 2016-04-05 13:49:28 EDT
We're going to investigate removing the need for --add-registry.
Comment 9 Avesh Agarwal 2016-04-05 14:40:38 EDT
Brenton, why would you say that "docker cli handles it" because docker cli only handles unqualified references to the docker hub, otherwise a reference to an image should be fully qualified. Isn't it? Unless in rhel, where we have add-registry option.

One thing to try in Kubelet would be prefixing each registry in config.json to unqualified image references and then using the credential of that registry to pull the image. And if it fails, keep retrying with other registries in the config.json until there is success or all fail. Haven't thought much about it though.
Comment 10 Brenton Leanhardt 2016-04-05 14:45:41 EDT
I can 'docker login' to registry.qe and my .docker/config.json will look like this:

{
        "auths": {
                "https://registry.qe.openshift.com": {
                        "auth": "XXXX",
                        "email": "XXXX"
                }
        }
}

Yet if I 'docker pull openshift3/ose-pod:v3.2.0.11' it will do the right thing I have have --add-registry set.  I don't have to fully qualify the hostname.  The --add-registry implementation likely had to deal with this edge case.
Comment 11 Xiaoli Tian 2016-04-12 03:31:59 EDT
Will the fix (In reply to Andy Goldstein from comment #7)
> The --add-registry setting for the Docker daemon is a Red Hat only feature,
> so it's unlikely there is any alternative we can pursue upstream, given that
> upstream already supports credentials at the node level assuming it can
> match the image pull spec to a set of credentials.
> 
> The workaround for now is to fully qualify any references to images on
> registry.access.redhat.com. This means setting the kubeletArguments ->
> pod-infra-container-image in node-config.yaml, the imageConfig in
> master-config.yaml, and any default image streams and templates. Next, make
> sure all your nodes have the same Docker credentials file to be able to pull
> from the registry requiring authentication:
> http://kubernetes.io/docs/user-guide/images/#configuring-nodes-to-
> authenticate-to-a-private-repository.

@Andy, will above workaround also mean:

User doesn't need to add separate secrets to the build, deploy, default service accounts under customer's project while using the authenticated registry?
Comment 12 Andy Goldstein 2016-04-12 09:20:28 EDT
(In reply to Xiaoli Tian from comment #11)
> @Andy, will above workaround also mean:
> 
> User doesn't need to add separate secrets to the build, deploy, default
> service accounts under customer's project while using the authenticated
> registry?

Correct
Comment 19 Johnny Liu 2016-08-02 02:05:06 EDT
*** Bug 1360900 has been marked as a duplicate of this bug. ***
Comment 21 Eric Rich 2017-03-02 09:42:17 EST
*** Bug 1377899 has been marked as a duplicate of this bug. ***
Comment 35 Stefanie Forrester 2017-08-16 12:06:16 EDT
Ok, I have a better understanding of our registry challenges now. So the main thing we'll probably want to focus on for this bug is adding an option to the installer that will allow the user to set registry credentials. Until we have that, the migration to the new registry will be blocked.
Comment 36 Michael Gugino 2017-08-17 22:57:41 EDT
I have submitted a PR here:  https://github.com/openshift/openshift-ansible/pull/5128
Comment 38 Michael Gugino 2017-08-21 08:35:34 EDT
(In reply to Johnny Liu from comment #34)
> (In reply to Stefanie Forrester from comment #33)
> > These instructions might help when setting up authentication to the new
> > registry. I've been able to use this in online-int for authentication.
> > 
> > https://github.com/openshift/ops-sop/blob/master/services/opsregistry.
> > asciidoc#using-the-registry-in-openshift-static-automation-user
> > 
> > # Here are the instructions for reference:
> > 
> > oc login --config=~/.kube/reg-aws --username=<your_username>
> > --password=<your_password> https://api.reg-aws.openshift.com
> > 
> > docker --config="/var/lib/origin/.docker" login -u <your_username> -p $(oc
> > --config=~/.kube/reg-aws whoami -t) registry.reg-aws.openshift.com:443
> > 
> > # Edit /etc/origin/master/master-config.yaml and
> > /etc/origin/node/node-config.yaml and add these lines:
> > 
> > imageConfig:
> >   format:
> > registry.reg-aws.openshift.com:443/online/ose-${component}:${version}
> >   latest: false
> > 
> > systemctl restart atomic-openshift-master atomic-openshift-node
> 
> I already did that, but still does not work.
> 
> After try again and again, found that once I manually pulled
> registry.reg-aws.openshift.com:443/online/ose-pod:v3.6.173.0.5 onto the
> node, then everything works well now, and also could found "looking for"
> keyword in the node log.
> 
> So seem like the kubectl in 3.6 did a little change, kubelet auth will start
> to work only when ose-pod is already existing in advanced.

I am unable to replicate the openshift + secure registry process locally.  I am able to pull images from the registry, but openshift keeps getting 401 errors.  I have placed .docker/config.json in various places on the file system such as /root, /var/lib/docker, /var/lib/origin, to no effect.
Comment 39 Michael Gugino 2017-08-21 15:38:12 EDT
The inability to authenticate has been determined to be a regression upstream.  A PR has been opened against openshift/origin to backport fix to 3.6: https://github.com/openshift/origin/pull/15880
Comment 41 Johnny Liu 2017-09-07 02:44:11 EDT
After checked the latest openshift-ansible build (openshift-ansible-3.6.173.0.30-1.git.0.b644a5b.el7.noarch), the PR is not included, waiting for newer build.
Comment 44 Michael Gugino 2017-09-11 13:14:42 EDT
Pull request created to support containerized installs: https://github.com/openshift/openshift-ansible/pull/5359
Comment 45 Michael Gugino 2017-09-14 13:55:41 EDT
Back port created and merged for 3.6: https://github.com/openshift/openshift-ansible/pull/5406
Comment 47 Johnny Liu 2017-09-18 06:48:38 EDT
Verified this bug with openshift-ansible-3.6.173.0.35-1.git.0.6c318bc.el7.noarch, and FAIL.

Several issues:
1). go through the #PR, does not take single containerized master into consideration, does not update roles/openshift_master/templates/master_docker/master.docker.service.j2, 
2). "-v /var/lib/origin/.docker:/root/.docker:ro" is not added into /etc/systemd/system/atomic-openshift-node.service after installation. Go through the whole installation log, found because "systemd_units.yml" is running prior to "registry_auth.yml"

Checking: roles/openshift_node/tasks/main.yml
<--snip-->
- name: Install the systemd units
  include: systemd_units.yml
<--snip-->
- include: registry_auth.yml


After manually update node system unit files to workaround the above two issues, cluster is working well.

Node log:
Sep 18 09:43:07 qe-jialiu36-node-registry-router-1 atomic-openshift-node[17783]: I0918 09:43:07.384025   17847 config.go:131] looking for config.json at /root/.docker/config.json
Sep 18 09:43:07 qe-jialiu36-node-registry-router-1 atomic-openshift-node[17783]: I0918 09:43:07.384281   17847 config.go:131] looking for config.json at /.docker/config.json
Sep 18 09:43:07 qe-jialiu36-node-registry-router-1 atomic-openshift-node[17783]: I0918 09:43:07.384301   17847 config.go:131] looking for config.json at /root/.docker/config.json
Sep 18 09:43:13 qe-jialiu36-node-registry-router-1 atomic-openshift-node[17783]: I0918 09:43:13.381718   17847 config.go:131] looking for config.json at /.docker/config.json
Sep 18 09:43:13 qe-jialiu36-node-registry-router-1 atomic-openshift-node[17783]: I0918 09:43:13.381751   17847 config.go:131] looking for config.json at /root/.docker/config.json
Sep 18 09:43:15 qe-jialiu36-node-registry-router-1 atomic-openshift-node[17783]: I0918 09:43:15.381645   17847 config.go:131] looking for config.json at /.docker/config.json
Sep 18 09:43:15 qe-jialiu36-node-registry-router-1 atomic-openshift-node[17783]: I0918 09:43:15.381674   17847 config.go:131] looking for config.json at /root/.docker/config.json
Sep 18 09:43:58 qe-jialiu36-node-registry-router-1 atomic-openshift-node[37419]: I0918 09:43:58.343815   37481 config.go:131] looking for config.json at /var/lib/origin/openshift.local.volumes/config.json
Sep 18 09:43:58 qe-jialiu36-node-registry-router-1 atomic-openshift-node[37419]: I0918 09:43:58.343845   37481 config.go:131] looking for config.json at /var/lib/origin/config.json
Sep 18 09:43:58 qe-jialiu36-node-registry-router-1 atomic-openshift-node[37419]: I0918 09:43:58.343852   37481 config.go:131] looking for config.json at /root/.docker/config.json
Sep 18 09:43:58 qe-jialiu36-node-registry-router-1 atomic-openshift-node[37419]: I0918 09:43:58.343939   37481 config.go:139] found valid config.json at /root/.docker/config.json

3). When testing this bug with containerized install, did not encounter BZ#1488833, while reproduced it in a rpm install, compare with install log for two kinds of installation, found "Create credentials for docker cli registry auth" task is executed in in containerized install, while the task is skipped in rpm install, if installer could make sure "Create credentials for docker cli registry auth" task is executed, /root/.docker is generated, then BZ#1488833 would be also fixed. That will help a lot for QE's installation (currently we are disabling docker_image_availability checking for workaround)
Comment 48 Michael Gugino 2017-09-18 12:23:11 EDT
PR Submitted to address issue of registry_auth.yml happening before systemd_units.yml in both masters and nodes:

https://github.com/openshift/openshift-ansible/pull/5441

Backport PR's for 3.6 1.5, and 1.4 submitted.  These backports also include support for roles/openshift_master/templates/master_docker/master.docker.service.j2 which is not present in the current master branch.

3.6: https://github.com/openshift/openshift-ansible/pull/5444
1.5: https://github.com/openshift/openshift-ansible/pull/5445
1.4: https://github.com/openshift/openshift-ansible/pull/5446

@Johnny Liu:

For item issue #3 you found, 'Create credentials for docker cli registry auth' this should not be the case.  Can you ensure there is no file present at /root/.docker/config.json ?
Comment 51 Johnny Liu 2017-09-18 23:52:19 EDT
(In reply to Michael Gugino from comment #48)
> @Johnny Liu:
> 
> For item issue #3 you found, 'Create credentials for docker cli registry
> auth' this should not be the case.  Can you ensure there is no file present
> at /root/.docker/config.json ?
Yeah, #3 should be another bug, and I could confirm /root/.docker/config.json is not present when the failure described in BZ#1488833 happened. Actually I want /root/.docker/config.json to be present so that BZ#1488833 could be fixed.

If you want track docker_image_availability check failure issue in BZ#1488833, ignore #3, let us fix it in BZ#1488833.
Comment 52 Michael Gugino 2017-09-19 09:28:49 EDT
PR merged: https://github.com/openshift/openshift-ansible/pull/5444
Comment 54 Johnny Liu 2017-09-28 03:58:14 EDT
Re-test this bug with openshift-ansible-3.6.173.0.43-1, and FAIL.

Seem like the PR introduce a typo, "oreg_auth_credentials_replace.changed" should be "oreg_auth_credentials_replace | changed", am I right?

TASK [openshift_node : Setup ro mount of /root/.docker for containerized hosts] ***
Thursday 28 September 2017  06:54:46 +0000 (0:00:01.148)       0:11:37.156 **** 
fatal: [qe-jialiu36-node-registry-router-1.0928-8qo.qe.rhcloud.com]: FAILED! => {
    "failed": true
}

MSG:

The conditional check '(node_oreg_auth_credentials_stat.stat.exists or oreg_auth_credentials_replace or oreg_auth_credentials_replace.changed) | bool' failed. The error was: error while evaluating conditional ((node_oreg_auth_credentials_stat.stat.exists or oreg_auth_credentials_replace or oreg_auth_credentials_replace.changed) | bool): 'bool object' has no attribute 'changed'

The error appears to have been in '/home/slave4/workspace/Launch Environment Flexy/private-openshift-ansible/roles/openshift_node/tasks/registry_auth.yml': line 28, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

# Container images may need the registry credentials
- name: Setup ro mount of /root/.docker for containerized hosts
  ^ here
Comment 57 Michael Gugino 2017-09-29 11:38:00 EDT
PR Created to solve latest issue of 'Setup ro mount of /root/.docker for containerized hosts' task: https://github.com/openshift/openshift-ansible/pull/5595

3.6 Backport PR: https://github.com/openshift/openshift-ansible/pull/5596
1.5 Backport PR: https://github.com/openshift/openshift-ansible/pull/5597
1.4 Backport PR: https://github.com/openshift/openshift-ansible/pull/5598
Comment 64 Michael Gugino 2017-10-02 19:06:01 EDT
PR Merged: https://github.com/openshift/openshift-ansible/pull/5596
Comment 66 Michael Gugino 2017-10-03 15:24:42 EDT
PR Submitted to ensure docker is started prior to requesting credentials in docker role:  https://github.com/openshift/openshift-ansible/pull/5647
Comment 68 Michael Gugino 2017-10-03 19:25:12 EDT
PR created for fixing image availability check: https://github.com/openshift/openshift-ansible/pull/5650
Comment 71 Johnny Liu 2017-10-11 04:39:23 EDT
Verified this bug with openshift-ansible-3.6.173.0.48-1.git.0.1609d30.el7.noarch, and PASS.

Containerized install is completed successfully.
Comment 73 errata-xmlrpc 2017-10-17 07:45:24 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2900

Note You need to log in before you can comment on or make changes to this bug.