PR to kube upstream (not yet merged): https://github.com/kubernetes/kubernetes/pull/23686
Hi Avesh, I looked through the PR and it does seem like it would allow the infra image to be pulled. The other problem we have is that images such as the deployer, the builders, router and registry also have the same problem. It wasn't clear to me if this PR would handle those. Ideally the admin would have a way to provide credentials that would be used for any pull that is unauthenticated. Some environments will have an authenticated registry yet they do not want users to be forced to add pull secrets to every build config simply because they want to pull the stock s2i images from this registry. When I dug through kubernetes to find out if something like this was supported it appeared as it that was the reason for the pkg/credentialprovider/keyring.go but I could be mistaken. Would you mind taking a look?
Hi Brendon, I will look into that.
Every image other than the pod infra container image is pulled via either the user's secrets (standalone pod) or the secrets associated with the service account running the pod (RC/DC/Job/DS). If we need to be able to do similar things for deployer/builder/etc/ images, we should probably create a separate issue.
After looking through the steps Avesh tried I noticed there was one difference. He was passing in an argument to the kubelet: kubeletArguments: pod-infra-container-image: - "registry.qe.openshift.com/openshift3/ose-pod" In environments installed by Ansible we only reference the registry hostnames in /etc/sysconfig/docker in either the ADD_REGISTRY, BLOCK_REGISTRY or INSECURE_REGISTRY variables. The motivation for this approach was that we did not want to customize the imagestreams or templates for each environment. We wanted these hostnames in one place. This means in our Node config the imageConfig was configured as follows: imageConfig: format: /openshift3/ose-${component}:${version} latest: false I suspect the lack of hostname and reliance on the ADD_REGISTRY settings is what caused this problem. Somehow Kubelet doesn't know to associate the request for /openshift3/ose-pod:v3.2.0.11 with the correct configuration in the Docker keyring. If you fully qualify the hostname everywhere it works. This is pretty inconvenient but I think we could make due if there was no other practical option. Ideally there would be a simple way for the Kubelet to handle docker pulls that are not fully qualified. Obviously the docker cli handles it.
The --add-registry setting for the Docker daemon is a Red Hat only feature, so it's unlikely there is any alternative we can pursue upstream, given that upstream already supports credentials at the node level assuming it can match the image pull spec to a set of credentials. The workaround for now is to fully qualify any references to images on registry.access.redhat.com. This means setting the kubeletArguments -> pod-infra-container-image in node-config.yaml, the imageConfig in master-config.yaml, and any default image streams and templates. Next, make sure all your nodes have the same Docker credentials file to be able to pull from the registry requiring authentication: http://kubernetes.io/docs/user-guide/images/#configuring-nodes-to-authenticate-to-a-private-repository.
We're going to investigate removing the need for --add-registry.
Brenton, why would you say that "docker cli handles it" because docker cli only handles unqualified references to the docker hub, otherwise a reference to an image should be fully qualified. Isn't it? Unless in rhel, where we have add-registry option. One thing to try in Kubelet would be prefixing each registry in config.json to unqualified image references and then using the credential of that registry to pull the image. And if it fails, keep retrying with other registries in the config.json until there is success or all fail. Haven't thought much about it though.
I can 'docker login' to registry.qe and my .docker/config.json will look like this: { "auths": { "https://registry.qe.openshift.com": { "auth": "XXXX", "email": "XXXX" } } } Yet if I 'docker pull openshift3/ose-pod:v3.2.0.11' it will do the right thing I have have --add-registry set. I don't have to fully qualify the hostname. The --add-registry implementation likely had to deal with this edge case.
Will the fix (In reply to Andy Goldstein from comment #7) > The --add-registry setting for the Docker daemon is a Red Hat only feature, > so it's unlikely there is any alternative we can pursue upstream, given that > upstream already supports credentials at the node level assuming it can > match the image pull spec to a set of credentials. > > The workaround for now is to fully qualify any references to images on > registry.access.redhat.com. This means setting the kubeletArguments -> > pod-infra-container-image in node-config.yaml, the imageConfig in > master-config.yaml, and any default image streams and templates. Next, make > sure all your nodes have the same Docker credentials file to be able to pull > from the registry requiring authentication: > http://kubernetes.io/docs/user-guide/images/#configuring-nodes-to- > authenticate-to-a-private-repository. @Andy, will above workaround also mean: User doesn't need to add separate secrets to the build, deploy, default service accounts under customer's project while using the authenticated registry?
(In reply to Xiaoli Tian from comment #11) > @Andy, will above workaround also mean: > > User doesn't need to add separate secrets to the build, deploy, default > service accounts under customer's project while using the authenticated > registry? Correct
*** Bug 1360900 has been marked as a duplicate of this bug. ***
*** Bug 1377899 has been marked as a duplicate of this bug. ***
Ok, I have a better understanding of our registry challenges now. So the main thing we'll probably want to focus on for this bug is adding an option to the installer that will allow the user to set registry credentials. Until we have that, the migration to the new registry will be blocked.
I have submitted a PR here: https://github.com/openshift/openshift-ansible/pull/5128
(In reply to Johnny Liu from comment #34) > (In reply to Stefanie Forrester from comment #33) > > These instructions might help when setting up authentication to the new > > registry. I've been able to use this in online-int for authentication. > > > > https://github.com/openshift/ops-sop/blob/master/services/opsregistry. > > asciidoc#using-the-registry-in-openshift-static-automation-user > > > > # Here are the instructions for reference: > > > > oc login --config=~/.kube/reg-aws --username=<your_username> > > --password=<your_password> https://api.reg-aws.openshift.com > > > > docker --config="/var/lib/origin/.docker" login -u <your_username> -p $(oc > > --config=~/.kube/reg-aws whoami -t) registry.reg-aws.openshift.com:443 > > > > # Edit /etc/origin/master/master-config.yaml and > > /etc/origin/node/node-config.yaml and add these lines: > > > > imageConfig: > > format: > > registry.reg-aws.openshift.com:443/online/ose-${component}:${version} > > latest: false > > > > systemctl restart atomic-openshift-master atomic-openshift-node > > I already did that, but still does not work. > > After try again and again, found that once I manually pulled > registry.reg-aws.openshift.com:443/online/ose-pod:v3.6.173.0.5 onto the > node, then everything works well now, and also could found "looking for" > keyword in the node log. > > So seem like the kubectl in 3.6 did a little change, kubelet auth will start > to work only when ose-pod is already existing in advanced. I am unable to replicate the openshift + secure registry process locally. I am able to pull images from the registry, but openshift keeps getting 401 errors. I have placed .docker/config.json in various places on the file system such as /root, /var/lib/docker, /var/lib/origin, to no effect.
The inability to authenticate has been determined to be a regression upstream. A PR has been opened against openshift/origin to backport fix to 3.6: https://github.com/openshift/origin/pull/15880
After checked the latest openshift-ansible build (openshift-ansible-3.6.173.0.30-1.git.0.b644a5b.el7.noarch), the PR is not included, waiting for newer build.
Pull request created to support containerized installs: https://github.com/openshift/openshift-ansible/pull/5359
Back port created and merged for 3.6: https://github.com/openshift/openshift-ansible/pull/5406
Verified this bug with openshift-ansible-3.6.173.0.35-1.git.0.6c318bc.el7.noarch, and FAIL. Several issues: 1). go through the #PR, does not take single containerized master into consideration, does not update roles/openshift_master/templates/master_docker/master.docker.service.j2, 2). "-v /var/lib/origin/.docker:/root/.docker:ro" is not added into /etc/systemd/system/atomic-openshift-node.service after installation. Go through the whole installation log, found because "systemd_units.yml" is running prior to "registry_auth.yml" Checking: roles/openshift_node/tasks/main.yml <--snip--> - name: Install the systemd units include: systemd_units.yml <--snip--> - include: registry_auth.yml After manually update node system unit files to workaround the above two issues, cluster is working well. Node log: Sep 18 09:43:07 qe-jialiu36-node-registry-router-1 atomic-openshift-node[17783]: I0918 09:43:07.384025 17847 config.go:131] looking for config.json at /root/.docker/config.json Sep 18 09:43:07 qe-jialiu36-node-registry-router-1 atomic-openshift-node[17783]: I0918 09:43:07.384281 17847 config.go:131] looking for config.json at /.docker/config.json Sep 18 09:43:07 qe-jialiu36-node-registry-router-1 atomic-openshift-node[17783]: I0918 09:43:07.384301 17847 config.go:131] looking for config.json at /root/.docker/config.json Sep 18 09:43:13 qe-jialiu36-node-registry-router-1 atomic-openshift-node[17783]: I0918 09:43:13.381718 17847 config.go:131] looking for config.json at /.docker/config.json Sep 18 09:43:13 qe-jialiu36-node-registry-router-1 atomic-openshift-node[17783]: I0918 09:43:13.381751 17847 config.go:131] looking for config.json at /root/.docker/config.json Sep 18 09:43:15 qe-jialiu36-node-registry-router-1 atomic-openshift-node[17783]: I0918 09:43:15.381645 17847 config.go:131] looking for config.json at /.docker/config.json Sep 18 09:43:15 qe-jialiu36-node-registry-router-1 atomic-openshift-node[17783]: I0918 09:43:15.381674 17847 config.go:131] looking for config.json at /root/.docker/config.json Sep 18 09:43:58 qe-jialiu36-node-registry-router-1 atomic-openshift-node[37419]: I0918 09:43:58.343815 37481 config.go:131] looking for config.json at /var/lib/origin/openshift.local.volumes/config.json Sep 18 09:43:58 qe-jialiu36-node-registry-router-1 atomic-openshift-node[37419]: I0918 09:43:58.343845 37481 config.go:131] looking for config.json at /var/lib/origin/config.json Sep 18 09:43:58 qe-jialiu36-node-registry-router-1 atomic-openshift-node[37419]: I0918 09:43:58.343852 37481 config.go:131] looking for config.json at /root/.docker/config.json Sep 18 09:43:58 qe-jialiu36-node-registry-router-1 atomic-openshift-node[37419]: I0918 09:43:58.343939 37481 config.go:139] found valid config.json at /root/.docker/config.json 3). When testing this bug with containerized install, did not encounter BZ#1488833, while reproduced it in a rpm install, compare with install log for two kinds of installation, found "Create credentials for docker cli registry auth" task is executed in in containerized install, while the task is skipped in rpm install, if installer could make sure "Create credentials for docker cli registry auth" task is executed, /root/.docker is generated, then BZ#1488833 would be also fixed. That will help a lot for QE's installation (currently we are disabling docker_image_availability checking for workaround)
PR Submitted to address issue of registry_auth.yml happening before systemd_units.yml in both masters and nodes: https://github.com/openshift/openshift-ansible/pull/5441 Backport PR's for 3.6 1.5, and 1.4 submitted. These backports also include support for roles/openshift_master/templates/master_docker/master.docker.service.j2 which is not present in the current master branch. 3.6: https://github.com/openshift/openshift-ansible/pull/5444 1.5: https://github.com/openshift/openshift-ansible/pull/5445 1.4: https://github.com/openshift/openshift-ansible/pull/5446 @Johnny Liu: For item issue #3 you found, 'Create credentials for docker cli registry auth' this should not be the case. Can you ensure there is no file present at /root/.docker/config.json ?
(In reply to Michael Gugino from comment #48) > @Johnny Liu: > > For item issue #3 you found, 'Create credentials for docker cli registry > auth' this should not be the case. Can you ensure there is no file present > at /root/.docker/config.json ? Yeah, #3 should be another bug, and I could confirm /root/.docker/config.json is not present when the failure described in BZ#1488833 happened. Actually I want /root/.docker/config.json to be present so that BZ#1488833 could be fixed. If you want track docker_image_availability check failure issue in BZ#1488833, ignore #3, let us fix it in BZ#1488833.
PR merged: https://github.com/openshift/openshift-ansible/pull/5444
Re-test this bug with openshift-ansible-3.6.173.0.43-1, and FAIL. Seem like the PR introduce a typo, "oreg_auth_credentials_replace.changed" should be "oreg_auth_credentials_replace | changed", am I right? TASK [openshift_node : Setup ro mount of /root/.docker for containerized hosts] *** Thursday 28 September 2017 06:54:46 +0000 (0:00:01.148) 0:11:37.156 **** fatal: [qe-jialiu36-node-registry-router-1.0928-8qo.qe.rhcloud.com]: FAILED! => { "failed": true } MSG: The conditional check '(node_oreg_auth_credentials_stat.stat.exists or oreg_auth_credentials_replace or oreg_auth_credentials_replace.changed) | bool' failed. The error was: error while evaluating conditional ((node_oreg_auth_credentials_stat.stat.exists or oreg_auth_credentials_replace or oreg_auth_credentials_replace.changed) | bool): 'bool object' has no attribute 'changed' The error appears to have been in '/home/slave4/workspace/Launch Environment Flexy/private-openshift-ansible/roles/openshift_node/tasks/registry_auth.yml': line 28, column 3, but may be elsewhere in the file depending on the exact syntax problem. The offending line appears to be: # Container images may need the registry credentials - name: Setup ro mount of /root/.docker for containerized hosts ^ here
PR Created to solve latest issue of 'Setup ro mount of /root/.docker for containerized hosts' task: https://github.com/openshift/openshift-ansible/pull/5595 3.6 Backport PR: https://github.com/openshift/openshift-ansible/pull/5596 1.5 Backport PR: https://github.com/openshift/openshift-ansible/pull/5597 1.4 Backport PR: https://github.com/openshift/openshift-ansible/pull/5598
PR Merged: https://github.com/openshift/openshift-ansible/pull/5596
PR Submitted to ensure docker is started prior to requesting credentials in docker role: https://github.com/openshift/openshift-ansible/pull/5647
PR created for fixing image availability check: https://github.com/openshift/openshift-ansible/pull/5650
Verified this bug with openshift-ansible-3.6.173.0.48-1.git.0.1609d30.el7.noarch, and PASS. Containerized install is completed successfully.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2900