Bug 1676399 - upgrade failed due to can not pull image from an wrong registry address
Summary: upgrade failed due to can not pull image from an wrong registry address
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.11.z
Assignee: Joseph Callen
QA Contact: liujia
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-02-12 08:02 UTC by liujia
Modified: 2019-06-26 09:08 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-26 09:07:54 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:1605 None None None 2019-06-26 09:08:02 UTC

Description liujia 2019-02-12 08:02:00 UTC
Description of problem:
Upgrade ocp from v3.11.69 to v3.11.82 with oreg_url specified to a private registry. But the specified registry address is not updated into node-config.yaml/configmap which will cause the ose-pod image can not pull when deploy.

Upgrade failed at TASK [openshift_node : Wait for master API to come back online] *************************************************************************************************************
fatal: [x]: FAILED! => {"changed": false, "elapsed": 600, "msg": "Timeout when waiting for ip-172-18-9-212.ec2.internal:8443"}
        to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_11/upgrade.retry

node log:
Feb 12 00:35:54 ip-172-18-9-212.ec2.internal atomic-openshift-node[6224]: E0212 00:35:54.950592    6224 kuberuntime_manager.go:646] createPodSandbox for pod "master-controllers-ip-172-18-9-212.ec2.internal_kube-system(229f15d98250968e2cd36eb1465d3a73)" failed: rpc error: code = Unknown desc = error parsing HTTP 404 response body: invalid character 'F' looking for beginning of value: "File not found.\""
...
Feb 12 01:31:41 ip-172-18-9-212.ec2.internal atomic-openshift-node[9106]: I0212 01:31:41.165113    9106 kube_docker_client.go:348] Stop pulling image "registry.redhat.io/openshift3/ose-pod:v3.11.82": "Trying to pull repository registry.redhat.io/openshift3/ose-pod ... "


docker log:
t.io/openshift3/ose-pod:v3.11.82/json returned error: No such image: registry.redhat.io/openshift3/ose-pod:v3.11.82"
t.io/openshift3/ose-pod:v3.11.82/json returned error: No such image: registry.redhat.io/openshift3/ose-pod:v3.11.82"

[root@ip-172-18-9-212 ~]# docker images|grep ose-pod
registry.reg-aws.openshift.com:443/openshift3/ose-pod             v3.11               ecc76353758b        13 hours ago        238 MB
registry.redhat.io/openshift3/ose-pod                             v3.11               298223cef55e        2 weeks ago         238 MB
registry.redhat.io/openshift3/ose-pod                             v3.11.69            298223cef55e        2 weeks ago         238 MB


debug info:
# cat /etc/origin/node/node-config.yaml |grep -A 1 imageConfig
imageConfig:
  format: registry.redhat.io/openshift3/ose-${component}:${version}

# cat /etc/origin/master/master-config.yaml |grep -A 1 imageConfig
imageConfig:
  format: registry.reg-aws.openshift.com:443/openshift3/ose-${component}:${version}

# oc get cm node-config-master -o yaml -n openshift-node > test
# cat test |grep -A 1 imageConfig
    imageConfig:
      format: registry.redhat.io/openshift3/ose-${component}:${version}

# openshift version
openshift v3.11.82


Version-Release number of the following components:
openshift-ansible-3.11.82-1.git.0.f29227a.el7.noarch

How reproducible:
always

Steps to Reproduce:
1. Install ocp v3.11 with default registry

2. Edit inventory file to specify a private registry
oreg_url=registry.reg-aws.openshift.com:443/openshift3/ose-${component}:${version}
oreg_auth_user={{ lookup('env','REG_AUTH_USER') }}
oreg_auth_password={{ lookup('env','REG_AUTH_PASSWORD') }}

3. Upgrade above ocp to latest v3.11

Actual results:
Upgrade failed

Expected results:
Upgrade succeed

Additional info:
TASK [openshift_node : Check status of node pod image pre-pull] *************************************************************************************************************
changed: [x] => {"ansible_job_id": "737111034179.2779", "attempts": 1, "changed": true, "cmd": ["docker", "pull", "registry.reg-aws.openshift.com:443/openshift3/ose-pod:v3.11"], "delta": "0:00:01.326965", "end": "2019-02-12 00:25:42.570251", "failed_when_result": false, "finished": 1, "rc": 0, "start": "2019-02-12 00:25:41.243286", "stderr": "", "stderr_lines": [], "stdout": "Trying to pull repository registry.reg-aws.openshift.com:443/openshift3/ose-pod ... \nv3.11: Pulling from registry.reg-aws.openshift.com:443/openshift3/ose-pod\nc325120ebc8d: Already exists\nc9d123037991: Already exists\n281e96bf3d21: Already exists\ne897b0489f0d: Pulling fs layer\ne897b0489f0d: Verifying Checksum\ne897b0489f0d: Download complete\ne897b0489f0d: Pull complete\nDigest: sha256:a3aaddd8bdafc63203f64debc7faa6a6739aaacbb52a2bd7f480710af002de4e\nStatus: Downloaded newer image for registry.reg-aws.openshift.com:443/openshift3/ose-pod:v3.11", "stdout_lines": ["Trying to pull repository registry.reg-aws.openshift.com:443/openshift3/ose-pod ... ", "v3.11: Pulling from registry.reg-aws.openshift.com:443/openshift3/ose-pod", "c325120ebc8d: Already exists", "c9d123037991: Already exists", "281e96bf3d21: Already exists", "e897b0489f0d: Pulling fs layer", "e897b0489f0d: Verifying Checksum", "e897b0489f0d: Download complete", "e897b0489f0d: Pull complete", "Digest: sha256:a3aaddd8bdafc63203f64debc7faa6a6739aaacbb52a2bd7f480710af002de4e", "Status: Downloaded newer image for registry.reg-aws.openshift.com:443/openshift3/ose-pod:v3.11"]}

Comment 1 Scott Dodson 2019-02-12 20:17:01 UTC
This would be limited in scope to those customers who are changing oreg_url during an upgrade which I suspect is rare but we probably did handle that pattern in 3.9 and earlier and we may want to attempt to patch all configmaps with the new oreg_url.

Comment 2 Joseph Callen 2019-03-26 18:44:33 UTC
Just finished testing and confirm that the registry is not being updated.
imageConfig:
  format: registry.redhat.io/openshift3/ose-${component}:${version}
  latest: false

Comment 3 Joseph Callen 2019-03-27 12:38:14 UTC
Hi liujia,

There is a playbook to change the imageConfig.format after upgrade.

See this PR for additional context: https://github.com/openshift/openshift-ansible/pull/9784

Here is the path to the playbook "playbooks/openshift-node/imageconfig.yml"

Comment 5 Kathryn Alexander 2019-03-28 15:05:19 UTC
I think the PR looks good. @Jia Liu, will you please confirm?

Comment 6 liujia 2019-04-16 08:28:41 UTC
Still hit the same issue on openshift-ansible-3.11.106-1.git.0.2d027da.el7.noarch.

Steps:
1. Install ocp v3.11.88 with default registry.
# docker images|grep ose-pod
registry.redhat.io/openshift3/ose-pod                       v3.11               d5f897cfbb0d        13 days ago         238 MB
registry.redhat.io/openshift3/ose-pod                       v3.11.88            ff8efa1e789c        6 weeks ago         238 MB
2. Edit inventory file to specify a private registry
oreg_url=registry.reg-aws.openshift.com:443/openshift3/ose-${component}:${version}
oreg_auth_user={{ lookup('env','REG_AUTH_USER') }}
oreg_auth_password={{ lookup('env','REG_AUTH_PASSWORD') }}
3. Upgrade above ocp to latest v3.11(v3.11.106-)

Upgrade failed at the same task.
TASK [openshift_node : Wait for master API to come back online] ****************
task path: /usr/share/ansible/openshift-ansible/roles/openshift_node/tasks/upgrade/restart.yml:66
Tuesday 16 April 2019  07:12:09 +0000 (0:00:00.881)       0:12:57.237 ********* 
fatal: [x]: FAILED! => {"changed": false, "elapsed": 600, "msg": "Timeout when waiting for ip-172-18-13-102.ec2.internal:8443"}

some node logs:
Apr 16 03:51:03 ip-172-18-13-102.ec2.internal atomic-openshift-node[5251]: logging error output: "Unauthorized"
Apr 16 03:51:04 ip-172-18-13-102.ec2.internal atomic-openshift-node[5251]: E0416 03:51:04.238172    5251 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = error parsing HTTP 404 response body: invalid character 'F' looking for beginning of value: "File not found.\""
Apr 16 03:51:04 ip-172-18-13-102.ec2.internal atomic-openshift-node[5251]: E0416 03:51:04.238228    5251 kuberuntime_sandbox.go:56] CreatePodSandbox for pod "master-api-ip-172-18-13-102.ec2.internal_kube-system(9d066f84b20195c767ec4ed9d7ac3ba2)" failed: rpc error: code = Unknown desc = error parsing HTTP 404 response body: invalid character 'F' looking for beginning of value: "File not found.\""
Apr 16 03:51:04 ip-172-18-13-102.ec2.internal atomic-openshift-node[5251]: E0416 03:51:04.238253    5251 kuberuntime_manager.go:646] createPodSandbox for pod "master-api-ip-172-18-13-102.ec2.internal_kube-system(9d066f84b20195c767ec4ed9d7ac3ba2)" failed: rpc error: code = Unknown desc = error parsing HTTP 404 response body: invalid character 'F' looking for beginning of value: "File not found.\""
Apr 16 03:51:04 ip-172-18-13-102.ec2.internal atomic-openshift-node[5251]: E0416 03:51:04.238355    5251 pod_workers.go:186] Error syncing pod 9d066f84b20195c767ec4ed9d7ac3ba2 ("master-api-ip-172-18-13-102.ec2.internal_kube-system(9d066f84b20195c767ec4ed9d7ac3ba2)"), skipping: failed to "CreatePodSandbox" for "master-api-ip-172-18-13-102.ec2.internal_kube-system(9d066f84b20195c767ec4ed9d7ac3ba2)" with CreatePodSandboxError: "CreatePodSandbox for pod \"master-api-ip-172-18-13-102.ec2.internal_kube-system(9d066f84b20195c767ec4ed9d7ac3ba2)\" failed: rpc error: code = Unknown desc = error parsing HTTP 404 response body: invalid character 'F' looking for beginning of value: \"File not found.\\\"\""
Apr 16 03:51:04 ip-172-18-13-102.ec2.internal atomic-openshift-node[5251]: I0416 03:51:04.238403    5251 server.go:470] Event(v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"master-api-ip-172-18-13-102.ec2.internal", UID:"9d066f84b20195c767ec4ed9d7ac3ba2", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'FailedCreatePodSandBox' Failed create pod sandbox: rpc error: code = Unknown desc = error parsing HTTP 404 response body: invalid character 'F' looking for beginning of value: "File not found.\""
Apr 16 03:51:05 ip-172-18-13-102.ec2.internal atomic-openshift-node[5251]: E0416 03:51:05.779921    5251 server.go:226] Unable to authenticate the request due to an error: Post https://ip-172-18-13-102.ec2.internal:8443/apis/authentication.k8s.io/v1beta1/tokenreviews: dial tcp 172.18.13.102:8443: connect: connection refused


some docker logs.
Apr 16 03:45:16 ip-172-18-13-102.ec2.internal dockerd-current[4934]: time="2019-04-16T03:45:16.155768049-04:00" level=error msg="Handler for GET /v1.26/images/registry.redhat.io/openshift3/ose-pod:v3.11.106/json returned error: No such image: registry.redhat.io/openshift3/ose-pod:v3.11.106"
Apr 16 03:45:16 ip-172-18-13-102.ec2.internal dockerd-current[4934]: time="2019-04-16T03:45:16.156197806-04:00" level=error msg="Handler for GET /v1.26/images/registry.redhat.io/openshift3/ose-pod:v3.11.106/json returned error: No such image: registry.redhat.io/openshift3/ose-pod:v3.11.106"
Apr 16 03:45:17 ip-172-18-13-102.ec2.internal dockerd-current[4934]: time="2019-04-16T03:45:17.138926749-04:00" level=error msg="Error trying v2 registry: error parsing HTTP 404 response body: invalid character 'F' looking for beginning of value: \"File not found.\\\"\""
Apr 16 03:45:17 ip-172-18-13-102.ec2.internal dockerd-current[4934]: time="2019-04-16T03:45:17.138998697-04:00" level=error msg="Attempting next endpoint for pull after error: error parsing HTTP 404 response body: invalid character 'F' looking for beginning of value: \"File not found.\\\"\""


[root@ip-172-18-13-102 ~]# cat /etc/origin/master/master-config.yaml |grep -A 1 imageConfig
imageConfig:
  format: registry.reg-aws.openshift.com:443/openshift3/ose-${component}:${version}
[root@ip-172-18-13-102 ~]# cat /etc/origin/node/node-config.yaml |grep -A 1 imageConfig
imageConfig:
  format: registry.redhat.io/openshift3/ose-${component}:${version}

# docker images|grep ose-pod
registry.reg-aws.openshift.com:443/openshift3/ose-pod             v3.11               b676db41573e        30 hours ago        238 MB
registry.redhat.io/openshift3/ose-pod                             v3.11               d5f897cfbb0d        13 days ago         238 MB
registry.redhat.io/openshift3/ose-pod                             v3.11.88            ff8efa1e789c        6 weeks ago         238 MB

Comment 10 Joseph Callen 2019-04-23 18:38:30 UTC
PR: https://github.com/openshift/openshift-ansible/pull/11541

Comment 12 Joseph Callen 2019-04-26 13:17:49 UTC
Build: openshift-ansible-3.11.110-1

Comment 14 liujia 2019-06-18 02:08:44 UTC
According to comment11, add one step before run upgrade playbook.

Version:
ansible-2.6.17-1.el7ae.noarch
openshift-ansible-3.11.119-1.git.0.c9a8ebf.el7.noarch

1. Install ocp v3.11.104 with default registry.
# docker images|grep ose-pod
registry.redhat.io/openshift3/ose-pod                       v3.11               6759d8752074        3 weeks ago         1.03 GB
registry.redhat.io/openshift3/ose-pod                       v3.11.88            ff8efa1e789c        3 months ago        238 MB

2. Edit inventory file to specify a private registry
oreg_url=registry.reg-aws.openshift.com:443/openshift3/ose-${component}:${version}
oreg_auth_user={{ lookup('env','REG_AUTH_USER') }}
oreg_auth_password={{ lookup('env','REG_AUTH_PASSWORD') }}

3. Run playbooks/openshift-node/imageconfig.yml

4. Run upgrade to update above ocp to latest v3.11(v3.11.117)

Upgrade succeed and images are pulled from specified registry.
# docker images
REPOSITORY                                                                  TAG                 IMAGE ID            CREATED             SIZE
registry.reg-aws.openshift.com:443/openshift3/ose-node                      v3.11               85e87675ef7b        5 days ago          1.2 GB
registry.reg-aws.openshift.com:443/openshift3/ose-control-plane             v3.11               bb262ffdc4ff        5 days ago          820 MB
registry.reg-aws.openshift.com:443/openshift3/ose-deployer                  v3.11.117           146bca0da64b        5 days ago          373 MB
registry.reg-aws.openshift.com:443/openshift3/ose-kube-rbac-proxy           v3.11               b7bd1af18a65        5 days ago          276 MB
registry.reg-aws.openshift.com:443/openshift3/ose-console                   v3.11               0bff93a1dcef        5 days ago          266 MB
registry.reg-aws.openshift.com:443/openshift3/ose-web-console               v3.11               cbddf00dc079        5 days ago          334 MB
registry.reg-aws.openshift.com:443/openshift3/ose-pod                       v3.11               8d0bf3c3b7f3        5 days ago          250 MB
registry.reg-aws.openshift.com:443/openshift3/ose-pod                       v3.11.117           8d0bf3c3b7f3        5 days ago          250 MB
registry.reg-aws.openshift.com:443/openshift3/ose-service-catalog           v3.11               0df332403c62        5 days ago          321 MB
registry.reg-aws.openshift.com:443/openshift3/ose-template-service-broker   v3.11               3b2d527c17b9        5 days ago          324 MB
registry.reg-aws.openshift.com:443/openshift3/prometheus-node-exporter      v3.11               4970129aac25        5 days ago          237 MB
registry.reg-aws.openshift.com:443/openshift3/registry-console              v3.11               33ebc54a7694        5 days ago          246 MB
registry.redhat.io/openshift3/ose-node                                      v3.11               be8a09b5514c        3 weeks ago         1.97 GB
registry.redhat.io/openshift3/ose-control-plane                             v3.11               c33fa4c530a3        3 weeks ago         1.6 GB
registry.redhat.io/openshift3/ose-deployer                                  v3.11.104           1500740029de        3 weeks ago         1.16 GB
registry.redhat.io/openshift3/ose-console                                   v3.11               6e555a73ff6e        3 weeks ago         1.05 GB
registry.redhat.io/openshift3/ose-pod                                       v3.11               6759d8752074        3 weeks ago         1.03 GB
registry.redhat.io/openshift3/ose-service-catalog                           v3.11               410f55e8c706        3 weeks ago         1.1 GB
registry.redhat.io/openshift3/ose-web-console                               v3.11               4c147a14b66f        3 weeks ago         1.12 GB
registry.redhat.io/openshift3/ose-kube-rbac-proxy                           v3.11               cdfa9d0da060        3 weeks ago         1.06 GB
registry.redhat.io/openshift3/ose-template-service-broker                   v3.11               e0f28a2f2555        3 weeks ago         1.11 GB
registry.redhat.io/openshift3/registry-console                              v3.11               38a5af0ed6c5        3 weeks ago         1.03 GB
registry.redhat.io/openshift3/prometheus-node-exporter                      v3.11               0f508556d522        3 weeks ago         1.02 GB
registry.redhat.io/rhel7/etcd                                               3.2.22              d636cc8689ea        2 months ago        259 MB
registry.redhat.io/openshift3/ose-pod                                       v3.11.88            ff8efa1e789c        3 months ago        238 MB
registry.reg-aws.openshift.com:443/openshift3/ose-pod                       v3.11.88            ff8efa1e789c        3 months ago        238 MB
registry.reg-aws.openshift.com:443/rhel7/etcd                               3.2.22              bb2f1d4dd3a7        12 months ago       256 MB

Comment 16 errata-xmlrpc 2019-06-26 09:07:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1605


Note You need to log in before you can comment on or make changes to this bug.