Bug 1629558
| Summary: | Fail to atomic pull node image due to docker service was stopped in previous task | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | liujia <jiajliu> | |
| Component: | Cluster Version Operator | Assignee: | Michael Gugino <mgugino> | |
| Status: | CLOSED ERRATA | QA Contact: | liujia <jiajliu> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 3.11.0 | CC: | aos-bugs, jokerman, mmccomas, wmeng, wsun | |
| Target Milestone: | --- | |||
| Target Release: | 3.11.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1631021 1632865 (view as bug list) | Environment: | ||
| Last Closed: | 2018-11-20 03:10:43 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1631021, 1632865 | |||
block upgrade test against system container node PR Created in master: https://github.com/openshift/openshift-ansible/pull/10125 The PR 10135 has been merged to openshift-ansible-3.11.9-1,please check the bug. Verified on openshift-ansible-3.11.9-1.git.0.63f7970.el7_5.noarch Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3537 |
Description of problem: Upgrade failed at task [openshift_node : Copy node container image to ostree storage]. FAILED - RETRYING: Copy node container image to ostree storage (3 retries left). FAILED - RETRYING: Copy node container image to ostree storage (2 retries left). FAILED - RETRYING: Copy node container image to ostree storage (1 retries left). fatal: [x]: FAILED! => {"attempts": 3, "changed": false, "cmd": ["atomic", "pull", "--storage=ostree", "docker:registry.reg-aws.openshift.com:443/openshift3/ose-node:v3.11"], "delta": "0:00:01.013669", "end": "2018-09-17 05:39:11.586435", "msg": "non-zero return code", "rc": 1, "start": "2018-09-17 05:39:10.572766", "stderr": "time=\"2018-09-17T05:39:11Z\" level=fatal msg=\"Error initializing source docker-daemon:registry.reg-aws.openshift.com:443/openshift3/ose-node:v3.11: Error loading image from docker engine: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?\" ", "stderr_lines": ["time=\"2018-09-17T05:39:11Z\" level=fatal msg=\"Error initializing source docker-daemon:registry.reg-aws.openshift.com:443/openshift3/ose-node:v3.11: Error loading image from docker engine: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?\" "], "stdout": "", "stdout_lines": []} ====================== This should be caused by the previous task [openshift_node : stop docker to kill static pods] ************************, this change merged from pr10030. [root@ip-172-18-14-104 ~]# systemctl status docker ● docker.service - Docker Application Container Engine Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/docker.service.d └─custom.conf /usr/lib/systemd/system/docker.service.d └─flannel.conf Active: inactive (dead) since Mon 2018-09-17 05:37:11 UTC; 4min 39s ago [root@ip-172-18-14-104 ~]# atomic pull --storage=ostree docker:registry.reg-aws.openshift.com:443/openshift3/ose-node:v3.11 FATA[0000] Error initializing source docker-daemon:registry.reg-aws.openshift.com:443/openshift3/ose-node:v3.11: Error loading image from docker engine: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? Version-Release number of the following components: ansible-2.6.4-1.el7ae.noarch openshift-ansible-3.11.7-1.git.0.911481d.el7_5.noarch How reproducible: always Steps to Reproduce: 1. Install ocp v3.10 on atomic with system container node and without service catelog deployed. 2. Upgrade above ocp 3. Actual results: Upgrade failed. Expected results: Upgrade succeed. Additional info: Please attach logs from ansible-playbook with the -vvv flag