Bug 1629558 - Fail to atomic pull node image due to docker service was stopped in previous task
Summary: Fail to atomic pull node image due to docker service was stopped in previous ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.11.0
Assignee: Michael Gugino
QA Contact: liujia
URL:
Whiteboard:
Depends On:
Blocks: 1631021 1632865
TreeView+ depends on / blocked
 
Reported: 2018-09-17 06:00 UTC by liujia
Modified: 2018-11-20 03:11 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1631021 1632865 (view as bug list)
Environment:
Last Closed: 2018-11-20 03:10:43 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:3537 0 None None None 2018-11-20 03:11:19 UTC

Description liujia 2018-09-17 06:00:09 UTC
Description of problem:
Upgrade failed at task [openshift_node : Copy node container image to ostree storage].
FAILED - RETRYING: Copy node container image to ostree storage (3 retries left).
FAILED - RETRYING: Copy node container image to ostree storage (2 retries left).
FAILED - RETRYING: Copy node container image to ostree storage (1 retries left).
fatal: [x]: FAILED! => {"attempts": 3, "changed": false, "cmd": ["atomic", "pull", "--storage=ostree", "docker:registry.reg-aws.openshift.com:443/openshift3/ose-node:v3.11"], "delta": "0:00:01.013669", "end": "2018-09-17 05:39:11.586435", "msg": "non-zero return code", "rc": 1, "start": "2018-09-17 05:39:10.572766", "stderr": "time=\"2018-09-17T05:39:11Z\" level=fatal msg=\"Error initializing source docker-daemon:registry.reg-aws.openshift.com:443/openshift3/ose-node:v3.11: Error loading image from docker engine: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?\" ", "stderr_lines": ["time=\"2018-09-17T05:39:11Z\" level=fatal msg=\"Error initializing source docker-daemon:registry.reg-aws.openshift.com:443/openshift3/ose-node:v3.11: Error loading image from docker engine: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?\" "], "stdout": "", "stdout_lines": []}
======================
This should be caused by the previous task [openshift_node : stop docker to kill static pods] ************************, this change merged from pr10030.

[root@ip-172-18-14-104 ~]# systemctl status docker
● docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/docker.service.d
           └─custom.conf
        /usr/lib/systemd/system/docker.service.d
           └─flannel.conf
   Active: inactive (dead) since Mon 2018-09-17 05:37:11 UTC; 4min 39s ago

[root@ip-172-18-14-104 ~]# atomic pull --storage=ostree docker:registry.reg-aws.openshift.com:443/openshift3/ose-node:v3.11
FATA[0000] Error initializing source docker-daemon:registry.reg-aws.openshift.com:443/openshift3/ose-node:v3.11: Error loading image from docker engine: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? 

Version-Release number of the following components:
ansible-2.6.4-1.el7ae.noarch
openshift-ansible-3.11.7-1.git.0.911481d.el7_5.noarch

How reproducible:
always

Steps to Reproduce:
1. Install ocp v3.10 on atomic with system container node and without service catelog deployed.
2. Upgrade above ocp
3.

Actual results:
Upgrade failed.

Expected results:
Upgrade succeed.

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 3 liujia 2018-09-18 03:48:43 UTC
block upgrade test against system container node

Comment 4 Michael Gugino 2018-09-18 14:57:20 UTC
PR Created in master: https://github.com/openshift/openshift-ansible/pull/10125

Comment 5 Scott Dodson 2018-09-18 20:46:58 UTC
https://github.com/openshift/openshift-ansible/pull/10135 release-3.11

Comment 6 Wei Sun 2018-09-19 06:42:08 UTC
The PR 10135 has been merged to openshift-ansible-3.11.9-1,please check the bug.

Comment 7 liujia 2018-09-19 06:44:25 UTC
Verified on openshift-ansible-3.11.9-1.git.0.63f7970.el7_5.noarch

Comment 9 errata-xmlrpc 2018-11-20 03:10:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3537


Note You need to log in before you can comment on or make changes to this bug.