Bug 1629558

Summary: Fail to atomic pull node image due to docker service was stopped in previous task
Product: OpenShift Container Platform Reporter: liujia <jiajliu>
Component: Cluster Version OperatorAssignee: Michael Gugino <mgugino>
Status: CLOSED ERRATA QA Contact: liujia <jiajliu>
Severity: high Docs Contact:
Priority: high    
Version: 3.11.0CC: aos-bugs, jokerman, mmccomas, wmeng, wsun
Target Milestone: ---   
Target Release: 3.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1631021 1632865 (view as bug list) Environment:
Last Closed: 2018-11-20 03:10:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1631021, 1632865    

Description liujia 2018-09-17 06:00:09 UTC
Description of problem:
Upgrade failed at task [openshift_node : Copy node container image to ostree storage].
FAILED - RETRYING: Copy node container image to ostree storage (3 retries left).
FAILED - RETRYING: Copy node container image to ostree storage (2 retries left).
FAILED - RETRYING: Copy node container image to ostree storage (1 retries left).
fatal: [x]: FAILED! => {"attempts": 3, "changed": false, "cmd": ["atomic", "pull", "--storage=ostree", "docker:registry.reg-aws.openshift.com:443/openshift3/ose-node:v3.11"], "delta": "0:00:01.013669", "end": "2018-09-17 05:39:11.586435", "msg": "non-zero return code", "rc": 1, "start": "2018-09-17 05:39:10.572766", "stderr": "time=\"2018-09-17T05:39:11Z\" level=fatal msg=\"Error initializing source docker-daemon:registry.reg-aws.openshift.com:443/openshift3/ose-node:v3.11: Error loading image from docker engine: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?\" ", "stderr_lines": ["time=\"2018-09-17T05:39:11Z\" level=fatal msg=\"Error initializing source docker-daemon:registry.reg-aws.openshift.com:443/openshift3/ose-node:v3.11: Error loading image from docker engine: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?\" "], "stdout": "", "stdout_lines": []}
======================
This should be caused by the previous task [openshift_node : stop docker to kill static pods] ************************, this change merged from pr10030.

[root@ip-172-18-14-104 ~]# systemctl status docker
● docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/docker.service.d
           └─custom.conf
        /usr/lib/systemd/system/docker.service.d
           └─flannel.conf
   Active: inactive (dead) since Mon 2018-09-17 05:37:11 UTC; 4min 39s ago

[root@ip-172-18-14-104 ~]# atomic pull --storage=ostree docker:registry.reg-aws.openshift.com:443/openshift3/ose-node:v3.11
FATA[0000] Error initializing source docker-daemon:registry.reg-aws.openshift.com:443/openshift3/ose-node:v3.11: Error loading image from docker engine: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? 

Version-Release number of the following components:
ansible-2.6.4-1.el7ae.noarch
openshift-ansible-3.11.7-1.git.0.911481d.el7_5.noarch

How reproducible:
always

Steps to Reproduce:
1. Install ocp v3.10 on atomic with system container node and without service catelog deployed.
2. Upgrade above ocp
3.

Actual results:
Upgrade failed.

Expected results:
Upgrade succeed.

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 3 liujia 2018-09-18 03:48:43 UTC
block upgrade test against system container node

Comment 4 Michael Gugino 2018-09-18 14:57:20 UTC
PR Created in master: https://github.com/openshift/openshift-ansible/pull/10125

Comment 5 Scott Dodson 2018-09-18 20:46:58 UTC
https://github.com/openshift/openshift-ansible/pull/10135 release-3.11

Comment 6 Wei Sun 2018-09-19 06:42:08 UTC
The PR 10135 has been merged to openshift-ansible-3.11.9-1,please check the bug.

Comment 7 liujia 2018-09-19 06:44:25 UTC
Verified on openshift-ansible-3.11.9-1.git.0.63f7970.el7_5.noarch

Comment 9 errata-xmlrpc 2018-11-20 03:10:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3537