1571724 – Fail to upgrade containerized ocp due to node service can not start

Bug 1571724 - Fail to upgrade containerized ocp due to node service can not start

Summary: Fail to upgrade containerized ocp due to node service can not start

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cluster Version Operator
Sub Component:
Version:	3.10.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	3.10.0
Assignee:	Vadim Rutkovsky
QA Contact:	liujia
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-04-25 10:03 UTC by liujia
Modified:	2018-07-30 19:14 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-07-30 19:13:48 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2018:1816	0	None	None	None	2018-07-30 19:14:10 UTC

Description liujia 2018-04-25 10:03:32 UTC

Description of problem:
Run upgrade against container ocp. Upgrade fail at task [openshift_node : Wait for node to be ready]. Checked that node can not start. One of reasons should be that old Node service file was not updated to unwants openvwitch because ovs service file was removed before the task.

# cat /etc/systemd/system/atomic-openshift-node.service|grep openv
After=openvswitch.service
Wants=openvswitch.service
PartOf=openvswitch.service
  -v /lib/modules:/lib/modules -v /etc/origin/openvswitch:/etc/openvswitch \


# systemctl status openvswitch.service 
Unit openvswitch.service could not be found.

# systemctl status atomic-openshift-node -l
● atomic-openshift-node.service
   Loaded: loaded (/etc/systemd/system/atomic-openshift-node.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2018-04-25 04:17:03 EDT; 1h 8min ago
 Main PID: 22580 (code=exited, status=1/FAILURE)

Apr 25 04:17:02 qe-jliu-c39-master-etcd-1 atomic-openshift-node[22580]: I0425 04:17:02.536408   22637 factory.go:116] Factory "docker" was unable to handle container "/system.slice/var-lib-origin-openshift.local.volumes-pods-047fedd5\\x2d4861\\x2d11e8\\x2da006\\x2d42010af00014-volumes-kubernetes.io\\x7esecret-sdn\\x2dtoken\\x2d82s7t.mount"
Apr 25 04:17:02 qe-jliu-c39-master-etcd-1 atomic-openshift-node[22580]: I0425 04:17:02.536418   22637 factory.go:109] Factory "systemd" can handle container "/system.slice/var-lib-origin-openshift.local.volumes-pods-047fedd5\\x2d4861\\x2d11e8\\x2da006\\x2d42010af00014-volumes-kubernetes.io\\x7esecret-sdn\\x2dtoken\\x2d82s7t.mount", but ignoring.
Apr 25 04:17:02 qe-jliu-c39-master-etcd-1 atomic-openshift-node[22580]: I0425 04:17:02.536431   22637 manager.go:930] ignoring container "/system.slice/var-lib-origin-openshift.local.volumes-pods-047fedd5\\x2d4861\\x2d11e8\\x2da006\\x2d42010af00014-volumes-kubernetes.io\\x7esecret-sdn\\x2dtoken\\x2d82s7t.mount"
Apr 25 04:17:02 qe-jliu-c39-master-etcd-1 atomic-openshift-node[22580]: I0425 04:17:02.718150   22637 docker_server.go:73] Stop docker server
Apr 25 04:17:02 qe-jliu-c39-master-etcd-1 atomic-openshift-node[23611]: atomic-openshift-node
Apr 25 04:17:02 qe-jliu-c39-master-etcd-1 systemd[1]: atomic-openshift-node.service: main process exited, code=exited, status=1/FAILURE
Apr 25 04:17:03 qe-jliu-c39-master-etcd-1 systemd[1]: Stopped atomic-openshift-node.service.
Apr 25 04:17:03 qe-jliu-c39-master-etcd-1 systemd[1]: Unit atomic-openshift-node.service entered failed state.
Apr 25 04:17:03 qe-jliu-c39-master-etcd-1 systemd[1]: atomic-openshift-node.service failed.
Apr 25 05:00:36 qe-jliu-c39-master-etcd-1 systemd[1]: Cannot add dependency job for unit atomic-openshift-node.service, ignoring: Unit not found.


Version-Release number of the following components:
openshift-ansible-3.10.0-0.28.0.git.0.439cb5c.el7.noarch

How reproducible:
always

Steps to Reproduce:
1. Run upgrade against container ocp without setting openshift_use_system_containers

2.
3.

Actual results:
Upgrade failed.

Expected results:
Upgrade succeed.

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 liujia 2018-04-27 07:16:52 UTC

Container upgrade can not proceed. Need this bug fixed asap.

Comment 2 Scott Dodson 2018-05-03 13:08:41 UTC

https://github.com/openshift/openshift-ansible/pull/8239 WIP

Comment 3 Vadim Rutkovsky 2018-05-07 09:22:12 UTC

Fix is available in openshift-ansible-3.10.0-0.35.0

Comment 4 liujia 2018-05-08 08:31:46 UTC

Blocked verify by bz1575897. Remove testblocker first.

Comment 5 liujia 2018-05-14 09:24:27 UTC

Version:openshift-ansible-3.10.0-0.41.0.git.0.88119e4.el7.noarch

The original issue which caused node service can not start has been fixed. But upgrade against containerized ocp still failed(tracked in another bz1575507). Verify this bug.

Comment 7 errata-xmlrpc 2018-07-30 19:13:48 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816

Note You need to log in before you can comment on or make changes to this bug.