Bug 1571724 - Fail to upgrade containerized ocp due to node service can not start
Summary: Fail to upgrade containerized ocp due to node service can not start
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.10.0
Assignee: Vadim Rutkovsky
QA Contact: liujia
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-25 10:03 UTC by liujia
Modified: 2018-07-30 19:14 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-07-30 19:13:48 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:1816 None None None 2018-07-30 19:14:10 UTC

Description liujia 2018-04-25 10:03:32 UTC
Description of problem:
Run upgrade against container ocp. Upgrade fail at task [openshift_node : Wait for node to be ready]. Checked that node can not start. One of reasons should be that old Node service file was not updated to unwants openvwitch because ovs service file was removed before the task.

# cat /etc/systemd/system/atomic-openshift-node.service|grep openv
After=openvswitch.service
Wants=openvswitch.service
PartOf=openvswitch.service
  -v /lib/modules:/lib/modules -v /etc/origin/openvswitch:/etc/openvswitch \


# systemctl status openvswitch.service 
Unit openvswitch.service could not be found.

# systemctl status atomic-openshift-node -l
● atomic-openshift-node.service
   Loaded: loaded (/etc/systemd/system/atomic-openshift-node.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2018-04-25 04:17:03 EDT; 1h 8min ago
 Main PID: 22580 (code=exited, status=1/FAILURE)

Apr 25 04:17:02 qe-jliu-c39-master-etcd-1 atomic-openshift-node[22580]: I0425 04:17:02.536408   22637 factory.go:116] Factory "docker" was unable to handle container "/system.slice/var-lib-origin-openshift.local.volumes-pods-047fedd5\\x2d4861\\x2d11e8\\x2da006\\x2d42010af00014-volumes-kubernetes.io\\x7esecret-sdn\\x2dtoken\\x2d82s7t.mount"
Apr 25 04:17:02 qe-jliu-c39-master-etcd-1 atomic-openshift-node[22580]: I0425 04:17:02.536418   22637 factory.go:109] Factory "systemd" can handle container "/system.slice/var-lib-origin-openshift.local.volumes-pods-047fedd5\\x2d4861\\x2d11e8\\x2da006\\x2d42010af00014-volumes-kubernetes.io\\x7esecret-sdn\\x2dtoken\\x2d82s7t.mount", but ignoring.
Apr 25 04:17:02 qe-jliu-c39-master-etcd-1 atomic-openshift-node[22580]: I0425 04:17:02.536431   22637 manager.go:930] ignoring container "/system.slice/var-lib-origin-openshift.local.volumes-pods-047fedd5\\x2d4861\\x2d11e8\\x2da006\\x2d42010af00014-volumes-kubernetes.io\\x7esecret-sdn\\x2dtoken\\x2d82s7t.mount"
Apr 25 04:17:02 qe-jliu-c39-master-etcd-1 atomic-openshift-node[22580]: I0425 04:17:02.718150   22637 docker_server.go:73] Stop docker server
Apr 25 04:17:02 qe-jliu-c39-master-etcd-1 atomic-openshift-node[23611]: atomic-openshift-node
Apr 25 04:17:02 qe-jliu-c39-master-etcd-1 systemd[1]: atomic-openshift-node.service: main process exited, code=exited, status=1/FAILURE
Apr 25 04:17:03 qe-jliu-c39-master-etcd-1 systemd[1]: Stopped atomic-openshift-node.service.
Apr 25 04:17:03 qe-jliu-c39-master-etcd-1 systemd[1]: Unit atomic-openshift-node.service entered failed state.
Apr 25 04:17:03 qe-jliu-c39-master-etcd-1 systemd[1]: atomic-openshift-node.service failed.
Apr 25 05:00:36 qe-jliu-c39-master-etcd-1 systemd[1]: Cannot add dependency job for unit atomic-openshift-node.service, ignoring: Unit not found.


Version-Release number of the following components:
openshift-ansible-3.10.0-0.28.0.git.0.439cb5c.el7.noarch

How reproducible:
always

Steps to Reproduce:
1. Run upgrade against container ocp without setting openshift_use_system_containers

2.
3.

Actual results:
Upgrade failed.

Expected results:
Upgrade succeed.

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 liujia 2018-04-27 07:16:52 UTC
Container upgrade can not proceed. Need this bug fixed asap.

Comment 2 Scott Dodson 2018-05-03 13:08:41 UTC
https://github.com/openshift/openshift-ansible/pull/8239 WIP

Comment 3 Vadim Rutkovsky 2018-05-07 09:22:12 UTC
Fix is available in openshift-ansible-3.10.0-0.35.0

Comment 4 liujia 2018-05-08 08:31:46 UTC
Blocked verify by bz1575897. Remove testblocker first.

Comment 5 liujia 2018-05-14 09:24:27 UTC
Version:openshift-ansible-3.10.0-0.41.0.git.0.88119e4.el7.noarch

The original issue which caused node service can not start has been fixed. But upgrade against containerized ocp still failed(tracked in another bz1575507). Verify this bug.

Comment 7 errata-xmlrpc 2018-07-30 19:13:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816


Note You need to log in before you can comment on or make changes to this bug.