Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1571724

Summary: Fail to upgrade containerized ocp due to node service can not start
Product: OpenShift Container Platform Reporter: liujia <jiajliu>
Component: Cluster Version OperatorAssignee: Vadim Rutkovsky <vrutkovs>
Status: CLOSED ERRATA QA Contact: liujia <jiajliu>
Severity: high Docs Contact:
Priority: high    
Version: 3.10.0CC: aos-bugs, jokerman, mmccomas, vrutkovs, wmeng
Target Milestone: ---   
Target Release: 3.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-07-30 19:13:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description liujia 2018-04-25 10:03:32 UTC
Description of problem:
Run upgrade against container ocp. Upgrade fail at task [openshift_node : Wait for node to be ready]. Checked that node can not start. One of reasons should be that old Node service file was not updated to unwants openvwitch because ovs service file was removed before the task.

# cat /etc/systemd/system/atomic-openshift-node.service|grep openv
After=openvswitch.service
Wants=openvswitch.service
PartOf=openvswitch.service
  -v /lib/modules:/lib/modules -v /etc/origin/openvswitch:/etc/openvswitch \


# systemctl status openvswitch.service 
Unit openvswitch.service could not be found.

# systemctl status atomic-openshift-node -l
● atomic-openshift-node.service
   Loaded: loaded (/etc/systemd/system/atomic-openshift-node.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2018-04-25 04:17:03 EDT; 1h 8min ago
 Main PID: 22580 (code=exited, status=1/FAILURE)

Apr 25 04:17:02 qe-jliu-c39-master-etcd-1 atomic-openshift-node[22580]: I0425 04:17:02.536408   22637 factory.go:116] Factory "docker" was unable to handle container "/system.slice/var-lib-origin-openshift.local.volumes-pods-047fedd5\\x2d4861\\x2d11e8\\x2da006\\x2d42010af00014-volumes-kubernetes.io\\x7esecret-sdn\\x2dtoken\\x2d82s7t.mount"
Apr 25 04:17:02 qe-jliu-c39-master-etcd-1 atomic-openshift-node[22580]: I0425 04:17:02.536418   22637 factory.go:109] Factory "systemd" can handle container "/system.slice/var-lib-origin-openshift.local.volumes-pods-047fedd5\\x2d4861\\x2d11e8\\x2da006\\x2d42010af00014-volumes-kubernetes.io\\x7esecret-sdn\\x2dtoken\\x2d82s7t.mount", but ignoring.
Apr 25 04:17:02 qe-jliu-c39-master-etcd-1 atomic-openshift-node[22580]: I0425 04:17:02.536431   22637 manager.go:930] ignoring container "/system.slice/var-lib-origin-openshift.local.volumes-pods-047fedd5\\x2d4861\\x2d11e8\\x2da006\\x2d42010af00014-volumes-kubernetes.io\\x7esecret-sdn\\x2dtoken\\x2d82s7t.mount"
Apr 25 04:17:02 qe-jliu-c39-master-etcd-1 atomic-openshift-node[22580]: I0425 04:17:02.718150   22637 docker_server.go:73] Stop docker server
Apr 25 04:17:02 qe-jliu-c39-master-etcd-1 atomic-openshift-node[23611]: atomic-openshift-node
Apr 25 04:17:02 qe-jliu-c39-master-etcd-1 systemd[1]: atomic-openshift-node.service: main process exited, code=exited, status=1/FAILURE
Apr 25 04:17:03 qe-jliu-c39-master-etcd-1 systemd[1]: Stopped atomic-openshift-node.service.
Apr 25 04:17:03 qe-jliu-c39-master-etcd-1 systemd[1]: Unit atomic-openshift-node.service entered failed state.
Apr 25 04:17:03 qe-jliu-c39-master-etcd-1 systemd[1]: atomic-openshift-node.service failed.
Apr 25 05:00:36 qe-jliu-c39-master-etcd-1 systemd[1]: Cannot add dependency job for unit atomic-openshift-node.service, ignoring: Unit not found.


Version-Release number of the following components:
openshift-ansible-3.10.0-0.28.0.git.0.439cb5c.el7.noarch

How reproducible:
always

Steps to Reproduce:
1. Run upgrade against container ocp without setting openshift_use_system_containers

2.
3.

Actual results:
Upgrade failed.

Expected results:
Upgrade succeed.

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 liujia 2018-04-27 07:16:52 UTC
Container upgrade can not proceed. Need this bug fixed asap.

Comment 2 Scott Dodson 2018-05-03 13:08:41 UTC
https://github.com/openshift/openshift-ansible/pull/8239 WIP

Comment 3 Vadim Rutkovsky 2018-05-07 09:22:12 UTC
Fix is available in openshift-ansible-3.10.0-0.35.0

Comment 4 liujia 2018-05-08 08:31:46 UTC
Blocked verify by bz1575897. Remove testblocker first.

Comment 5 liujia 2018-05-14 09:24:27 UTC
Version:openshift-ansible-3.10.0-0.41.0.git.0.88119e4.el7.noarch

The original issue which caused node service can not start has been fixed. But upgrade against containerized ocp still failed(tracked in another bz1575507). Verify this bug.

Comment 7 errata-xmlrpc 2018-07-30 19:13:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816