Bug 1548677 - Upgrade failed due to ovs2.9 can not start while selinux-policy was not updated
Summary: Upgrade failed due to ovs2.9 can not start while selinux-policy was not updated
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.9.z
Assignee: Aaron Conole
QA Contact: liujia
URL:
Whiteboard:
Depends On: 1549673
Blocks: 1542824
TreeView+ depends on / blocked
 
Reported: 2018-02-24 11:02 UTC by liujia
Modified: 2018-06-18 18:20 UTC (History)
12 users (show)

Fixed In Version: openshift-ansible-3.9.14-1.git.3.c62bc34.el7.noarch
Doc Type: Bug Fix
Doc Text:
Cause: We were using a version of the OVS rpm that did't have the right selinux policy Consequence: OVS failed due to selinux. Fix: Get the right version of the OVS rpm that had the right rules. Result: OVS works.
Clone Of:
Environment:
Last Closed: 2018-06-18 18:20:32 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:2013 normal SHIPPED_LIVE Important: OpenShift Container Platform 3.9 security, bug fix, and enhancement update 2018-06-27 22:01:43 UTC

Description liujia 2018-02-24 11:02:08 UTC
Description of problem:
Upgrade ocp v3.7 to v3.9 with latest ovs2.9 repo enabled. Upgrade failed at task [openshift_node : Wait for node to be ready] due to latest ovs2.9 can not start while selinux-policy was not updated during upgrade.

TASK [openshift_node : Wait for node to be ready] ******************************
task path: /usr/share/ansible/openshift-ansible/roles/openshift_node/tasks/upgrade.yml:39
FAILED - RETRYING: Wait for node to be ready (24 retries left).
...
FAILED - RETRYING: Wait for node to be ready (1 retries left).
fatal: [x.x.x.x -> x.x.x.x]: FAILED! => {"attempts": 24, "changed": false, "results": {"cmd": "/usr/bin/oc get node 172.16.120.107 -o json -n default", "results": [{"apiVersion": "v1", "kind": "Node", "metadata": {"annotations": {"volumes.kubernetes.io/controller-managed-attach-detach": "true"}, "creationTimestamp": "2018-02-24T09:20:24Z", "labels": {"beta.kubernetes.io/arch": "amd64", "beta.kubernetes.io/os": "linux", "failure-domain.beta.kubernetes.io/region": "regionOne", "failure-domain.beta.kubernetes.io/zone": "nova", "kubernetes.io/hostname": "172.16.120.107", "node-role.kubernetes.io/master": "true", "role": "node"}, "name": "172.16.120.107", "resourceVersion": "11129", "selfLink": "/api/v1/nodes/172.16.120.107", "uid": "f1fbb47a-1943-11e8-905f-fa163efd75ed"}, "spec": {"externalID": "518a4cd7-bf05-424e-a3a2-4153966e1741", "providerID": "openstack:///518a4cd7-bf05-424e-a3a2-4153966e1741", "unschedulable": true}, "status": {"addresses": [{"address": "172.16.120.107", "type": "InternalIP"}, {"address": "10.8.244.74", "type": "ExternalIP"}, {"address": "172.16.120.107", "type": "Hostname"}], "allocatable": {"cpu": "2", "memory": "3779276Ki", "pods": "250"}, "capacity": {"cpu": "2", "memory": "3881676Ki", "pods": "250"}, "conditions": [{"lastHeartbeatTime": "2018-02-24T10:36:43Z", "lastTransitionTime": "2018-02-24T09:20:24Z", "message": "kubelet has sufficient disk space available", "reason": "KubeletHasSufficientDisk", "status": "False", "type": "OutOfDisk"}, {"lastHeartbeatTime": "2018-02-24T10:36:43Z", "lastTransitionTime": "2018-02-24T10:37:25Z", "message": "Kubelet stopped posting node status.", "reason": "NodeStatusUnknown", "status": "Unknown", "type": "MemoryPressure"}, {"lastHeartbeatTime": "2018-02-24T10:36:43Z", "lastTransitionTime": "2018-02-24T10:37:25Z", "message": "Kubelet stopped posting node status.", "reason": "NodeStatusUnknown", "status": "Unknown", "type": "DiskPressure"}, {"lastHeartbeatTime": "2018-02-24T10:36:43Z", "lastTransitionTime": "2018-02-24T10:37:25Z", "message": "Kubelet stopped posting node status.", "reason": "NodeStatusUnknown", "status": "Unknown", "type": "Ready"}], "daemonEndpoints": {"kubeletEndpoint": {"Port": 10250}}, "nodeInfo": {"architecture": "amd64", "bootID": "664978a4-fe4b-4fed-8a3d-a9802377ad76", "containerRuntimeVersion": "docker://1.12.6", "kernelVersion": "3.10.0-693.11.1.el7.x86_64", "kubeProxyVersion": "v1.7.6+a08f5eeb62", "kubeletVersion": "v1.7.6+a08f5eeb62", "machineID": "6578704f71144944bcf05068370a5315", "operatingSystem": "linux", "osImage": "Red Hat Enterprise Linux Server 7.4 (Maipo)", "systemUUID": "518A4CD7-BF05-424E-A3A2-4153966E1741"}}}], "returncode": 0}, "state": "list"}

before upgrade:
# rpm -qa|grep selinux-policy
selinux-policy-targeted-3.13.1-166.el7_4.7.noarch
selinux-policy-3.13.1-166.el7_4.7.noarch

# rpm -qa|grep openv
openvswitch-2.7.3-3.git20180112.el7fdp.x86_64

after upgrade:
# rpm -qa|grep selinux-policy
selinux-policy-targeted-3.13.1-166.el7_4.7.noarch
selinux-policy-3.13.1-166.el7_4.7.noarch

# rpm -qa|grep selinux-policy
selinux-policy-targeted-3.13.1-166.el7_4.7.noarch
selinux-policy-3.13.1-166.el7_4.7.noarch

When run "yum update selinux-policy" manually, restart ovs and node service succeed.

Version-Release number of the following components:
openshift-ansible-3.9.0-0.51.0.git.0.e26400f.el7.noarch

How reproducible:
always

Steps to Reproduce:
1.Install ocp v3.7 with external etcd
2.Enable ose and ovs2.9 repos
3.Upgrade above ocp

Actual results:
Upgrade failed.

Expected results:
Upgrade succeed.

Additional info:
# systemctl status atomic-openshift-node.service 
● atomic-openshift-node.service - OpenShift Node
   Loaded: loaded (/etc/systemd/system/atomic-openshift-node.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/atomic-openshift-node.service.d
           └─openshift-sdn-ovs.conf
   Active: failed (Result: exit-code) since Sat 2018-02-24 05:36:49 EST; 11min ago
     Docs: https://github.com/openshift/origin
 Main PID: 25893 (code=exited, status=1/FAILURE)

Feb 24 05:36:49 host-172-16-120-107 systemd[1]: Stopping OpenShift Node...
Feb 24 05:36:49 host-172-16-120-107 atomic-openshift-node[25893]: I0224 05:36:49.576736   25893 docker_server.go:73] Stop docker server
Feb 24 05:36:49 host-172-16-120-107 systemd[1]: atomic-openshift-node.service: main process exited, code=exited, status=1/FAILURE
Feb 24 05:36:49 host-172-16-120-107 systemd[1]: Stopped OpenShift Node.
Feb 24 05:36:49 host-172-16-120-107 systemd[1]: Unit atomic-openshift-node.service entered failed state.
Feb 24 05:36:49 host-172-16-120-107 systemd[1]: atomic-openshift-node.service failed.
Feb 24 05:37:06 host-172-16-120-107 systemd[1]: Dependency failed for OpenShift Node.
Feb 24 05:37:06 host-172-16-120-107 systemd[1]: Job atomic-openshift-node.service/start failed with result 'dependency'.
Feb 24 05:40:08 host-172-16-120-107 systemd[1]: Dependency failed for OpenShift Node.
Feb 24 05:40:08 host-172-16-120-107 systemd[1]: Job atomic-openshift-node.service/start failed with result 'dependency'.
Hint: Some lines were ellipsized, use -l to show in full.

# systemctl status openvswitch.service 
● openvswitch.service - Open vSwitch
   Loaded: loaded (/usr/lib/systemd/system/openvswitch.service; disabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/openvswitch.service.d
           └─01-avoid-oom.conf
   Active: inactive (dead) since Sat 2018-02-24 05:36:49 EST; 11min ago
 Main PID: 25888 (code=exited, status=0/SUCCESS)

Feb 24 05:36:49 host-172-16-120-107 systemd[1]: Stopping Open vSwitch...
Feb 24 05:36:49 host-172-16-120-107 systemd[1]: Stopped Open vSwitch.
Feb 24 05:37:05 host-172-16-120-107 systemd[1]: Dependency failed for Open vSwitch.
Feb 24 05:37:05 host-172-16-120-107 systemd[1]: Job openvswitch.service/start failed with result 'dependency'.
Feb 24 05:37:06 host-172-16-120-107 systemd[1]: Dependency failed for Open vSwitch.
Feb 24 05:37:06 host-172-16-120-107 systemd[1]: Job openvswitch.service/start failed with result 'dependency'.
Feb 24 05:40:07 host-172-16-120-107 systemd[1]: Dependency failed for Open vSwitch.
Feb 24 05:40:07 host-172-16-120-107 systemd[1]: Job openvswitch.service/start failed with result 'dependency'.
Feb 24 05:40:08 host-172-16-120-107 systemd[1]: Dependency failed for Open vSwitch.
Feb 24 05:40:08 host-172-16-120-107 systemd[1]: Job openvswitch.service/start failed with result 'dependency'.

Comment 2 liujia 2018-02-27 02:20:14 UTC
Block ovs upgrade test.

Comment 6 Aaron Conole 2018-02-28 20:22:01 UTC
This is a duplicate of an issue we already have logged.  Thanks for also identifying it.

*** This bug has been marked as a duplicate of bug 1549673 ***

Comment 7 Xiaoli Tian 2018-03-05 09:02:12 UTC
Keep this bug open for better tracking from OpenShift side.

Once OVS bug 1549673 is fixed, we'll verify this bug from OpenShift side.

Comment 13 liujia 2018-04-13 01:27:43 UTC
Version:
openvswitch-2.9.0-15
openshift-ansible-3.9.14-1.git.3.c62bc34.el7.noarch

Steps:
1.Install ocp v3.7
check ovs and selinux policy version:
# rpm -qa|grep selinux-policy
selinux-policy-targeted-3.13.1-166.el7_4.7.noarch
selinux-policy-3.13.1-166.el7_4.7.noarch

# rpm -qa|grep openv
openvswitch-2.7.3-3.git20180112.el7fdp.x86_64

2.Enable ose and ovs2.9 repos
3.Upgrade above ocp

Upgrade succeed. selinux-policy was updated with openvswitch updated.

# rpm -qa|grep openv
openvswitch-2.9.0-15.el7fdp.x86_64
# rpm -qa|grep selinux-policy
selinux-policy-3.13.1-192.el7_5.3.noarch
selinux-policy-targeted-3.13.1-192.el7_5.3.noarch


Note You need to log in before you can comment on or make changes to this bug.