Bug 1635257 - OVS Pod is failing
Summary: OVS Pod is failing
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 3.10.z
Assignee: Casey Callendrello
QA Contact: Meng Bo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-10-02 13:17 UTC by jolee
Modified: 2019-03-29 06:37 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-11-19 13:00:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3630581 0 None None None 2018-10-02 13:22:34 UTC

Description jolee 2018-10-02 13:17:53 UTC
Description of problem:

atomic-openshift-node: I0926 18:36:18.268991    6814 kuberuntime_manager.go:513] Container {Name:openvswitch Image:registry.access.redhat.com/opens
hift3/ose-node:v3.10 Command:[/bin/bash -c #!/bin/bash
  -node-1 atomic-openshift-node: set -euo pipefail
  -node-1 atomic-openshift-node: # if another process is listening on the cni-server socket, wait until it exits
  -node-1 atomic-openshift-node: trap 'kill $(jobs -p); exit 0' TERM
  -node-1 atomic-openshift-node: retries=0
  -node-1 atomic-openshift-node: while true; do
  -node-1 atomic-openshift-node: if /usr/share/openvswitch/scripts/ovs-ctl status &>/dev/null; then
  -node-1 atomic-openshift-node: echo "warning: Another process is currently managing OVS, waiting 15s ..." 2>&1
  -node-1 atomic-openshift-node: sleep 15 & wait
  -node-1 atomic-openshift-node: (( retries += 1 ))
  -node-1 atomic-openshift-node: else
  -node-1 atomic-openshift-node: break
  -node-1 atomic-openshift-node: fi
  -node-1 atomic-openshift-node: if [[ "${retries}" -gt 40 ]]; then
  -node-1 atomic-openshift-node: echo "error: Another process is currently managing OVS, exiting" 2>&1
  -node-1 atomic-openshift-node: exit 1
  -node-1 atomic-openshift-node: fi
  -node-1 atomic-openshift-node: done
  -node-1 atomic-openshift-node: # launch OVS
  -node-1 atomic-openshift-node: function quit {
  -node-1 atomic-openshift-node: /usr/share/openvswitch/scripts/ovs-ctl stop
  -node-1 atomic-openshift-node: exit 0
  -node-1 atomic-openshift-node: }
  -node-1 atomic-openshift-node: trap quit SIGTERM
  -node-1 atomic-openshift-node: /usr/share/openvswitch/scripts/ovs-ctl start --system-id=random
  -node-1 atomic-openshift-node: # Restrict the number of pthreads ovs-vswitchd creates to reduce the
  -node-1 atomic-openshift-node: # amount of RSS it uses on hosts with many cores
  -node-1 atomic-openshift-node: # https://bugzilla.redhat.com/show_bug.cgi?id=1571379
  -node-1 atomic-openshift-node: # https://bugzilla.redhat.com/show_bug.cgi?id=1572797
  -node-1 atomic-openshift-node: if [[ `nproc` -gt 12 ]]; then
  -node-1 atomic-openshift-node: ovs-vsctl set Open_vSwitch . other_config:n-revalidator-threads=4
  -node-1 atomic-openshift-node: ovs-vsctl set Open_vSwitch . other_config:n-handler-threads=10
  -node-1 atomic-openshift-node: fi
  -node-1 atomic-openshift-node: while true; do sleep 5; done
  -node-1 atomic-openshift-node: ] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[cpu:{i:{value:200 scale:-3} d:{Dec:<nil>} s:200m Form
at:DecimalSI} memory:{i:{value:419430400 scale:0} d:{Dec:<nil>} s: Format:BinarySI}] Requests:map[cpu:{i:{value:100 scale:-3} d:{Dec:<nil>} s:100m Format:DecimalSI} memory:{i:{value:314572800
 scale:0} d:{Dec:<nil>} s:300Mi Format:BinarySI}]} VolumeMounts:[{Name:host-modules ReadOnly:true MountPath:/lib/modules SubPath: MountPropagation:<nil>} {Name:host-run-ovs ReadOnly:false Mou
ntPath:/run/openvswitch SubPath: MountPropagation:<nil>} {Name:host-run-ovs ReadOnly:false MountPath:/var/run/openvswitch SubPath: MountPropagation:<nil>} {Name:host-sys ReadOnly:true MountPa
th:/sys SubPath: MountPropagation:<nil>} {Name:host-config-openvswitch ReadOnly:false MountPath:/etc/openvswitch SubPath: MountPropagation:<nil>} {Name:sdn-token-q47s4 ReadOnly:true MountPath
:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-lo
g TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:nil,Privileged:*true,SELinuxOptions:nil,RunAsUser:*0,RunAsNonRoot:nil,ReadOnlyRootFi
lesystem:nil,AllowPrivilegeEscalation:nil,RunAsGroup:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.



Version-Release number of selected component (if applicable):

OCP 3.9 being used (v3.9.30)


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:

...OpenShift SDN network process is not (yet?) available...  continually re-occurs.


Expected results:

SDN network to start


Additional info:

Comment 1 Casey Callendrello 2018-10-02 13:32:17 UTC
That is just the kubelet echoing out the command it's about to run; it's not an error.

You'll need to debug the SDN the same way you'd debug any other process. Look it up with "oc get pods", look for logs with "oc log", watch for events with "oc get events".

Comment 8 jolee 2018-10-09 13:22:57 UTC
Please confirm that the openshift-sdn plugin is being used for the sdn plugin per Known Issues when upgrading to OpenShift 3.10 [1]. Currently using anything other than the default openshift-sdn plugin encounters a known issue that will not be addressed until "mid/late October 2018".



Diagnostic Steps
grep os_sdn_network_plugin_name -r * 2>/dev/null
<file>:os_sdn_network_plugin_name='redhat/openshift-ovs-multitenant'\

[1]
https://access.redhat.com/solutions/3631141


Note You need to log in before you can comment on or make changes to this bug.