Bug 1635257

Summary: OVS Pod is failing
Product: OpenShift Container Platform Reporter: jolee
Component: NetworkingAssignee: Casey Callendrello <cdc>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Meng Bo <bmeng>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.10.0CC: aconole, aos-bugs, cdc, jolee, wmeng
Target Milestone: ---   
Target Release: 3.10.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-19 13:00:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description jolee 2018-10-02 13:17:53 UTC
Description of problem:

atomic-openshift-node: I0926 18:36:18.268991    6814 kuberuntime_manager.go:513] Container {Name:openvswitch Image:registry.access.redhat.com/opens
hift3/ose-node:v3.10 Command:[/bin/bash -c #!/bin/bash
  -node-1 atomic-openshift-node: set -euo pipefail
  -node-1 atomic-openshift-node: # if another process is listening on the cni-server socket, wait until it exits
  -node-1 atomic-openshift-node: trap 'kill $(jobs -p); exit 0' TERM
  -node-1 atomic-openshift-node: retries=0
  -node-1 atomic-openshift-node: while true; do
  -node-1 atomic-openshift-node: if /usr/share/openvswitch/scripts/ovs-ctl status &>/dev/null; then
  -node-1 atomic-openshift-node: echo "warning: Another process is currently managing OVS, waiting 15s ..." 2>&1
  -node-1 atomic-openshift-node: sleep 15 & wait
  -node-1 atomic-openshift-node: (( retries += 1 ))
  -node-1 atomic-openshift-node: else
  -node-1 atomic-openshift-node: break
  -node-1 atomic-openshift-node: fi
  -node-1 atomic-openshift-node: if [[ "${retries}" -gt 40 ]]; then
  -node-1 atomic-openshift-node: echo "error: Another process is currently managing OVS, exiting" 2>&1
  -node-1 atomic-openshift-node: exit 1
  -node-1 atomic-openshift-node: fi
  -node-1 atomic-openshift-node: done
  -node-1 atomic-openshift-node: # launch OVS
  -node-1 atomic-openshift-node: function quit {
  -node-1 atomic-openshift-node: /usr/share/openvswitch/scripts/ovs-ctl stop
  -node-1 atomic-openshift-node: exit 0
  -node-1 atomic-openshift-node: }
  -node-1 atomic-openshift-node: trap quit SIGTERM
  -node-1 atomic-openshift-node: /usr/share/openvswitch/scripts/ovs-ctl start --system-id=random
  -node-1 atomic-openshift-node: # Restrict the number of pthreads ovs-vswitchd creates to reduce the
  -node-1 atomic-openshift-node: # amount of RSS it uses on hosts with many cores
  -node-1 atomic-openshift-node: # https://bugzilla.redhat.com/show_bug.cgi?id=1571379
  -node-1 atomic-openshift-node: # https://bugzilla.redhat.com/show_bug.cgi?id=1572797
  -node-1 atomic-openshift-node: if [[ `nproc` -gt 12 ]]; then
  -node-1 atomic-openshift-node: ovs-vsctl set Open_vSwitch . other_config:n-revalidator-threads=4
  -node-1 atomic-openshift-node: ovs-vsctl set Open_vSwitch . other_config:n-handler-threads=10
  -node-1 atomic-openshift-node: fi
  -node-1 atomic-openshift-node: while true; do sleep 5; done
  -node-1 atomic-openshift-node: ] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[cpu:{i:{value:200 scale:-3} d:{Dec:<nil>} s:200m Form
at:DecimalSI} memory:{i:{value:419430400 scale:0} d:{Dec:<nil>} s: Format:BinarySI}] Requests:map[cpu:{i:{value:100 scale:-3} d:{Dec:<nil>} s:100m Format:DecimalSI} memory:{i:{value:314572800
 scale:0} d:{Dec:<nil>} s:300Mi Format:BinarySI}]} VolumeMounts:[{Name:host-modules ReadOnly:true MountPath:/lib/modules SubPath: MountPropagation:<nil>} {Name:host-run-ovs ReadOnly:false Mou
ntPath:/run/openvswitch SubPath: MountPropagation:<nil>} {Name:host-run-ovs ReadOnly:false MountPath:/var/run/openvswitch SubPath: MountPropagation:<nil>} {Name:host-sys ReadOnly:true MountPa
th:/sys SubPath: MountPropagation:<nil>} {Name:host-config-openvswitch ReadOnly:false MountPath:/etc/openvswitch SubPath: MountPropagation:<nil>} {Name:sdn-token-q47s4 ReadOnly:true MountPath
:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-lo
g TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:nil,Privileged:*true,SELinuxOptions:nil,RunAsUser:*0,RunAsNonRoot:nil,ReadOnlyRootFi
lesystem:nil,AllowPrivilegeEscalation:nil,RunAsGroup:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.



Version-Release number of selected component (if applicable):

OCP 3.9 being used (v3.9.30)


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:

...OpenShift SDN network process is not (yet?) available...  continually re-occurs.


Expected results:

SDN network to start


Additional info:

Comment 1 Casey Callendrello 2018-10-02 13:32:17 UTC
That is just the kubelet echoing out the command it's about to run; it's not an error.

You'll need to debug the SDN the same way you'd debug any other process. Look it up with "oc get pods", look for logs with "oc log", watch for events with "oc get events".

Comment 8 jolee 2018-10-09 13:22:57 UTC
Please confirm that the openshift-sdn plugin is being used for the sdn plugin per Known Issues when upgrading to OpenShift 3.10 [1]. Currently using anything other than the default openshift-sdn plugin encounters a known issue that will not be addressed until "mid/late October 2018".



Diagnostic Steps
grep os_sdn_network_plugin_name -r * 2>/dev/null
<file>:os_sdn_network_plugin_name='redhat/openshift-ovs-multitenant'\

[1]
https://access.redhat.com/solutions/3631141