Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1635257 - OVS Pod is failing [NEEDINFO]
OVS Pod is failing
Status: NEW
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking (Show other bugs)
3.10.0
Unspecified Unspecified
unspecified Severity unspecified
: ---
: 3.10.z
Assigned To: Casey Callendrello
Meng Bo
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2018-10-02 09:17 EDT by jolee
Modified: 2018-10-26 09:28 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
aconole: needinfo? (jolee)


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3630581 None None None 2018-10-02 09:22 EDT

  None (edit)
Description jolee 2018-10-02 09:17:53 EDT
Description of problem:

atomic-openshift-node: I0926 18:36:18.268991    6814 kuberuntime_manager.go:513] Container {Name:openvswitch Image:registry.access.redhat.com/opens
hift3/ose-node:v3.10 Command:[/bin/bash -c #!/bin/bash
  -node-1 atomic-openshift-node: set -euo pipefail
  -node-1 atomic-openshift-node: # if another process is listening on the cni-server socket, wait until it exits
  -node-1 atomic-openshift-node: trap 'kill $(jobs -p); exit 0' TERM
  -node-1 atomic-openshift-node: retries=0
  -node-1 atomic-openshift-node: while true; do
  -node-1 atomic-openshift-node: if /usr/share/openvswitch/scripts/ovs-ctl status &>/dev/null; then
  -node-1 atomic-openshift-node: echo "warning: Another process is currently managing OVS, waiting 15s ..." 2>&1
  -node-1 atomic-openshift-node: sleep 15 & wait
  -node-1 atomic-openshift-node: (( retries += 1 ))
  -node-1 atomic-openshift-node: else
  -node-1 atomic-openshift-node: break
  -node-1 atomic-openshift-node: fi
  -node-1 atomic-openshift-node: if [[ "${retries}" -gt 40 ]]; then
  -node-1 atomic-openshift-node: echo "error: Another process is currently managing OVS, exiting" 2>&1
  -node-1 atomic-openshift-node: exit 1
  -node-1 atomic-openshift-node: fi
  -node-1 atomic-openshift-node: done
  -node-1 atomic-openshift-node: # launch OVS
  -node-1 atomic-openshift-node: function quit {
  -node-1 atomic-openshift-node: /usr/share/openvswitch/scripts/ovs-ctl stop
  -node-1 atomic-openshift-node: exit 0
  -node-1 atomic-openshift-node: }
  -node-1 atomic-openshift-node: trap quit SIGTERM
  -node-1 atomic-openshift-node: /usr/share/openvswitch/scripts/ovs-ctl start --system-id=random
  -node-1 atomic-openshift-node: # Restrict the number of pthreads ovs-vswitchd creates to reduce the
  -node-1 atomic-openshift-node: # amount of RSS it uses on hosts with many cores
  -node-1 atomic-openshift-node: # https://bugzilla.redhat.com/show_bug.cgi?id=1571379
  -node-1 atomic-openshift-node: # https://bugzilla.redhat.com/show_bug.cgi?id=1572797
  -node-1 atomic-openshift-node: if [[ `nproc` -gt 12 ]]; then
  -node-1 atomic-openshift-node: ovs-vsctl set Open_vSwitch . other_config:n-revalidator-threads=4
  -node-1 atomic-openshift-node: ovs-vsctl set Open_vSwitch . other_config:n-handler-threads=10
  -node-1 atomic-openshift-node: fi
  -node-1 atomic-openshift-node: while true; do sleep 5; done
  -node-1 atomic-openshift-node: ] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[cpu:{i:{value:200 scale:-3} d:{Dec:<nil>} s:200m Form
at:DecimalSI} memory:{i:{value:419430400 scale:0} d:{Dec:<nil>} s: Format:BinarySI}] Requests:map[cpu:{i:{value:100 scale:-3} d:{Dec:<nil>} s:100m Format:DecimalSI} memory:{i:{value:314572800
 scale:0} d:{Dec:<nil>} s:300Mi Format:BinarySI}]} VolumeMounts:[{Name:host-modules ReadOnly:true MountPath:/lib/modules SubPath: MountPropagation:<nil>} {Name:host-run-ovs ReadOnly:false Mou
ntPath:/run/openvswitch SubPath: MountPropagation:<nil>} {Name:host-run-ovs ReadOnly:false MountPath:/var/run/openvswitch SubPath: MountPropagation:<nil>} {Name:host-sys ReadOnly:true MountPa
th:/sys SubPath: MountPropagation:<nil>} {Name:host-config-openvswitch ReadOnly:false MountPath:/etc/openvswitch SubPath: MountPropagation:<nil>} {Name:sdn-token-q47s4 ReadOnly:true MountPath
:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-lo
g TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:nil,Privileged:*true,SELinuxOptions:nil,RunAsUser:*0,RunAsNonRoot:nil,ReadOnlyRootFi
lesystem:nil,AllowPrivilegeEscalation:nil,RunAsGroup:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.



Version-Release number of selected component (if applicable):

OCP 3.9 being used (v3.9.30)


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:

...OpenShift SDN network process is not (yet?) available...  continually re-occurs.


Expected results:

SDN network to start


Additional info:
Comment 1 Casey Callendrello 2018-10-02 09:32:17 EDT
That is just the kubelet echoing out the command it's about to run; it's not an error.

You'll need to debug the SDN the same way you'd debug any other process. Look it up with "oc get pods", look for logs with "oc log", watch for events with "oc get events".
Comment 8 jolee 2018-10-09 09:22:57 EDT
Please confirm that the openshift-sdn plugin is being used for the sdn plugin per Known Issues when upgrading to OpenShift 3.10 [1]. Currently using anything other than the default openshift-sdn plugin encounters a known issue that will not be addressed until "mid/late October 2018".



Diagnostic Steps
grep os_sdn_network_plugin_name -r * 2>/dev/null
<file>:os_sdn_network_plugin_name='redhat/openshift-ovs-multitenant'\

[1]
https://access.redhat.com/solutions/3631141

Note You need to log in before you can comment on or make changes to this bug.