Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1595349

Summary: sdn daemonset pods under project openshift-sdn remain in pending state
Product: OpenShift Container Platform Reporter: Chris Callegari <ccallega>
Component: InstallerAssignee: Scott Dodson <sdodson>
Status: CLOSED CURRENTRELEASE QA Contact: Johnny Liu <jialiu>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.10.0CC: aos-bugs, bbennett, ccallega, cdc, jokerman, mmccomas, xtian
Target Milestone: ---   
Target Release: 3.10.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-07-03 15:18:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
inventory file none

Description Chris Callegari 2018-06-26 17:08:31 UTC
Description of problem:
sdn daemonset pods under project openshift-sdn remain in pending state

Version-Release number of selected component (if applicable):
3.10.x

How reproducible:
Always

Steps to Reproduce:
1. Deploy OpenShift
2.
3.

Actual results:
I see the following...
# oc get all -n openshift-sdn
NAME            READY     STATUS    RESTARTS   AGE
pod/ovs-6g5nx   0/1       Pending   0          54m
pod/ovs-nrkb4   0/1       Pending   0          54m
pod/ovs-tdd6d   0/1       Pending   0          54m
pod/sdn-djhwt   0/1       Pending   0          54m
pod/sdn-h8xq7   0/1       Pending   0          54m
pod/sdn-km6mj   0/1       Pending   0          54m

NAME                 DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/ovs   3         3         0         3            0           <none>          54m
daemonset.apps/sdn   3         3         0         3            0           <none>          54m

NAME                                  DOCKER REPO                                           TAGS      UPDATED
imagestream.image.openshift.io/node   docker-registry.default.svc:5000/openshift-sdn/node   v3.10     About an hour ago


# oc get events -n openshift-sdn
LAST SEEN   FIRST SEEN   COUNT     NAME                   KIND        SUBOBJECT   TYPE      REASON             SOURCE                 MESSAGE
56m         56m          1         ovs.153bc14f3fca39ef   DaemonSet               Warning   FailedCreate       daemonset-controller   Error creating: pods "ovs-" is forbidden: error looking up service account openshift-sdn/sdn: serviceaccount "sdn" not found
56m         56m          1         ovs.153bc14f7c85aef7   DaemonSet               Normal    SuccessfulCreate   daemonset-controller   Created pod: ovs-6g5nx
56m         56m          1         ovs.153bc14f7d11d1a6   DaemonSet               Normal    SuccessfulCreate   daemonset-controller   Created pod: ovs-tdd6d
56m         56m          1         ovs.153bc14f7d47a9be   DaemonSet               Normal    SuccessfulCreate   daemonset-controller   Created pod: ovs-nrkb4
56m         56m          1         sdn.153bc14f43805f00   DaemonSet               Warning   FailedCreate       daemonset-controller   Error creating: Pod "sdn-xpbmg" is invalid: spec.containers[0].image: Invalid value: " ": must not have leading or trailing whitespace
56m         56m          1         sdn.153bc14f4471bbd3   DaemonSet               Normal    SuccessfulCreate   daemonset-controller   Created pod: sdn-h8xq7
56m         56m          1         sdn.153bc14f450a8570   DaemonSet               Normal    SuccessfulCreate   daemonset-controller   Created pod: sdn-km6mj
56m         56m          1         sdn.153bc14f45104ebe   DaemonSet               Normal    SuccessfulCreate   daemonset-controller   Created pod: sdn-djhwt


# oc describe is node
Error from server (NotFound): imagestreams.image.openshift.io "node" not found
[root@ip-172-31-57-19 ~]# oc describe is node -n openshift-sdn
Name:			node
Namespace:		openshift-sdn
Created:		About an hour ago
Labels:			<none>
Annotations:		<none>
Docker Pull Spec:	docker-registry.default.svc:5000/openshift-sdn/node
Image Lookup:		local=false
Unique Images:		1
Tags:			1

v3.10
  reference to registry registry.reg-aws.openshift.com:443/openshift3/ose-node:v3.10

  * registry.reg-aws.openshift.com:443/openshift3/ose-node:v3.10
      About an hour ago


Expected results:
I expect pods to deploy correctly

Additional info:

Comment 1 Chris Callegari 2018-06-26 17:31:52 UTC
Created attachment 1454768 [details]
inventory file

Comment 2 Ben Bennett 2018-06-27 13:38:22 UTC
This looks like an installation problem since the sdn service account is not present and the image in the pod spec is not being set.

Comment 3 Chris Callegari 2018-06-27 13:40:52 UTC
The sdn sa is available within the openshift-sdn project

Comment 4 Chris Callegari 2018-06-27 13:41:54 UTC
# oc get sa -n openshift-sdn
NAME       SECRETS   AGE
builder    2         16h
default    2         16h
deployer   2         16h
sdn        2         16h

Comment 5 Chris Callegari 2018-06-27 13:47:53 UTC
# oc describe daemonset.apps/ovs -n openshift-sdn
Name:           ovs
Selector:       app=ovs
Node-Selector:  <none>
Labels:         app=ovs
                component=network
                openshift.io/component=network
                type=infra
Annotations:    image.openshift.io/triggers=[{"from":{"kind":"ImageStreamTag","name":"node:v3.10"},"fieldPath":"spec.template.spec.containers[?(@.name==\"openvswitch\")].image"}]

  kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"apps/v1","kind":"DaemonSet","metadata":{"annotations":{"image.openshift.io/triggers":"[{\"from\":{\"kind\":\"ImageStreamTag\",\"name\":\...
  kubernetes.io/description=This daemon set launches the openvswitch daemon.

Desired Number of Nodes Scheduled: 3
Current Number of Nodes Scheduled: 3
Number of Nodes Scheduled with Up-to-date Pods: 3
Number of Nodes Scheduled with Available Pods: 0
Number of Nodes Misscheduled: 0
Pods Status:  0 Running / 3 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           app=ovs
                    component=network
                    openshift.io/component=network
                    type=infra
  Annotations:      scheduler.alpha.kubernetes.io/critical-pod=
  Service Account:  sdn
  Containers:
   openvswitch:
    Image:      registry.reg-aws.openshift.com:443/openshift3/ose-node:v3.10
    Port:       <none>
    Host Port:  <none>
    Command:
      /bin/bash
      -c
      #!/bin/bash
set -euo pipefail

# if another process is listening on the cni-server socket, wait until it exits
trap 'kill $(jobs -p); exit 0' TERM
retries=0
while true; do
  if /usr/share/openvswitch/scripts/ovs-ctl status &>/dev/null; then
    echo "warning: Another process is currently managing OVS, waiting 15s ..." 2>&1
    sleep 15 & wait
    (( retries += 1 ))
  else
    break
  fi
  if [[ "${retries}" -gt 40 ]]; then
    echo "error: Another process is currently managing OVS, exiting" 2>&1
    exit 1
  fi
done

# launch OVS
function quit {
    /usr/share/openvswitch/scripts/ovs-ctl stop
    exit 0
}
trap quit SIGTERM
/usr/share/openvswitch/scripts/ovs-ctl start --system-id=random

# Restrict the number of pthreads ovs-vswitchd creates to reduce the
# amount of RSS it uses on hosts with many cores
# https://bugzilla.redhat.com/show_bug.cgi?id=1571379
# https://bugzilla.redhat.com/show_bug.cgi?id=1572797
if [[ `nproc` -gt 12 ]]; then
    ovs-vsctl set Open_vSwitch . other_config:n-revalidator-threads=4
    ovs-vsctl set Open_vSwitch . other_config:n-handler-threads=10
fi
while true; do sleep 5; done

    Limits:
      cpu:     200m
      memory:  400Mi
    Requests:
      cpu:        100m
      memory:     300Mi
    Environment:  <none>
    Mounts:
      /etc/openvswitch from host-config-openvswitch (rw)
      /lib/modules from host-modules (ro)
      /run/openvswitch from host-run-ovs (rw)
      /sys from host-sys (ro)
      /var/run/openvswitch from host-run-ovs (rw)
  Volumes:
   host-modules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:
   host-run-ovs:
    Type:          HostPath (bare host directory volume)
    Path:          /run/openvswitch
    HostPathType:
   host-sys:
    Type:          HostPath (bare host directory volume)
    Path:          /sys
    HostPathType:
   host-config-openvswitch:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/origin/openvswitch
    HostPathType:
Events:            <none>

Comment 6 Chris Callegari 2018-06-27 13:48:21 UTC
# oc describe daemonset.apps/sdn -n openshift-sdn
Name:           sdn
Selector:       app=sdn
Node-Selector:  <none>
Labels:         app=sdn
                component=network
                openshift.io/component=network
                type=infra
Annotations:    image.openshift.io/triggers=[
  {"from":{"kind":"ImageStreamTag","name":"node:v3.10"},"fieldPath":"spec.template.spec.containers[?(@.name==\"sdn\")].image"}
]

  kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"apps/v1","kind":"DaemonSet","metadata":{"annotations":{"image.openshift.io/triggers":"[\n  {\"from\":{\"kind\":\"ImageStreamTag\",\"name...
  kubernetes.io/description=This daemon set launches the OpenShift networking components (kube-proxy, DNS, and openshift-sdn).
It expects that OVS is running on the node.

Desired Number of Nodes Scheduled: 3
Current Number of Nodes Scheduled: 3
Number of Nodes Scheduled with Up-to-date Pods: 3
Number of Nodes Scheduled with Available Pods: 0
Number of Nodes Misscheduled: 0
Pods Status:  0 Running / 3 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           app=sdn
                    component=network
                    openshift.io/component=network
                    type=infra
  Annotations:      scheduler.alpha.kubernetes.io/critical-pod=
  Service Account:  sdn
  Containers:
   sdn:
    Image:      registry.reg-aws.openshift.com:443/openshift3/ose-node:v3.10
    Port:       10256/TCP
    Host Port:  10256/TCP
    Command:
      /bin/bash
      -c
      #!/bin/bash
set -euo pipefail

# if another process is listening on the cni-server socket, wait until it exits
trap 'kill $(jobs -p); exit 0' TERM
retries=0
while true; do
  if echo 'test' | socat - UNIX-CONNECT:/var/run/openshift-sdn/cni-server.sock >/dev/null; then
    echo "warning: Another process is currently listening on the CNI socket, waiting 15s ..." 2>&1
    sleep 15 & wait
    (( retries += 1 ))
  else
    break
  fi
  if [[ "${retries}" -gt 40 ]]; then
    echo "error: Another process is currently listening on the CNI socket, exiting" 2>&1
    exit 1
  fi
done
# if the node config doesn't exist yet, wait until it does
retries=0
while true; do
  if [[ ! -f /etc/origin/node/node-config.yaml ]]; then
    echo "warning: Cannot find existing node-config.yaml, waiting 15s ..." 2>&1
    sleep 15 & wait
    (( retries += 1 ))
  else
    break
  fi
  if [[ "${retries}" -gt 40 ]]; then
    echo "error: No existing node-config.yaml, exiting" 2>&1
    exit 1
  fi
done

# Take over network functions on the node
rm -Rf /etc/cni/net.d/*
rm -Rf /host/opt/cni/bin/*
cp -Rf /opt/cni/bin/* /host/opt/cni/bin/

if [[ -f /etc/sysconfig/origin-node ]]; then
  set -o allexport
  source /etc/sysconfig/origin-node
fi

# use either the bootstrapped node kubeconfig or the static configuration
file=/etc/origin/node/node.kubeconfig
if [[ ! -f "${file}" ]]; then
  # use the static node config if it exists
  # TODO: remove when static node configuration is no longer supported
  for f in /etc/origin/node/system*.kubeconfig; do
    echo "info: Using ${f} for node configuration" 1>&2
    file="${f}"
    break
  done
fi
# Use the same config as the node, but with the service account token
oc config "--config=${file}" view --flatten > /tmp/kubeconfig
oc config --config=/tmp/kubeconfig set-credentials sa "--token=$( cat /var/run/secrets/kubernetes.io/serviceaccount/token )"
oc config --config=/tmp/kubeconfig set-context "$( oc config --config=/tmp/kubeconfig current-context )" --user=sa
# Launch the network process
exec openshift start network --config=/etc/origin/node/node-config.yaml --kubeconfig=/tmp/kubeconfig --loglevel=${DEBUG_LOGLEVEL:-2}

    Requests:
      cpu:     100m
      memory:  200Mi
    Environment:
      OPENSHIFT_DNS_DOMAIN:  cluster.local
    Mounts:
      /etc/cni/net.d from host-etc-cni-netd (rw)
      /etc/origin/node/ from host-config (ro)
      /etc/sysconfig/origin-node from host-sysconfig-node (ro)
      /host/opt/cni/bin from host-opt-cni-bin (rw)
      /var/lib/cni/networks/openshift-sdn from host-var-lib-cni-networks-openshift-sdn (rw)
      /var/run from host-var-run (rw)
      /var/run/dbus/ from host-var-run-dbus (ro)
      /var/run/kubernetes/ from host-var-run-kubernetes (ro)
      /var/run/openshift-sdn from host-var-run-openshift-sdn (rw)
      /var/run/openvswitch/ from host-var-run-ovs (ro)
  Volumes:
   host-config:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/origin/node
    HostPathType:
   host-sysconfig-node:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/sysconfig/origin-node
    HostPathType:
   host-modules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:
   host-var-run:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run
    HostPathType:
   host-var-run-dbus:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/dbus
    HostPathType:
   host-var-run-ovs:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/openvswitch
    HostPathType:
   host-var-run-kubernetes:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/kubernetes
    HostPathType:
   host-var-run-openshift-sdn:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/openshift-sdn
    HostPathType:
   host-opt-cni-bin:
    Type:          HostPath (bare host directory volume)
    Path:          /opt/cni/bin
    HostPathType:
   host-etc-cni-netd:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/cni/net.d
    HostPathType:
   host-var-lib-cni-networks-openshift-sdn:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/cni/networks/openshift-sdn
    HostPathType:
Events:            <none>

Comment 7 Chris Callegari 2018-06-27 13:49:42 UTC
# docker images
REPOSITORY                                                        TAG                 IMAGE ID            CREATED             SIZE
registry.reg-aws.openshift.com:443/openshift3/ose-control-plane   v3.10               3048d4947e06        39 hours ago        635 MB
registry.reg-aws.openshift.com:443/openshift3/ose-node            v3.10               e451f33f4929        2 days ago          1.21 GB
registry.reg-aws.openshift.com:443/openshift3/ose-pod             v3.10.0-0.69.0      4034d27ececb        12 days ago         214 MB
registry.reg-aws.openshift.com:443/rhel7/etcd                     3.2.15              4f35b6516d22        2 months ago        256 MB


# docker ps
CONTAINER ID        IMAGE                                                                  COMMAND                  CREATED             STATUS              PORTS               NAMES
ed96f1525fe6        e451f33f4929                                                           "/bin/bash -c '#!/..."   16 hours ago        Up 16 hours                             k8s_sync_sync-cp6bq_openshift-node_8d8f8e1c-7983-11e8-a2b4-129b8a71d970_0
172cefe4b448        registry.reg-aws.openshift.com:443/openshift3/ose-pod:v3.10.0-0.69.0   "/usr/bin/pod"           16 hours ago        Up 16 hours                             k8s_POD_sync-cp6bq_openshift-node_8d8f8e1c-7983-11e8-a2b4-129b8a71d970_0
d7b2dce5e1cb        4f35b6516d22                                                           "/bin/sh -c '#!/bi..."   16 hours ago        Up 16 hours                             k8s_etcd_master-etcd-ip-172-31-48-171.ec2.internal_kube-system_75433b6b24af16ed51e88817ff1459fd_0
ebe5fe7a7001        registry.reg-aws.openshift.com:443/openshift3/ose-pod:v3.10.0-0.69.0   "/usr/bin/pod"           16 hours ago        Up 16 hours                             k8s_POD_master-etcd-ip-172-31-48-171.ec2.internal_kube-system_75433b6b24af16ed51e88817ff1459fd_0
6e81462a6a0a        3048d4947e06                                                           "/bin/bash -c '#!/..."   16 hours ago        Up 16 hours                             k8s_controllers_master-controllers-ip-172-31-48-171.ec2.internal_kube-system_f8a3ee7782c553ccbfe2ee36e97d899a_0
aa64ed52c4a4        registry.reg-aws.openshift.com:443/openshift3/ose-pod:v3.10.0-0.69.0   "/usr/bin/pod"           16 hours ago        Up 16 hours                             k8s_POD_master-controllers-ip-172-31-48-171.ec2.internal_kube-system_f8a3ee7782c553ccbfe2ee36e97d899a_0
876a4de98e66        3048d4947e06                                                           "/bin/bash -c '#!/..."   16 hours ago        Up 16 hours                             k8s_api_master-api-ip-172-31-48-171.ec2.internal_kube-system_15ee3413164360daf0ba1dab4d07c041_0
6641f1ab745c        registry.reg-aws.openshift.com:443/openshift3/ose-pod:v3.10.0-0.69.0   "/usr/bin/pod"           16 hours ago        Up 16 hours                             k8s_POD_master-api-ip-172-31-48-171.ec2.internal_kube-system_15ee3413164360daf0ba1dab4d07c041_0


It sure looks like the ose-node image IS available and downloads correctly.

Comment 8 Ben Bennett 2018-06-27 14:17:47 UTC
Do you have an imagestream and imagestream tag with the right name?

Comment 10 Chris Callegari 2018-06-27 17:38:53 UTC
# oc get imagestream -n openshift-sdn
NAME      DOCKER REPO                                           TAGS      UPDATED
node      docker-registry.default.svc:5000/openshift-sdn/node   v3.10     12 minutes ago

# oc get imagestreamtag -n openshift-sdn
No resources found.

Comment 12 Chris Callegari 2018-07-02 19:36:08 UTC
I redeployed this morning using 3.10.10-1

Control plane deploys successfully.

I'm good with closing this bugzilla.