Bug 1595349
| Summary: | sdn daemonset pods under project openshift-sdn remain in pending state | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Chris Callegari <ccallega> | ||||
| Component: | Installer | Assignee: | Scott Dodson <sdodson> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Johnny Liu <jialiu> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 3.10.0 | CC: | aos-bugs, bbennett, ccallega, cdc, jokerman, mmccomas, xtian | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 3.10.z | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2018-07-03 15:18:55 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Created attachment 1454768 [details]
inventory file
This looks like an installation problem since the sdn service account is not present and the image in the pod spec is not being set. The sdn sa is available within the openshift-sdn project # oc get sa -n openshift-sdn NAME SECRETS AGE builder 2 16h default 2 16h deployer 2 16h sdn 2 16h # oc describe daemonset.apps/ovs -n openshift-sdn
Name: ovs
Selector: app=ovs
Node-Selector: <none>
Labels: app=ovs
component=network
openshift.io/component=network
type=infra
Annotations: image.openshift.io/triggers=[{"from":{"kind":"ImageStreamTag","name":"node:v3.10"},"fieldPath":"spec.template.spec.containers[?(@.name==\"openvswitch\")].image"}]
kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"apps/v1","kind":"DaemonSet","metadata":{"annotations":{"image.openshift.io/triggers":"[{\"from\":{\"kind\":\"ImageStreamTag\",\"name\":\...
kubernetes.io/description=This daemon set launches the openvswitch daemon.
Desired Number of Nodes Scheduled: 3
Current Number of Nodes Scheduled: 3
Number of Nodes Scheduled with Up-to-date Pods: 3
Number of Nodes Scheduled with Available Pods: 0
Number of Nodes Misscheduled: 0
Pods Status: 0 Running / 3 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: app=ovs
component=network
openshift.io/component=network
type=infra
Annotations: scheduler.alpha.kubernetes.io/critical-pod=
Service Account: sdn
Containers:
openvswitch:
Image: registry.reg-aws.openshift.com:443/openshift3/ose-node:v3.10
Port: <none>
Host Port: <none>
Command:
/bin/bash
-c
#!/bin/bash
set -euo pipefail
# if another process is listening on the cni-server socket, wait until it exits
trap 'kill $(jobs -p); exit 0' TERM
retries=0
while true; do
if /usr/share/openvswitch/scripts/ovs-ctl status &>/dev/null; then
echo "warning: Another process is currently managing OVS, waiting 15s ..." 2>&1
sleep 15 & wait
(( retries += 1 ))
else
break
fi
if [[ "${retries}" -gt 40 ]]; then
echo "error: Another process is currently managing OVS, exiting" 2>&1
exit 1
fi
done
# launch OVS
function quit {
/usr/share/openvswitch/scripts/ovs-ctl stop
exit 0
}
trap quit SIGTERM
/usr/share/openvswitch/scripts/ovs-ctl start --system-id=random
# Restrict the number of pthreads ovs-vswitchd creates to reduce the
# amount of RSS it uses on hosts with many cores
# https://bugzilla.redhat.com/show_bug.cgi?id=1571379
# https://bugzilla.redhat.com/show_bug.cgi?id=1572797
if [[ `nproc` -gt 12 ]]; then
ovs-vsctl set Open_vSwitch . other_config:n-revalidator-threads=4
ovs-vsctl set Open_vSwitch . other_config:n-handler-threads=10
fi
while true; do sleep 5; done
Limits:
cpu: 200m
memory: 400Mi
Requests:
cpu: 100m
memory: 300Mi
Environment: <none>
Mounts:
/etc/openvswitch from host-config-openvswitch (rw)
/lib/modules from host-modules (ro)
/run/openvswitch from host-run-ovs (rw)
/sys from host-sys (ro)
/var/run/openvswitch from host-run-ovs (rw)
Volumes:
host-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType:
host-run-ovs:
Type: HostPath (bare host directory volume)
Path: /run/openvswitch
HostPathType:
host-sys:
Type: HostPath (bare host directory volume)
Path: /sys
HostPathType:
host-config-openvswitch:
Type: HostPath (bare host directory volume)
Path: /etc/origin/openvswitch
HostPathType:
Events: <none>
# oc describe daemonset.apps/sdn -n openshift-sdn
Name: sdn
Selector: app=sdn
Node-Selector: <none>
Labels: app=sdn
component=network
openshift.io/component=network
type=infra
Annotations: image.openshift.io/triggers=[
{"from":{"kind":"ImageStreamTag","name":"node:v3.10"},"fieldPath":"spec.template.spec.containers[?(@.name==\"sdn\")].image"}
]
kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"apps/v1","kind":"DaemonSet","metadata":{"annotations":{"image.openshift.io/triggers":"[\n {\"from\":{\"kind\":\"ImageStreamTag\",\"name...
kubernetes.io/description=This daemon set launches the OpenShift networking components (kube-proxy, DNS, and openshift-sdn).
It expects that OVS is running on the node.
Desired Number of Nodes Scheduled: 3
Current Number of Nodes Scheduled: 3
Number of Nodes Scheduled with Up-to-date Pods: 3
Number of Nodes Scheduled with Available Pods: 0
Number of Nodes Misscheduled: 0
Pods Status: 0 Running / 3 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: app=sdn
component=network
openshift.io/component=network
type=infra
Annotations: scheduler.alpha.kubernetes.io/critical-pod=
Service Account: sdn
Containers:
sdn:
Image: registry.reg-aws.openshift.com:443/openshift3/ose-node:v3.10
Port: 10256/TCP
Host Port: 10256/TCP
Command:
/bin/bash
-c
#!/bin/bash
set -euo pipefail
# if another process is listening on the cni-server socket, wait until it exits
trap 'kill $(jobs -p); exit 0' TERM
retries=0
while true; do
if echo 'test' | socat - UNIX-CONNECT:/var/run/openshift-sdn/cni-server.sock >/dev/null; then
echo "warning: Another process is currently listening on the CNI socket, waiting 15s ..." 2>&1
sleep 15 & wait
(( retries += 1 ))
else
break
fi
if [[ "${retries}" -gt 40 ]]; then
echo "error: Another process is currently listening on the CNI socket, exiting" 2>&1
exit 1
fi
done
# if the node config doesn't exist yet, wait until it does
retries=0
while true; do
if [[ ! -f /etc/origin/node/node-config.yaml ]]; then
echo "warning: Cannot find existing node-config.yaml, waiting 15s ..." 2>&1
sleep 15 & wait
(( retries += 1 ))
else
break
fi
if [[ "${retries}" -gt 40 ]]; then
echo "error: No existing node-config.yaml, exiting" 2>&1
exit 1
fi
done
# Take over network functions on the node
rm -Rf /etc/cni/net.d/*
rm -Rf /host/opt/cni/bin/*
cp -Rf /opt/cni/bin/* /host/opt/cni/bin/
if [[ -f /etc/sysconfig/origin-node ]]; then
set -o allexport
source /etc/sysconfig/origin-node
fi
# use either the bootstrapped node kubeconfig or the static configuration
file=/etc/origin/node/node.kubeconfig
if [[ ! -f "${file}" ]]; then
# use the static node config if it exists
# TODO: remove when static node configuration is no longer supported
for f in /etc/origin/node/system*.kubeconfig; do
echo "info: Using ${f} for node configuration" 1>&2
file="${f}"
break
done
fi
# Use the same config as the node, but with the service account token
oc config "--config=${file}" view --flatten > /tmp/kubeconfig
oc config --config=/tmp/kubeconfig set-credentials sa "--token=$( cat /var/run/secrets/kubernetes.io/serviceaccount/token )"
oc config --config=/tmp/kubeconfig set-context "$( oc config --config=/tmp/kubeconfig current-context )" --user=sa
# Launch the network process
exec openshift start network --config=/etc/origin/node/node-config.yaml --kubeconfig=/tmp/kubeconfig --loglevel=${DEBUG_LOGLEVEL:-2}
Requests:
cpu: 100m
memory: 200Mi
Environment:
OPENSHIFT_DNS_DOMAIN: cluster.local
Mounts:
/etc/cni/net.d from host-etc-cni-netd (rw)
/etc/origin/node/ from host-config (ro)
/etc/sysconfig/origin-node from host-sysconfig-node (ro)
/host/opt/cni/bin from host-opt-cni-bin (rw)
/var/lib/cni/networks/openshift-sdn from host-var-lib-cni-networks-openshift-sdn (rw)
/var/run from host-var-run (rw)
/var/run/dbus/ from host-var-run-dbus (ro)
/var/run/kubernetes/ from host-var-run-kubernetes (ro)
/var/run/openshift-sdn from host-var-run-openshift-sdn (rw)
/var/run/openvswitch/ from host-var-run-ovs (ro)
Volumes:
host-config:
Type: HostPath (bare host directory volume)
Path: /etc/origin/node
HostPathType:
host-sysconfig-node:
Type: HostPath (bare host directory volume)
Path: /etc/sysconfig/origin-node
HostPathType:
host-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType:
host-var-run:
Type: HostPath (bare host directory volume)
Path: /var/run
HostPathType:
host-var-run-dbus:
Type: HostPath (bare host directory volume)
Path: /var/run/dbus
HostPathType:
host-var-run-ovs:
Type: HostPath (bare host directory volume)
Path: /var/run/openvswitch
HostPathType:
host-var-run-kubernetes:
Type: HostPath (bare host directory volume)
Path: /var/run/kubernetes
HostPathType:
host-var-run-openshift-sdn:
Type: HostPath (bare host directory volume)
Path: /var/run/openshift-sdn
HostPathType:
host-opt-cni-bin:
Type: HostPath (bare host directory volume)
Path: /opt/cni/bin
HostPathType:
host-etc-cni-netd:
Type: HostPath (bare host directory volume)
Path: /etc/cni/net.d
HostPathType:
host-var-lib-cni-networks-openshift-sdn:
Type: HostPath (bare host directory volume)
Path: /var/lib/cni/networks/openshift-sdn
HostPathType:
Events: <none>
# docker images REPOSITORY TAG IMAGE ID CREATED SIZE registry.reg-aws.openshift.com:443/openshift3/ose-control-plane v3.10 3048d4947e06 39 hours ago 635 MB registry.reg-aws.openshift.com:443/openshift3/ose-node v3.10 e451f33f4929 2 days ago 1.21 GB registry.reg-aws.openshift.com:443/openshift3/ose-pod v3.10.0-0.69.0 4034d27ececb 12 days ago 214 MB registry.reg-aws.openshift.com:443/rhel7/etcd 3.2.15 4f35b6516d22 2 months ago 256 MB # docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES ed96f1525fe6 e451f33f4929 "/bin/bash -c '#!/..." 16 hours ago Up 16 hours k8s_sync_sync-cp6bq_openshift-node_8d8f8e1c-7983-11e8-a2b4-129b8a71d970_0 172cefe4b448 registry.reg-aws.openshift.com:443/openshift3/ose-pod:v3.10.0-0.69.0 "/usr/bin/pod" 16 hours ago Up 16 hours k8s_POD_sync-cp6bq_openshift-node_8d8f8e1c-7983-11e8-a2b4-129b8a71d970_0 d7b2dce5e1cb 4f35b6516d22 "/bin/sh -c '#!/bi..." 16 hours ago Up 16 hours k8s_etcd_master-etcd-ip-172-31-48-171.ec2.internal_kube-system_75433b6b24af16ed51e88817ff1459fd_0 ebe5fe7a7001 registry.reg-aws.openshift.com:443/openshift3/ose-pod:v3.10.0-0.69.0 "/usr/bin/pod" 16 hours ago Up 16 hours k8s_POD_master-etcd-ip-172-31-48-171.ec2.internal_kube-system_75433b6b24af16ed51e88817ff1459fd_0 6e81462a6a0a 3048d4947e06 "/bin/bash -c '#!/..." 16 hours ago Up 16 hours k8s_controllers_master-controllers-ip-172-31-48-171.ec2.internal_kube-system_f8a3ee7782c553ccbfe2ee36e97d899a_0 aa64ed52c4a4 registry.reg-aws.openshift.com:443/openshift3/ose-pod:v3.10.0-0.69.0 "/usr/bin/pod" 16 hours ago Up 16 hours k8s_POD_master-controllers-ip-172-31-48-171.ec2.internal_kube-system_f8a3ee7782c553ccbfe2ee36e97d899a_0 876a4de98e66 3048d4947e06 "/bin/bash -c '#!/..." 16 hours ago Up 16 hours k8s_api_master-api-ip-172-31-48-171.ec2.internal_kube-system_15ee3413164360daf0ba1dab4d07c041_0 6641f1ab745c registry.reg-aws.openshift.com:443/openshift3/ose-pod:v3.10.0-0.69.0 "/usr/bin/pod" 16 hours ago Up 16 hours k8s_POD_master-api-ip-172-31-48-171.ec2.internal_kube-system_15ee3413164360daf0ba1dab4d07c041_0 It sure looks like the ose-node image IS available and downloads correctly. Do you have an imagestream and imagestream tag with the right name? # oc get imagestream -n openshift-sdn NAME DOCKER REPO TAGS UPDATED node docker-registry.default.svc:5000/openshift-sdn/node v3.10 12 minutes ago # oc get imagestreamtag -n openshift-sdn No resources found. I redeployed this morning using 3.10.10-1 Control plane deploys successfully. I'm good with closing this bugzilla. |
Description of problem: sdn daemonset pods under project openshift-sdn remain in pending state Version-Release number of selected component (if applicable): 3.10.x How reproducible: Always Steps to Reproduce: 1. Deploy OpenShift 2. 3. Actual results: I see the following... # oc get all -n openshift-sdn NAME READY STATUS RESTARTS AGE pod/ovs-6g5nx 0/1 Pending 0 54m pod/ovs-nrkb4 0/1 Pending 0 54m pod/ovs-tdd6d 0/1 Pending 0 54m pod/sdn-djhwt 0/1 Pending 0 54m pod/sdn-h8xq7 0/1 Pending 0 54m pod/sdn-km6mj 0/1 Pending 0 54m NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/ovs 3 3 0 3 0 <none> 54m daemonset.apps/sdn 3 3 0 3 0 <none> 54m NAME DOCKER REPO TAGS UPDATED imagestream.image.openshift.io/node docker-registry.default.svc:5000/openshift-sdn/node v3.10 About an hour ago # oc get events -n openshift-sdn LAST SEEN FIRST SEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE 56m 56m 1 ovs.153bc14f3fca39ef DaemonSet Warning FailedCreate daemonset-controller Error creating: pods "ovs-" is forbidden: error looking up service account openshift-sdn/sdn: serviceaccount "sdn" not found 56m 56m 1 ovs.153bc14f7c85aef7 DaemonSet Normal SuccessfulCreate daemonset-controller Created pod: ovs-6g5nx 56m 56m 1 ovs.153bc14f7d11d1a6 DaemonSet Normal SuccessfulCreate daemonset-controller Created pod: ovs-tdd6d 56m 56m 1 ovs.153bc14f7d47a9be DaemonSet Normal SuccessfulCreate daemonset-controller Created pod: ovs-nrkb4 56m 56m 1 sdn.153bc14f43805f00 DaemonSet Warning FailedCreate daemonset-controller Error creating: Pod "sdn-xpbmg" is invalid: spec.containers[0].image: Invalid value: " ": must not have leading or trailing whitespace 56m 56m 1 sdn.153bc14f4471bbd3 DaemonSet Normal SuccessfulCreate daemonset-controller Created pod: sdn-h8xq7 56m 56m 1 sdn.153bc14f450a8570 DaemonSet Normal SuccessfulCreate daemonset-controller Created pod: sdn-km6mj 56m 56m 1 sdn.153bc14f45104ebe DaemonSet Normal SuccessfulCreate daemonset-controller Created pod: sdn-djhwt # oc describe is node Error from server (NotFound): imagestreams.image.openshift.io "node" not found [root@ip-172-31-57-19 ~]# oc describe is node -n openshift-sdn Name: node Namespace: openshift-sdn Created: About an hour ago Labels: <none> Annotations: <none> Docker Pull Spec: docker-registry.default.svc:5000/openshift-sdn/node Image Lookup: local=false Unique Images: 1 Tags: 1 v3.10 reference to registry registry.reg-aws.openshift.com:443/openshift3/ose-node:v3.10 * registry.reg-aws.openshift.com:443/openshift3/ose-node:v3.10 About an hour ago Expected results: I expect pods to deploy correctly Additional info: