Bug 1810505 - Failed create pod sandbox due to sandbox already exists
Summary: Failed create pod sandbox due to sandbox already exists
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.2.z
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.5.0
Assignee: Dan Winship
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-05 11:58 UTC by Junqi Zhao
Modified: 2020-08-04 18:03 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: iptables locking problems Consequence: In rare circumstances, a pod could fail to start, and "oc describe pod" would show an event including the text "Failed create pod sandbox ... could not set up pod iptables rules: Another app is currently holding the xtables lock." Fix: We now pass "-w" to iptables in the relevant piece of code Result: iptables waits for the lock, does not fail spuriously.
Clone Of:
Environment:
Last Closed: 2020-08-04 18:03:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift sdn pull 115 0 None closed Bug 1810505: Pass -w to iptables when adding anti-metadata-server rules 2020-11-17 23:06:30 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-08-04 18:03:46 UTC

Description Junqi Zhao 2020-03-05 11:58:10 UTC
Description of problem:
openshift-state-metrics pod is scheduled on rhel workers, but failed to start up, this is the first time to see such error, it is not reproducible every time
# oc -n openshift-monitoring get pod -o wide 
NAME                                           READY   STATUS              RESTARTS   AGE     IP            NODE                               NOMINATED NODE   READINESS GATES
...
openshift-state-metrics-5d4477d447-xjr5j       0/3     ContainerCreating   0          5h54m   <none>        qe-lpt-481-xmb5l-rhel-3            <none>           <none>

# oc -n openshift-monitoring describe pod openshift-state-metrics-5d4477d447-xjr5j
Events:
  Type     Reason                  Age                           From                              Message
  ----     ------                  ----                          ----                              -------
  Warning  FailedScheduling        5h38m (x9 over 5h45m)         default-scheduler                 0/3 nodes are available: 3 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling        5h37m (x2 over 5h37m)         default-scheduler                 0/4 nodes are available: 4 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling        5h37m (x2 over 5h37m)         default-scheduler                 0/5 nodes are available: 5 node(s) had taints that the pod didn't tolerate.
  Warning  FailedScheduling        5h37m                         default-scheduler                 0/7 nodes are available: 7 node(s) had taints that the pod didn't tolerate.
  Normal   Scheduled               5h37m                         default-scheduler                 Successfully assigned openshift-monitoring/openshift-state-metrics-5d4477d447-xjr5j to qe-lpt-481-xmb5l-rhel-3
  Warning  FailedMount             5h37m (x3 over 5h37m)         kubelet, qe-lpt-481-xmb5l-rhel-3  MountVolume.SetUp failed for volume "openshift-state-metrics-tls" : couldn't propagate object cache: timed out waiting for the condition
  Warning  FailedMount             5h37m (x3 over 5h37m)         kubelet, qe-lpt-481-xmb5l-rhel-3  MountVolume.SetUp failed for volume "openshift-state-metrics-token-g46pr" : couldn't propagate object cache: timed out waiting for the condition
  Warning  FailedCreatePodSandBox  5h36m                         kubelet, qe-lpt-481-xmb5l-rhel-3  Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_openshift-state-metrics-5d4477d447-xjr5j_openshift-monitoring_8d5a2f0e-5e9c-11ea-8dc2-fa163effc51f_0(a2995de158cc47d5370774976284faba332fdb2589aa5ce011d45f3deaea1f1b): Multus: Err adding pod to network "openshift-sdn": Multus: error in invoke Delegate add - "openshift-sdn": could not set up pod iptables rules: Another app is currently holding the xtables lock. Perhaps you want to use the -w option?
  Warning  FailedCreatePodSandBox  <invalid> (x1559 over 5h36m)  kubelet, qe-lpt-481-xmb5l-rhel-3  Failed create pod sandbox: rpc error: code = Unknown desc = pod sandbox with name "k8s_openshift-state-metrics-5d4477d447-xjr5j_openshift-monitoring_8d5a2f0e-5e9c-11ea-8dc2-fa163effc51f_0" already exists

# oc get node  --show-labels
NAME                               STATUS   ROLES    AGE     VERSION             LABELS
qe-lpt-481-xmb5l-control-plane-0   Ready    master   7h56m   v1.14.6+6f6155bd9   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=qe-lpt-481-xmb5l-control-plane-0,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.openshift.io/os_id=rhcos
qe-lpt-481-xmb5l-control-plane-1   Ready    master   7h56m   v1.14.6+6f6155bd9   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=qe-lpt-481-xmb5l-control-plane-1,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.openshift.io/os_id=rhcos
qe-lpt-481-xmb5l-control-plane-2   Ready    master   7h56m   v1.14.6+6f6155bd9   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=qe-lpt-481-xmb5l-control-plane-2,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.openshift.io/os_id=rhcos
qe-lpt-481-xmb5l-rhel-0            Ready    worker   6h58m   v1.14.6+6f6155bd9   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=qe-lpt-481-xmb5l-rhel-0,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhel
qe-lpt-481-xmb5l-rhel-1            Ready    worker   6h58m   v1.14.6+6f6155bd9   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=qe-lpt-481-xmb5l-rhel-1,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhel
qe-lpt-481-xmb5l-rhel-2            Ready    worker   6h58m   v1.14.6+6f6155bd9   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=qe-lpt-481-xmb5l-rhel-2,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhel
qe-lpt-481-xmb5l-rhel-3            Ready    worker   6h58m   v1.14.6+6f6155bd9   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=qe-lpt-481-xmb5l-rhel-3,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhel


check the journal logs, same error:
******************************************************
Mar 04 23:57:19 qe-lpt-481-xmb5l-rhel-3 hyperkube[1258]: I0304 23:57:19.547093    1258 status_manager.go:524] Status for pod "alertmanager-main-2_openshift-monitoring(8dcc91a3-5e9c-11ea-8dc2-fa163effc51f)" updatd successfully: (1, {Phase:Pending Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-03-04 23:57:00 -0500 EST Reason: Message:} {Type:Ready Status:Fals LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-03-04 23:57:00 -0500 EST Reason:ContainersNotReady Message:containers with unready status: [alertmanager config-reloader alertmanager-proxy]} Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-03-04 23:57:00 -0500 EST Reason:ContainersNotReady Message:containers with unready status: [alertmanager confg-reloader alertmanager-proxy]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-03-04 23:57:01 -0500 EST Reason: Message:}] Message: Reason: NominatedNodeName: ostIP:10.0.99.37 PodIP: StartTime:2020-03-04 23:57:00 -0500 EST InitContainerStatuses:[] ContainerStatuses:[{Name:alertmanager State:{Waiting:&ContainerStateWaiting{Reason:ContainerCreating,Message:,} Running:ni Terminated:nil} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4abb5e5901b8a47af9c0e23455aeca604809fc25cd213af469bfe2ec82a3253 ImageID: ContainerID:} {Name:alertmanager-proxy State:{Waiting:&ContainerStateWaiting{Reason:ContainerCreating,Message:,} Running:nil Terminated:nil} LastTerminationState:{Waiting:nil Running:il Terminated:nil} Ready:false RestartCount:0 Image:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7ee6e583b29b9879fe539674281a8c3a961c3599e76637299ec4265d08b4fd70 ImageID: ContainerID:} {Name:config-reloder State:{Waiting:&ContainerStateWaiting{Reason:ContainerCreating,Message:,} Running:nil Terminated:nil} LastTerminationState:{Waiting:nil Running:nil Terminated:nil} Ready:false RestartCount:0 Image:quay.io/opnshift-release-dev/ocp-v4.0-art-dev@sha256:2e6be7edcd47f45897b42e052bd9ceebac65f9caa460bf49c7450cb130e228a9 ImageID: ContainerID:}] QOSClass:Burstable})
Mar 04 23:57:19 qe-lpt-481-xmb5l-rhel-3 hyperkube[1258]: E0304 23:57:19.584910    1258 remote_runtime.go:109] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to create pod netwrk sandbox k8s_openshift-state-metrics-5d4477d447-xjr5j_openshift-monitoring_8d5a2f0e-5e9c-11ea-8dc2-fa163effc51f_0(a2995de158cc47d5370774976284faba332fdb2589aa5ce011d45f3deaea1f1b): Multus: Err adding pod to nework "openshift-sdn": Multus: error in invoke Delegate add - "openshift-sdn": could not set up pod iptables rules: Another app is currently holding the xtables lock. Perhaps you want to use the -w option?
Mar 04 23:57:19 qe-lpt-481-xmb5l-rhel-3 hyperkube[1258]: E0304 23:57:19.585072    1258 kuberuntime_sandbox.go:68] CreatePodSandbox for pod "openshift-state-metrics-5d4477d447-xjr5j_openshift-monitoring(8d5a2f0e-e9c-11ea-8dc2-fa163effc51f)" failed: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_openshift-state-metrics-5d4477d447-xjr5j_openshift-monitoring_8d5a2f0e-5e9c-11ea-8dc2-fa163effc51f_0a2995de158cc47d5370774976284faba332fdb2589aa5ce011d45f3deaea1f1b): Multus: Err adding pod to network "openshift-sdn": Multus: error in invoke Delegate add - "openshift-sdn": could not set up pod iptables rules: nother app is currently holding the xtables lock. Perhaps you want to use the -w option?
Mar 04 23:57:19 qe-lpt-481-xmb5l-rhel-3 hyperkube[1258]: E0304 23:57:19.585122    1258 kuberuntime_manager.go:697] createPodSandbox for pod "openshift-state-metrics-5d4477d447-xjr5j_openshift-monitoring(8d5a2f0e5e9c-11ea-8dc2-fa163effc51f)" failed: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_openshift-state-metrics-5d4477d447-xjr5j_openshift-monitoring_8d5a2f0e-5e9c-11ea-8dc2-fa163effc51f_(a2995de158cc47d5370774976284faba332fdb2589aa5ce011d45f3deaea1f1b): Multus: Err adding pod to network "openshift-sdn": Multus: error in invoke Delegate add - "openshift-sdn": could not set up pod iptables rules:Another app is currently holding the xtables lock. Perhaps you want to use the -w option?
Mar 04 23:57:19 qe-lpt-481-xmb5l-rhel-3 hyperkube[1258]: E0304 23:57:19.585337    1258 pod_workers.go:190] Error syncing pod 8d5a2f0e-5e9c-11ea-8dc2-fa163effc51f ("openshift-state-metrics-5d4477d447-xjr5j_openshft-monitoring(8d5a2f0e-5e9c-11ea-8dc2-fa163effc51f)"), skipping: failed to "CreatePodSandbox" for "openshift-state-metrics-5d4477d447-xjr5j_openshift-monitoring(8d5a2f0e-5e9c-11ea-8dc2-fa163effc51f)" with CreateodSandboxError: "CreatePodSandbox for pod \"openshift-state-metrics-5d4477d447-xjr5j_openshift-monitoring(8d5a2f0e-5e9c-11ea-8dc2-fa163effc51f)\" failed: rpc error: code = Unknown desc = failed to create pod netork sandbox k8s_openshift-state-metrics-5d4477d447-xjr5j_openshift-monitoring_8d5a2f0e-5e9c-11ea-8dc2-fa163effc51f_0(a2995de158cc47d5370774976284faba332fdb2589aa5ce011d45f3deaea1f1b): Multus: Err adding pod to ntwork \"openshift-sdn\": Multus: error in invoke Delegate add - \"openshift-sdn\": could not set up pod iptables rules: Another app is currently holding the xtables lock. Perhaps you want to use the -w option?\nMar 04 23:57:19 qe-lpt-481-xmb5l-rhel-3 hyperkube[1258]: I0304 23:57:19.585395    1258 event.go:209] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-monitoring", Name:"openshift-state-metrics-5d4477d44-xjr5j", UID:"8d5a2f0e-5e9c-11ea-8dc2-fa163effc51f", APIVersion:"v1", ResourceVersion:"37857", FieldPath:""}): type: 'Warning' reason: 'FailedCreatePodSandBox' Failed create pod sandbox: rpc error: code = Unknow desc = failed to create pod network sandbox k8s_openshift-state-metrics-5d4477d447-xjr5j_openshift-monitoring_8d5a2f0e-5e9c-11ea-8dc2-fa163effc51f_0(a2995de158cc47d5370774976284faba332fdb2589aa5ce011d45f3deaea11b): Multus: Err adding pod to network "openshift-sdn": Multus: error in invoke Delegate add - "openshift-sdn": could not set up pod iptables rules: Another app is currently holding the xtables lock. Perhaps youwant to use the -w option?
Mar 04 23:57:19 qe-lpt-481-xmb5l-rhel-3 hyperkube[1258]: I0304 23:57:19.637884    1258 kubelet_pods.go:1346] Generating status for "openshift-state-metrics-5d4477d447-xjr5j_openshift-monitoring(8d5a2f0e-5e9c-11e-8dc2-fa163effc51f)"
******************************************************

Version-Release number of selected component (if applicable):
4.2.22

How reproducible:
rarely

Steps to Reproduce:
1. See the description
2.
3.

Actual results:


Expected results:


Additional info:

Comment 4 zhaozhanqi 2020-03-10 06:43:52 UTC
verified this bug on 4.5.0-0.nightly-2020-03-10-002435

Comment 6 errata-xmlrpc 2020-08-04 18:03:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5 image release advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.