Created attachment 1339803 [details] Events from jenkins pod Description of problem: [root@starter-ca-central-1-master-692e9 ~]# oc get pods NAME READY STATUS RESTARTS AGE jenkins-1-57f3m 0/1 ContainerCreating 0 6m jenkins-1-deploy 1/1 Running 0 6m [root@starter-ca-central-1-master-692e9 ~]# oc get pods NAME READY STATUS RESTARTS AGE jenkins-1-deploy 0/1 Error 0 10m [root@starter-ca-central-1-master-692e9 ~]# oc logs jenkins-1-deploy --> Scaling jenkins-1 to 1 error: update acceptor rejected jenkins-1: pods for rc 'jmp-test/jenkins-1' took longer than 600 seconds to become available Version-Release number of selected component (if applicable): Master: oc v3.7.0-0.143.3 Nodes: oc v3.6 GA Steps to Reproduce: 1. Instantiate the Jenkins ephemeral template
$ oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE jenkins-1-deploy 0/1 Error 0 1h 10.131.34.57 ip-172-31-20-86.ca-central-1.compute.internal on the node the pod was assigned: $ oc describe node ip-172-31-25-45.ca-central-1.compute.internal ... Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 8d 1h 202 kubelet, ip-172-31-25-45.ca-central-1.compute.internal Normal NodeNotReady Node ip-172-31-25-45.ca-central-1.compute.internal status is now: NodeNotReady 8d 1h 202 kubelet, ip-172-31-25-45.ca-central-1.compute.internal Normal NodeReady Node ip-172-31-25-45.ca-central-1.compute.internal status is now: NodeReady 2d 41m 10 kubelet, ip-172-31-25-45.ca-central-1.compute.internal Warning SystemOOM System OOM encountered so this node is not healthy and explains the timeout.
The deploy worked for me: $ oc project Using project "sjenning-demo" on server "https://api.starter-ca-central-1.openshift.com:443". $ oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE jenkins-1-16hqj 1/1 Running 0 11m 10.131.38.196 ip-172-31-22-176.ca-central-1.compute.internal However, I was on the edge of timing out. The sandbox creation keeps failing due to the iptables issue: RunPodSandbox from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod "jenkins-1-deploy_sjenning-demo" network: CNI request failed with status 400: 'Failed to execute iptables-restore: exit status 4 (Another app is currently holding the xtables lock. Perhaps you want to use the -w option? Eventually it did succeed.
Created attachment 1341136 [details] node-combined.log These are the interleaved log from the node running the deploy pod (jenkins-1-deploy) and the node running the pod to be deployed (jenkins-1-16hqj).
I imagine there is bug already tracking this but can't find it atm. Sending to Networking for processing.
*** Bug 1505167 has been marked as a duplicate of this bug. ***
This is probably the same as https://bugzilla.redhat.com/show_bug.cgi?id=1451902 I'm looking to make sure that we have the patch that reduces the number of calls to iptables, but the real fix will be when the kernel change to fix https://bugzilla.redhat.com/show_bug.cgi?id=1503702 lands.
*** This bug has been marked as a duplicate of bug 1451902 ***