Hide Forgot
Description of problem: The router pod will become 'CrashLoopBackOff' When set 'router ROUTER_BIND_PORTS_AFTER_SYNC' to true for router. This issue happen on 'system container' install env. and rpm install is working well. Version-Release number of selected component (if applicable): oc v3.9.0-0.53.0 kubernetes v1.9.1+a0ce1bc657 features: Basic-Auth GSSAPI Kerberos SPNEGO How reproducible: always Steps to Reproduce: 1. setup env using 'system container' install 2. Check the router pod is running 3. set the ROUTER_BIND_PORTS_AFTER_SYNC to true for router oc env dc router ROUTER_BIND_PORTS_AFTER_SYNC=true 4. Check the router pod Actual results: step 4: the router pod will become crash, see 'oc describe pod router-10-xxx': router=enabled Tolerations: node.kubernetes.io/memory-pressure:NoSchedule Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 6m (x5 over 7m) default-scheduler 0/2 nodes are available: 1 MatchNodeSelector, 1 PodFitsHostPorts. Normal Scheduled 6m default-scheduler Successfully assigned router-10-t7fmv to qe-zzhao-node-registry-router-1 Normal SuccessfulMountVolume 6m kubelet, qe-zzhao-node-registry-router-1 MountVolume.SetUp succeeded for volume "server-certificate" Normal SuccessfulMountVolume 6m kubelet, qe-zzhao-node-registry-router-1 MountVolume.SetUp succeeded for volume "router-token-sh6jl" Normal Killing 5m (x2 over 6m) kubelet, qe-zzhao-node-registry-router-1 Killing container with id docker://router:Container failed liveness probe.. Container will be killed and recreated. Normal Created 5m (x3 over 6m) kubelet, qe-zzhao-node-registry-router-1 Created container Normal Started 5m (x3 over 6m) kubelet, qe-zzhao-node-registry-router-1 Started container Warning Unhealthy 5m (x6 over 6m) kubelet, qe-zzhao-node-registry-router-1 Liveness probe failed: HTTP probe failed with statuscode: 500 Warning Unhealthy 5m (x6 over 6m) kubelet, qe-zzhao-node-registry-router-1 Readiness probe failed: HTTP probe failed with statuscode: 500 Normal Pulled 1m (x7 over 6m) kubelet, qe-zzhao-node-registry-router-1 Container image "registry.reg-aws.openshift.com:443/openshift3/ose-haproxy-router:v3.9.0-0.53.0" already present on machine Expected results: router can be running with ROUTER_BIND_PORTS_AFTER_SYNC=true Additional info: FYI:rpm install is working well
Can you still reproduce? I saw the behaviour but after deleting the first attempted pod deployed the dc was able to spawn a valid pod, does that happen on your setup?
yes, this issue still can be reproduced in system container installed env. the router cannot be running when enable 'ROUTER_BIND_PORTS_AFTER_SYNC' Even if I delete the first attempted pod.
PR https://github.com/openshift/origin/issues/19009
Commit pushed to master at https://github.com/openshift/origin https://github.com/openshift/origin/commit/978d2bc3de43445e4809193016ee7f658ca1348a Differentiate liveness and readiness probes for router Add a backend to the router controller "/livez" that always returns true. This differentiates the liveness and readiness probes so that a router can be alive and not ready. Bug 1550007
verified in openshift v3.11.0-0.21.0 and issue has been fixed. Operation System: Red Hat Enterprise Linux Atomic Host release 7.5 Cluster Install Method: system container kernel: Linux qe-master-etcd-1 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2652