Bug 1421035

Summary: Router pod keep restarting then become CrashLoopBackOff in container network mode
Product: OpenShift Container Platform Reporter: Yan Du <yadu>
Component: NetworkingAssignee: Weibin Liang <weliang>
Networking sub component: router QA Contact: zhaozhanqi <zzhao>
Status: CLOSED NOTABUG Docs Contact:
Severity: medium    
Priority: medium CC: aos-bugs, bbennett, ichavero, weliang, yadu
Version: 3.5.0Keywords: Reopened
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-03-01 17:01:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
kube describe router none

Description Yan Du 2017-02-10 07:51:19 UTC
Description of problem:
Create router in container network mode (with the option --host-network=false), router pod keep restarting then become CrashLoopBackOff in container network mode


Version-Release number of selected component (if applicable):
openshift v3.5.0.18+9a5d1aa
kubernetes v1.5.2+43a9be4
etcd 3.1.0
ose-haproxy-router   v3.5.0.18           109538c1aad4


How reproducible:
always


Steps to Reproduce:
1. Create router in container network mode
# oadm router router --hostnetwork=false


Actual results:
Router pod keep restarting then become CrashLoopBackOff in container network mode
#oc get pod -w
NAME             READY     STATUS    RESTARTS   AGE
router-2-ln8h9   0/1       Running   5         2m
router-2-ln8h9   0/1       CrashLoopBackOff   5         3m


Got some error log as below:

  11m    11m    1    {kubelet qe-yadu-node-registry-router-1}    spec.containers{router}    Normal    Created        Created container with docker id 4c939ff9195b; Security:[seccomp=unconfined]
  11m    11m    1    {kubelet qe-yadu-node-registry-router-1}    spec.containers{router}    Normal    Started        Started container with docker id 4c939ff9195b
  11m    11m    1    {kubelet qe-yadu-node-registry-router-1}    spec.containers{router}    Normal    Killing        Killing container with docker id 4c939ff9195b: pod "router-2-ln8h9_default(c2b5b998-ef5b-11e6-8298-42010af00031)" container "router" is unhealthy, it will be killed and re-created.
  5m    5m    1    {kubelet qe-yadu-node-registry-router-1}    spec.containers{router}    Normal    Created        Created container with docker id 92bea969e15f; Security:[seccomp=unconfined]
  5m    5m    1    {kubelet qe-yadu-node-registry-router-1}    spec.containers{router}    Normal    Started        Started container with docker id 92bea969e15f
  5m    5m    1    {kubelet qe-yadu-node-registry-router-1}    spec.containers{router}    Normal    Killing        Killing container with docker id 92bea969e15f: pod "router-2-ln8h9_default(c2b5b998-ef5b-11e6-8298-42010af00031)" container "router" is unhealthy, it will be killed and re-created.
  18m    5m    12    {kubelet qe-yadu-node-registry-router-1}    spec.containers{router}    Warning    Unhealthy    Readiness probe failed: Get http://localhost:1936/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  18m    5m    12    {kubelet qe-yadu-node-registry-router-1}    spec.containers{router}    Warning    Unhealthy    Liveness probe failed: Get http://localhost:1936/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  5m    5m    1    {kubelet qe-yadu-node-registry-router-1}    spec.containers{router}    Normal    Killing        (events with common reason combined)
  11m    15s    48    {kubelet qe-yadu-node-registry-router-1}                Warning    FailedSync    Error syncing pod, skipping: failed to "StartContainer" for "router" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=router pod=router-2-ln8h9_default(c2b5b998-ef5b-11e6-8298-42010af00031)"

  16m    15s    73    {kubelet qe-yadu-node-registry-router-1}    spec.containers{router}    Warning    BackOff    Back-off restarting failed docker container
  18m    3s    11    {kubelet qe-yadu-node-registry-router-1}    spec.containers{router}    Normal    Pulled    Container image "registry.ops.openshift.com/openshift3/ose-haproxy-router:v3.5.0.18" already present on machine


Expected results
Router could work well in container network mode

Comment 1 Ben Bennett 2017-02-14 13:54:30 UTC
Jake can't reproduce this with Origin latest.  Weibin, can you try with OSE please?

Comment 2 Weibin Liang 2017-02-14 14:41:34 UTC
Same v3.5.0.18 failed on evn.


[root@dhcp-41-87 byo]# oc version
oc v3.5.0.18+9a5d1aa
kubernetes v1.5.2+43a9be4
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://dhcp-41-87.bos.redhat.com:8443
openshift v3.5.0.18+9a5d1aa
kubernetes v1.5.2+43a9be4
[root@dhcp-41-87 byo]# 
[root@dhcp-41-87 byo]# oc new-project http
Already on project "http" on server "https://dhcp-41-87.bos.redhat.com:8443".

You can add applications to this project with the 'new-app' command. For example, try:

    oc new-app centos/ruby-22-centos7~https://github.com/openshift/ruby-ex.git

to build a new example application in Ruby.
[root@dhcp-41-87 byo]# oadm policy add-scc-to-user privileged -z http-user
[root@dhcp-41-87 byo]# oadm router router-http --replicas=1  --service-account=http-user -n http --host-network=false
info: password for stats user admin has been set to IS4WwULmfW
--> Creating router router-http ...
    serviceaccount "http-user" created
    warning: clusterrolebinding "router-router-http-role" already exists
    deploymentconfig "router-http" created
    service "router-http" created
--> Success
[root@dhcp-41-87 byo]# oc get pods
NAME                   READY     STATUS              RESTARTS   AGE
router-http-1-deploy   0/1       ContainerCreating   0          4s
[root@dhcp-41-87 byo]# oc get pods
NAME                   READY     STATUS         RESTARTS   AGE
router-http-1-4b3g8    0/1       ErrImagePull   0          12s
router-http-1-deploy   1/1       Running        0          19s
[root@dhcp-41-87 byo]# oc logs router-http-1-4b3g8
Error from server (BadRequest): container "router" in pod "router-http-1-4b3g8" is waiting to start: trying and failing to pull image
[root@dhcp-41-87 byo]# oc get events
LASTSEEN   FIRSTSEEN   COUNT     NAME                  KIND      SUBOBJECT                 TYPE      REASON       SOURCE                                 MESSAGE
1m         1m          1         router-http-1-4b3g8   Pod                                 Normal    Scheduled    {default-scheduler }                   Successfully assigned router-http-1-4b3g8 to dhcp-41-106.bos.redhat.com
3s         1m          4         router-http-1-4b3g8   Pod       spec.containers{router}   Normal    Pulling      {kubelet dhcp-41-106.bos.redhat.com}   pulling image "openshift/origin-haproxy-router:v3.5.0.18"
43s        1m          3         router-http-1-4b3g8   Pod       spec.containers{router}   Warning   Failed       {kubelet dhcp-41-106.bos.redhat.com}   Failed to pull image "openshift/origin-haproxy-router:v3.5.0.18": manifest unknown: manifest unknown
43s        1m          3         router-http-1-4b3g8   Pod                                 Warning   FailedSync   {kubelet dhcp-41-106.bos.redhat.com}   Error syncing pod, skipping: failed to "StartContainer" for "router" with ErrImagePull: "manifest unknown: manifest unknown"

1m        1m        1         router-http-1-4b3g8   Pod                 Warning   FailedSync   {kubelet dhcp-41-106.bos.redhat.com}   Error syncing pod, skipping: failed to "SetupNetwork" for "router-http-1-4b3g8_http" with SetupNetworkError: "Failed to setup network for pod \"router-http-1-4b3g8_http(29c09530-f2c3-11e6-996e-525400bef5b7)\" using network plugins \"cni\": CNI request failed with status 400: 'Cannot open hostport 80 for pod router-http-1-4b3g8_http: listen tcp :80: bind: address already in use\n'; Skipping pod"

17s       1m        6         router-http-1-4b3g8   Pod       spec.containers{router}   Normal    BackOff      {kubelet dhcp-41-106.bos.redhat.com}   Back-off pulling image "openshift/origin-haproxy-router:v3.5.0.18"
17s       1m        6         router-http-1-4b3g8   Pod                                 Warning   FailedSync   {kubelet dhcp-41-106.bos.redhat.com}   Error syncing pod, skipping: failed to "StartContainer" for "router" with ImagePullBackOff: "Back-off pulling image \"openshift/origin-haproxy-router:v3.5.0.18\""

1m        1m        1         router-http-1-deploy   Pod                                                   Normal    Scheduled           {default-scheduler }                  Successfully assigned router-http-1-deploy to dhcp-41-55.bos.redhat.com
1m        1m        1         router-http-1-deploy   Pod                     spec.containers{deployment}   Normal    Pulled              {kubelet dhcp-41-55.bos.redhat.com}   Container image "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/openshift3/ose-deployer:v3.5.0.18" already present on machine
1m        1m        1         router-http-1-deploy   Pod                     spec.containers{deployment}   Normal    Created             {kubelet dhcp-41-55.bos.redhat.com}   Created container with docker id cb0bffdbec33; Security:[seccomp=unconfined]
1m        1m        1         router-http-1-deploy   Pod                     spec.containers{deployment}   Normal    Started             {kubelet dhcp-41-55.bos.redhat.com}   Started container with docker id cb0bffdbec33
1m        1m        1         router-http-1          ReplicationController                                 Normal    SuccessfulCreate    {replication-controller }             Created pod: router-http-1-4b3g8
1m        1m        1         router-http            DeploymentConfig                                      Normal    DeploymentCreated   {deploymentconfig-controller }        Created new replication controller "router-http-1" for version 1
[root@dhcp-41-87 byo]#

Comment 3 Weibin Liang 2017-02-14 16:08:59 UTC
Router pod can be deployed after run "oadm router router-container --replicas=1 --service-account=https-user -n https --images='openshift3/ose-${component}:${version}' --host-network=false"

[root@dhcp-41-87 origin]# oc version
oc v3.5.0.18+9a5d1aa
kubernetes v1.5.2+43a9be4
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://dhcp-41-87.bos.redhat.com:8443
openshift v3.5.0.18+9a5d1aa
kubernetes v1.5.2+43a9be4

[root@dhcp-41-87 origin]# oadm router router-container --replicas=1 --service-account=https-user -n https --images='openshift3/ose-${component}:${version}' --host-network=false

[root@dhcp-41-87 origin]# oc get pods -o wide -w
NAME                       READY     STATUS    RESTARTS   AGE       IP            NODE
router-container-1-nffrd   1/1       Running   0          8m        10.130.0.17   dhcp-41-106.bos.redhat.com
router1-1-rx2h6            1/1       Running   0          9m        10.18.41.55   dhcp-41-55.bos.redhat.com


Also tried --host-network=false in AWS with v3.5.0.20, it works fine too.

Comment 4 Yan Du 2017-02-15 06:48:17 UTC
I still could reproduce the issue with the latest OCP env
openshift v3.5.0.20+87266c6
kubernetes v1.5.2+43a9be4
etcd 3.1.0


After change hostNetwork to false, monitor the pod
# oc get pod -w
router-2-7jps5   0/1       Pending   0         1s
router-2-7jps5   0/1       ContainerCreating   0         1s
router-2-deploy   0/1       Completed   0         7s
router-2-deploy   0/1       Terminating   0         7s
router-2-deploy   0/1       Terminating   0         7s
router-2-7jps5   0/1       Running   0         4s
router-2-7jps5   0/1       Running   1         44s
router-2-7jps5   0/1       Running   2         1m
router-2-7jps5   0/1       Running   3         1m
router-2-7jps5   0/1       Running   4         1m
router-2-7jps5   0/1       CrashLoopBackOff   4         2m

Attach part of the log:
r}	Normal	Started		Started container with docker id 7d8478ed2dd6
  3m	3m	1	{kubelet host-8-174-95.host.centralci.eng.rdu2.redhat.com}	spec.containers{router}	Normal	Killing		Killing container with docker id 93465e7c66dd: pod "router-2-7jps5_default(dfff0717-f349-11e6-8d27-fa163ebf4833)" container "router" is unhealthy, it will be killed and re-created.
  3m	3m	1	{kubelet host-8-174-95.host.centralci.eng.rdu2.redhat.com}	spec.containers{router}	Normal	Created		Created container with docker id 7d8478ed2dd6; Security:[seccomp=unconfined]
  3m	3m	1	{kubelet host-8-174-95.host.centralci.eng.rdu2.redhat.com}	spec.containers{router}	Normal	Killing		Killing container with docker id 7d8478ed2dd6: pod "router-2-7jps5_default(dfff0717-f349-11e6-8d27-fa163ebf4833)" container "router" is unhealthy, it will be killed and re-created.
  3m	3m	1	{kubelet host-8-174-95.host.centralci.eng.rdu2.redhat.com}	spec.containers{router}	Normal	Started		Started container with docker id 93c43f56795a
  3m	3m	1	{kubelet host-8-174-95.host.centralci.eng.rdu2.redhat.com}	spec.containers{router}	Normal	Created		Created container with docker id 93c43f56795a; Security:[seccomp=unconfined]
  2m	2m	1	{kubelet host-8-174-95.host.centralci.eng.rdu2.redhat.com}	spec.containers{router}	Normal	Killing		Killing container with docker id 93c43f56795a: pod "router-2-7jps5_default(dfff0717-f349-11e6-8d27-fa163ebf4833)" container "router" is unhealthy, it will be killed and re-created.
  2m	2m	1	{kubelet host-8-174-95.host.centralci.eng.rdu2.redhat.com}	spec.containers{router}	Normal	Created		Created container with docker id 01a67a2462cc; Security:[seccomp=unconfined]
  2m	2m	1	{kubelet host-8-174-95.host.centralci.eng.rdu2.redhat.com}	spec.containers{router}	Normal	Started		Started container with docker id 01a67a2462cc
  2m	2m	1	{kubelet host-8-174-95.host.centralci.eng.rdu2.redhat.com}	spec.containers{router}	Normal	Killing		Killing container with docker id 01a67a2462cc: pod "router-2-7jps5_default(dfff0717-f349-11e6-8d27-fa163ebf4833)" container "router" is unhealthy, it will be killed and re-created.
  2m	2m	4	{kubelet host-8-174-95.host.centralci.eng.rdu2.redhat.com}				Warning	FailedSync	Error syncing pod, skipping: failed to "StartContainer" for "router" with CrashLoopBackOff: "Back-off 40s restarting failed container=router pod=router-2-7jps5_default(dfff0717-f349-11e6-8d27-fa163ebf4833)"

Comment 6 Weibin Liang 2017-02-15 15:47:17 UTC
Yandu,

I login your master, after delete your existing router and recreate it again by "oadm router router --images='openshift3/ose-${component}:${version}' --host-network=false", router creation is stable for 15m.

When create router without --images option, you will see ErrImagePull error which DE is working on this issue now.


[root@host-8-175-216 ~]# oc get all
NAME                  DOCKER REPO                                   TAGS      UPDATED
is/registry-console   172.30.222.93:5000/default/registry-console   3.5       

NAME                  REVISION   DESIRED   CURRENT   TRIGGERED BY
dc/docker-registry    2          1         1         config
dc/registry-console   1          1         1         config
dc/router             5          1         1         config

NAME                    DESIRED   CURRENT   READY     AGE
rc/docker-registry-1    0         0         0         1d
rc/docker-registry-2    1         1         1         1d
rc/registry-console-1   1         1         1         1d
rc/router-3             0         0         0         29m
rc/router-4             0         0         0         25m
rc/router-5             1         1         0         22m

NAME                      HOST/PORT                                          PATH      SERVICES           PORT               TERMINATION   WILDCARD
routes/docker-registry    docker-registry-default.0214-bfs.qe.rhcloud.com              docker-registry    5000-tcp           passthrough   None
routes/registry-console   registry-console-default.0214-bfs.qe.rhcloud.com             registry-console   registry-console   passthrough   None

NAME                   CLUSTER-IP       EXTERNAL-IP   PORT(S)                   AGE
svc/docker-registry    172.30.222.93    <none>        5000/TCP                  1d
svc/kubernetes         172.30.0.1       <none>        443/TCP,53/UDP,53/TCP     1d
svc/registry-console   172.30.190.18    <none>        9000/TCP                  1d
svc/router             172.30.211.236   <none>        80/TCP,443/TCP,1936/TCP   1d

NAME                          READY     STATUS             RESTARTS   AGE
po/docker-registry-2-z2ltg    1/1       Running            2          1d
po/registry-console-1-7gqhf   1/1       Running            2          1d
po/router-5-deploy            1/1       Running            0          8m
po/router-5-h1tkw             0/1       CrashLoopBackOff   7          8m
[root@host-8-175-216 ~]# oc delete dc/router
deploymentconfig "router" deleted
[root@host-8-175-216 ~]# oc delete svc/router
service "router" deleted
[root@host-8-175-216 ~]# oc get pods
NAME                       READY     STATUS      RESTARTS   AGE
docker-registry-2-z2ltg    1/1       Running     2          1d
registry-console-1-7gqhf   1/1       Running     2          1d
router-5-deploy            0/1       Completed   0          8m
[root@host-8-175-216 ~]# oc delete po/router-5-deploy
pod "router-5-deploy" deleted
[root@host-8-175-216 ~]# oc delete svc/router
Error from server (NotFound): services "router" not found
[root@host-8-175-216 ~]# oc get pods
NAME                       READY     STATUS    RESTARTS   AGE
docker-registry-2-z2ltg    1/1       Running   2          1d
registry-console-1-7gqhf   1/1       Running   2          1d
[root@host-8-175-216 ~]# oc get all
NAME                  DOCKER REPO                                   TAGS      UPDATED
is/registry-console   172.30.222.93:5000/default/registry-console   3.5       

NAME                  REVISION   DESIRED   CURRENT   TRIGGERED BY
dc/docker-registry    2          1         1         config
dc/registry-console   1          1         1         config

NAME                    DESIRED   CURRENT   READY     AGE
rc/docker-registry-1    0         0         0         1d
rc/docker-registry-2    1         1         1         1d
rc/registry-console-1   1         1         1         1d

NAME                      HOST/PORT                                          PATH      SERVICES           PORT               TERMINATION   WILDCARD
routes/docker-registry    docker-registry-default.0214-bfs.qe.rhcloud.com              docker-registry    5000-tcp           passthrough   None
routes/registry-console   registry-console-default.0214-bfs.qe.rhcloud.com             registry-console   registry-console   passthrough   None

NAME                   CLUSTER-IP      EXTERNAL-IP   PORT(S)                 AGE
svc/docker-registry    172.30.222.93   <none>        5000/TCP                1d
svc/kubernetes         172.30.0.1      <none>        443/TCP,53/UDP,53/TCP   1d
svc/registry-console   172.30.190.18   <none>        9000/TCP                1d

NAME                          READY     STATUS    RESTARTS   AGE
po/docker-registry-2-z2ltg    1/1       Running   2          1d
po/registry-console-1-7gqhf   1/1       Running   2          1d
[root@host-8-175-216 ~]# oadm router router --images='openshift3/ose-${component}:${version}' --host-network=false
info: password for stats user admin has been set to MjkCCV8Wyy
--> Creating router router ...
    warning: serviceaccounts "router" already exists
    warning: clusterrolebinding "router-router-role" already exists
    deploymentconfig "router" created
    service "router" created
--> Success

[root@host-8-175-216 ~]# oc get all
NAME                  DOCKER REPO                                   TAGS      UPDATED
is/registry-console   172.30.222.93:5000/default/registry-console   3.5       

NAME                  REVISION   DESIRED   CURRENT   TRIGGERED BY
dc/docker-registry    2          1         1         config
dc/registry-console   1          1         1         config
dc/router             1          1         1         config

NAME                    DESIRED   CURRENT   READY     AGE
rc/docker-registry-1    0         0         0         1d
rc/docker-registry-2    1         1         1         1d
rc/registry-console-1   1         1         1         1d
rc/router-1             1         1         0         9s

NAME                      HOST/PORT                                          PATH      SERVICES           PORT               TERMINATION   WILDCARD
routes/docker-registry    docker-registry-default.0214-bfs.qe.rhcloud.com              docker-registry    5000-tcp           passthrough   None
routes/registry-console   registry-console-default.0214-bfs.qe.rhcloud.com             registry-console   registry-console   passthrough   None

NAME                   CLUSTER-IP      EXTERNAL-IP   PORT(S)                   AGE
svc/docker-registry    172.30.222.93   <none>        5000/TCP                  1d
svc/kubernetes         172.30.0.1      <none>        443/TCP,53/UDP,53/TCP     1d
svc/registry-console   172.30.190.18   <none>        9000/TCP                  1d
svc/router             172.30.13.116   <none>        80/TCP,443/TCP,1936/TCP   9s

NAME                          READY     STATUS    RESTARTS   AGE
po/docker-registry-2-z2ltg    1/1       Running   2          1d
po/registry-console-1-7gqhf   1/1       Running   2          1d
po/router-1-21xtk             0/1       Running   0          7s
po/router-1-deploy            1/1       Running   0          9s
[root@host-8-175-216 ~]# oc get pods -w
NAME                       READY     STATUS    RESTARTS   AGE
docker-registry-2-z2ltg    1/1       Running   2          1d
registry-console-1-7gqhf   1/1       Running   2          1d
router-1-21xtk             0/1       Running   0          17s
router-1-deploy            1/1       Running   0          19s
NAME             READY     STATUS    RESTARTS   AGE
router-1-21xtk   1/1       Running   0          21s
router-1-deploy   0/1       Completed   0         23s
router-1-deploy   0/1       Terminating   0         23s
router-1-deploy   0/1       Terminating   0         23s
^C[root@host-8-175-216 ~]# oc get all
NAME                  DOCKER REPO                                   TAGS      UPDATED
is/registry-console   172.30.222.93:5000/default/registry-console   3.5       

NAME                  REVISION   DESIRED   CURRENT   TRIGGERED BY
dc/docker-registry    2          1         1         config
dc/registry-console   1          1         1         config
dc/router             1          1         1         config

NAME                    DESIRED   CURRENT   READY     AGE
rc/docker-registry-1    0         0         0         1d
rc/docker-registry-2    1         1         1         1d
rc/registry-console-1   1         1         1         1d
rc/router-1             1         1         1         4m

NAME                      HOST/PORT                                          PATH      SERVICES           PORT               TERMINATION   WILDCARD
routes/docker-registry    docker-registry-default.0214-bfs.qe.rhcloud.com              docker-registry    5000-tcp           passthrough   None
routes/registry-console   registry-console-default.0214-bfs.qe.rhcloud.com             registry-console   registry-console   passthrough   None

NAME                   CLUSTER-IP      EXTERNAL-IP   PORT(S)                   AGE
svc/docker-registry    172.30.222.93   <none>        5000/TCP                  1d
svc/kubernetes         172.30.0.1      <none>        443/TCP,53/UDP,53/TCP     1d
svc/registry-console   172.30.190.18   <none>        9000/TCP                  1d
svc/router             172.30.13.116   <none>        80/TCP,443/TCP,1936/TCP   4m

NAME                          READY     STATUS    RESTARTS   AGE
po/docker-registry-2-z2ltg    1/1       Running   2          1d
po/registry-console-1-7gqhf   1/1       Running   2          1d
po/router-1-21xtk             1/1       Running   0          4m
[root@host-8-175-216 ~]# oc get pods -o wide
NAME                       READY     STATUS    RESTARTS   AGE       IP            NODE
docker-registry-2-z2ltg    1/1       Running   2          1d        10.129.0.19   host-8-174-95.host.centralci.eng.rdu2.redhat.com
registry-console-1-7gqhf   1/1       Running   2          1d        10.129.0.20   host-8-174-95.host.centralci.eng.rdu2.redhat.com
router-1-21xtk             1/1       Running   0          10m       10.129.0.39   host-8-174-95.host.centralci.eng.rdu2.redhat.com
[root@host-8-175-216 ~]# 

[root@host-8-175-216 ~]# oc get pods -o wide
NAME                       READY     STATUS    RESTARTS   AGE       IP            NODE
docker-registry-2-z2ltg    1/1       Running   2          1d        10.129.0.19   host-8-174-95.host.centralci.eng.rdu2.redhat.com
registry-console-1-7gqhf   1/1       Running   2          1d        10.129.0.20   host-8-174-95.host.centralci.eng.rdu2.redhat.com
router-1-21xtk             1/1       Running   0          13m       10.129.0.39   host-8-174-95.host.centralci.eng.rdu2.redhat.com
[root@host-8-175-216 ~]# 
[root@host-8-175-216 ~]# oc get pods -o wide
NAME                       READY     STATUS    RESTARTS   AGE       IP            NODE
docker-registry-2-z2ltg    1/1       Running   2          1d        10.129.0.19   host-8-174-95.host.centralci.eng.rdu2.redhat.com
registry-console-1-7gqhf   1/1       Running   2          1d        10.129.0.20   host-8-174-95.host.centralci.eng.rdu2.redhat.com
router-1-21xtk             1/1       Running   0          15m       10.129.0.39   host-8-174-95.host.centralci.eng.rdu2.redhat.com
[root@host-8-175-216 ~]#

Comment 7 Ben Bennett 2017-02-15 15:55:40 UTC
How did you deploy?  I went to your system and ran:
  oadm router --host-network=false --images='openshift3/ose-${component}:${version}'

And that image is happily running.

Somehow before it was contacting localhost directly in the liveness checks, but localhost was resolving to the IPv6 address and it was failing (look in /etc/hosts).

Comment 8 Yan Du 2017-02-16 03:21:51 UTC
I found the steps to reproduce the bug:

1. Create router 
# oadm router router --images='openshift3/ose-${component}:${version}'
info: password for stats user admin has been set to YzFMuOsfDv
--> Creating router router ...
    warning: serviceaccounts "router" already exists
    warning: clusterrolebinding "router-router-role" already exists
    deploymentconfig "router" created
    service "router" created
--> Success

# oc get pod
NAME                       READY     STATUS    RESTARTS   AGE
router-1-pb4d6             1/1       Running   0          42s

2. Edit the dc of router to hostNetwork: false (it was hostNetwork: true), then the router will be re-deploy, and the pod will keep restarting and go to CrashLoopBackOff

# oc get pod -w
NAME                       READY     STATUS    RESTARTS   AGE
router-2-5s803   0/1       Running   1         43s
router-2-5s803   0/1       Running   2         1m
router-2-5s803   0/1       Running   3         1m
router-2-5s803   0/1       Running   4         1m
router-2-5s803   0/1       CrashLoopBackOff   4         2m
router-2-5s803   0/1       Running   5         2m
router-2-5s803   0/1       CrashLoopBackOff   5         3m

But if I create router with --host-network=false, the router could be running successfully.
# oadm router router --images='openshift3/ose-${component}:${version}' --host-network=false
info: password for stats user admin has been set to 16ir20Yf1h
--> Creating router router ...
    warning: serviceaccounts "router" already exists
    warning: clusterrolebinding "router-router-role" already exists
    deploymentconfig "router" created
    service "router" created
--> Success

# oc get pod
NAME                       READY     STATUS    RESTARTS   AGE
router-1-gghdn             1/1       Running   0          1m

So I'm not quite sure that is there any difference that [1]create router with --host-network=false and [2]update dc of router to hostNetwork: false then re-deploy it, but the both ways was working well before.

Comment 9 Weibin Liang 2017-02-16 18:39:54 UTC
Ben, following the same steps mentioned in comments 8, after edit dc/router to change hostNetwork to be false, I saw the same router pod in CrashLoopBackOff state issue in my local env.

Comment 10 Ivan Chavero 2017-02-21 07:38:29 UTC
Created attachment 1255994 [details]
kube describe router

Comment 11 Ivan Chavero 2017-02-21 07:39:02 UTC
According to kubectl describe the router container has problems executing. Checking other logs

Comment 12 Ivan Chavero 2017-02-23 22:07:41 UTC
Could not reproduce this problem from latest source: v3.5.0.32-1+4f84c83-1100

Weibin can you confirm this?

Comment 13 Yan Du 2017-02-24 08:48:56 UTC
Retest it on OCP 3.5.0.33
ose-haproxy-router    v3.5.0.33           25f705e32e9b

Issue still could be reproduced.
[root@host-8-174-87 ~]# oc get pod -w
NAME             READY     STATUS    RESTARTS   AGE
router-2-fc97k   0/1       Running   1          44s
router-2-fc97k   0/1       Running   2         1m
router-2-fc97k   0/1       Running   3         1m
router-2-fc97k   0/1       Running   4         1m
router-2-fc97k   0/1       CrashLoopBackOff   4

Comment 14 Ivan Chavero 2017-02-28 18:21:04 UTC
I was testing under an all in one setup, i'm checking with a multi node setup to see if this is part of the problem.

Comment 15 Ben Bennett 2017-03-01 17:01:06 UTC
Yan Du:

Somehow before it was contacting localhost directly in the liveness checks, but localhost was resolving to the IPv6 address and it was failing (look in /etc/hosts).

If you remove the IPv6 address for localhost then there is no problem.