Bug 1384541 - CreatingLoadBalancerFailed - routes return http 503
Summary: CreatingLoadBalancerFailed - routes return http 503
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Routing
Version: 3.x
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Ben Bennett
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-13 13:50 UTC by Jiří Fiala
Modified: 2016-10-26 19:41 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-10-26 19:41:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
events and routes (138.13 KB, image/png)
2016-10-13 13:50 UTC, Jiří Fiala
no flags Details

Description Jiří Fiala 2016-10-13 13:50:39 UTC
Created attachment 1210124 [details]
events and routes

Description of problem:
ser reported http 503 when accessing their app on dev-preview public cluster, which appears to be working otherwise (no error while deploying, no errors in logs):
http://windward-php-windward.44fs.preview.openshiftapps.com/
User's events and routes' description is attached.

I was able to induce this by deploying the cakephp example app from source (https://github.com/openshift/cakephp-ex.git):
http://cakephp-pub.44fs.preview.openshiftapps.com

The app itself seems to be working just fine - when rshed into the pod, the content is delivered. When accessed by the route, it's returning 503.
Aside of repeating warnings that the service is having trouble creating load balancer:
-----
18s       4h        116       cakephp     Service                           Warning   CreatingLoadBalancerFailed   {service-controller }            (events with common reason combined)
-----
...only the following warning was presented for the deploy pod:
-----
1h         1h          1         cakephp-2-deploy   Pod                                                   Warning   FailedSync         {kubelet ip-172-31-8-217.ec2.internal}    Error syncing pod, skipping: failed to "TeardownNetwork" for "cakephp-2-deploy_pub" with TeardownNetworkError: "Failed to teardown network for pod \"61c98211-9138-11e6-ba92-0e63b9c1c48f\" using network plugins \"redhat/openshift-ovs-multitenant\": Error running network teardown script: Could not find IP address for container a4b042079b8dfb79c860881e7ca78c1a077935ce80426d02810264942d6a727d"
-----

Here are all recent events:
-----
$ oc get events | grep cakephp
1h         1h          1         cakephp-1-93ox8    Pod                     spec.containers{cakephp}      Normal    Killing            {kubelet ip-172-31-3-39.ec2.internal}     Killing container with docker id 042e3178a31f: Need to kill pod.
1h         1h          1         cakephp-1          ReplicationController                                 Normal    SuccessfulDelete   {replication-controller }                 Deleted pod: cakephp-1-93ox8
1h         1h          1         cakephp-2-845mq    Pod                                                   Normal    Scheduled          {default-scheduler }                      Successfully assigned cakephp-2-845mq to ip-172-31-10-174.ec2.internal
1h         1h          1         cakephp-2-845mq    Pod                     spec.containers{cakephp}      Normal    Pulling            {kubelet ip-172-31-10-174.ec2.internal}   pulling image "172.30.47.227:5000/pub/cakephp@sha256:c9da248b4bbe412e1a5be51fc708a80b8046d953172a567bb422240e71a4477a"
1h         1h          1         cakephp-2-845mq    Pod                     spec.containers{cakephp}      Normal    Pulled             {kubelet ip-172-31-10-174.ec2.internal}   Successfully pulled image "172.30.47.227:5000/pub/cakephp@sha256:c9da248b4bbe412e1a5be51fc708a80b8046d953172a567bb422240e71a4477a"
1h         1h          1         cakephp-2-845mq    Pod                     spec.containers{cakephp}      Normal    Created            {kubelet ip-172-31-10-174.ec2.internal}   Created container with docker id 2011f6b0b2ac
1h         1h          1         cakephp-2-845mq    Pod                     spec.containers{cakephp}      Normal    Started            {kubelet ip-172-31-10-174.ec2.internal}   Started container with docker id 2011f6b0b2ac
40m        40m         1         cakephp-2-845mq    Pod                     spec.containers{cakephp}      Normal    Killing            {kubelet ip-172-31-10-174.ec2.internal}   Killing container with docker id 2011f6b0b2ac: Need to kill pod.
40m        40m         1         cakephp-2-crktz    Pod                                                   Normal    Scheduled          {default-scheduler }                      Successfully assigned cakephp-2-crktz to ip-172-31-8-230.ec2.internal
40m        40m         1         cakephp-2-crktz    Pod                     spec.containers{cakephp}      Normal    Pulling            {kubelet ip-172-31-8-230.ec2.internal}    pulling image "172.30.47.227:5000/pub/cakephp@sha256:c9da248b4bbe412e1a5be51fc708a80b8046d953172a567bb422240e71a4477a"
40m        40m         1         cakephp-2-crktz    Pod                     spec.containers{cakephp}      Normal    Pulled             {kubelet ip-172-31-8-230.ec2.internal}    Successfully pulled image "172.30.47.227:5000/pub/cakephp@sha256:c9da248b4bbe412e1a5be51fc708a80b8046d953172a567bb422240e71a4477a"
40m        40m         1         cakephp-2-crktz    Pod                     spec.containers{cakephp}      Normal    Created            {kubelet ip-172-31-8-230.ec2.internal}    Created container with docker id 69bd880b993e
40m        40m         1         cakephp-2-crktz    Pod                     spec.containers{cakephp}      Normal    Started            {kubelet ip-172-31-8-230.ec2.internal}    Started container with docker id 69bd880b993e
1h         1h          1         cakephp-2-deploy   Pod                                                   Normal    Scheduled          {default-scheduler }                      Successfully assigned cakephp-2-deploy to ip-172-31-8-217.ec2.internal
1h         1h          1         cakephp-2-deploy   Pod                     spec.containers{deployment}   Normal    Pulling            {kubelet ip-172-31-8-217.ec2.internal}    pulling image "registry.ops.openshift.com/openshift3/ose-deployer:v3.3.0.33"
1h         1h          1         cakephp-2-deploy   Pod                     spec.containers{deployment}   Normal    Pulled             {kubelet ip-172-31-8-217.ec2.internal}    Successfully pulled image "registry.ops.openshift.com/openshift3/ose-deployer:v3.3.0.33"
1h         1h          1         cakephp-2-deploy   Pod                     spec.containers{deployment}   Normal    Created            {kubelet ip-172-31-8-217.ec2.internal}    Created container with docker id 93142e66fe31
1h         1h          1         cakephp-2-deploy   Pod                     spec.containers{deployment}   Normal    Started            {kubelet ip-172-31-8-217.ec2.internal}    Started container with docker id 9
3142e66fe31
1h         1h          1         cakephp-2-deploy   Pod                     spec.containers{deployment}   Normal    Killing            {kubelet ip-172-31-8-217.ec2.internal}    Killing container with docker id 9
3142e66fe31: Need to kill pod.
1h         1h          1         cakephp-2-deploy   Pod                                                   Warning   FailedSync         {kubelet ip-172-31-8-217.ec2.internal}    Error syncing pod, skipping: faile
d to "TeardownNetwork" for "cakephp-2-deploy_pub" with TeardownNetworkError: "Failed to teardown network for pod \"61c98211-9138-11e6-ba92-0e63b9c1c48f\" using network plugins \"redhat/openshift-ovs-multitenant\
": Error running network teardown script: Could not find IP address for container a4b042079b8dfb79c860881e7ca78c1a077935ce80426d02810264942d6a727d"
1h        1h        1         cakephp-2   ReplicationController             Normal    SuccessfulCreate             {replication-controller }        Created pod: cakephp-2-845mq
40m       40m       1         cakephp-2   ReplicationController             Normal    SuccessfulCreate             {replication-controller }        Created pod: cakephp-2-crktz
18s       4h        116       cakephp     Service                           Warning   CreatingLoadBalancerFailed   {service-controller }            (events with common reason combined)
1h        1h        1         cakephp     DeploymentConfig                  Normal    DeploymentCreated            {deploymentconfig-controller }   Created new deployment "cakephp-2" for version 2
-----

Version-Release number of selected component (if applicable):
OpenShift Master:
    v3.3.0.33
Kubernetes Master:
    v1.3.0+52492b4 

How reproducible:
always

Steps to Reproduce:
1. Create pod, service and route
2. access the service via http route

Actual results:
http 503

Expected results:
content delivery from the service

Additional info:
seems similar to bug 1372619

Comment 1 Ben Bennett 2016-10-13 14:18:37 UTC
Can you get more logs for the service-controller?  That error seems like the culprit.

I would also be interested in seeing the output from:
  oc get svc cakephp -o yaml

And:
  oc get ep cakephp

Thanks

Comment 2 Jiří Fiala 2016-10-13 14:45:26 UTC
Could you please provide steps on how to get more logs for the service-controller? The CreatingLoadBalancerFailed warning occurred many times, so the messages are combined together:
$ oc get events           
LASTSEEN   FIRSTSEEN   COUNT     NAME       KIND      SUBOBJECT   TYPE      REASON                       SOURCE                  MESSAGE
26s        5h          182       cakephp    Service               Warning   CreatingLoadBalancerFailed   {service-controller }   (events with common reason combined)
5s         5h          135       nodejsex   Service               Warning   CreatingLoadBalancerFailed   {service-controller }   (events with common reason combined)

$ oc get svc cakephp -o yaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    openshift.io/generated-by: OpenShiftWebConsole
  creationTimestamp: 2016-10-13T07:41:35Z
  labels:
    app: cakephp
  name: cakephp
  namespace: pub
  resourceVersion: "233880834"
  selfLink: /api/v1/namespaces/pub/services/cakephp
  uid: 77b64a42-9118-11e6-ae04-0ebeb1070c7f
spec:
  clusterIP: 172.30.23.123
  portalIP: 172.30.23.123
  ports:
  - name: 8080-tcp
    port: 8080
    protocol: TCP
    targetPort: 8080
  selector:
    deploymentconfig: cakephp
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

$ oc get ep cakephp         
NAME      ENDPOINTS        AGE
cakephp   10.1.60.2:8080   6h

Comment 3 Tomas Schlosser 2016-10-13 15:09:27 UTC
Hello Ben,
I have reported the issue originally. Here are my findings so far:

Going directly to the pod works, going through service works as well, going through the route is broken.

You can check my namespace on Dev Preview (tschloss) to see the problem, it's the EAP application.

$ oc get route
NAME      HOST/PORT                                         PATH      SERVICE        TERMINATION   LABELS
eap-app   eap-app-tschloss.44fs.preview.openshiftapps.com             eap-app:http                 app=eap-app,application=eap-app,template=eap70-mysql-persistent-s2i,xpaas=1.3.1

$ oc get svc
NAME             CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
database-mysql   172.30.166.247   <none>        3306/TCP   2h
eap-app          172.30.84.68     <none>        8080/TCP   2h

$ oc get ep
NAME             ENDPOINTS        AGE
database-mysql   10.1.60.6:3306   2h
eap-app          10.1.87.2:8080   2h

$ oc get svc eap-app -o yaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    description: The web server's http and https ports.
    openshift.io/generated-by: OpenShiftNewApp
  creationTimestamp: 2016-10-13T12:18:19Z
  labels:
    app: eap-app
    application: eap-app
    template: eap70-mysql-persistent-s2i
    xpaas: 1.3.1
  name: eap-app
  namespace: tschloss
  resourceVersion: "234438053"
  selfLink: /api/v1/namespaces/tschloss/services/eap-app
  uid: 20b5b31f-913f-11e6-ad3a-0e3d364e19a5
spec:
  clusterIP: 172.30.84.68
  portalIP: 172.30.84.68
  ports:
  - name: http
    port: 8080
    protocol: TCP
    targetPort: 8080
  selector:
    deploymentConfig: eap-app
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

$ oc rsh database-mysql-1-ec8ue
sh-4.2$ curl -s -o /dev/null -D - 10.1.87.2:8080
HTTP/1.1 200 OK
Connection: keep-alive
X-Powered-By: Undertow/1
Server: JBoss-EAP/7
Content-Type: text/html;charset=UTF-8
Content-Length: 2005
Date: Thu, 13 Oct 2016 15:08:10 GMT

sh-4.2$ curl -s -o /dev/null -D - 172.30.84.68:8080
HTTP/1.1 200 OK
Connection: keep-alive
X-Powered-By: Undertow/1
Server: JBoss-EAP/7
Content-Type: text/html;charset=UTF-8
Content-Length: 2005
Date: Thu, 13 Oct 2016 15:08:32 GMT

sh-4.2$ curl -s -o /dev/null -D - eap-app-tschloss.44fs.preview.openshiftapps.com                      
HTTP/1.0 503 Service Unavailable
Cache-Control: no-cache
Connection: close
Content-Type: text/html

Comment 4 Ben Bennett 2016-10-13 15:13:41 UTC
What does:
  oc get route eap-app -o yaml

Return?

Comment 5 Tomas Schlosser 2016-10-13 15:37:33 UTC
$ oc get route eap-app -o yaml
apiVersion: v1
kind: Route
metadata:
  annotations:
    description: Route for application's http service.
    openshift.io/generated-by: OpenShiftNewApp
    openshift.io/host.generated: "true"
  creationTimestamp: 2016-10-13T12:18:20Z
  labels:
    app: eap-app
    application: eap-app
    template: eap70-mysql-persistent-s2i
    xpaas: 1.3.1
  name: eap-app
  namespace: tschloss
  resourceVersion: "234438065"
  selfLink: /oapi/v1/namespaces/tschloss/routes/eap-app
  uid: 20ca8be4-913f-11e6-ad3a-0e3d364e19a5
spec:
  host: eap-app-tschloss.44fs.preview.openshiftapps.com
  port:
    targetPort: http
  to:
    kind: Service
    name: eap-app
status:
  ingress:
  - conditions:
    - lastTransitionTime: 2016-10-13T12:18:20Z
      status: "True"
      type: Admitted
    host: eap-app-tschloss.44fs.preview.openshiftapps.com
    routerName: router

Comment 7 Bing Li 2016-10-14 11:14:41 UTC
I can reproduce this issue in Online prod, and prod environment is extremely slow now.
Error syncing pod, skipping: failed to "TeardownNetwork" for "cakephp-mysql-example-1-deploy_bingli4" with TeardownNetworkError: "Failed to teardown network for pod \"00e8e93e-91fd-11e6-ba92-0e63b9c1c48f\" using network plugins \"redhat/openshift-ovs-multitenant\": Error running network teardown script: Could not find IP address for container f5583130ec15e0143b9a9fe01204b955ea44ab928e8df9538a1c0874c931bb06"

Comment 8 ljladmin 2016-10-14 13:29:22 UTC
➜  ~ oc get svc cake -o yaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    openshift.io/generated-by: OpenShiftWebConsole
  creationTimestamp: 2016-10-14T11:06:26Z
  labels:
    app: cake
  name: cake
  namespace: nifty
  resourceVersion: "237197024"
  selfLink: /api/v1/namespaces/nifty/services/cake
  uid: 40352961-91fe-11e6-ba92-0e63b9c1c48f
spec:
  clusterIP: 172.30.196.123
  portalIP: 172.30.196.123
  ports:
  - name: 8080-tcp
    port: 8080
    protocol: TCP
    targetPort: 8080
  selector:
    deploymentconfig: cake
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}
➜  ~ oc get ep cake
NAME      ENDPOINTS        AGE
cake      10.1.96.6:8080   13m

Comment 9 Ben Bennett 2016-10-14 14:05:10 UTC
It looks like the problem is that you have a targetPort of http (i.e. 80) specified in the route, but the service is on 8080.

Comment 10 ljladmin 2016-10-14 14:42:10 UTC
1.But all my operation is the same before today, and it has been ok before.
2.80 and 8080 can't be change in my operation.

Comment 11 ljladmin 2016-10-14 14:42:47 UTC
This target port will route to Service Port 8080 → Container Port 8080 (TCP).

Comment 12 ljladmin 2016-10-14 15:07:57 UTC
11:07:24 PM	Warning	Creating load balancer failed 	Error creating load balancer (will retry): Error getting LB for service react/cake: AccessDenied: User: arn:aws:iam::507479335359:user/cloud_provider is not authorized to perform: elasticloadbalancing:DescribeLoadBalancers status code: 403, request id: e9b826bb-921f-11e6-aeea-75cc499f5b37

Comment 13 ljladmin 2016-10-14 15:11:05 UTC
-> Cgroups memory limit is set, using HTTPD_MAX_REQUEST_WORKERS=34
AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 10.1.5.6. Set the 'ServerName' directive globally to suppress this message
AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 10.1.5.6. Set the 'ServerName' directive globally to suppress this message
[Fri Oct 14 10:54:24.207285 2016] [auth_digest:notice] [pid 1] AH01757: generating secret for digest authentication ...
[Fri Oct 14 10:54:24.216426 2016] [http2:warn] [pid 1] AH02951: mod_ssl does not seem to be enabled
[Fri Oct 14 10:54:24.217053 2016] [lbmethod_heartbeat:notice] [pid 1] AH02282: No slotmem from mod_heartmonitor
[Fri Oct 14 10:54:24.396521 2016] [mpm_prefork:notice] [pid 1] AH00163: Apache/2.4.18 (Red Hat) configured -- resuming normal operations
[Fri Oct 14 10:54:24.396553 2016] [core:notice] [pid 1] AH00094: Command line: 'httpd -D FOREGROUND'

Comment 14 Ben Bennett 2016-10-14 17:07:21 UTC
ljladmin: Ok, that "Creating load balancer" message is for creating a service of type loadbalancer.  But those aren't supported in online, and the service above is not of that type.  So that's probably unrelated.

Tomas: Sorry, I was looking at the wrong service.  I see that yours names a port http.  So the route is fine.  I think we need to look at the logs for the router to see what's going on.

Comment 15 Ben Bennett 2016-10-14 17:16:12 UTC
ljladmin: Miciah points out your issues are the same as https://bugzilla.redhat.com/show_bug.cgi?id=1367229

But Tomas' are separate and I'll keep looking at those.

Comment 16 Ben Bennett 2016-10-14 18:01:02 UTC
Miciah got the logs, and we see:
  W1014 04:24:08.510172       1 router.go:690] a edge terminated route with host cakephp-mysql-persistent-cakephp.44fs.preview.openshiftapps.com does not have the required certificates.  The route will still be created but no certificates will be written

Jiří Fiala: Can you post the yaml for your route please?

Comment 17 Ben Bennett 2016-10-14 18:02:07 UTC
Oh, and Tomas, are your routes still present?  There's nothing in the logs about eap-app.

Comment 18 Tomas Schlosser 2016-10-14 20:39:08 UTC
Ben, I have recreated them a few times throughout the day to see if the issue is fixed. I wanted to use OSO in live demo. Some other time perhaps.

Right now, I have two routes deployed in my namespace (eap-app and secure-eap-app).

Comment 19 Matt Googinis 2016-10-14 21:01:57 UTC
Hi Guys

Thank you for looking into this matter... I was the fellow who first reported the error.

My OpenShift Preview site is essentially broken until this is resolved.

Is there something I should be doing, or just sort of sit tight and be patient. I'm totally cool with that, just not sure of next steps.

Thanks in advance for your help.

- Googs

Comment 20 Thomas Sandford 2016-10-15 14:32:20 UTC
I am having exactly the same problem with my preview.openshift.com account.

I've tried with a number of the demo applications and in each case any attempt to access the route fails with the same "503 Service Unavailable
No server is available to handle this request." error.

I am also seeing the LoadBalancer failure messages - eg
Error creating load balancer (will retry): Error getting LB for service tdgsandf-cmd-nodejs/nodejs-mongodb-example: AccessDenied: User: arn:aws:iam::507479335359:user/cloud_provider is not authorized to perform: elasticloadbalancing:DescribeLoadBalancers status code: 403, request id: d3c8de66-92c8-11e6-b538-f3dcdae1fb63

I've also seen the Teardownnetwork message but don't have an example in my current logs/events.

As Matt Googinis says above - this makes the Openshift Preview effectively useless at present as there is no means to interact with a running service from outside.

Reproducible: Yes - always
Steps to reproduce: 
  clear any existing project
  follow the steps in the "basic walkthrough" - https://docs.openshift.com/online/getting_started/basic_walkthrough.html
  until step "Viewing your Running Application"

Expected result: as per walkthrough documentation
Actual result: "503 Service Unavailable No server is available to handle this request."

Comment 21 Thomas Sandford 2016-10-16 08:55:57 UTC
This appears to have been fixed (for me at least) overnight.

Comment 22 Matt Googinis 2016-10-16 12:30:01 UTC
Me too! :-) My Service is now available! Thank you Red Hat!

Comment 23 ljladmin 2016-10-17 03:29:51 UTC
Me too!Thank you!

Comment 24 Ben Bennett 2016-10-17 13:04:25 UTC
We saw:
  E1013 23:52:33.630700       1 ratelimiter.go:52] error reloading router: wait: no child processes
In the logs.  Restarting the router fixed it, but we don't know the root cause yet.

Comment 25 Jiří Fiala 2016-10-19 07:35:38 UTC
(In reply to Ben Bennett from comment #16)
> Jiří Fiala: Can you post the yaml for your route please?

I'm sorry for the delay, here's the route in question:
---
$ oc get route/cakephp -o yaml
apiVersion: v1
kind: Route
metadata:
  annotations:
    openshift.io/host.generated: "true"
  creationTimestamp: 2016-10-13T12:17:02Z
  labels:
    app: cakephp
  name: cakephp
  namespace: pub
  resourceVersion: "234435454"
  selfLink: /oapi/v1/namespaces/pub/routes/cakephp
  uid: f2ca8b63-913e-11e6-ae04-0ebeb1070c7f
spec:
  host: cakephp-pub.44fs.preview.openshiftapps.com
  port:
    targetPort: 8080-tcp
  to:
    kind: Service
    name: cakephp
    weight: 100
status:
  ingress:
  - conditions:
    - lastTransitionTime: 2016-10-13T12:17:02Z
      status: "True"
      type: Admitted
    host: cakephp-pub.44fs.preview.openshiftapps.com
    routerName: router
---

Comment 28 Ben Bennett 2016-10-26 19:41:20 UTC
I'm closing this because the router issue was resolved by Miciah.


Note You need to log in before you can comment on or make changes to this bug.