| Summary: | CreatingLoadBalancerFailed - routes return http 503 | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Online | Reporter: | Jiří Fiala <jfiala> | ||||
| Component: | Routing | Assignee: | Ben Bennett <bbennett> | ||||
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | zhaozhanqi <zzhao> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 3.x | CC: | aos-bugs, bbennett, bingli, jfiala, ljladmin, matt.googins, thomas, tschloss | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2016-10-26 19:41:20 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
|
Description
Jiří Fiala
2016-10-13 13:50:39 UTC
Can you get more logs for the service-controller? That error seems like the culprit. I would also be interested in seeing the output from: oc get svc cakephp -o yaml And: oc get ep cakephp Thanks Could you please provide steps on how to get more logs for the service-controller? The CreatingLoadBalancerFailed warning occurred many times, so the messages are combined together:
$ oc get events
LASTSEEN FIRSTSEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
26s 5h 182 cakephp Service Warning CreatingLoadBalancerFailed {service-controller } (events with common reason combined)
5s 5h 135 nodejsex Service Warning CreatingLoadBalancerFailed {service-controller } (events with common reason combined)
$ oc get svc cakephp -o yaml
apiVersion: v1
kind: Service
metadata:
annotations:
openshift.io/generated-by: OpenShiftWebConsole
creationTimestamp: 2016-10-13T07:41:35Z
labels:
app: cakephp
name: cakephp
namespace: pub
resourceVersion: "233880834"
selfLink: /api/v1/namespaces/pub/services/cakephp
uid: 77b64a42-9118-11e6-ae04-0ebeb1070c7f
spec:
clusterIP: 172.30.23.123
portalIP: 172.30.23.123
ports:
- name: 8080-tcp
port: 8080
protocol: TCP
targetPort: 8080
selector:
deploymentconfig: cakephp
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
$ oc get ep cakephp
NAME ENDPOINTS AGE
cakephp 10.1.60.2:8080 6h
Hello Ben,
I have reported the issue originally. Here are my findings so far:
Going directly to the pod works, going through service works as well, going through the route is broken.
You can check my namespace on Dev Preview (tschloss) to see the problem, it's the EAP application.
$ oc get route
NAME HOST/PORT PATH SERVICE TERMINATION LABELS
eap-app eap-app-tschloss.44fs.preview.openshiftapps.com eap-app:http app=eap-app,application=eap-app,template=eap70-mysql-persistent-s2i,xpaas=1.3.1
$ oc get svc
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
database-mysql 172.30.166.247 <none> 3306/TCP 2h
eap-app 172.30.84.68 <none> 8080/TCP 2h
$ oc get ep
NAME ENDPOINTS AGE
database-mysql 10.1.60.6:3306 2h
eap-app 10.1.87.2:8080 2h
$ oc get svc eap-app -o yaml
apiVersion: v1
kind: Service
metadata:
annotations:
description: The web server's http and https ports.
openshift.io/generated-by: OpenShiftNewApp
creationTimestamp: 2016-10-13T12:18:19Z
labels:
app: eap-app
application: eap-app
template: eap70-mysql-persistent-s2i
xpaas: 1.3.1
name: eap-app
namespace: tschloss
resourceVersion: "234438053"
selfLink: /api/v1/namespaces/tschloss/services/eap-app
uid: 20b5b31f-913f-11e6-ad3a-0e3d364e19a5
spec:
clusterIP: 172.30.84.68
portalIP: 172.30.84.68
ports:
- name: http
port: 8080
protocol: TCP
targetPort: 8080
selector:
deploymentConfig: eap-app
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
$ oc rsh database-mysql-1-ec8ue
sh-4.2$ curl -s -o /dev/null -D - 10.1.87.2:8080
HTTP/1.1 200 OK
Connection: keep-alive
X-Powered-By: Undertow/1
Server: JBoss-EAP/7
Content-Type: text/html;charset=UTF-8
Content-Length: 2005
Date: Thu, 13 Oct 2016 15:08:10 GMT
sh-4.2$ curl -s -o /dev/null -D - 172.30.84.68:8080
HTTP/1.1 200 OK
Connection: keep-alive
X-Powered-By: Undertow/1
Server: JBoss-EAP/7
Content-Type: text/html;charset=UTF-8
Content-Length: 2005
Date: Thu, 13 Oct 2016 15:08:32 GMT
sh-4.2$ curl -s -o /dev/null -D - eap-app-tschloss.44fs.preview.openshiftapps.com
HTTP/1.0 503 Service Unavailable
Cache-Control: no-cache
Connection: close
Content-Type: text/html
What does: oc get route eap-app -o yaml Return? $ oc get route eap-app -o yaml
apiVersion: v1
kind: Route
metadata:
annotations:
description: Route for application's http service.
openshift.io/generated-by: OpenShiftNewApp
openshift.io/host.generated: "true"
creationTimestamp: 2016-10-13T12:18:20Z
labels:
app: eap-app
application: eap-app
template: eap70-mysql-persistent-s2i
xpaas: 1.3.1
name: eap-app
namespace: tschloss
resourceVersion: "234438065"
selfLink: /oapi/v1/namespaces/tschloss/routes/eap-app
uid: 20ca8be4-913f-11e6-ad3a-0e3d364e19a5
spec:
host: eap-app-tschloss.44fs.preview.openshiftapps.com
port:
targetPort: http
to:
kind: Service
name: eap-app
status:
ingress:
- conditions:
- lastTransitionTime: 2016-10-13T12:18:20Z
status: "True"
type: Admitted
host: eap-app-tschloss.44fs.preview.openshiftapps.com
routerName: router
I can reproduce this issue in Online prod, and prod environment is extremely slow now. Error syncing pod, skipping: failed to "TeardownNetwork" for "cakephp-mysql-example-1-deploy_bingli4" with TeardownNetworkError: "Failed to teardown network for pod \"00e8e93e-91fd-11e6-ba92-0e63b9c1c48f\" using network plugins \"redhat/openshift-ovs-multitenant\": Error running network teardown script: Could not find IP address for container f5583130ec15e0143b9a9fe01204b955ea44ab928e8df9538a1c0874c931bb06" ➜ ~ oc get svc cake -o yaml
apiVersion: v1
kind: Service
metadata:
annotations:
openshift.io/generated-by: OpenShiftWebConsole
creationTimestamp: 2016-10-14T11:06:26Z
labels:
app: cake
name: cake
namespace: nifty
resourceVersion: "237197024"
selfLink: /api/v1/namespaces/nifty/services/cake
uid: 40352961-91fe-11e6-ba92-0e63b9c1c48f
spec:
clusterIP: 172.30.196.123
portalIP: 172.30.196.123
ports:
- name: 8080-tcp
port: 8080
protocol: TCP
targetPort: 8080
selector:
deploymentconfig: cake
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
➜ ~ oc get ep cake
NAME ENDPOINTS AGE
cake 10.1.96.6:8080 13m
It looks like the problem is that you have a targetPort of http (i.e. 80) specified in the route, but the service is on 8080. 1.But all my operation is the same before today, and it has been ok before. 2.80 and 8080 can't be change in my operation. This target port will route to Service Port 8080 → Container Port 8080 (TCP). 11:07:24 PM Warning Creating load balancer failed Error creating load balancer (will retry): Error getting LB for service react/cake: AccessDenied: User: arn:aws:iam::507479335359:user/cloud_provider is not authorized to perform: elasticloadbalancing:DescribeLoadBalancers status code: 403, request id: e9b826bb-921f-11e6-aeea-75cc499f5b37 -> Cgroups memory limit is set, using HTTPD_MAX_REQUEST_WORKERS=34 AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 10.1.5.6. Set the 'ServerName' directive globally to suppress this message AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 10.1.5.6. Set the 'ServerName' directive globally to suppress this message [Fri Oct 14 10:54:24.207285 2016] [auth_digest:notice] [pid 1] AH01757: generating secret for digest authentication ... [Fri Oct 14 10:54:24.216426 2016] [http2:warn] [pid 1] AH02951: mod_ssl does not seem to be enabled [Fri Oct 14 10:54:24.217053 2016] [lbmethod_heartbeat:notice] [pid 1] AH02282: No slotmem from mod_heartmonitor [Fri Oct 14 10:54:24.396521 2016] [mpm_prefork:notice] [pid 1] AH00163: Apache/2.4.18 (Red Hat) configured -- resuming normal operations [Fri Oct 14 10:54:24.396553 2016] [core:notice] [pid 1] AH00094: Command line: 'httpd -D FOREGROUND' ljladmin: Ok, that "Creating load balancer" message is for creating a service of type loadbalancer. But those aren't supported in online, and the service above is not of that type. So that's probably unrelated. Tomas: Sorry, I was looking at the wrong service. I see that yours names a port http. So the route is fine. I think we need to look at the logs for the router to see what's going on. ljladmin: Miciah points out your issues are the same as https://bugzilla.redhat.com/show_bug.cgi?id=1367229 But Tomas' are separate and I'll keep looking at those. Miciah got the logs, and we see: W1014 04:24:08.510172 1 router.go:690] a edge terminated route with host cakephp-mysql-persistent-cakephp.44fs.preview.openshiftapps.com does not have the required certificates. The route will still be created but no certificates will be written Jiří Fiala: Can you post the yaml for your route please? Oh, and Tomas, are your routes still present? There's nothing in the logs about eap-app. Ben, I have recreated them a few times throughout the day to see if the issue is fixed. I wanted to use OSO in live demo. Some other time perhaps. Right now, I have two routes deployed in my namespace (eap-app and secure-eap-app). Hi Guys Thank you for looking into this matter... I was the fellow who first reported the error. My OpenShift Preview site is essentially broken until this is resolved. Is there something I should be doing, or just sort of sit tight and be patient. I'm totally cool with that, just not sure of next steps. Thanks in advance for your help. - Googs I am having exactly the same problem with my preview.openshift.com account. I've tried with a number of the demo applications and in each case any attempt to access the route fails with the same "503 Service Unavailable No server is available to handle this request." error. I am also seeing the LoadBalancer failure messages - eg Error creating load balancer (will retry): Error getting LB for service tdgsandf-cmd-nodejs/nodejs-mongodb-example: AccessDenied: User: arn:aws:iam::507479335359:user/cloud_provider is not authorized to perform: elasticloadbalancing:DescribeLoadBalancers status code: 403, request id: d3c8de66-92c8-11e6-b538-f3dcdae1fb63 I've also seen the Teardownnetwork message but don't have an example in my current logs/events. As Matt Googinis says above - this makes the Openshift Preview effectively useless at present as there is no means to interact with a running service from outside. Reproducible: Yes - always Steps to reproduce: clear any existing project follow the steps in the "basic walkthrough" - https://docs.openshift.com/online/getting_started/basic_walkthrough.html until step "Viewing your Running Application" Expected result: as per walkthrough documentation Actual result: "503 Service Unavailable No server is available to handle this request." This appears to have been fixed (for me at least) overnight. Me too! :-) My Service is now available! Thank you Red Hat! Me too!Thank you! We saw: E1013 23:52:33.630700 1 ratelimiter.go:52] error reloading router: wait: no child processes In the logs. Restarting the router fixed it, but we don't know the root cause yet. (In reply to Ben Bennett from comment #16) > Jiří Fiala: Can you post the yaml for your route please? I'm sorry for the delay, here's the route in question: --- $ oc get route/cakephp -o yaml apiVersion: v1 kind: Route metadata: annotations: openshift.io/host.generated: "true" creationTimestamp: 2016-10-13T12:17:02Z labels: app: cakephp name: cakephp namespace: pub resourceVersion: "234435454" selfLink: /oapi/v1/namespaces/pub/routes/cakephp uid: f2ca8b63-913e-11e6-ae04-0ebeb1070c7f spec: host: cakephp-pub.44fs.preview.openshiftapps.com port: targetPort: 8080-tcp to: kind: Service name: cakephp weight: 100 status: ingress: - conditions: - lastTransitionTime: 2016-10-13T12:17:02Z status: "True" type: Admitted host: cakephp-pub.44fs.preview.openshiftapps.com routerName: router --- I'm closing this because the router issue was resolved by Miciah. |