Bug 1458587 - [Starter]Route can't be accepted by a router
Summary: [Starter]Route can't be accepted by a router
Keywords:
Status: CLOSED DUPLICATE of bug 1447928
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Routing
Version: 3.x
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Ben Bennett
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-06-04 15:32 UTC by Marcin Tojek
Modified: 2023-09-14 03:58 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-06-22 15:10:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Marcin Tojek 2017-06-04 15:32:04 UTC
Description of problem:
I created a new app running on a 1 pod and exposed a service. Then I created a route. Unfortunately I can't access the web app through browser because of:
"The route is not accepting traffic yet because it has not been admitted by a router."

Version-Release number of selected component (if applicable):

OpenShift Master:
v3.5.5.19 (online version 3.5.0.20)
Kubernetes Master:
v1.5.2+43a9be4


How reproducible:


Steps to Reproduce:
1. Create new app (deployment config).
2. Spawn 1 pod.
3. Create a service.
4. Expose one port non-standard (9000/TCP)
5. Create new route:
5.1 Use random hostname: https://<name>-<namespace>.1d35.starter-us-east-1.openshiftapps.com/ 

Actual results:

I have been waiting 2-3 hours so far and tried a few times to expose the app.

1. I see the spinner with baloon saying: The route is not accepting traffic yet because it has not been admitted by a router.
2. If I browsed to https://<name>-<namespace>.1d35.starter-us-east-1.openshiftapps.com/ , I see a standard "Application is not available" page

Expected results:
1. Exposed application at https://<name>-<namespace>.1d35.starter-us-east-1.openshiftapps.com/


Additional info:
1. The application is using health checks which are green (HTTP 200).
2. No database is used.
3. The app has been deployed using Docker image pushed to internal OpenShift Online repository.

Comment 2 Ben Bennett 2017-06-05 14:10:05 UTC
Is there anything in the router logs?

Can you curl the endpoint directly from the node the router is on?

Comment 3 Eric Paris 2017-06-05 17:31:48 UTC
Are you trying to create a wildcard route? I see 3 of those not being admitted (correctly)


There are 65 other routes on the starter us east 1 cluster which were not admitted because they requested the same pathname as an existing route. The oldest route to claim wins (again correctly)

Without specific details of which route in which namespace its impossible to be any much more specific. But at this point this appears to be 'notabug'

Comment 4 Marcin Tojek 2017-06-05 17:42:26 UTC
Thank you for your interest.

Let me update the status first and give you more details.

1. I managed to expose the application route after 1-2h, so the propagation time is really long.

2. While waiting for the propagation, I removed&created the configuration multiple times. I did a couple of trials but every time the effect was a bit strange:
- I exposed the app on all ports, including unwanted, internal ones
- I got a warning about conflict of versions (there is an older one published, but in fact I deleted it)
- I couldn't recreate a deleted route because of "route already exposed" error.

3. Pushing an update of Docker image triggers a rollout of new pods, what results in the HTTP 503/"Application is not available" banner for few hours (but some requests succeed), although there is one pod always available. Shouldn't it be automatically refreshed?

4. There're many requests comming to the route which ends up with 503. I am the only client reaching the application endpoint.

@Ben Bennet
Sure, I can check router logs if there're accessible for the OpenShift Online platform. Can you give me some advice/commands? Do I need special permissions?

Comment 9 Ben Bennett 2017-06-20 18:56:32 UTC
My hunch is that this is one of:
 * https://bugzilla.redhat.com/show_bug.cgi?id=1451854
 * https://bugzilla.redhat.com/show_bug.cgi?id=1447928

Is the backend correct in /var/lib/haproxy/conf/haproxy.conf?  If so, this may be that ARP problem :-(

Comment 10 Ben Bennett 2017-06-22 15:10:30 UTC
Ok. I'm fairly certain that you are hitting the event queue problems we found and fixed while working on https://bugzilla.redhat.com/show_bug.cgi?id=1447928

While that bug is not the same symptom, while chasing it down we realized that events can be added back in to the event queue for deleted routes.  This builds up over time and the event queue can get really slow.

Responding to your comment above:

> 1. I managed to expose the application route after 1-2h, so the propagation time is really long.

Explained by things building up in the event queue cache and being erroneously added.


> 2. While waiting for the propagation, I removed&created the configuration multiple times. I did a couple of trials but every time the effect was a bit strange:
> - I exposed the app on all ports, including unwanted, internal ones

Not sure what you mean here...


> - I got a warning about conflict of versions (there is an older one published, but in fact I deleted it)

This makes sense because the old one would re-appear for a little while when the cache is resynced, but would then delete when we re-list.


> - I couldn't recreate a deleted route because of "route already exposed" error.

This is the same as the above where the old version is occasionally revived :-(


> 3. Pushing an update of Docker image triggers a rollout of new pods, what results in the HTTP 503/"Application is not available" banner for few hours (but some requests succeed), although there is one pod always available. Shouldn't it be automatically refreshed?

It should, but because the queue is overwhelmed with deleted junk (because of the bug), it can take a long time.  


> 4. There're many requests comming to the route which ends up with 503. I am the only client reaching the application endpoint.

I am not sure what that means... are you saying it sometimes works?  Once the route is admitted it should stay.


We will be backporting the needed fix early next week.  In the meantime, restarting the routers will make them responsive again.

*** This bug has been marked as a duplicate of bug 1447928 ***

Comment 11 Red Hat Bugzilla 2023-09-14 03:58:44 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.