Bug 1486322

Summary: Router image is not getting all routes
Product: OpenShift Container Platform Reporter: Vladislav Walek <vwalek>
Component: NetworkingAssignee: Ravi Sankar <rpenta>
Networking sub component: router QA Contact: zhaozhanqi <zzhao>
Status: CLOSED DUPLICATE Docs Contact:
Severity: high    
Priority: high CC: aos-bugs, bbennett
Version: 3.4.0   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-09-11 15:47:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vladislav Walek 2017-08-29 13:11:09 UTC
Description of problem:

customer has configured 2 routers and using namespace labels env variable to shards the routes based on namespace. They configured env variable ROUTER_ALLOWED_DOMAINS on each router.
However, when the router reloads it doesn't get any route. The logs are showing only:

E0825 16:48:36.851076       1 host_admitter.go:121] Route example/route not admitted: host not in the allowed list of domains
E0825 16:48:36.851095       1 controller.go:169] host not in the allowed list of domains

The problem is that the router is not loading all the routes. In some cases it gets the route and some not.
For some reason the router was reloading after 1 sec (based on message Router reloaded).

The reload time was set to 10m by setting the --resync interval on the openshift-router.
However, even after that it is not getting all the routes.

The labels are ok. The configuration is ok as at some point the route is admitted and sometimes is not. The behavior looks like the router doesn't download all the routes. 

Version-Release number of selected component (if applicable):
OpenShift container platform 3.4

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 Ravi Sankar 2017-08-31 23:23:16 UTC
I have tested this use case on 3.4 latest release (v3.4.1.44.18) and also on latest code on master. ROUTER_ALLOWED_DOMAINS and NAMESPACE_LABELS on router worked as expected with known caveats! 

Looking at the logs, router config looks good.
I suspect one of these could have caused the issue:

(1) Unlike non-sharded router, sharded router can take up to 20 mins to refresh the state for 10min resync interval. 10 min resync interval will update the namespaces but routes/endpoints resources are also resynced every 10 mins and the router could take 2 resync cycles to catch up (existing issue: https://bugzilla.redhat.com/show_bug.cgi?id=1479295)
This might explain why you see routes sometimes and not in some cases after 10 mins. 
To validate this, you can try:
- Create the route that matches namespace labels and also present in allowed domains.
- Check the route is working or not (it may not work as per this bug)
- Now, scale down the router to 0 (oc scale dc/<router-name> --replicas=0)
- Now, scale up the router to 2 (oc scale dc/<router-name> --replicas=2)
- Check the route again, if this works then you are hitting this issue.

(2) There are few issues related to router event queue like: 
 https://github.com/openshift/ose/pull/669, https://github.com/openshift/ose/pull/559 that could cause this issue. These fixes are back ported to 3.4, check if your 3.4 version has those fixes.

Comment 4 Ben Bennett 2017-09-11 15:47:49 UTC

*** This bug has been marked as a duplicate of bug 1479295 ***