Bug 1415112 - [3.5] [networking_public_407] the router configuration not reloaded after the namespace label changed
Summary: [3.5] [networking_public_407] the router configuration not reloaded after the...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.5.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.5.z
Assignee: Ben Bennett
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks: 1434574 1436296
TreeView+ depends on / blocked
 
Reported: 2017-01-20 10:08 UTC by Hongan Li
Modified: 2022-08-04 22:20 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1434574 1436296 (view as bug list)
Environment:
Last Closed: 2017-04-26 05:36:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ose pull 657 0 None None None 2020-08-24 08:54:30 UTC
Red Hat Product Errata RHBA-2017:1129 0 normal SHIPPED_LIVE OpenShift Container Platform 3.5, 3.4, 3.3, and 3.2 bug fix update 2017-04-26 09:35:35 UTC

Description Hongan Li 2017-01-20 10:08:35 UTC
Description of problem:
If add NAMESPACE_LABELS to router firstly, then when you adding the label to namespace or remove the label from the namespace, the router configuration cannot be reloaded.
But if change router NAMESPACE_LABELS at last, the configuration can be reloaded. 

Version-Release number of selected component (if applicable):
openshift v3.5.0.6+87f6173
kubernetes v1.5.2+43a9be4
etcd 3.1.0-rc.0


How reproducible:
always

Steps to Reproduce:
1. create project, pod, service and route.
   # oc new-project u1p1
   # oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/routing/caddy-docker.json
   # oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/routing/edge/service_unsecure.json
   # oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/routing/edge/route_edge.json

2. add label to the project.
   # oc label namespace u1p1 team=red

3. create pod, service, route in another project but without label.

4. add NAMESPACE_LABELS to router
   # oadm policy add-cluster-role-to-user cluster-reader system:serviceaccount:default:router
   # oc env dc/router NAMESPACE_LABELS=team=red

5. check the router configuration
   # oc rsh router-2-xxxxx cat haproxy.config

6. remove the label from the project.
   # oc label namespace u1p1 team-

7. check the router configuration again.

Actual results:
In step 5, the router configuration is reloaded and only the route in the labelled namespace can be shown.
But in step 7, the router configuration is not reloaded, the route is still there even the label has been removed.

Expected results:
the router configuration should be reloaded after namespace label changed (in step 7)

Additional info:

Comment 1 Maru Newby 2017-01-24 17:35:12 UTC
The router only updates namespaces on an interval (controllable via --resync-interval and defaulting to 10m).  Changes to namespace labelling will not be reflected in the routes served by a given router instance until the interval is hit or the instance is restarted.

*** This bug has been marked as a duplicate of bug 1355711 ***

Comment 2 Hongan Li 2017-03-14 10:39:38 UTC
retested in latest OCP 3.5.0.50 and found router not updated in 10m, so it is not same to bug 1355711 and reopen this one.

I'm wonder if the default 10m of resync-interval has been removed due to the PR: https://github.com/openshift/origin/pull/12242/

Comment 3 Maru Newby 2017-03-15 15:12:28 UTC
(In reply to hongli from comment #2)
> retested in latest OCP 3.5.0.50 and found router not updated in 10m, so it
> is not same to bug 1355711 and reopen this one.
> 
> I'm wonder if the default 10m of resync-interval has been removed due to the
> PR: https://github.com/openshift/origin/pull/12242/

The resync interval was not removed.  That PR only prevents reloads if the route state has not changed, but a change in the set of namespaces a router targets should still result in a state change.

It would be helpful to increase the logging verbosity on the router and provide those logs. There is logging around the updating of watched namespaces.

Comment 4 Hongan Li 2017-03-16 10:23:28 UTC
Hi Maru, the reproduce step has just a different order with the original bug (run step4 firstly). The steps as follows:

1. add NAMESPACE_LABELS to router
2. create project, pod, service and route.
3. add label to the project.

Wait more than 10 min but the route doesn't reload. Below is the router logs, observed "forcing resync" at 10:04 but not reload route. 

I0316 09:54:41.882547       1 router.go:390] Writing the router state
I0316 09:54:41.883141       1 router.go:395] Writing the router config
I0316 09:54:41.915215       1 router.go:400] Reloading the router
I0316 09:54:41.977765       1 reaper.go:24] Signal received: child exited
I0316 09:54:41.977804       1 reaper.go:32] Reaped process with pid 53
I0316 09:54:42.001333       1 router.go:475] Router reloaded:
 - Checking HAProxy /healthz on port 1936 ...
 - HAProxy port 1936 health check ok : 0 retry attempt(s).
I0316 09:54:42.001410       1 reaper.go:24] Signal received: child exited
I0316 09:55:46.704353       1 controller.go:305] Processing Route: u1p1/secured-edge-route -> service-unsecure
I0316 09:55:46.704373       1 controller.go:306]            Alias: test-edge.example.com
I0316 09:55:46.704378       1 controller.go:307]            Path: 
I0316 09:55:46.704382       1 controller.go:308]            Event: ADDED
I0316 09:55:46.704391       1 router.go:129] host test-edge.example.com admitted
I0316 10:01:32.208521       1 reflector.go:392] github.com/openshift/origin/pkg/router/template/service_lookup.go:30: Watch close - *api.Service total 1 items received
I0316 10:04:27.335563       1 controller.go:150] Updating watched namespaces: map[u1p1:{}]
I0316 10:04:36.772510       1 reflector.go:273] github.com/openshift/origin/pkg/router/template/service_lookup.go:30: forcing resync
I0316 10:04:37.146518       1 reflector.go:273] github.com/openshift/origin/pkg/router/controller/factory/factory.go:75: forcing resync
I0316 10:04:37.146630       1 reflector.go:273] github.com/openshift/origin/pkg/router/controller/factory/factory.go:68: forcing resync
I0316 10:04:37.146657       1 controller.go:305] Processing Route: default/docker-registry -> docker-registry
I0316 10:04:37.146662       1 controller.go:306]            Alias: docker-registry-default.0316-yny.qe.rhcloud.com
I0316 10:04:37.146667       1 controller.go:307]            Path: 
I0316 10:04:37.146670       1 controller.go:308]            Event: 
I0316 10:04:37.146681       1 router.go:129] host docker-registry-default.0316-yny.qe.rhcloud.com admitted
I0316 10:04:37.146698       1 plugin.go:151] Processing 1 Endpoints for Name: service-unsecure ()
I0316 10:04:37.146706       1 plugin.go:154]   Subset 0 : api.EndpointSubset{Addresses:[]api.EndpointAddress{api.EndpointAddress{IP:"10.2.2.21", Hostname:"", NodeName:(*string)(0xc42011f630), TargetRef:(*api.ObjectReference)(0xc42046b730)}}, NotReadyAddresses:[]api.EndpointAddress(nil), Ports:[]api.EndpointPort{api.EndpointPort{Name:"http", Port:8080, Protocol:"TCP"}}}
I0316 10:09:14.213867       1 reflector.go:392] github.com/openshift/origin/pkg/router/template/service_lookup.go:30: Watch close - *api.Service total 0 items received

Comment 5 Maru Newby 2017-03-18 16:05:56 UTC
(In reply to hongli from comment #4)
> Hi Maru, the reproduce step has just a different order with the original bug
> (run step4 firstly). The steps as follows:
> 
> 1. add NAMESPACE_LABELS to router
> 2. create project, pod, service and route.
> 3. add label to the project.
> 
> Wait more than 10 min but the route doesn't reload. Below is the router
> logs, observed "forcing resync" at 10:04 but not reload route. 

There are 2 types of resync - namespace and everything else.  The 2 use different mechanisms, and namespace sync is intended to be triggered just before the sync of route data.  Does the text 'Updating watched namespaces' appear in the log?

Comment 6 Maru Newby 2017-03-18 16:14:53 UTC
(In reply to Maru Newby from comment #5)
> (In reply to hongli from comment #4)
> > Hi Maru, the reproduce step has just a different order with the original bug
> > (run step4 firstly). The steps as follows:
> > 
> > 1. add NAMESPACE_LABELS to router
> > 2. create project, pod, service and route.
> > 3. add label to the project.
> > 
> > Wait more than 10 min but the route doesn't reload. Below is the router
> > logs, observed "forcing resync" at 10:04 but not reload route. 
> 
> There are 2 types of resync - namespace and everything else.  The 2 use
> different mechanisms, and namespace sync is intended to be triggered just
> before the sync of route data.  Does the text 'Updating watched namespaces'
> appear in the log?

Nevermind, I see it.  So the namespace sync is being triggered. I don't see an event for the route in the resync though, is that the end of the log?

Comment 7 Maru Newby 2017-03-18 23:23:21 UTC
PR is up.  This issue looks to have been latent for a long time, good catch.

Comment 9 Ben Bennett 2017-03-28 17:26:10 UTC
PR https://github.com/openshift/ose/pull/657 is waiting for 3.5.0 to release before we can commit it.

The change has already landed in master.

Comment 10 Troy Dawson 2017-04-18 20:05:11 UTC
This has been merged into ocp and is in OCP v3.5.5.7 or newer.

Comment 12 Hongan Li 2017-04-19 02:57:20 UTC
verified in atomic-openshift-3.5.5.7-1.git.0.644a8c2.el7.x86_64 and the issue has been fixed.

router is reload in 10 min and logs as below:

[root@ip-172-18-11-131 ~]# oc logs router-2-2thp3 
I0419 02:35:37.085170       1 router.go:479] Router reloaded:
 - Checking HAProxy /healthz on port 1936 ...
 - HAProxy port 1936 health check ok : 0 retry attempt(s).
I0419 02:35:37.085258       1 router.go:228] Router is only using routes in namespaces matching team=red
I0419 02:35:37.159813       1 router.go:479] Router reloaded:
 - Checking HAProxy /healthz on port 1936 ...
 - HAProxy port 1936 health check ok : 0 retry attempt(s).
I0419 02:35:41.991900       1 router.go:479] Router reloaded:
 - Checking HAProxy /healthz on port 1936 ...
 - HAProxy port 1936 health check ok : 0 retry attempt(s).
I0419 02:45:37.173565       1 router.go:479] Router reloaded:
 - Checking HAProxy /healthz on port 1936 ...
 - HAProxy port 1936 health check ok : 0 retry attempt(s).

Comment 14 errata-xmlrpc 2017-04-26 05:36:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1129

Comment 15 Ben Bennett 2017-06-19 17:24:28 UTC
*** Bug 1462675 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.