Bug 1434574 - [3.4] [networking_public_407] the router configuration not reloaded after the namespace label changed
Summary: [3.4] [networking_public_407] the router configuration not reloaded after the...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.4.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.4.z
Assignee: Ben Bennett
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On: 1415112 1436296
Blocks: 1267746
TreeView+ depends on / blocked
 
Reported: 2017-03-21 19:41 UTC by Ben Bennett
Modified: 2022-08-04 22:20 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1415112
Environment:
Last Closed: 2017-04-04 14:28:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ose pull 669 0 None None None 2017-03-27 15:43:24 UTC
Red Hat Product Errata RHBA-2017:0865 0 normal SHIPPED_LIVE OpenShift Container Platform 3.4.1.12, 3.3.1.17-4, and 3.2.1.30 bug fix update 2017-04-04 18:27:43 UTC

Description Ben Bennett 2017-03-21 19:41:18 UTC
+++ This bug was initially created as a clone of Bug #1415112 +++

Description of problem:
If add NAMESPACE_LABELS to router firstly, then when you adding the label to namespace or remove the label from the namespace, the router configuration cannot be reloaded.
But if change router NAMESPACE_LABELS at last, the configuration can be reloaded. 

Version-Release number of selected component (if applicable):
openshift v3.5.0.6+87f6173
kubernetes v1.5.2+43a9be4
etcd 3.1.0-rc.0


How reproducible:
always

Steps to Reproduce:
1. create project, pod, service and route.
   # oc new-project u1p1
   # oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/routing/caddy-docker.json
   # oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/routing/edge/service_unsecure.json
   # oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/routing/edge/route_edge.json

2. add label to the project.
   # oc label namespace u1p1 team=red

3. create pod, service, route in another project but without label.

4. add NAMESPACE_LABELS to router
   # oadm policy add-cluster-role-to-user cluster-reader system:serviceaccount:default:router
   # oc env dc/router NAMESPACE_LABELS=team=red

5. check the router configuration
   # oc rsh router-2-xxxxx cat haproxy.config

6. remove the label from the project.
   # oc label namespace u1p1 team-

7. check the router configuration again.

Actual results:
In step 5, the router configuration is reloaded and only the route in the labelled namespace can be shown.
But in step 7, the router configuration is not reloaded, the route is still there even the label has been removed.

Expected results:
the router configuration should be reloaded after namespace label changed (in step 7)

Additional info:

--- Additional comment from Maru Newby on 2017-01-24 12:35:12 EST ---

The router only updates namespaces on an interval (controllable via --resync-interval and defaulting to 10m).  Changes to namespace labelling will not be reflected in the routes served by a given router instance until the interval is hit or the instance is restarted.

--- Additional comment from hongli on 2017-03-14 06:39:38 EDT ---

retested in latest OCP 3.5.0.50 and found router not updated in 10m, so it is not same to bug 1355711 and reopen this one.

I'm wonder if the default 10m of resync-interval has been removed due to the PR: https://github.com/openshift/origin/pull/12242/

--- Additional comment from Maru Newby on 2017-03-15 11:12:28 EDT ---

(In reply to hongli from comment #2)
> retested in latest OCP 3.5.0.50 and found router not updated in 10m, so it
> is not same to bug 1355711 and reopen this one.
> 
> I'm wonder if the default 10m of resync-interval has been removed due to the
> PR: https://github.com/openshift/origin/pull/12242/

The resync interval was not removed.  That PR only prevents reloads if the route state has not changed, but a change in the set of namespaces a router targets should still result in a state change.

It would be helpful to increase the logging verbosity on the router and provide those logs. There is logging around the updating of watched namespaces.

--- Additional comment from hongli on 2017-03-16 06:23:28 EDT ---

Hi Maru, the reproduce step has just a different order with the original bug (run step4 firstly). The steps as follows:

1. add NAMESPACE_LABELS to router
2. create project, pod, service and route.
3. add label to the project.

Wait more than 10 min but the route doesn't reload. Below is the router logs, observed "forcing resync" at 10:04 but not reload route. 

I0316 09:54:41.882547       1 router.go:390] Writing the router state
I0316 09:54:41.883141       1 router.go:395] Writing the router config
I0316 09:54:41.915215       1 router.go:400] Reloading the router
I0316 09:54:41.977765       1 reaper.go:24] Signal received: child exited
I0316 09:54:41.977804       1 reaper.go:32] Reaped process with pid 53
I0316 09:54:42.001333       1 router.go:475] Router reloaded:
 - Checking HAProxy /healthz on port 1936 ...
 - HAProxy port 1936 health check ok : 0 retry attempt(s).
I0316 09:54:42.001410       1 reaper.go:24] Signal received: child exited
I0316 09:55:46.704353       1 controller.go:305] Processing Route: u1p1/secured-edge-route -> service-unsecure
I0316 09:55:46.704373       1 controller.go:306]            Alias: test-edge.example.com
I0316 09:55:46.704378       1 controller.go:307]            Path: 
I0316 09:55:46.704382       1 controller.go:308]            Event: ADDED
I0316 09:55:46.704391       1 router.go:129] host test-edge.example.com admitted
I0316 10:01:32.208521       1 reflector.go:392] github.com/openshift/origin/pkg/router/template/service_lookup.go:30: Watch close - *api.Service total 1 items received
I0316 10:04:27.335563       1 controller.go:150] Updating watched namespaces: map[u1p1:{}]
I0316 10:04:36.772510       1 reflector.go:273] github.com/openshift/origin/pkg/router/template/service_lookup.go:30: forcing resync
I0316 10:04:37.146518       1 reflector.go:273] github.com/openshift/origin/pkg/router/controller/factory/factory.go:75: forcing resync
I0316 10:04:37.146630       1 reflector.go:273] github.com/openshift/origin/pkg/router/controller/factory/factory.go:68: forcing resync
I0316 10:04:37.146657       1 controller.go:305] Processing Route: default/docker-registry -> docker-registry
I0316 10:04:37.146662       1 controller.go:306]            Alias: docker-registry-default.0316-yny.qe.rhcloud.com
I0316 10:04:37.146667       1 controller.go:307]            Path: 
I0316 10:04:37.146670       1 controller.go:308]            Event: 
I0316 10:04:37.146681       1 router.go:129] host docker-registry-default.0316-yny.qe.rhcloud.com admitted
I0316 10:04:37.146698       1 plugin.go:151] Processing 1 Endpoints for Name: service-unsecure ()
I0316 10:04:37.146706       1 plugin.go:154]   Subset 0 : api.EndpointSubset{Addresses:[]api.EndpointAddress{api.EndpointAddress{IP:"10.2.2.21", Hostname:"", NodeName:(*string)(0xc42011f630), TargetRef:(*api.ObjectReference)(0xc42046b730)}}, NotReadyAddresses:[]api.EndpointAddress(nil), Ports:[]api.EndpointPort{api.EndpointPort{Name:"http", Port:8080, Protocol:"TCP"}}}
I0316 10:09:14.213867       1 reflector.go:392] github.com/openshift/origin/pkg/router/template/service_lookup.go:30: Watch close - *api.Service total 0 items received

--- Additional comment from Maru Newby on 2017-03-18 12:05:56 EDT ---

(In reply to hongli from comment #4)
> Hi Maru, the reproduce step has just a different order with the original bug
> (run step4 firstly). The steps as follows:
> 
> 1. add NAMESPACE_LABELS to router
> 2. create project, pod, service and route.
> 3. add label to the project.
> 
> Wait more than 10 min but the route doesn't reload. Below is the router
> logs, observed "forcing resync" at 10:04 but not reload route. 

There are 2 types of resync - namespace and everything else.  The 2 use different mechanisms, and namespace sync is intended to be triggered just before the sync of route data.  Does the text 'Updating watched namespaces' appear in the log?

--- Additional comment from Maru Newby on 2017-03-18 12:14:53 EDT ---

(In reply to Maru Newby from comment #5)
> (In reply to hongli from comment #4)
> > Hi Maru, the reproduce step has just a different order with the original bug
> > (run step4 firstly). The steps as follows:
> > 
> > 1. add NAMESPACE_LABELS to router
> > 2. create project, pod, service and route.
> > 3. add label to the project.
> > 
> > Wait more than 10 min but the route doesn't reload. Below is the router
> > logs, observed "forcing resync" at 10:04 but not reload route. 
> 
> There are 2 types of resync - namespace and everything else.  The 2 use
> different mechanisms, and namespace sync is intended to be triggered just
> before the sync of route data.  Does the text 'Updating watched namespaces'
> appear in the log?

Nevermind, I see it.  So the namespace sync is being triggered. I don't see an event for the route in the resync though, is that the end of the log?

--- Additional comment from Maru Newby on 2017-03-18 19:23:21 EDT ---

PR is up.  This issue looks to have been latent for a long time, good catch.

Comment 3 Ben Bennett 2017-03-28 15:20:37 UTC
PR https://github.com/openshift/ose/pull/669

Comment 4 Ben Bennett 2017-03-28 15:27:57 UTC
*** Bug 1434707 has been marked as a duplicate of this bug. ***

Comment 5 Hongan Li 2017-03-29 08:33:18 UTC
Since today's env is OCP 3.4.1.11 and still can reproduce the issue, will verify it ASAP when v3.4.1.12 env ready.

Comment 6 Hongan Li 2017-03-30 01:57:27 UTC
verified in 3.4.1.12 (atomic-openshift-3.4.1.12-1.git.0.57d7e1d.el7.x86_64) and the issue has been fixed.

The router is reloaded after 10 minutes and curl is ok. 

 [root@host-8-174-62 ~]# oc logs router-3-l36r1 
I0330 01:36:16.503068       1 router.go:456] Router reloaded:
 - Checking HAProxy /healthz on port 1936 ...
 - HAProxy port 1936 health check ok : 0 retry attempt(s).
I0330 01:36:16.503153       1 router.go:221] Router is only using routes in namespaces matching team=red
I0330 01:36:16.552631       1 router.go:456] Router reloaded:
 - Checking HAProxy /healthz on port 1936 ...
 - HAProxy port 1936 health check ok : 0 retry attempt(s).
I0330 01:36:21.230074       1 router.go:456] Router reloaded:
 - Checking HAProxy /healthz on port 1936 ...
 - HAProxy port 1936 health check ok : 0 retry attempt(s).
I0330 01:46:16.546185       1 router.go:456] Router reloaded:
 - Checking HAProxy /healthz on port 1936 ...
 - HAProxy port 1936 health check ok : 0 retry attempt(s).
I0330 01:46:16.583721       1 router.go:456] Router reloaded:
 - Checking HAProxy /healthz on port 1936 ...
 - HAProxy port 1936 health check ok : 0 retry attempt(s).

Comment 8 errata-xmlrpc 2017-04-04 14:28:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0865


Note You need to log in before you can comment on or make changes to this bug.