Bug 1905748
| Summary: | with sharded ingresscontrollers, all shards reload when any endpoint changes | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Dan Yocum <dyocum> |
| Component: | Networking | Assignee: | Andrew McDermott <amcdermo> |
| Networking sub component: | router | QA Contact: | Hongan Li <hongli> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | high | ||
| Priority: | urgent | CC: | amcdermo, aos-bugs, aos-network-edge-staff, dahernan, ltitov, openshift-bugs-escalate, sgreene, svaughn |
| Version: | 4.5 | Keywords: | Reopened |
| Target Milestone: | --- | ||
| Target Release: | 4.7.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-02-24 15:41:14 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1918194, 1918197 | ||
|
Description
Dan Yocum
2020-12-09 00:20:55 UTC
There is this existing BZ which is broadly the same issue: https://bugzilla.redhat.com/show_bug.cgi?id=1839989 As this really is an enhancement to the current design this is now captured in the following RFE: https://issues.redhat.com/browse/NE-391 Re-opening this BZ. That RFE refers to an old v3.11 BZ and it doesn't address the issue that the customer is experiencing. They have 6 router shards. When a single endpoint changes (create/delete/migrate), *ALL* the routers reload, not just the router with the endpoint. This isn't an haproxy issue, it's a k8s issue. The customer has dug into the code and this is what they have to say: "This [issue] is a result of the "Kind: endpoints/endpointslice" changing in k8s, not haproxy noticing dead backends." (In reply to Dan Yocum from comment #0) > Description of problem: > We have a number of ingresscontrollers setup to handle different routes: > > NAME AGE > crcshard-0 159d > crcshard-1 159d > crcshard-2 159d > crcshard-3 159d > crcshard-4 159d > crcshard-5 159d > default 286d > public 138d > > Each of these has different routeSelector, but no namespaceSelector. Each > of the routes in our clusters match one, and only one of these > routeSelectors (handled by custom webhook/operator). Please could we attach the YAML output for all these ingresscontrollers: $ oc get ingresscontrollers --all-namespaces -o yaml Tested with 4.7.0-0.nightly-2021-01-22-134922 and passed.
1. create route shards with two more custom ingresscontrollers, one is using namespace label and one is using route label.
spec:
namespaceSelector:
matchLabels:
namespace: router-test
spec:
routeSelector:
matchLabels:
route: router-test
2. create three projects, pods, services and routes, ns1 is labelled as "namespace=router-test", route2 in ns2 is labelled as route=router-test,
3. scale up/down the pods in ns3 to make endpoints change, no reload in both router pods with labels.
4. scale up/down the pods in ns2, no reload in the router pod with namespace label.
5. scale up/down the pods in ns1, no reload in the router pod with route label.
logs:
$ oc -n openshift-ingress logs router-nslabel-6b9c5d77b-l25mj | tail -n2
I0125 09:15:24.735305 1 router.go:578] template "msg"="router reloaded" "output"=" - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
I0125 09:25:34.356529 1 router.go:578] template "msg"="router reloaded" "output"=" - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
$ oc -n openshift-ingress logs router-routelabel-ff4dfdd4-bfbtg | tail -n2
I0125 09:17:10.457487 1 router.go:578] template "msg"="router reloaded" "output"=" - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
I0125 09:23:26.001079 1 router.go:578] template "msg"="router reloaded" "output"=" - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |