Bug 2004632
| Summary: | When LE takes a large amount of time, multiple whereabouts are seen | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Martin Kennelly <mkennell> | ||||
| Component: | Networking | Assignee: | Douglas Smith <dosmith> | ||||
| Networking sub component: | multus | QA Contact: | Weibin Liang <weliang> | ||||
| Status: | CLOSED ERRATA | Docs Contact: | |||||
| Severity: | high | ||||||
| Priority: | urgent | CC: | bbennett | ||||
| Version: | 4.10 | ||||||
| Target Milestone: | --- | ||||||
| Target Release: | 4.10.0 | ||||||
| Hardware: | All | ||||||
| OS: | All | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | |||||||
| : | 2009493 (view as bug list) | Environment: | |||||
| Last Closed: | 2022-03-10 16:10:53 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 2009493 | ||||||
| Attachments: |
|
||||||
|
Description
Martin Kennelly
2021-09-15 17:04:06 UTC
I think, I see the issue in the code - in pkg/storage/kubernetes/ipam.go:
go func() {
defer wg.Done()
ctx, cancel := context.WithCancel(context.Background())
res := make(chan error)
go func() {
logging.Debugf("Started leader election")
le.Run(ctx)
logging.Debugf("Finished leader election")
res <- nil
}()
LE never ends and needs a timeout. It should be context.WithTimeout().
Here is a possible implementation of a fix: https://github.com/k8snetworkplumbingwg/whereabouts/pull/142 Flow the steps in description, all the pods can get the unical IP addresses from two WB instances, tested in 4.10.0-0.nightly-2021-10-15-025303
[weliang@weliang whereabouts-stopwatch]$ oc get pods | grep test | awk '{print $1}' | xargs -I {} oc exec -t {} -- ip a | grep "inet 10.10" | awk '{print $2}' | sort | uniq | wc -l
406
[weliang@weliang whereabouts-stopwatch]$ oc get pod | grep Running | wc -l
406
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 |