Bug 2079012
Summary: | egressIP not migrated to correct workers after deleting machineset it was assigned | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Anand T N <atn> | |
Component: | Networking | Assignee: | Surya Seetharaman <surya> | |
Networking sub component: | ovn-kubernetes | QA Contact: | huirwang | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | high | |||
Priority: | high | CC: | anusaxen, ffernand, huirwang, pdiak, surya | |
Version: | 4.10 | Flags: | surya:
needinfo-
|
|
Target Milestone: | --- | |||
Target Release: | 4.11.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | No Doc Update | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2105657 (view as bug list) | Environment: | ||
Last Closed: | 2022-08-10 11:08:39 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 2094039 | |||
Bug Blocks: | 2105657 |
Description
Anand T N
2022-04-26 17:22:56 UTC
initial status: $ oc get machineset -A NAMESPACE NAME DESIRED CURRENT READY AVAILABLE AGE openshift-machine-api ci-ln-8tni372-c1627-72fgn-worker 3 3 3 3 20m $ oc get nodes -owide -A NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ci-ln-8tni372-c1627-72fgn-master-0 Ready master 20m v1.23.5+1f952b3 192.168.51.12 192.168.51.12 Red Hat Enterprise Linux CoreOS 410.84.202203290245-0 (Ootpa) 4.18.0-305.40.2.el8_4.x86_64 cri-o://1.23.2-3.rhaos4.10.gitcbe78bd.el8 ci-ln-8tni372-c1627-72fgn-master-1 Ready master 20m v1.23.5+1f952b3 192.168.51.21 192.168.51.21 Red Hat Enterprise Linux CoreOS 410.84.202203290245-0 (Ootpa) 4.18.0-305.40.2.el8_4.x86_64 cri-o://1.23.2-3.rhaos4.10.gitcbe78bd.el8 ci-ln-8tni372-c1627-72fgn-master-2 Ready master 20m v1.23.5+1f952b3 192.168.51.20 192.168.51.20 Red Hat Enterprise Linux CoreOS 410.84.202203290245-0 (Ootpa) 4.18.0-305.40.2.el8_4.x86_64 cri-o://1.23.2-3.rhaos4.10.gitcbe78bd.el8 ci-ln-8tni372-c1627-72fgn-worker-952kx Ready worker 10m v1.23.5+1f952b3 192.168.51.14 192.168.51.14 Red Hat Enterprise Linux CoreOS 410.84.202203290245-0 (Ootpa) 4.18.0-305.40.2.el8_4.x86_64 cri-o://1.23.2-3.rhaos4.10.gitcbe78bd.el8 ci-ln-8tni372-c1627-72fgn-worker-pndm8 Ready worker 10m v1.23.5+1f952b3 192.168.51.19 192.168.51.19 Red Hat Enterprise Linux CoreOS 410.84.202203290245-0 (Ootpa) 4.18.0-305.40.2.el8_4.x86_64 cri-o://1.23.2-3.rhaos4.10.gitcbe78bd.el8 ci-ln-8tni372-c1627-72fgn-worker-svzph Ready worker 10m v1.23.5+1f952b3 192.168.51.30 192.168.51.30 Red Hat Enterprise Linux CoreOS 410.84.202203290245-0 (Ootpa) 4.18.0-305.40.2.el8_4.x86_64 cri-o://1.23.2-3.rhaos4.10.gitcbe78bd.el8 $ oc get machineset -A NAMESPACE NAME DESIRED CURRENT READY AVAILABLE AGE openshift-machine-api ci-ln-8tni372-c1627-72fgn-worker 3 3 3 3 3h3m Spec: Egress I Ps: 192.168.51.15 192.168.51.16 Namespace Selector: Match Labels: Env: prod Status: Items: Egress IP: 192.168.51.15 Node: ci-ln-8tni372-c1627-72fgn-worker-952kx Egress IP: 192.168.51.16 Node: ci-ln-8tni372-c1627-72fgn-worker-pndm8 Events: <none> I tried a test with egressIPs when machineset is scaled down and reassignment is indeed happening: [surya@hidden-temple openshift]$ oc scale machineset -n openshift-machine-api ci-ln-8tni372-c1627-72fgn-worker --replicas=2 machineset.machine.openshift.io/ci-ln-8tni372-c1627-72fgn-worker scaled [surya@hidden-temple openshift]$ oc get machineset -A NAMESPACE NAME DESIRED CURRENT READY AVAILABLE AGE openshift-machine-api ci-ln-8tni372-c1627-72fgn-worker 2 2 2 2 3h4m [surya@hidden-temple openshift]$ oc get nodes NAME STATUS ROLES AGE VERSION ci-ln-8tni372-c1627-72fgn-master-0 Ready master 3h4m v1.23.5+1f952b3 ci-ln-8tni372-c1627-72fgn-master-1 Ready master 3h4m v1.23.5+1f952b3 ci-ln-8tni372-c1627-72fgn-master-2 Ready master 3h4m v1.23.5+1f952b3 ci-ln-8tni372-c1627-72fgn-worker-952kx Ready,SchedulingDisabled worker 174m v1.23.5+1f952b3 ci-ln-8tni372-c1627-72fgn-worker-pndm8 Ready worker 174m v1.23.5+1f952b3 ci-ln-8tni372-c1627-72fgn-worker-svzph Ready worker 174m v1.23.5+1f952b3 Spec: Egress I Ps: 192.168.51.15 192.168.51.16 Namespace Selector: Match Labels: Env: prod Status: Items: Egress IP: 192.168.51.16 Node: ci-ln-8tni372-c1627-72fgn-worker-pndm8 Egress IP: 192.168.51.15 Node: ci-ln-8tni372-c1627-72fgn-worker-svzph Events: <none> E0510 17:05:32.588294 1 egressip.go:747] Allocator error: EgressIP: egressips-prod assigned to node: ci-ln-8tni372-c1627-72fgn-worker-952kx which is not reachable, will attempt rebalancing I0510 17:05:32.588697 1 egressip.go:1353] Successful assignment of egress IP: 192.168.51.15 on node: &{egressIPConfig:0xc000180600 mgmtIPs:[[10 128 2 2]] allocations:map[] isReady:true isReachable:true isEgressAssignable:true name:ci-ln-8tni372-c1627-72fgn-worker-svzph} I0510 17:05:32.588825 1 kube.go:244] Patching status on EgressIP egressips-prod it moves correctly to the other node. now let me see how I can create a new machine-set and do the same test as shown in the bug. Hey Huiran! Could you please try to reproduce the same issue with this image: quay.io/itssurya/dev-images:e5a4884b-83c7-4728-9117-936093a25c7d. Basically install OCP; scale down CVO, edit CNO deployment and change OVNK image to the one ^ provided and redo the MC test you did in: https://bugzilla.redhat.com/show_bug.cgi?id=2079012#c9? We need to solve https://bugzilla.redhat.com/show_bug.cgi?id=2094039#c0 first because everytime we try to reproduce this using machine set deletion, we seem to be hitting that bug which causes panic and restarts in ovnk hence making it difficult to actually verify this bug's fix. Marking this as dependent on 2094039. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |