Bug 2018188
| Summary: | VRRP ID conflict between keepalived-ipfailover and cluster VIPs | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Bram Verschueren <bverschu> |
| Component: | Networking | Assignee: | Ryan Fredette <rfredette> |
| Networking sub component: | router | QA Contact: | Melvin Joseph <mjoseph> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | achernet, aos-bugs, bperkins, dcbw, dmoessne, hongli, mmasters, pmannidi, rfredette |
| Version: | 4.10 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.11.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-08-10 10:39:06 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Bram Verschueren
2021-10-28 13:36:51 UTC
Moving to Routing component as the fix seems to be in the keepalived image which is owned by that team. Setting blocker- because this is requesting new functionality. We'll discuss with PM how we want to handle this request. melvinjoseph@mjoseph-mac openshift-tests-private % oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.10.0-0.ci-2022-03-02-053737 True False 169m Cluster version is 4.10.0-0.ci-2022-03-02-053737
melvinjoseph@mjoseph-mac Downloads % oc get po -n openshift-vsphere-infra -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-mjoseph-vip2-r2b7g-master-0 2/2 Running 0 5h18m 172.31.249.59 mjoseph-vip2-r2b7g-master-0 <none> <none>
coredns-mjoseph-vip2-r2b7g-master-1 2/2 Running 0 5h18m 172.31.249.167 mjoseph-vip2-r2b7g-master-1 <none> <none>
coredns-mjoseph-vip2-r2b7g-master-2 2/2 Running 0 5h18m 172.31.249.248 mjoseph-vip2-r2b7g-master-2 <none> <none>
coredns-mjoseph-vip2-r2b7g-worker-7z985 2/2 Running 0 5h8m 172.31.249.162 mjoseph-vip2-r2b7g-worker-7z985 <none> <none>
coredns-mjoseph-vip2-r2b7g-worker-hmxrm 2/2 Running 0 5h8m 172.31.249.89 mjoseph-vip2-r2b7g-worker-hmxrm <none> <none>
haproxy-mjoseph-vip2-r2b7g-master-0 2/2 Running 0 5h18m 172.31.249.59 mjoseph-vip2-r2b7g-master-0 <none> <none>
haproxy-mjoseph-vip2-r2b7g-master-1 2/2 Running 0 5h19m 172.31.249.167 mjoseph-vip2-r2b7g-master-1 <none> <none>
haproxy-mjoseph-vip2-r2b7g-master-2 2/2 Running 0 5h17m 172.31.249.248 mjoseph-vip2-r2b7g-master-2 <none> <none>
keepalived-mjoseph-vip2-r2b7g-master-0 2/2 Running 0 5h18m 172.31.249.59 mjoseph-vip2-r2b7g-master-0 <none> <none>
keepalived-mjoseph-vip2-r2b7g-master-1 2/2 Running 0 5h19m 172.31.249.167 mjoseph-vip2-r2b7g-master-1 <none> <none>
keepalived-mjoseph-vip2-r2b7g-master-2 2/2 Running 0 5h18m 172.31.249.248 mjoseph-vip2-r2b7g-master-2 <none> <none>
keepalived-mjoseph-vip2-r2b7g-worker-7z985 2/2 Running 0 5h7m 172.31.249.162 mjoseph-vip2-r2b7g-worker-7z985 <none> <none>
keepalived-mjoseph-vip2-r2b7g-worker-hmxrm 2/2 Running 0 5h8m 172.31.249.89 mjoseph-vip2-r2b7g-worker-hmxrm <none> <none>
melvinjoseph@mjoseph-mac Downloads %
melvinjoseph@mjoseph-mac Downloads %
melvinjoseph@mjoseph-mac Downloads %
melvinjoseph@mjoseph-mac Downloads % oc debug node/mjoseph-vip2-r2b7g-master-0 -- tcpdump -i any vrrp
Starting pod/mjoseph-vip2-r2b7g-master-0-debug ...
To use host binaries, run `chroot /host`
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
08:19:19.959553 IP ip-172-31-249-221.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 219, prio 90, authtype simple, intvl 1s, length 20
08:19:19.993250 IP ip-172-31-249-8.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 165, prio 90, authtype simple, intvl 1s, length 20
08:19:20.096232 IP ip-172-31-249-230.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 180, prio 65, authtype simple, intvl 1s, length 20
08:19:20.192349 IP ip-172-31-249-184.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 28, prio 65, authtype simple, intvl 1s, length 20
08:19:20.320090 IP mjoseph-vip2-r2b7g-master-2 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 131, prio 65, authtype simple, intvl 1s, length 20
08:19:20.415158 IP ip-172-31-249-195.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 136, prio 65, authtype simple, intvl 1s, length 20
08:19:20.415372 IP mjoseph-vip2-r2b7g-worker-7z985 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 226, prio 90, authtype simple, intvl 1s, length 20
08:19:20.501468 IP ip-172-31-249-252.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 118, prio 65, authtype simple, intvl 1s, length 20
08:19:20.633687 IP ip-172-31-249-117.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 71, prio 90, authtype simple, intvl 1s, length 20
08:19:20.634236 IP ip-172-31-249-216.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 99, prio 90, authtype simple, intvl 1s, length 20
08:19:20.959732 IP ip-172-31-249-221.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 219, prio 90, authtype simple, intvl 1s, length 20
08:19:20.993428 IP ip-172-31-249-8.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 165, prio 90, authtype simple, intvl 1s, length 20
08:19:21.096310 IP ip-172-31-249-230.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 180, prio 65, authtype simple, intvl 1s, length 20
08:19:21.192391 IP ip-172-31-249-184.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 28, prio 65, authtype simple, intvl 1s, length 20
08:19:21.320173 IP mjoseph-vip2-r2b7g-master-2 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 131, prio 65, authtype simple, intvl 1s, length 20
08:19:21.415256 IP ip-172-31-249-195.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 136, prio 65, authtype simple, intvl 1s, length 20
08:19:21.415462 IP mjoseph-vip2-r2b7g-worker-7z985 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 226, prio 90, authtype simple, intvl 1s, length 20
08:19:21.501519 IP ip-172-31-249-252.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 118, prio 65, authtype simple, intvl 1s, length 20
08:19:21.633784 IP ip-172-31-249-117.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 71, prio 90, authtype simple, intvl 1s, length 20
08:19:21.634341 IP ip-172-31-249-216.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 99, prio 90, authtype simple, intvl 1s, length 20
08:19:21.959775 IP ip-172-31-249-221.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 219, prio 90, authtype simple, intvl 1s, length 20
08:19:21.993622 IP ip-172-31-249-8.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 165, prio 90, authtype simple, intvl 1s, length 20
08:19:22.096440 IP ip-172-31-249-230.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 180, prio 65, authtype simple, intvl 1s, length 20
08:19:22.192473 IP ip-172-31-249-184.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 28, prio 65, authtype simple, intvl 1s, length 20
08:19:22.320351 IP mjoseph-vip2-r2b7g-master-2 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 131, prio 65, authtype simple, intvl 1s, length 20
08:19:22.415327 IP ip-172-31-249-195.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 136, prio 65, authtype simple, intvl 1s, length 20
Creating ipfailover pods
melvinjoseph@mjoseph-mac Downloads % oc create sa ipfailover
serviceaccount/ipfailover created
melvinjoseph@mjoseph-mac Downloads % oc adm policy add-scc-to-user priviledged -z ipfailover
clusterrole.rbac.authorization.k8s.io/system:openshift:scc:priviledged added: "ipfailover"
melvinjoseph@mjoseph-mac Downloads % oc adm policy add-scc-to-user hostnetwork -z ipfailover
clusterrole.rbac.authorization.k8s.io/system:openshift:scc:hostnetwork added: "ipfailover"
melvinjoseph@mjoseph-mac Downloads % oc create -f work_docs/deploy-ipfailover.yaml
deployment.apps/ipfailover created
melvinjoseph@mjoseph-mac Downloads % oc create -f work_docs/web-server-1.yaml
replicationcontroller/web-server-rc created
melvinjoseph@mjoseph-mac Downloads % oc get all -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/ipfailover-788b595477-49k9t 1/1 Running 0 26s 172.31.249.89 mjoseph-vip2-r2b7g-worker-hmxrm <none> <none>
pod/ipfailover-788b595477-7m2vv 1/1 Running 0 26s 172.31.249.162 mjoseph-vip2-r2b7g-worker-7z985 <none> <none>
pod/web-server-rc-gjzgs 1/1 Running 0 17s 172.31.249.162 mjoseph-vip2-r2b7g-worker-7z985 <none> <none>
pod/web-server-rc-htks8 1/1 Running 0 17s 172.31.249.89 mjoseph-vip2-r2b7g-worker-hmxrm <none> <none>
NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
replicationcontroller/web-server-rc 2 2 2 18s nginx quay.io/openshifttest/nginx-alpine@sha256:5d3f3372288b8a93fc9fc7747925df2328c24db41e4b4226126c3af293c5ad88 name=web-server-rc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/kubernetes ClusterIP 172.30.0.1 <none> 443/TCP 5h27m <none>
service/openshift ExternalName <none> kubernetes.default.svc.cluster.local <none> 5h23m <none>
NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
deployment.apps/ipfailover 2/2 2 2 27s ipfailover-keepalived quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:644bf2d63cc24035ec82a39e0b14e6d61e3ca4ba39181b409590132f59bfc2cf ipfailover=ipfailover
NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
replicaset.apps/ipfailover-788b595477 2 2 2 28s ipfailover-keepalived quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:644bf2d63cc24035ec82a39e0b14e6d61e3ca4ba39181b409590132f59bfc2cf ipfailover=ipfailover,pod-template-hash=788b595477
melvinjoseph@mjoseph-mac Downloads % oc logs ipfailover-788b595477-49k9t
- Loading ip_vs module ...
- Checking if ip_vs module is available ...
ip_vs 172032 0
- Module ip_vs is loaded.
- check for iptables rule for keepalived multicast (224.0.0.18) ...
chroot: cannot change root directory to '/host': No such file or directory
- adding iptables rule to INPUT to access 224.0.0.18.
chroot: cannot change root directory to '/host': No such file or directory
- Generating and writing config to /etc/keepalived/keepalived.conf
- Starting failover services ...
Wed Mar 2 08:24:23 2022: Starting Keepalived v2.1.5 (07/13,2020)
Wed Mar 2 08:24:23 2022: Running on Linux 4.18.0-348.12.2.el8_5.x86_64 #1 SMP Mon Jan 17 07:06:06 EST 2022 (built for Linux 4.18.0)
Wed Mar 2 08:24:23 2022: Command line: '/usr/sbin/keepalived' '-D' '-n' '--log-console'
Wed Mar 2 08:24:23 2022: Opening file '/etc/keepalived/keepalived.conf'.
Wed Mar 2 08:24:23 2022: NOTICE: setting config option max_auto_priority should result in better keepalived performance
Wed Mar 2 08:24:23 2022: Starting VRRP child process, pid=74
Wed Mar 2 08:24:23 2022: Registering Kernel netlink reflector
Wed Mar 2 08:24:23 2022: Registering Kernel netlink command channel
Wed Mar 2 08:24:23 2022: Opening file '/etc/keepalived/keepalived.conf'.
Wed Mar 2 08:24:23 2022: WARNING - default user 'keepalived_script' for script execution does not exist - please create.
Wed Mar 2 08:24:23 2022: (/etc/keepalived/keepalived.conf: Line 29) Truncating auth_pass to 8 characters
Wed Mar 2 08:24:23 2022: SECURITY VIOLATION - scripts are being executed but script_security not enabled.
Wed Mar 2 08:24:23 2022: (ipfailover_VIP_1) Warning - nopreempt will not work with initial state MASTER - clearing
Wed Mar 2 08:24:23 2022: Assigned address 172.31.249.89 for interface ens192
Wed Mar 2 08:24:23 2022: Assigned address fe80::116c:b36c:b2f8:4154 for interface ens192
Wed Mar 2 08:24:23 2022: Registering gratuitous ARP shared channel
Wed Mar 2 08:24:23 2022: (ipfailover_VIP_1) removing VIPs.
Wed Mar 2 08:24:23 2022: VRRP sockpool: [ifindex( 2), family(IPv4), proto(112), fd(10,11)]
Wed Mar 2 08:24:23 2022: Script `chk_ipfailover` now returning 1
Wed Mar 2 08:24:23 2022: VRRP_Script(chk_ipfailover) failed (exited with status 1)
Wed Mar 2 08:24:23 2022: (ipfailover_VIP_1) Entering FAULT STATE
Wed Mar 2 08:24:33 2022: Script `chk_ipfailover` now returning 0
Wed Mar 2 08:24:33 2022: VRRP_Script(chk_ipfailover) succeeded
Wed Mar 2 08:24:33 2022: (ipfailover_VIP_1) Entering BACKUP STATE
melvinjoseph@mjoseph-mac Downloads % oc set env deployment.apps/ipfailover OPENSHIFT_HA_VIP_GROUPS=255
melvinjoseph@mjoseph-mac Downloads % oc set env deployment.apps/ipfailover OPENSHIFT_HA_VRRP_ID_OFFSET=0
melvinjoseph@mjoseph-mac Downloads % oc set env deployment.apps/ipfailover OPENSHIFT_HA_VIRTUAL_IPS=172.31.248.1-255
melvinjoseph@mjoseph-mac Downloads % oc exec -n openshift-vsphere-infra keepalived-mjoseph-vip2-r2b7g-master-0 -c keepalived -- grep 226 -B3 /etc/keepalived/keepalived.conf
vrrp_instance mjoseph-vip2_INGRESS {
state BACKUP
interface ens192
virtual_router_id 226
melvinjoseph@mjoseph-mac Downloads % oc exec -n openshift-vsphere-infra keepalived-mjoseph-vip2-r2b7g-master-2 -c keepalived -- grep 131 -B3 /etc/keepalived/keepalived.conf
vrrp_instance mjoseph-vip2_API {
state BACKUP
interface ens192
virtual_router_id 131
melvinjoseph@mjoseph-mac Downloads %
melvinjoseph@mjoseph-mac Downloads % oc exec -n openshift-vsphere-infra keepalived-mjoseph-vip2-r2b7g-master-0 -c keepalived -- grep 226 -B3 /etc/keepalived/keepalived.conf
vrrp_instance mjoseph-vip2_INGRESS {
state BACKUP
interface ens192
virtual_router_id 226
melvinjoseph@mjoseph-mac Downloads %
melvinjoseph@mjoseph-mac Downloads % oc exec ipfailover-8d8b9cb5c-d6l24 -- grep 'ipfailover_VIP_226' -A3 /etc/keepalived/keepalived.conf
vrrp_instance ipfailover_VIP_226 {
interface ens192
state MASTER
virtual_router_id 226
It seems the ingress vrrp_instance is not omitted from the ipfailover deployments.
Kindly rectify if there is any more configuration required in the verification steps.
Excluding vrrp is configured using the HA_EXCLUDED_VRRP_IDS env var (https://github.com/openshift/images/blob/86494446733fc171ee757e8166191e32d5931eb9/ipfailover/keepalived/lib/config-generators.sh#L190) on the ipfailover deployment. Bear in mind that your setup uses the full range of vrrp-ids [1], so skipping (either using existing offset or recently merged exclusion mechanism) will go out-of-range. [1] https://docs.openshift.com/container-platform/4.9/networking/configuring-ipfailover.html#nw-ipfailover-vrrp-ip-offset_configuring-ipfailover melvinjoseph@mjoseph-mac Downloads % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-03-15-060211 True False 9h Cluster version is 4.11.0-0.nightly-2022-03-15-060211 melvinjoseph@mjoseph-mac Downloads % oc create sa ipfailover serviceaccount/ipfailover created melvinjoseph@mjoseph-mac Downloads % oc adm policy add-scc-to-user privileged -z ipfailover clusterrole.rbac.authorization.k8s.io/system:openshift:scc:privileged added: "ipfailover" melvinjoseph@mjoseph-mac Downloads % oc adm policy add-scc-to-user hostnetwork -z ipfailover clusterrole.rbac.authorization.k8s.io/system:openshift:scc:hostnetwork added: "ipfailover" melvinjoseph@mjoseph-mac Downloads % oc apply -f deployment.yaml deployment.apps/ipfailover-keepalived created melvinjoseph@mjoseph-mac Downloads % oc exec -n openshift-openstack-infra keepalived-mjoseph-bug35-4s6ms-master-0 -c keepalived -- grep virtual_router_id /etc/keepalived/keepalived.conf virtual_router_id 10 virtual_router_id 73 melvinjoseph@mjoseph-mac Downloads % melvinjoseph@mjoseph-mac Downloads % melvinjoseph@mjoseph-mac Downloads % melvinjoseph@mjoseph-mac Downloads % oc get po NAME READY STATUS RESTARTS AGE ipfailover-keepalived-6d9c998b4b-5429t 1/1 Running 0 20s ipfailover-keepalived-6d9c998b4b-kkjt4 1/1 Running 0 20s melvinjoseph@mjoseph-mac Downloads % oc set env deploy/ipfailover-keepalived OPENSHIFT_HA_VIRTUAL_IPS=1.1.1.1-253 deployment.apps/ipfailover-keepalived updated melvinjoseph@mjoseph-mac Downloads % oc logs ipfailover-keepalived-c8ddc898b-fqmjc |tail Wed Mar 16 16:12:08 2022: (ipfailover_VIP_10) received an invalid passwd! Wed Mar 16 16:12:09 2022: (ipfailover_VIP_10) received an invalid passwd! Wed Mar 16 16:12:10 2022: (ipfailover_VIP_10) received an invalid passwd! Wed Mar 16 16:12:11 2022: (ipfailover_VIP_10) received an invalid passwd! Wed Mar 16 16:12:12 2022: (ipfailover_VIP_10) received an invalid passwd! Wed Mar 16 16:12:13 2022: (ipfailover_VIP_10) received an invalid passwd! Wed Mar 16 16:12:14 2022: (ipfailover_VIP_10) received an invalid passwd! Wed Mar 16 16:12:15 2022: (ipfailover_VIP_10) received an invalid passwd! Wed Mar 16 16:12:16 2022: (ipfailover_VIP_10) received an invalid passwd! Wed Mar 16 16:12:17 2022: (ipfailover_VIP_10) received an invalid passwd! melvinjoseph@mjoseph-mac Downloads % oc exec ipfailover-keepalived-c8ddc898b-fqmjc -- grep virtual_router_id /etc/keepalived/keepalived.conf virtual_router_id 1 virtual_router_id 2 virtual_router_id 3 virtual_router_id 4 virtual_router_id 5 virtual_router_id 6 virtual_router_id 7 virtual_router_id 8 virtual_router_id 9 virtual_router_id 10 <<< melvinjoseph@mjoseph-mac Downloads % oc set env deploy/ipfailover-keepalived HA_EXCLUDED_VRRP_IDS="10 73" deployment.apps/ipfailover-keepalived updated melvinjoseph@mjoseph-mac Downloads % oc get po NAME READY STATUS RESTARTS AGE ipfailover-keepalived-d99ff8cfd-qkvl5 1/1 Running 0 3s ipfailover-keepalived-d99ff8cfd-xfwzh 1/1 Running 0 3s melvinjoseph@mjoseph-mac Downloads % oc exec ipfailover-keepalived-d99ff8cfd-qkvl5 -- grep virtual_router_id /etc/keepalived/keepalived.conf virtual_router_id 1 virtual_router_id 2 virtual_router_id 3 virtual_router_id 4 virtual_router_id 5 virtual_router_id 6 virtual_router_id 7 virtual_router_id 8 virtual_router_id 9 virtual_router_id 11 <<< skipped 10 Hence marking as verified Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days |