Bug 2018188
Summary: | VRRP ID conflict between keepalived-ipfailover and cluster VIPs | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Bram Verschueren <bverschu> |
Component: | Networking | Assignee: | Ryan Fredette <rfredette> |
Networking sub component: | router | QA Contact: | Melvin Joseph <mjoseph> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | medium | ||
Priority: | medium | CC: | achernet, aos-bugs, bperkins, dcbw, dmoessne, hongli, mmasters, pmannidi, rfredette |
Version: | 4.10 | ||
Target Milestone: | --- | ||
Target Release: | 4.11.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-08-10 10:39:06 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Bram Verschueren
2021-10-28 13:36:51 UTC
Moving to Routing component as the fix seems to be in the keepalived image which is owned by that team. Setting blocker- because this is requesting new functionality. We'll discuss with PM how we want to handle this request. melvinjoseph@mjoseph-mac openshift-tests-private % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-0.ci-2022-03-02-053737 True False 169m Cluster version is 4.10.0-0.ci-2022-03-02-053737 melvinjoseph@mjoseph-mac Downloads % oc get po -n openshift-vsphere-infra -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES coredns-mjoseph-vip2-r2b7g-master-0 2/2 Running 0 5h18m 172.31.249.59 mjoseph-vip2-r2b7g-master-0 <none> <none> coredns-mjoseph-vip2-r2b7g-master-1 2/2 Running 0 5h18m 172.31.249.167 mjoseph-vip2-r2b7g-master-1 <none> <none> coredns-mjoseph-vip2-r2b7g-master-2 2/2 Running 0 5h18m 172.31.249.248 mjoseph-vip2-r2b7g-master-2 <none> <none> coredns-mjoseph-vip2-r2b7g-worker-7z985 2/2 Running 0 5h8m 172.31.249.162 mjoseph-vip2-r2b7g-worker-7z985 <none> <none> coredns-mjoseph-vip2-r2b7g-worker-hmxrm 2/2 Running 0 5h8m 172.31.249.89 mjoseph-vip2-r2b7g-worker-hmxrm <none> <none> haproxy-mjoseph-vip2-r2b7g-master-0 2/2 Running 0 5h18m 172.31.249.59 mjoseph-vip2-r2b7g-master-0 <none> <none> haproxy-mjoseph-vip2-r2b7g-master-1 2/2 Running 0 5h19m 172.31.249.167 mjoseph-vip2-r2b7g-master-1 <none> <none> haproxy-mjoseph-vip2-r2b7g-master-2 2/2 Running 0 5h17m 172.31.249.248 mjoseph-vip2-r2b7g-master-2 <none> <none> keepalived-mjoseph-vip2-r2b7g-master-0 2/2 Running 0 5h18m 172.31.249.59 mjoseph-vip2-r2b7g-master-0 <none> <none> keepalived-mjoseph-vip2-r2b7g-master-1 2/2 Running 0 5h19m 172.31.249.167 mjoseph-vip2-r2b7g-master-1 <none> <none> keepalived-mjoseph-vip2-r2b7g-master-2 2/2 Running 0 5h18m 172.31.249.248 mjoseph-vip2-r2b7g-master-2 <none> <none> keepalived-mjoseph-vip2-r2b7g-worker-7z985 2/2 Running 0 5h7m 172.31.249.162 mjoseph-vip2-r2b7g-worker-7z985 <none> <none> keepalived-mjoseph-vip2-r2b7g-worker-hmxrm 2/2 Running 0 5h8m 172.31.249.89 mjoseph-vip2-r2b7g-worker-hmxrm <none> <none> melvinjoseph@mjoseph-mac Downloads % melvinjoseph@mjoseph-mac Downloads % melvinjoseph@mjoseph-mac Downloads % melvinjoseph@mjoseph-mac Downloads % oc debug node/mjoseph-vip2-r2b7g-master-0 -- tcpdump -i any vrrp Starting pod/mjoseph-vip2-r2b7g-master-0-debug ... To use host binaries, run `chroot /host` dropped privs to tcpdump tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes 08:19:19.959553 IP ip-172-31-249-221.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 219, prio 90, authtype simple, intvl 1s, length 20 08:19:19.993250 IP ip-172-31-249-8.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 165, prio 90, authtype simple, intvl 1s, length 20 08:19:20.096232 IP ip-172-31-249-230.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 180, prio 65, authtype simple, intvl 1s, length 20 08:19:20.192349 IP ip-172-31-249-184.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 28, prio 65, authtype simple, intvl 1s, length 20 08:19:20.320090 IP mjoseph-vip2-r2b7g-master-2 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 131, prio 65, authtype simple, intvl 1s, length 20 08:19:20.415158 IP ip-172-31-249-195.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 136, prio 65, authtype simple, intvl 1s, length 20 08:19:20.415372 IP mjoseph-vip2-r2b7g-worker-7z985 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 226, prio 90, authtype simple, intvl 1s, length 20 08:19:20.501468 IP ip-172-31-249-252.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 118, prio 65, authtype simple, intvl 1s, length 20 08:19:20.633687 IP ip-172-31-249-117.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 71, prio 90, authtype simple, intvl 1s, length 20 08:19:20.634236 IP ip-172-31-249-216.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 99, prio 90, authtype simple, intvl 1s, length 20 08:19:20.959732 IP ip-172-31-249-221.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 219, prio 90, authtype simple, intvl 1s, length 20 08:19:20.993428 IP ip-172-31-249-8.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 165, prio 90, authtype simple, intvl 1s, length 20 08:19:21.096310 IP ip-172-31-249-230.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 180, prio 65, authtype simple, intvl 1s, length 20 08:19:21.192391 IP ip-172-31-249-184.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 28, prio 65, authtype simple, intvl 1s, length 20 08:19:21.320173 IP mjoseph-vip2-r2b7g-master-2 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 131, prio 65, authtype simple, intvl 1s, length 20 08:19:21.415256 IP ip-172-31-249-195.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 136, prio 65, authtype simple, intvl 1s, length 20 08:19:21.415462 IP mjoseph-vip2-r2b7g-worker-7z985 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 226, prio 90, authtype simple, intvl 1s, length 20 08:19:21.501519 IP ip-172-31-249-252.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 118, prio 65, authtype simple, intvl 1s, length 20 08:19:21.633784 IP ip-172-31-249-117.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 71, prio 90, authtype simple, intvl 1s, length 20 08:19:21.634341 IP ip-172-31-249-216.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 99, prio 90, authtype simple, intvl 1s, length 20 08:19:21.959775 IP ip-172-31-249-221.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 219, prio 90, authtype simple, intvl 1s, length 20 08:19:21.993622 IP ip-172-31-249-8.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 165, prio 90, authtype simple, intvl 1s, length 20 08:19:22.096440 IP ip-172-31-249-230.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 180, prio 65, authtype simple, intvl 1s, length 20 08:19:22.192473 IP ip-172-31-249-184.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 28, prio 65, authtype simple, intvl 1s, length 20 08:19:22.320351 IP mjoseph-vip2-r2b7g-master-2 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 131, prio 65, authtype simple, intvl 1s, length 20 08:19:22.415327 IP ip-172-31-249-195.us-west-2.compute.internal > vrrp.mcast.net: VRRPv2, Advertisement, vrid 136, prio 65, authtype simple, intvl 1s, length 20 Creating ipfailover pods melvinjoseph@mjoseph-mac Downloads % oc create sa ipfailover serviceaccount/ipfailover created melvinjoseph@mjoseph-mac Downloads % oc adm policy add-scc-to-user priviledged -z ipfailover clusterrole.rbac.authorization.k8s.io/system:openshift:scc:priviledged added: "ipfailover" melvinjoseph@mjoseph-mac Downloads % oc adm policy add-scc-to-user hostnetwork -z ipfailover clusterrole.rbac.authorization.k8s.io/system:openshift:scc:hostnetwork added: "ipfailover" melvinjoseph@mjoseph-mac Downloads % oc create -f work_docs/deploy-ipfailover.yaml deployment.apps/ipfailover created melvinjoseph@mjoseph-mac Downloads % oc create -f work_docs/web-server-1.yaml replicationcontroller/web-server-rc created melvinjoseph@mjoseph-mac Downloads % oc get all -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod/ipfailover-788b595477-49k9t 1/1 Running 0 26s 172.31.249.89 mjoseph-vip2-r2b7g-worker-hmxrm <none> <none> pod/ipfailover-788b595477-7m2vv 1/1 Running 0 26s 172.31.249.162 mjoseph-vip2-r2b7g-worker-7z985 <none> <none> pod/web-server-rc-gjzgs 1/1 Running 0 17s 172.31.249.162 mjoseph-vip2-r2b7g-worker-7z985 <none> <none> pod/web-server-rc-htks8 1/1 Running 0 17s 172.31.249.89 mjoseph-vip2-r2b7g-worker-hmxrm <none> <none> NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR replicationcontroller/web-server-rc 2 2 2 18s nginx quay.io/openshifttest/nginx-alpine@sha256:5d3f3372288b8a93fc9fc7747925df2328c24db41e4b4226126c3af293c5ad88 name=web-server-rc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR service/kubernetes ClusterIP 172.30.0.1 <none> 443/TCP 5h27m <none> service/openshift ExternalName <none> kubernetes.default.svc.cluster.local <none> 5h23m <none> NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR deployment.apps/ipfailover 2/2 2 2 27s ipfailover-keepalived quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:644bf2d63cc24035ec82a39e0b14e6d61e3ca4ba39181b409590132f59bfc2cf ipfailover=ipfailover NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR replicaset.apps/ipfailover-788b595477 2 2 2 28s ipfailover-keepalived quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:644bf2d63cc24035ec82a39e0b14e6d61e3ca4ba39181b409590132f59bfc2cf ipfailover=ipfailover,pod-template-hash=788b595477 melvinjoseph@mjoseph-mac Downloads % oc logs ipfailover-788b595477-49k9t - Loading ip_vs module ... - Checking if ip_vs module is available ... ip_vs 172032 0 - Module ip_vs is loaded. - check for iptables rule for keepalived multicast (224.0.0.18) ... chroot: cannot change root directory to '/host': No such file or directory - adding iptables rule to INPUT to access 224.0.0.18. chroot: cannot change root directory to '/host': No such file or directory - Generating and writing config to /etc/keepalived/keepalived.conf - Starting failover services ... Wed Mar 2 08:24:23 2022: Starting Keepalived v2.1.5 (07/13,2020) Wed Mar 2 08:24:23 2022: Running on Linux 4.18.0-348.12.2.el8_5.x86_64 #1 SMP Mon Jan 17 07:06:06 EST 2022 (built for Linux 4.18.0) Wed Mar 2 08:24:23 2022: Command line: '/usr/sbin/keepalived' '-D' '-n' '--log-console' Wed Mar 2 08:24:23 2022: Opening file '/etc/keepalived/keepalived.conf'. Wed Mar 2 08:24:23 2022: NOTICE: setting config option max_auto_priority should result in better keepalived performance Wed Mar 2 08:24:23 2022: Starting VRRP child process, pid=74 Wed Mar 2 08:24:23 2022: Registering Kernel netlink reflector Wed Mar 2 08:24:23 2022: Registering Kernel netlink command channel Wed Mar 2 08:24:23 2022: Opening file '/etc/keepalived/keepalived.conf'. Wed Mar 2 08:24:23 2022: WARNING - default user 'keepalived_script' for script execution does not exist - please create. Wed Mar 2 08:24:23 2022: (/etc/keepalived/keepalived.conf: Line 29) Truncating auth_pass to 8 characters Wed Mar 2 08:24:23 2022: SECURITY VIOLATION - scripts are being executed but script_security not enabled. Wed Mar 2 08:24:23 2022: (ipfailover_VIP_1) Warning - nopreempt will not work with initial state MASTER - clearing Wed Mar 2 08:24:23 2022: Assigned address 172.31.249.89 for interface ens192 Wed Mar 2 08:24:23 2022: Assigned address fe80::116c:b36c:b2f8:4154 for interface ens192 Wed Mar 2 08:24:23 2022: Registering gratuitous ARP shared channel Wed Mar 2 08:24:23 2022: (ipfailover_VIP_1) removing VIPs. Wed Mar 2 08:24:23 2022: VRRP sockpool: [ifindex( 2), family(IPv4), proto(112), fd(10,11)] Wed Mar 2 08:24:23 2022: Script `chk_ipfailover` now returning 1 Wed Mar 2 08:24:23 2022: VRRP_Script(chk_ipfailover) failed (exited with status 1) Wed Mar 2 08:24:23 2022: (ipfailover_VIP_1) Entering FAULT STATE Wed Mar 2 08:24:33 2022: Script `chk_ipfailover` now returning 0 Wed Mar 2 08:24:33 2022: VRRP_Script(chk_ipfailover) succeeded Wed Mar 2 08:24:33 2022: (ipfailover_VIP_1) Entering BACKUP STATE melvinjoseph@mjoseph-mac Downloads % oc set env deployment.apps/ipfailover OPENSHIFT_HA_VIP_GROUPS=255 melvinjoseph@mjoseph-mac Downloads % oc set env deployment.apps/ipfailover OPENSHIFT_HA_VRRP_ID_OFFSET=0 melvinjoseph@mjoseph-mac Downloads % oc set env deployment.apps/ipfailover OPENSHIFT_HA_VIRTUAL_IPS=172.31.248.1-255 melvinjoseph@mjoseph-mac Downloads % oc exec -n openshift-vsphere-infra keepalived-mjoseph-vip2-r2b7g-master-0 -c keepalived -- grep 226 -B3 /etc/keepalived/keepalived.conf vrrp_instance mjoseph-vip2_INGRESS { state BACKUP interface ens192 virtual_router_id 226 melvinjoseph@mjoseph-mac Downloads % oc exec -n openshift-vsphere-infra keepalived-mjoseph-vip2-r2b7g-master-2 -c keepalived -- grep 131 -B3 /etc/keepalived/keepalived.conf vrrp_instance mjoseph-vip2_API { state BACKUP interface ens192 virtual_router_id 131 melvinjoseph@mjoseph-mac Downloads % melvinjoseph@mjoseph-mac Downloads % oc exec -n openshift-vsphere-infra keepalived-mjoseph-vip2-r2b7g-master-0 -c keepalived -- grep 226 -B3 /etc/keepalived/keepalived.conf vrrp_instance mjoseph-vip2_INGRESS { state BACKUP interface ens192 virtual_router_id 226 melvinjoseph@mjoseph-mac Downloads % melvinjoseph@mjoseph-mac Downloads % oc exec ipfailover-8d8b9cb5c-d6l24 -- grep 'ipfailover_VIP_226' -A3 /etc/keepalived/keepalived.conf vrrp_instance ipfailover_VIP_226 { interface ens192 state MASTER virtual_router_id 226 It seems the ingress vrrp_instance is not omitted from the ipfailover deployments. Kindly rectify if there is any more configuration required in the verification steps. Excluding vrrp is configured using the HA_EXCLUDED_VRRP_IDS env var (https://github.com/openshift/images/blob/86494446733fc171ee757e8166191e32d5931eb9/ipfailover/keepalived/lib/config-generators.sh#L190) on the ipfailover deployment. Bear in mind that your setup uses the full range of vrrp-ids [1], so skipping (either using existing offset or recently merged exclusion mechanism) will go out-of-range. [1] https://docs.openshift.com/container-platform/4.9/networking/configuring-ipfailover.html#nw-ipfailover-vrrp-ip-offset_configuring-ipfailover melvinjoseph@mjoseph-mac Downloads % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-03-15-060211 True False 9h Cluster version is 4.11.0-0.nightly-2022-03-15-060211 melvinjoseph@mjoseph-mac Downloads % oc create sa ipfailover serviceaccount/ipfailover created melvinjoseph@mjoseph-mac Downloads % oc adm policy add-scc-to-user privileged -z ipfailover clusterrole.rbac.authorization.k8s.io/system:openshift:scc:privileged added: "ipfailover" melvinjoseph@mjoseph-mac Downloads % oc adm policy add-scc-to-user hostnetwork -z ipfailover clusterrole.rbac.authorization.k8s.io/system:openshift:scc:hostnetwork added: "ipfailover" melvinjoseph@mjoseph-mac Downloads % oc apply -f deployment.yaml deployment.apps/ipfailover-keepalived created melvinjoseph@mjoseph-mac Downloads % oc exec -n openshift-openstack-infra keepalived-mjoseph-bug35-4s6ms-master-0 -c keepalived -- grep virtual_router_id /etc/keepalived/keepalived.conf virtual_router_id 10 virtual_router_id 73 melvinjoseph@mjoseph-mac Downloads % melvinjoseph@mjoseph-mac Downloads % melvinjoseph@mjoseph-mac Downloads % melvinjoseph@mjoseph-mac Downloads % oc get po NAME READY STATUS RESTARTS AGE ipfailover-keepalived-6d9c998b4b-5429t 1/1 Running 0 20s ipfailover-keepalived-6d9c998b4b-kkjt4 1/1 Running 0 20s melvinjoseph@mjoseph-mac Downloads % oc set env deploy/ipfailover-keepalived OPENSHIFT_HA_VIRTUAL_IPS=1.1.1.1-253 deployment.apps/ipfailover-keepalived updated melvinjoseph@mjoseph-mac Downloads % oc logs ipfailover-keepalived-c8ddc898b-fqmjc |tail Wed Mar 16 16:12:08 2022: (ipfailover_VIP_10) received an invalid passwd! Wed Mar 16 16:12:09 2022: (ipfailover_VIP_10) received an invalid passwd! Wed Mar 16 16:12:10 2022: (ipfailover_VIP_10) received an invalid passwd! Wed Mar 16 16:12:11 2022: (ipfailover_VIP_10) received an invalid passwd! Wed Mar 16 16:12:12 2022: (ipfailover_VIP_10) received an invalid passwd! Wed Mar 16 16:12:13 2022: (ipfailover_VIP_10) received an invalid passwd! Wed Mar 16 16:12:14 2022: (ipfailover_VIP_10) received an invalid passwd! Wed Mar 16 16:12:15 2022: (ipfailover_VIP_10) received an invalid passwd! Wed Mar 16 16:12:16 2022: (ipfailover_VIP_10) received an invalid passwd! Wed Mar 16 16:12:17 2022: (ipfailover_VIP_10) received an invalid passwd! melvinjoseph@mjoseph-mac Downloads % oc exec ipfailover-keepalived-c8ddc898b-fqmjc -- grep virtual_router_id /etc/keepalived/keepalived.conf virtual_router_id 1 virtual_router_id 2 virtual_router_id 3 virtual_router_id 4 virtual_router_id 5 virtual_router_id 6 virtual_router_id 7 virtual_router_id 8 virtual_router_id 9 virtual_router_id 10 <<< melvinjoseph@mjoseph-mac Downloads % oc set env deploy/ipfailover-keepalived HA_EXCLUDED_VRRP_IDS="10 73" deployment.apps/ipfailover-keepalived updated melvinjoseph@mjoseph-mac Downloads % oc get po NAME READY STATUS RESTARTS AGE ipfailover-keepalived-d99ff8cfd-qkvl5 1/1 Running 0 3s ipfailover-keepalived-d99ff8cfd-xfwzh 1/1 Running 0 3s melvinjoseph@mjoseph-mac Downloads % oc exec ipfailover-keepalived-d99ff8cfd-qkvl5 -- grep virtual_router_id /etc/keepalived/keepalived.conf virtual_router_id 1 virtual_router_id 2 virtual_router_id 3 virtual_router_id 4 virtual_router_id 5 virtual_router_id 6 virtual_router_id 7 virtual_router_id 8 virtual_router_id 9 virtual_router_id 11 <<< skipped 10 Hence marking as verified Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days |