Bug 1477552
Summary: | Upgrade incompatibility: keepalived forces vrrp_version 3 if IPv6 is used for keepalived-internal connections | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Robert Scheck <redhat-bugzilla> | |
Component: | keepalived | Assignee: | Ryan O'Hara <rohara> | |
Status: | CLOSED WONTFIX | QA Contact: | Brandon Perkins <bperkins> | |
Severity: | medium | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 7.4 | CC: | anthony.mcmahon, cfeist, cluster-maint, jruemker, redhat-bugzilla, robert.scheck, sababu | |
Target Milestone: | rc | |||
Target Release: | --- | |||
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1722910 (view as bug list) | Environment: | ||
Last Closed: | 2020-11-11 21:57:10 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: |
Description
Robert Scheck
2017-08-02 11:35:04 UTC
Cross-filed ticket 01903060 on the Red Hat customer portal. Please post a scrubbed version of your configuration file. I am unable to tell if you are setting vrrp_version correct in the config file without seeing it. If you want VRRPv2, there is no need to set this parameter. It sounds like you are receiving VRRP advertisements from another node that is using a different VRRP verision. I definitely am not seeing this problem. Without any vrrp_version set, run this: # keepalived -dln and look for "VRRP default protocol version". Or if you prefer to grep stderr ... #keepalived -dln 2>&1 >/dev/null | grep "VRRP default protocol version" VRRP default protocol version = 2 VRRP default protocol version = 2 ^C Now if I go change vrrp_version to 3 in the config file and run that command again ... # keepalived -dln 2>&1 >/dev/null | grep "VRRP default protocol version" VRRP default protocol version = 3 VRRP default protocol version = 3 Seems correct to me. Sorry, here we go: --- snipp --- global_defs { router_id tux1 enable_script_security # Keepalived yells about scripts?! # script_user root root # Keepalived yells anyway?! RHBZ#1477563 vrrp_iptables # Empty to avoid iptables rules # vrrp_ipset # Empty to avoid ipsets; does not work, RHBZ#1477572 vrrp_version 2 # tux2 still believes this is VRRP 3, RHBZ#1477552 } vrrp_sync_group VRRP_GROUP { group { VRRP_INSTANCE } notify_master "/etc/conntrackd/primary-backup.sh primary" notify_backup "/etc/conntrackd/primary-backup.sh backup" notify_fault "/etc/conntrackd/primary-backup.sh fault" } vrrp_instance VRRP_INSTANCE { interface em2 state BACKUP virtual_router_id 51 priority 150 track_interface { bond0 bond1 } native_ipv6 unicast_src_ip 2001:db8::1 unicast_peer { 2001:db8::2 } virtual_ipaddress { 192.0.2.1/30 dev bond1.1000 fe80::1/64 dev bond1.1000 2001:db8:0:1000::1/64 dev bond1.1000 192.0.2.250/29 dev bond0 2001:db8:0:4003::2/64 dev bond0 } virtual_routes { blackhole 192.0.2.0/24 blackhole 2001:db8::/32 } advert_int 1 nopreempt garp_master_delay 0 dont_track_primary } --- snapp --- As per documentation, this should lead to VRRP 2 (because no vrrp_version explicitly set), but tux2 (with keepalived 1.2.x yells that it receives VRRP 3 from tux1). tux1> keepalived -dln 2>&1 >/dev/null | grep "VRRP default protocol version" VRRP default protocol version = 2 VRRP default protocol version = 2 ^C But on tux2: Aug 2 22:26:51 tux2 Keepalived_vrrp[2264]: invalid version. 3 and expect 2 Aug 2 22:26:51 tux2 Keepalived_vrrp[2264]: bogus VRRP packet received on em2 !!! Aug 2 22:26:51 tux2 Keepalived_vrrp[2264]: VRRP_Instance(VRRP_INSTANCE) Dropping received VRRP packet... The connection between tux1 and tux2 is a dedicated copper cable on em2 at both sides. As the STDERR output above says...it should be VRRP2, but then the protocol is still incompatible on the wire between keepalived 1.2.x and 1.3.x?! (In reply to Robert Scheck from comment #4) > But on tux2: > Aug 2 22:26:51 tux2 Keepalived_vrrp[2264]: invalid version. 3 and expect 2 > Aug 2 22:26:51 tux2 Keepalived_vrrp[2264]: bogus VRRP packet received on > em2 !!! > Aug 2 22:26:51 tux2 Keepalived_vrrp[2264]: VRRP_Instance(VRRP_INSTANCE) > Dropping received VRRP packet... > > The connection between tux1 and tux2 is a dedicated copper cable on em2 at > both sides. > > As the STDERR output above says...it should be VRRP2, but then the protocol > is still incompatible on the wire between keepalived 1.2.x and 1.3.x?! It is on on my test machines. Works perfectly. Consult the VRRP spec, perhaps? Putting the configuration from comment #4 into a RHEL 7.3 on node A and the same configuration into a RHEL 7.4 with keepalived on node B, it leads to the errors on node A outlined at the end of comment #4 - which is a regression, or a bug in keepalived IMHO. This especially breaks upgrade scenarios, given node A and node B switch both to master state (and obviously don't want to talk to each other any longer). So even if keepalived 1.3.x speaks VRRP 2 as it says, keepalived 1.2.x doesn't understand it but treats it as VRRP 3. And this is in the end my main point here. (In reply to Robert Scheck from comment #6) > Putting the configuration from comment #4 into a RHEL 7.3 on node A and the > same configuration into a RHEL 7.4 with keepalived on node B, it leads to the > errors on node A outlined at the end of comment #4 - which is a regression, > or a bug in keepalived IMHO. This especially breaks upgrade scenarios, given > node A and node B switch both to master state (and obviously don't want to > talk to each other any longer). So even if keepalived 1.3.x speaks VRRP 2 as > it says, keepalived 1.2.x doesn't understand it but treats it as VRRP 3. And > this is in the end my main point here. That did not occur in my testing, nor QA testing. If you believe this is actually a bug, I suggest to take careful notes about 1) what version of keepalived is running on each node and 2) what the configuration is and how it differs. I was able to successfully run old (1.2.13) version of keepalived with new (1.3.5) versions without any problem. 1) keepalived-1.2.13-9.el7_3.x86_64 vs. keepalived-1.3.5-1.el7.x86_64 2) keepalived.conf from comment #4 on both nodes (without "vrrp_iptables" and "enable_script_security" keywords) https://github.com/acassen/keepalived/commit/485847cd30503c1ec2370713c2593a2216f19bb1#diff-bb37771a5dd629fb6332c05768e92a95R1606 makes me believing that using IPv6 for unicast_* leads to a VRRP 2 -> 3 upgrade, while vrrp_version is only honored for IPv4 unicast_* somehow. I guess QA testing was IPv4 only, or at least IPv4 for unicast_*, right? Based on a bunch of tests, QA must have done either IPv4-only or IPv6-only tests, especially if these tests succeeded. Mixed configuration like in my comment #4 was definately not QA'ed, because these tests would have failed, definately. keepalived-1.2.13-9.el7_3.x86_64 (RHEL 7.3) allowed this: --- snipp --- vrrp_instance VRRP_INSTANCE { # … native_ipv6 unicast_src_ip 2001:db8::1 unicast_peer { 2001:db8::2 } virtual_ipaddress { 192.0.2.1/30 dev bond1.1000 fe80::1/64 dev bond1.1000 2001:db8:0:1000::1/64 dev bond1.1000 192.0.2.250/29 dev bond0 2001:db8:0:4003::2/64 dev bond0 } } --- snipp --- Using keepalived-1.3.5-1.el7.x86_64 (RHEL 7.4), the following happens and applies: - The VRRP version is set to 3 no matter if or what vrrp_version keyword is configured to, because IPv6 is used for inter-keepalived-communication - Keepalived 1.2.13-9.el7_3 does only support VRRP 2, not VRRP 3 - Keepalived 1.2.13-9.el7_3 supports IPv6 for inter-keepalived-communication only using the "native_ipv6" keyword - Keepalived in 1.3.5-1.el7 ignores the "native_ipv6" keyword, but upgrades VRRP silently from 2 to 3 (see first point) It is NOT possible to make the configuration above working with keepalived 1.2.x AND 1.3.x, because of the difference "native_ipv6" vs. forced VRRP 3. Any upgrade, when having IPv6 for inter-keepalived-communication, requires a configuration change when upgrading from keepalived 1.2.x to 1.3.x. It is not possible to run a 1.2.x and 1.3.x mixed keepalived cluster when having IPv6 for inter-keepalived-communication. Above configuration needs to be rewritten for keepalived-1.3.5-1.el7.x86_64 (RHEL 7.4) like this: --- snipp --- vrrp_sync_group vrrp_group { # … group { vrrp_ipv4 vrrp_ipv6 } } vrrp_instance vrrp_ipv4 { # … unicast_src_ip 192.0.2.5 unicast_peer { 192.0.2.6 } virtual_ipaddress { 192.0.2.1/30 dev bond1.1000 192.0.2.250/29 dev bond0 } } vrrp_instance vrrp_ipv6 { # … native_ipv6 unicast_src_ip 2001:db8::1 unicast_peer { 2001:db8::2 } virtual_ipaddress { fe80::1/64 dev bond1.1000 2001:db8:0:1000::1/64 dev bond1.1000 2001:db8:0:4003::2/64 dev bond0 } } --- snipp --- Please update the RHEL 7.4 release notes to reflect these findings to help other customers about this keepalived incompatibility when upgrading from RHEL 7.3. I had a copy & paste mistake in my previous configuration snippet, thus now without a mistake: --- snipp --- vrrp_sync_group vrrp_group { # … group { vrrp_ipv4 vrrp_ipv6 } } vrrp_instance vrrp_ipv4 { # … unicast_src_ip 192.0.2.5 unicast_peer { 192.0.2.6 } virtual_ipaddress { 192.0.2.1/30 dev bond1.1000 192.0.2.250/29 dev bond0 } } vrrp_instance vrrp_ipv6 { # … unicast_src_ip 2001:db8::1 unicast_peer { 2001:db8::2 } virtual_ipaddress { fe80::1/64 dev bond1.1000 2001:db8:0:1000::1/64 dev bond1.1000 2001:db8:0:4003::2/64 dev bond0 } } --- snipp --- When running above configuration with keepalived-1.3.5-1.el7.x86_64 on tux1, and with keepalived-1.2.13-9.el7_3.x86_64 on tux2, the log on tux2 looks like this (just to proof my previous statement about incompatibility with IPv6): Aug 3 17:07:19 tux2 Keepalived_vrrp[81293]: receive an invalid ip number count associated with VRID! Aug 3 17:07:19 tux2 Keepalived_vrrp[81293]: bogus VRRP packet received on em2 !!! Aug 3 17:07:19 tux2 Keepalived_vrrp[81293]: VRRP_Instance(vrrp_ipv6) ignoring received advertisment... After upgrading tux2 to keepalived-1.3.5-1.el7.x86_64, everything works. Final conclusion remains: Please update Red Hat documentation to reflect the IPv6 keepalived incompatibility during upgrade from RHEL 7.3 to 7.4. (In reply to Robert Scheck from comment #9) > > Using keepalived-1.3.5-1.el7.x86_64 (RHEL 7.4), the following happens and > applies: > > - The VRRP version is set to 3 no matter if or what vrrp_version keyword is > configured to, because IPv6 is used for inter-keepalived-communication > - Keepalived 1.2.13-9.el7_3 does only support VRRP 2, not VRRP 3 > - Keepalived 1.2.13-9.el7_3 supports IPv6 for inter-keepalived-communication > only using the "native_ipv6" keyword > - Keepalived in 1.3.5-1.el7 ignores the "native_ipv6" keyword, but upgrades > VRRP silently from 2 to 3 (see first point) This is mostly correct. It is true that "native_ipv6" will force VRRP version 3. This is due to the fact that VRRPv2 technically did not have IPv6 support -- it was a modification done in keepalived, not part of the RFC. VRRPv3 introduced IPv6 support in the RFC. > It is NOT possible to make the configuration above working with keepalived > 1.2.x AND 1.3.x, because of the difference "native_ipv6" vs. forced VRRP 3. > > Any upgrade, when having IPv6 for inter-keepalived-communication, requires > a configuration change when upgrading from keepalived 1.2.x to 1.3.x. It is > not possible to run a 1.2.x and 1.3.x mixed keepalived cluster when having > IPv6 for inter-keepalived-communication. Right. We do not support using mixed version of keepalived, nor rolling upgrades. Technically it will work if you are using VRRPv2, but as you noticed the "native_ipv6" keyword is forcing VRRPv3, which will not work alongside older versions of keepalived. (In reply to Ryan O'Hara from comment #11) > Right. We do not support using mixed version of keepalived, nor rolling > upgrades. Technically it will work if you are using VRRPv2, but as you > noticed the "native_ipv6" keyword is forcing VRRPv3, which will not work > alongside older versions of keepalived. My enterprise expectation is, that keepalived should avoid downtimes of any kind, thus provide as many backward and forward compatibilities as possible; thus I treat it as even more important to make these points part of regular RHEL documentation. I've spent some time investigating and discovered a few things. First, the "VRRP default protocol version" that is shown on the configuration dump is a hard-coded default. It is simply stating what the default VRRP version is, which is 2. It can be overwritten/changed later due to other logic. More on this below. Note that the real indicator of what protocol version is being *used* is indicated under the VRRP instance in the configuration dump: ------< VRRP Topology >------ VRRP Instance = VRRP_01 Using VRRPv3 Using Native IPv6 ... It seems that the 'native_ipv6' option is forcing VRRPv3, which I verified in the code. The vrrp_native_ipv6_handler function (in vrrp/vrrp_parser.c), which is responsible for handling the 'native_ipv6' option, explicitly sets the VRRP version to VRRPv3: vrrp->version = VRRP_VERSION_3; Is there any resolution to this ticket if native_ipv6 still forces VRRPv3 or if it is possible to support both VRRPv2 and VRRPv3 as in comment #6 Red Hat Enterprise Linux 7 shipped it's final minor release on September 29th, 2020. 7.9 was the last minor releases scheduled for RHEL 7. From intial triage it does not appear the remaining Bugzillas meet the inclusion criteria for Maintenance Phase 2 and will now be closed. From the RHEL life cycle page: https://access.redhat.com/support/policy/updates/errata#Maintenance_Support_2_Phase "During Maintenance Support 2 Phase for Red Hat Enterprise Linux version 7,Red Hat defined Critical and Important impact Security Advisories (RHSAs) and selected (at Red Hat discretion) Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available." If this BZ was closed in error and meets the above criteria please re-open it flag for 7.9.z, provide suitable business and technical justifications, and follow the process for Accelerated Fixes: https://source.redhat.com/groups/public/pnt-cxno/pnt_customer_experience_and_operations_wiki/support_delivery_accelerated_fix_release_handbook Feature Requests can re-opened and moved to RHEL 8 if the desired functionality is not already present in the product. Please reach out to the applicable Product Experience Engineer[0] if you have any questions or concerns. [0] https://bugzilla.redhat.com/page.cgi?id=agile_component_mapping.html&product=Red+Hat+Enterprise+Linux+7 |