Bug 1842706
| Summary: | keepalived vrrp address lost after nmcli modication | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Justin <jherron> |
| Component: | keepalived | Assignee: | Ryan O'Hara <rohara> |
| Status: | CLOSED WONTFIX | QA Contact: | Brandon Perkins <bperkins> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.7 | CC: | cluster-maint, cutaylor |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-11-11 21:42:37 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Red Hat Enterprise Linux 7 shipped it's final minor release on September 29th, 2020. 7.9 was the last minor releases scheduled for RHEL 7. From intial triage it does not appear the remaining Bugzillas meet the inclusion criteria for Maintenance Phase 2 and will now be closed. From the RHEL life cycle page: https://access.redhat.com/support/policy/updates/errata#Maintenance_Support_2_Phase "During Maintenance Support 2 Phase for Red Hat Enterprise Linux version 7,Red Hat defined Critical and Important impact Security Advisories (RHSAs) and selected (at Red Hat discretion) Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available." If this BZ was closed in error and meets the above criteria please re-open it flag for 7.9.z, provide suitable business and technical justifications, and follow the process for Accelerated Fixes: https://source.redhat.com/groups/public/pnt-cxno/pnt_customer_experience_and_operations_wiki/support_delivery_accelerated_fix_release_handbook Feature Requests can re-opened and moved to RHEL 8 if the desired functionality is not already present in the product. Please reach out to the applicable Product Experience Engineer[0] if you have any questions or concerns. [0] https://bugzilla.redhat.com/page.cgi?id=agile_component_mapping.html&product=Red+Hat+Enterprise+Linux+7 |
Description of problem: Case 02644340, customer is reporting when any modification is done to the Interface profile managed by NetworkManager that keepalived is listening on the vrrp address assigned to an interface is lost and fail over does not occur as well as the vrrp address is unreachable. I was able to completely reproduce this problem with two vms on a single subnet. +----->--------------------+ | |XXXXXXXXXXXX| | | +-----------------<-+ | | | | | | v | v +-------+--+---+ +--+---+-------+ |node01 | |node02 | | +<---+VIP+-->+ | | | 10.170.1.50| | | | | | | | | | +--------------+ +--------------+ VIP address: ----------------------------------------- Version-Release number of selected component (if applicable): keepalived-1.3.5-16.el7.x86_64 How reproducible: ---------------------------------------------------- Steps to Reproduce: 1. yum install keepalived -y 2. Setup an instance of keepalived with MASTER|BACKUP 3. Edit the interface via nmcli then reset or reapply the interface. Actual results: /*Address is assigned to ens224*/ ---------------------------------------------------- ~]# ip addr list | grep 131 inet 131.232.67.212/24 brd 131.232.67.255 scope global noprefixroute ens192 inet 131.232.67.214/24 scope global secondary ens192 ---------------------------------------------------- /*Modify interface that is listed in the keepalived.conf instance ie directive `interface ens192`*/ ---------------------------------------------------- ~]# nmcli con mod ens192 ipv4.dns "131.232.3.99" ~]# ip addr list | grep 131 ; date inet 131.232.67.212/24 brd 131.232.67.255 scope global noprefixroute ens192 inet 131.232.67.214/24 scope global secondary ens192 ---------------------------------------------------- Now device reapply to update the settings; Notice below the VIP address is no longer assigned. however keepalived doesn't appear to be aware as the link is still up. ---------------------------------------------------- ~]# nmcli device reapply ens192 Connection successfully reapplied to device 'ens192'. ~]# ip addr list | grep 131 inet 131.232.67.212/24 brd 131.232.67.255 scope global noprefixroute ens192 ---------------------------------------------------- Now restart keepalived and the vip gets assigned. However fail-over did not occur or ip address reasign thus VIP is unreachable. ---------------------------------------------------- ~]# systemctl restart keepalived ~]# ip addr list | grep 131 ; date inet 131.232.67.212/24 brd 131.232.67.255 scope global noprefixroute ens192 ~]# ip addr list | grep 131 ; date inet 131.232.67.212/24 brd 131.232.67.255 scope global noprefixroute ens192 inet 131.232.67.214/24 scope global secondary ens192 Expected results: The expected result is to either initiate failover or for NetworkManager to not *remove* the vrrp address from the interface. Additional info: From what I have gathered keepalived only monitors the state of the device it has assigned for an instance in keepalived.conf. However, NetworkManager only knows about the ip addresses that are defined in the interface profiles. Once the command nmcli con <int> up; nmcli dev reapply <int>; NetworkMaanger deletes the VIP address and keepalived is completely unaware. ip addr inet 10.170.1.10/24 brd 10.170.1.255 scope global noprefixroute eth0 inet 10.170.1.50/32 scope global eth0 inet 10.170.1.40/24 scope global secondary eth0 systemctl status keepalived ● keepalived.service - LVS and VRRP High Availability Monitor Loaded: loaded (/usr/lib/systemd/system/keepalived.service; enabled; vendor preset: disabled) Active: active (running) since Mon 2020-06-01 17:22:03 EDT; 10min ago Process: 1358 ExecStart=/usr/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS) Jun 01 17:27:17 node01.example.com Keepalived_vrrp[1361]: Sending gratuitous ARP on eth0 for 10.170.1.40 Jun 01 17:27:17 node01.example.com Keepalived_vrrp[1361]: Sending gratuitous ARP on eth0 for 10.170.1.40 ip addr 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 0c:1a:4e:44:ba:00 brd ff:ff:ff:ff:ff:ff inet 10.170.1.10/24 brd 10.170.1.255 scope global noprefixroute eth0 valid_lft forever preferred_lft forever inet6 fe80::9e80:3cbf:b409:d551/64 scope link noprefixroute valid_lft forever preferred_lft forever /*Workaround*/ NetworkManger doesn't mark the state of the the interface down when it looses an ip address. So now the question is how do we remedy this an actually cause the fail over to occur? I used NetworkManager environment variables for NetworkManager to actually put the link down then bring it back up. /etc/NetworkManager/dispatcher.d/pre-down.d/ ``` #!/bin/bash IFACE=$DEVICE_IP_IFACE ACTION=$NM_DISPATCHER_ACTION ADVRT=1 case $ACTION in down) ip link set $ACTION $IFACE ;; up) ip link set down $IFACE;sleep $ADVRT;ip link set $ACTION $IFACE; logger info "Interface $IFACE has been reset!" ;; esac ``` I set the ADVRT variable above to one, as I obseverd it needs to be equal to the integer value of the advert_int defined in the keepalived.conf file.So the keepalived daemon has time, equal to the value defined by the advert_int to detect that the state of LINK went down and is not able to send its VRRP Hello packets out.The above is kinda of a crude way of doing this but it does work from my testing and implementation. It does not work with the nmcli device reapply as this is a different type of syscall that doesn't call the scripts in the dispatcher.d from what I have found. /*RESEARCH*/ After digging a bit I found the commit and the changelog in the upstream maintainers where this was implemented but this was implemented in version 2.0.0. Might be something to backport into the Red software collections keepalived 1.5, as from 1.3 its alot of commits and is proably not worth the effort. Keepalived ChangeLog Release 2.0.0 - https://www.keepalived.org/changelog.html * Monitor VIP/eVIP deletion and transition to backup if a VIP/eVIP is removed unloes it is configured with the no-track option. https://github.com/acassen/keepalived/commits/v2.0.0 > Add no-track option for VIPs/eVIPs https://github.com/acassen/keepalived/issues/836 > Add tracking of VIPs/eVIPs on interfaces other the vrrp instance i/f https://github.com/acassen/keepalived/commit/979727e5db1f0307149b2932267ed214ecd0850d