Bug 2077016 - [OSP16.1] HA L3 router/keepalived stability issues (ML2/OVS)
Summary: [OSP16.1] HA L3 router/keepalived stability issues (ML2/OVS)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 16.1 (Train)
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: z9
: 16.1 (Train on RHEL 8.2)
Assignee: Slawek Kaplonski
QA Contact: Fiorella Yanac
URL:
Whiteboard:
Depends On: 1869355
Blocks: 2096223
TreeView+ depends on / blocked
 
Reported: 2022-04-20 13:17 UTC by ggrimaux
Modified: 2022-12-08 15:23 UTC (History)
32 users (show)

Fixed In Version: openstack-neutron-15.2.1-1.20220421073454.40d217c.el8ost
Doc Type: No Doc Update
Doc Text:
Clone Of: 1869355
: 2096223 (view as bug list)
Environment:
Last Closed: 2022-12-07 20:28:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-14776 0 None None None 2022-04-20 13:34:37 UTC
Red Hat Product Errata RHSA-2022:8870 0 None None None 2022-12-07 20:29:44 UTC

Comment 5 ldenny 2022-06-07 04:40:51 UTC
Hi Team, 

For now we should be able to tune the `ha_vrrp_advert_int` param to 15 seconds[1] to avoid this issue.[2]

Checking downstream I can see the patch has been part of 16.1-truck-patches for a long time now:
```
[ldenny@redhat-jumpbox neutron]$ git log -S 'vrrp_garp_master_delay'
commit 9d1a942729b7ea03c042bdceb161f3145cfac8c1
Author: Rodolfo Alonso Hernandez <ralonsoh>
Date:   Tue Sep 15 16:04:45 2020 +0000
```

But checking the latest 16.1 neutron-server container we are shipping `openstack-neutron-15.2.1-1.20220112133420.el8ost.noarch` which is higher than the fixed in version of `openstack-neutron-12.1.1-38.el7ost` but the patch is indeed missing:

```
❯ podman create registry.redhat.io/rhosp-rhel8/openstack-neutron-server:16.1.8-10

❯containerfs=$(podman mount -l)

❯ grep -A8 'def _init_keepalived_manager' $containerfs/usr/lib/python3.6/site-packages/neutron/agent/l3/ha_router.py
    def _init_keepalived_manager(self, process_monitor):
        self.keepalived_manager = keepalived.KeepalivedManager(
            self.router['id'],
            keepalived.KeepalivedConf(),
            process_monitor,
            conf_path=self.agent_conf.ha_confs_path,
            namespace=self.ha_namespace,
            throttle_restart_value=(
                self.agent_conf.ha_vrrp_advert_int * THROTTLER_MULTIPLIER))
```

@ralonsoh is this expected? 

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1869355#c46
[2] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html-single/networking_guide/index#tune-keepalived_common-network-tasks

Comment 6 Rodolfo Alonso 2022-06-07 09:02:16 UTC
Hello Lewis:

Please use "git log --pretty=fuller" to check not the "AuthorDate" but the "CommitDate" (that is actually the date the commit was merged):

commit 9d1a942729b7ea03c042bdceb161f3145cfac8c1
Author:     Rodolfo Alonso Hernandez <ralonsoh>
AuthorDate: Tue Sep 15 16:04:45 2020 +0000
Commit:     Rodolfo Alonso Hernandez <ralonsoh>
CommitDate: Thu Apr 21 07:22:31 2022 +0000

    Add "vrrp_garp_master_delay" and "vrrp_garp_master_repeat" parameters


In 16.1 (and this is the goal of this BZ), the patch was merged in April. "openstack-neutron-15.2.1-1.20220112133420.el8ost.noarch" can't have it.

Regards.

Comment 7 ldenny 2022-06-08 04:29:30 UTC
oh great, thanks for the tip Rodolfo, that makes much more sense... sorry for the confusion!

Comment 36 errata-xmlrpc 2022-12-07 20:28:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenStack Platform 16.1.9 (openstack-neutron) security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:8870


Note You need to log in before you can comment on or make changes to this bug.