Bug 2077016

Summary: [OSP16.1] HA L3 router/keepalived stability issues (ML2/OVS)
Product: Red Hat OpenStack Reporter: ggrimaux
Component: openstack-neutronAssignee: Slawek Kaplonski <skaplons>
Status: CLOSED ERRATA QA Contact: Fiorella Yanac <fyanac>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 16.1 (Train)CC: ahyder, alolivei, averdagu, bcafarel, bdobreli, bperkins, bsawyers, bshephar, ccamposr, chrisw, cluster-maint, dalvarez, dhill, ekuris, eolivare, fleitner, jdolling, jhardee, jschluet, ldenny, ltamagno, mflusche, oblaut, pveiga, ralonsoh, rdiwakar, rohara, scohen, skaplons, sputhenp, sukar, takirby
Target Milestone: z9Keywords: TestCannotAutomate, Triaged, ZStream
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-neutron-15.2.1-1.20220421073454.40d217c.el8ost Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: 1869355
: 2096223 (view as bug list) Environment:
Last Closed: 2022-12-07 20:28:59 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1869355    
Bug Blocks: 2096223    

Comment 5 ldenny 2022-06-07 04:40:51 UTC
Hi Team, 

For now we should be able to tune the `ha_vrrp_advert_int` param to 15 seconds[1] to avoid this issue.[2]

Checking downstream I can see the patch has been part of 16.1-truck-patches for a long time now:
```
[ldenny@redhat-jumpbox neutron]$ git log -S 'vrrp_garp_master_delay'
commit 9d1a942729b7ea03c042bdceb161f3145cfac8c1
Author: Rodolfo Alonso Hernandez <ralonsoh>
Date:   Tue Sep 15 16:04:45 2020 +0000
```

But checking the latest 16.1 neutron-server container we are shipping `openstack-neutron-15.2.1-1.20220112133420.el8ost.noarch` which is higher than the fixed in version of `openstack-neutron-12.1.1-38.el7ost` but the patch is indeed missing:

```
❯ podman create registry.redhat.io/rhosp-rhel8/openstack-neutron-server:16.1.8-10

❯containerfs=$(podman mount -l)

❯ grep -A8 'def _init_keepalived_manager' $containerfs/usr/lib/python3.6/site-packages/neutron/agent/l3/ha_router.py
    def _init_keepalived_manager(self, process_monitor):
        self.keepalived_manager = keepalived.KeepalivedManager(
            self.router['id'],
            keepalived.KeepalivedConf(),
            process_monitor,
            conf_path=self.agent_conf.ha_confs_path,
            namespace=self.ha_namespace,
            throttle_restart_value=(
                self.agent_conf.ha_vrrp_advert_int * THROTTLER_MULTIPLIER))
```

@ralonsoh is this expected? 

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1869355#c46
[2] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html-single/networking_guide/index#tune-keepalived_common-network-tasks

Comment 6 Rodolfo Alonso 2022-06-07 09:02:16 UTC
Hello Lewis:

Please use "git log --pretty=fuller" to check not the "AuthorDate" but the "CommitDate" (that is actually the date the commit was merged):

commit 9d1a942729b7ea03c042bdceb161f3145cfac8c1
Author:     Rodolfo Alonso Hernandez <ralonsoh>
AuthorDate: Tue Sep 15 16:04:45 2020 +0000
Commit:     Rodolfo Alonso Hernandez <ralonsoh>
CommitDate: Thu Apr 21 07:22:31 2022 +0000

    Add "vrrp_garp_master_delay" and "vrrp_garp_master_repeat" parameters


In 16.1 (and this is the goal of this BZ), the patch was merged in April. "openstack-neutron-15.2.1-1.20220112133420.el8ost.noarch" can't have it.

Regards.

Comment 7 ldenny 2022-06-08 04:29:30 UTC
oh great, thanks for the tip Rodolfo, that makes much more sense... sorry for the confusion!

Comment 36 errata-xmlrpc 2022-12-07 20:28:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenStack Platform 16.1.9 (openstack-neutron) security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:8870