Bug 2187281

Summary: [openstack] [neutorn-l3-agent] Keepalived crashing when being used over IPoIB with ipv6 config
Product: Red Hat Enterprise Linux 9 Reporter: Waleed Mousa <waleedm>
Component: keepalivedAssignee: Ryan O'Hara <rohara>
Status: ASSIGNED --- QA Contact: cluster-qe <cluster-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: CentOS StreamCC: bstinson, cluster-maint, jwboyer, rohara
Target Milestone: rcKeywords: Triaged
Target Release: ---Flags: waleedm: needinfo? (rohara)
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Waleed Mousa 2023-04-17 11:01:42 UTC
Description of problem:
Keepalived is crashing on a system with IPoIB networking stack whenever ipv6 entry is used in its config file.
The crash happens while Sending/queueing Unsolicited Neighbour Adverts on the ipv6 address.

The issue was first seen in OpenStack cloud where keepalived is used for HA virtual router and its config it automatically created by openstack components and ipoib interfaces are created in dedicated namespace. According to the logs the crash in this case happens due to "buffer overflow detected"
Issue is reproduced as well on a standalone centos8 machine on which ipoib interface is configured manually. this time a core dump is created for keepalived.
When removing the ipv6 address entry from the conf file, keepalived is not crashing.

Version-Release number of selected component (if applicable):


How reproducible:
always

Steps to Reproduce:
The bug is already fixed in keepalived version 2.2.7 https://redmine.mellanox.com/issues/2971898,  but this version is not updated in Centos Stream repos 8/9

Actual results:


Expected results:


Additional info:

Comment 1 Ryan O'Hara 2023-04-17 17:23:37 UTC
Since it seems to be fixed in keepalived 2.2.7, we could potentially rebase in the next RHEL9 release. Has the patch that fixes this problem been identified? Also, this may be difficult to test/verify as I do not have access to IB hardware.

Comment 3 Waleed Mousa 2023-04-19 12:46:06 UTC
(In reply to Ryan O'Hara from comment #2)
> I think this is the patch:
> 
> https://github.com/acassen/keepalived/pull/2101/commits/
> b5d8aed6032a5c6a37c2880045fd929b80157eaa

(y)

Comment 4 Waleed Mousa 2023-05-07 05:38:12 UTC
(In reply to Ryan O'Hara from comment #1)
> Since it seems to be fixed in keepalived 2.2.7, we could potentially rebase
> in the next RHEL9 release. Has the patch that fixes this problem been
> identified? Also, this may be difficult to test/verify as I do not have
> access to IB hardware.

I've already replaced the rpm in openstack deployment and tested it on our IB hardware and the issue has been resolved.
And yes, this's the patch that solved the issue https://github.com/acassen/keepalived/pull/2101/commits/b5d8aed6032a5c6a37c2880045fd929b80157eaa

Comment 5 Waleed Mousa 2023-05-08 05:31:17 UTC
just to clarify, the latest version of keepalived 2.2.7-6 doesn't has the fix, so we need to build an rpm with that patch and push it to next RHEL9 release.
I built the rpm from master with the fix, and it solved the issue.

Comment 6 Waleed Mousa 2023-05-29 07:26:34 UTC
@Ryan O'Hara any updates here?

Comment 7 Ryan O'Hara 2023-06-08 15:42:23 UTC
(In reply to Waleed Mousa from comment #6)
> @Ryan O'Hara any updates here?

Plan is to have this fixed in RHEL 9.3