Bug 1574914

Summary: heartbeat: IPsrcaddr: Routes are not replaced when containing different metric and tos values
Product: Red Hat Enterprise Linux 7 Reporter: freaker56
Component: resource-agentsAssignee: Oyvind Albrigtsen <oalbrigt>
Status: CLOSED WONTFIX QA Contact: cluster-qe <cluster-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.4CC: agk, cluster-maint, fdinitto, mlisik
Target Milestone: rcKeywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-15 07:38:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description freaker56 2018-05-04 10:55:32 UTC
Description of problem:
The IPsrcaddr replaces ip routes to change the source ip for routing.

In the resource it replaces the rule, however if the rule that it should replace contains metric or tos values (different from the defaults) it adds a rule instead of replacing.

This is the issue the author of https://bugzilla.redhat.com/show_bug.cgi?id=1400172 is facing.
The author has ip routes with metrices and due to this issue it adds a rule instead of replacing.

In that bug report a fix is suggested and implemented:
https://github.com/ClusterLabs/resource-agents/pull/904/commits

However this causes other issues, when the system is booted, the ip route will have the 'proto kernel' attribute, if the resources has been started and stopped, it will no longer have this attribute and subsequent startup's fail.

Version-Release number of selected component (if applicable):
resource-agents-3.9.5-105.el7_4.11.x86_64

How reproducible:
Always.

Steps to Reproduce:
(Copied from bug 1400172)
Run:

export OCF_ROOT=/usr/lib/ocf
export PATH="/usr/sbin:/sbin:$PATH"
export OCF_RESKEY_ipaddress=192.168.0.253
export OCF_RESKEY_cidr_netmask=20

export OCF_RESKEY_nic=eth0
export OCF_RESKEY_ip=192.168.0.253
export OCF_RESKEY_cidr_netmask=20

/usr/lib/ocf/resource.d/heartbeat/IPaddr2 start

ip route show dev eth0

/usr/lib/ocf/resource.d/heartbeat/IPsrcaddr start

ip route show dev eth0

/usr/lib/ocf/resource.d/heartbeat/IPsrcaddr stop

ip route show dev eth0

/usr/lib/ocf/resource.d/heartbeat/IPaddr2 stop


Actual results:

default via 192.168.0.1 dev eth0  proto static  metric 100 
192.168.0.0/20 dev eth0  proto kernel  scope link  src 192.168.0.196  metric 100

default via 192.168.0.1 dev eth0  proto static  src 192.168.0.253  metric 100 
192.168.0.0/20 dev eth0  scope link  src 192.168.0.253 
192.168.0.0/20 dev eth0  proto kernel  scope link  src 192.168.0.196  metric 100

Error: either "to" is duplicate, or "192.168.0.0/20" is a garbage.
ocf-exit-reason:command 'ip route replace 192.168.0.0/20
192.168.0.0/20 dev eth0' failed

default via 192.168.0.1  proto static  src 192.168.0.253  metric 100 
192.168.0.0/20 dev eth0  scope link  src 192.168.0.253 
192.168.0.0/20 dev eth0  proto kernel  scope link  src 192.168.0.196  metric 100

Expected results:

default via 192.168.0.1 dev eth0  proto static  metric 100 
192.168.0.0/20 dev eth0  proto kernel  scope link  src 192.168.0.196  metric 100

default via 192.168.0.1 dev eth0  proto static  src 192.168.0.253  metric 100 
192.168.0.0/20 dev eth0  scope link  src 192.168.0.253  metric 100 

default via 192.168.0.1  proto static  metric 100 
192.168.0.0/20 dev eth0  scope link  src 192.168.0.196  metric 100


Additional info:
I suggest to reverse the initial fix as done in 1400172 to prevent the issues that the 'fix' itself causes.

Secondly I suggest to get the ip route before processing, replace the 'src x.x.x.x' attribute itself, remove the 'proto kernel' if present, and then use 'ip route replace'. This way it preserves also all other options and guarantees a actual replace instead of a possible add.

The reason to remove 'proto kernel' is because this resource changed the route and we can no longer claim it was the kernel that set this.

Comment 5 RHEL Program Management 2021-02-15 07:38:45 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.